Once device_register() failed, we should call put_device() to
decrement reference count for cleanup. Or it could cause memory leak.
As comment of device_register() says, 'NOTE: _Never_ directly free
@dev after calling this function, even if it returned an error! Always
use put_device() to give up the reference initialized in this function
instead.'
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v5:
- modified the bug description as suggestions;
Changes in v4:
- deleted the redundant initialization;
Changes in v3:
- modified the patch as suggestions;
Changes in v2:
- modified the patch as suggestions.
---
arch/arm/common/locomo.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/arm/common/locomo.c b/arch/arm/common/locomo.c
index cb6ef449b987..45106066a17f 100644
--- a/arch/arm/common/locomo.c
+++ b/arch/arm/common/locomo.c
@@ -223,10 +223,8 @@ locomo_init_one_child(struct locomo *lchip, struct locomo_dev_info *info)
int ret;
dev = kzalloc(sizeof(struct locomo_dev), GFP_KERNEL);
- if (!dev) {
- ret = -ENOMEM;
- goto out;
- }
+ if (!dev)
+ return -ENOMEM;
/*
* If the parent device has a DMA mask associated with it,
@@ -254,10 +252,9 @@ locomo_init_one_child(struct locomo *lchip, struct locomo_dev_info *info)
NO_IRQ : lchip->irq_base + info->irq[0];
ret = device_register(&dev->dev);
- if (ret) {
- out:
- kfree(dev);
- }
+ if (ret)
+ put_device(&dev->dev);
+
return ret;
}
--
2.25.1
If device_add() fails, do not use device_unregister() for error
handling. device_unregister() consists two functions: device_del() and
put_device(). device_unregister() should only be called after
device_add() succeeded because device_del() undoes what device_add()
does if successful. Change device_unregister() to put_device() call
before returning from the function.
As comment of device_add() says, 'if device_add() succeeds, you should
call device_del() when you want to get rid of it. If device_add() has
not succeeded, use only put_device() to drop the reference count'.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 53d2a715c240 ("phy: Add Tegra XUSB pad controller support")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- modified the bug description as suggestions.
---
drivers/phy/tegra/xusb.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/phy/tegra/xusb.c b/drivers/phy/tegra/xusb.c
index 79d4814d758d..c89df95aa6ca 100644
--- a/drivers/phy/tegra/xusb.c
+++ b/drivers/phy/tegra/xusb.c
@@ -548,16 +548,16 @@ static int tegra_xusb_port_init(struct tegra_xusb_port *port,
err = dev_set_name(&port->dev, "%s-%u", name, index);
if (err < 0)
- goto unregister;
+ goto put_device;
err = device_add(&port->dev);
if (err < 0)
- goto unregister;
+ goto put_device;
return 0;
-unregister:
- device_unregister(&port->dev);
+put_device:
+ put_device(&port->dev);
return err;
}
--
2.25.1
Fix two issues when cross-building userprogs with clang.
Reproducer, using nolibc to avoid libc requirements for cross building:
$ tail -2 init/Makefile
userprogs-always-y += test-llvm
test-llvm-userccflags += -nostdlib -nolibc -static -isystem usr/ -include $(srctree)/tools/include/nolibc/nolibc.h
$ cat init/test-llvm.c
int main(void)
{
return 0;
}
$ make ARCH=arm64 LLVM=1 allnoconfig headers_install init/
Validate that init/test-llvm builds and has the correct binary format.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Thomas Weißschuh (2):
kbuild: userprogs: fix bitsize and target detection on clang
kbuild: userprogs: use lld to link through clang
Makefile | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250213-kbuild-userprog-fixes-4f07b62ae818
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
On xiaomi-beryllium and oneplus-enchilada audio does not work reliably
with the in-kernel pd-mapper. Deferring the probe solves these issues.
Specifically, audio only works reliably with the in-kernel pd-mapper, if
the probe succeeds when remoteproc3 triggers the first successful probe.
I.e., probes from remoteproc0, 1, and 2 need to be deferred until
remoteproc3 has been probed.
Introduce a device specific quirk that lists the first auxdev for which
the probe must be executed. Until then, defer probes from other auxdevs.
Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Frank Oltmanns <frank(a)oltmanns.dev>
---
The in-kernel pd-mapper has been causing audio issues on sdm845
devices (specifically, xiaomi-beryllium and oneplus-enchilada). I
observed that Stephan’s approach [1] - which defers module probing by
blocklisting the module and triggering a later probe - works reliably.
Inspired by this, I experimented with delaying the probe within the
module itself by returning -EPROBE_DEFER in qcom_pdm_probe() until a
certain time (13.9 seconds after boot, based on ktime_get()) had
elapsed. This method also restored audio functionality.
Further logging of auxdev->id in qcom_pdm_probe() led to an interesting
discovery: audio only works reliably with the in-kernel pd-mapper when
the first successful probe is triggered by remoteproc3. In other words,
probes from remoteproc0, 1, and 2 must be deferred until remoteproc3 has
been probed.
To address this, I propose introducing a quirk table (which currently
only contains sdm845) to defer probing until the correct auxiliary
device (remoteproc3) initiates the probe.
I look forward to your feedback.
Thanks,
Frank
[1]: https://lore.kernel.org/linux-arm-msm/Zwj3jDhc9fRoCCn6@linaro.org/
---
drivers/soc/qcom/qcom_pd_mapper.c | 43 +++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/drivers/soc/qcom/qcom_pd_mapper.c b/drivers/soc/qcom/qcom_pd_mapper.c
index 154ca5beb47160cc404a46a27840818fe3187420..34b26df665a888ac4872f56e948e73b561ae3b6b 100644
--- a/drivers/soc/qcom/qcom_pd_mapper.c
+++ b/drivers/soc/qcom/qcom_pd_mapper.c
@@ -46,6 +46,11 @@ struct qcom_pdm_data {
struct list_head services;
};
+struct qcom_pdm_probe_first_dev_quirk {
+ const char *name;
+ u32 id;
+};
+
static DEFINE_MUTEX(qcom_pdm_mutex); /* protects __qcom_pdm_data */
static struct qcom_pdm_data *__qcom_pdm_data;
@@ -526,6 +531,11 @@ static const struct qcom_pdm_domain_data *x1e80100_domains[] = {
NULL,
};
+static const struct qcom_pdm_probe_first_dev_quirk first_dev_remoteproc3 = {
+ .id = 3,
+ .name = "pd-mapper"
+};
+
static const struct of_device_id qcom_pdm_domains[] __maybe_unused = {
{ .compatible = "qcom,apq8016", .data = NULL, },
{ .compatible = "qcom,apq8064", .data = NULL, },
@@ -566,6 +576,10 @@ static const struct of_device_id qcom_pdm_domains[] __maybe_unused = {
{},
};
+static const struct of_device_id qcom_pdm_defer[] __maybe_unused = {
+ { .compatible = "qcom,sdm845", .data = &first_dev_remoteproc3, },
+ {},
+};
static void qcom_pdm_stop(struct qcom_pdm_data *data)
{
qcom_pdm_free_domains(data);
@@ -637,6 +651,25 @@ static struct qcom_pdm_data *qcom_pdm_start(void)
return ERR_PTR(ret);
}
+static bool qcom_pdm_ready(struct auxiliary_device *auxdev)
+{
+ const struct of_device_id *match;
+ struct device_node *root;
+ struct qcom_pdm_probe_first_dev_quirk *first_dev;
+
+ root = of_find_node_by_path("/");
+ if (!root)
+ return true;
+
+ match = of_match_node(qcom_pdm_defer, root);
+ of_node_put(root);
+ if (!match)
+ return true;
+
+ first_dev = (struct qcom_pdm_probe_first_dev_quirk *) match->data;
+ return (auxdev->id == first_dev->id) && !strcmp(auxdev->name, first_dev->name);
+}
+
static int qcom_pdm_probe(struct auxiliary_device *auxdev,
const struct auxiliary_device_id *id)
@@ -647,6 +680,15 @@ static int qcom_pdm_probe(struct auxiliary_device *auxdev,
mutex_lock(&qcom_pdm_mutex);
if (!__qcom_pdm_data) {
+ if (!qcom_pdm_ready(auxdev)) {
+ pr_debug("%s: Deferring probe for device %s (id: %u)\n",
+ __func__, auxdev->name, auxdev->id);
+ ret = -EPROBE_DEFER;
+ goto probe_stop;
+ }
+ pr_debug("%s: Probing for device %s (id: %u), starting pdm\n",
+ __func__, auxdev->name, auxdev->id);
+
data = qcom_pdm_start();
if (IS_ERR(data))
@@ -659,6 +701,7 @@ static int qcom_pdm_probe(struct auxiliary_device *auxdev,
auxiliary_set_drvdata(auxdev, __qcom_pdm_data);
+probe_stop:
mutex_unlock(&qcom_pdm_mutex);
return ret;
---
base-commit: 7f048b202333b967782a98aa21bb3354dc379bbf
change-id: 20250205-qcom_pdm_defer-3dc1271d74d9
Best regards,
--
Frank Oltmanns <frank(a)oltmanns.dev>
From: Steven Rostedt <rostedt(a)goodmis.org>
The pages_touched field represents the number of subbuffers in the ring
buffer that have content that can be read. This is used in accounting of
"dirty_pages" and "buffer_percent" to allow the user to wait for the
buffer to be filled to a certain amount before it reads the buffer in
blocking mode.
The persistent buffer never updated this value so it was set to zero, and
this accounting would take it as it had no content. This would cause user
space to wait for content even though there's enough content in the ring
buffer that satisfies the buffer_percent.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Vincent Donnefort <vdonnefort(a)google.com>
Link: https://lore.kernel.org/20250214123512.0631436e@gandalf.local.home
Fixes: 5f3b6e839f3ce ("ring-buffer: Validate boot range memory events")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0419d41a2060..bb6089c2951e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1850,6 +1850,11 @@ static void rb_meta_validate_events(struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->cpu);
goto invalid;
}
+
+ /* If the buffer has content, update pages_touched */
+ if (ret)
+ local_inc(&cpu_buffer->pages_touched);
+
entries += ret;
entry_bytes += local_read(&head_page->page->commit);
local_set(&cpu_buffer->head_page->entries, ret);
--
2.47.2
From: Steven Rostedt <rostedt(a)goodmis.org>
When trying to mmap a trace instance buffer that is attached to
reserve_mem, it would crash:
BUG: unable to handle page fault for address: ffffe97bd00025c8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 2862f3067 P4D 2862f3067 PUD 0
Oops: Oops: 0000 [#1] PREEMPT_RT SMP PTI
CPU: 4 UID: 0 PID: 981 Comm: mmap-rb Not tainted 6.14.0-rc2-test-00003-g7f1a5e3fbf9e-dirty #233
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:validate_page_before_insert+0x5/0xb0
Code: e2 01 89 d0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <48> 8b 46 08 a8 01 75 67 66 90 48 89 f0 8b 50 34 85 d2 74 76 48 89
RSP: 0018:ffffb148c2f3f968 EFLAGS: 00010246
RAX: ffff9fa5d3322000 RBX: ffff9fa5ccff9c08 RCX: 00000000b879ed29
RDX: ffffe97bd00025c0 RSI: ffffe97bd00025c0 RDI: ffff9fa5ccff9c08
RBP: ffffb148c2f3f9f0 R08: 0000000000000004 R09: 0000000000000004
R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
R13: 00007f16a18d5000 R14: ffff9fa5c48db6a8 R15: 0000000000000000
FS: 00007f16a1b54740(0000) GS:ffff9fa73df00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffe97bd00025c8 CR3: 00000001048c6006 CR4: 0000000000172ef0
Call Trace:
<TASK>
? __die_body.cold+0x19/0x1f
? __die+0x2e/0x40
? page_fault_oops+0x157/0x2b0
? search_module_extables+0x53/0x80
? validate_page_before_insert+0x5/0xb0
? kernelmode_fixup_or_oops.isra.0+0x5f/0x70
? __bad_area_nosemaphore+0x16e/0x1b0
? bad_area_nosemaphore+0x16/0x20
? do_kern_addr_fault+0x77/0x90
? exc_page_fault+0x22b/0x230
? asm_exc_page_fault+0x2b/0x30
? validate_page_before_insert+0x5/0xb0
? vm_insert_pages+0x151/0x400
__rb_map_vma+0x21f/0x3f0
ring_buffer_map+0x21b/0x2f0
tracing_buffers_mmap+0x70/0xd0
__mmap_region+0x6f0/0xbd0
mmap_region+0x7f/0x130
do_mmap+0x475/0x610
vm_mmap_pgoff+0xf2/0x1d0
ksys_mmap_pgoff+0x166/0x200
__x64_sys_mmap+0x37/0x50
x64_sys_call+0x1670/0x1d70
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The reason was that the code that maps the ring buffer pages to user space
has:
page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]);
And uses that in:
vm_insert_pages(vma, vma->vm_start, pages, &nr_pages);
But virt_to_page() does not work with vmap()'d memory which is what the
persistent ring buffer has. It is rather trivial to allow this, but for
now just disable mmap() of instances that have their ring buffer from the
reserve_mem option.
If an mmap() is performed on a persistent buffer it will return -ENODEV
just like it would if the .mmap field wasn't defined in the
file_operations structure.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Vincent Donnefort <vdonnefort(a)google.com>
Link: https://lore.kernel.org/20250214115547.0d7287d3@gandalf.local.home
Fixes: e645535a954ad ("tracing: Add option to use memmapped memory for trace boot instance")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 25ff37aab00f..0e6d517e74e0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8279,6 +8279,10 @@ static int tracing_buffers_mmap(struct file *filp, struct vm_area_struct *vma)
struct trace_iterator *iter = &info->iter;
int ret = 0;
+ /* Currently the boot mapped buffer is not supported for mmap */
+ if (iter->tr->flags & TRACE_ARRAY_FL_BOOT)
+ return -ENODEV;
+
ret = get_snapshot_map(iter->tr);
if (ret)
return ret;
--
2.47.2
From: Steven Rostedt <rostedt(a)goodmis.org>
The meta data for a mapped ring buffer contains an array of indexes of all
the subbuffers. The first entry is the reader page, and the rest of the
entries lay out the order of the subbuffers in how the ring buffer link
list is to be created.
The validator currently makes sure that all the entries are within the
range of 0 and nr_subbufs. But it does not check if there are any
duplicates.
While working on the ring buffer, I corrupted this array, where I added
duplicates. The validator did not catch it and created the ring buffer
link list on top of it. Luckily, the corruption was only that the reader
page was also in the writer path and only presented corrupted data but did
not crash the kernel. But if there were duplicates in the writer side,
then it could corrupt the ring buffer link list and cause a crash.
Create a bitmask array with the size of the number of subbuffers. Then
clear it. When walking through the subbuf array checking to see if the
entries are within the range, test if its bit is already set in the
subbuf_mask. If it is, then there is duplicates and fail the validation.
If not, set the corresponding bit and continue.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Vincent Donnefort <vdonnefort(a)google.com>
Link: https://lore.kernel.org/20250214102820.7509ddea@gandalf.local.home
Fixes: c76883f18e59b ("ring-buffer: Add test if range of boot buffer is valid")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 07b421115692..0419d41a2060 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1672,7 +1672,8 @@ static void *rb_range_buffer(struct ring_buffer_per_cpu *cpu_buffer, int idx)
* must be the same.
*/
static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
- struct trace_buffer *buffer, int nr_pages)
+ struct trace_buffer *buffer, int nr_pages,
+ unsigned long *subbuf_mask)
{
int subbuf_size = PAGE_SIZE;
struct buffer_data_page *subbuf;
@@ -1680,6 +1681,9 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
unsigned long buffers_end;
int i;
+ if (!subbuf_mask)
+ return false;
+
/* Check the meta magic and meta struct size */
if (meta->magic != RING_BUFFER_META_MAGIC ||
meta->struct_size != sizeof(*meta)) {
@@ -1712,6 +1716,8 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
subbuf = rb_subbufs_from_meta(meta);
+ bitmap_clear(subbuf_mask, 0, meta->nr_subbufs);
+
/* Is the meta buffers and the subbufs themselves have correct data? */
for (i = 0; i < meta->nr_subbufs; i++) {
if (meta->buffers[i] < 0 ||
@@ -1725,6 +1731,12 @@ static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu,
return false;
}
+ if (test_bit(meta->buffers[i], subbuf_mask)) {
+ pr_info("Ring buffer boot meta [%d] array has duplicates\n", cpu);
+ return false;
+ }
+
+ set_bit(meta->buffers[i], subbuf_mask);
subbuf = (void *)subbuf + subbuf_size;
}
@@ -1889,17 +1901,22 @@ static void rb_meta_init_text_addr(struct ring_buffer_meta *meta)
static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
{
struct ring_buffer_meta *meta;
+ unsigned long *subbuf_mask;
unsigned long delta;
void *subbuf;
int cpu;
int i;
+ /* Create a mask to test the subbuf array */
+ subbuf_mask = bitmap_alloc(nr_pages + 1, GFP_KERNEL);
+ /* If subbuf_mask fails to allocate, then rb_meta_valid() will return false */
+
for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
void *next_meta;
meta = rb_range_meta(buffer, nr_pages, cpu);
- if (rb_meta_valid(meta, cpu, buffer, nr_pages)) {
+ if (rb_meta_valid(meta, cpu, buffer, nr_pages, subbuf_mask)) {
/* Make the mappings match the current address */
subbuf = rb_subbufs_from_meta(meta);
delta = (unsigned long)subbuf - meta->first_buffer;
@@ -1943,6 +1960,7 @@ static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages)
subbuf += meta->subbuf_size;
}
}
+ bitmap_free(subbuf_mask);
}
static void *rbm_start(struct seq_file *m, loff_t *pos)
--
2.47.2
From: Steven Rostedt <rostedt(a)goodmis.org>
Currently if __tracing_resize_ring_buffer() returns an error, the
tracing_resize_ringbuffer() returns -ENOMEM. But it may not be a memory
issue that caused the function to fail. If the ring buffer is memory
mapped, then the resizing of the ring buffer will be disabled. But if the
user tries to resize the buffer, it will get an -ENOMEM returned, which is
confusing because there is plenty of memory. The actual error returned was
-EBUSY, which would make much more sense to the user.
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Vincent Donnefort <vdonnefort(a)google.com>
Link: https://lore.kernel.org/20250213134132.7e4505d7@gandalf.local.home
Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
---
kernel/trace/trace.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1496a5ac33ae..25ff37aab00f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5977,8 +5977,6 @@ static int __tracing_resize_ring_buffer(struct trace_array *tr,
ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
unsigned long size, int cpu_id)
{
- int ret;
-
guard(mutex)(&trace_types_lock);
if (cpu_id != RING_BUFFER_ALL_CPUS) {
@@ -5987,11 +5985,7 @@ ssize_t tracing_resize_ring_buffer(struct trace_array *tr,
return -EINVAL;
}
- ret = __tracing_resize_ring_buffer(tr, size, cpu_id);
- if (ret < 0)
- ret = -ENOMEM;
-
- return ret;
+ return __tracing_resize_ring_buffer(tr, size, cpu_id);
}
static void update_last_data(struct trace_array *tr)
--
2.47.2
From: Steven Rostedt <rostedt(a)goodmis.org>
Memory mapping the tracing ring buffer will disable resizing the buffer.
But if there's an error in the memory mapping like an invalid parameter,
the function exits out without re-enabling the resizing of the ring
buffer, preventing the ring buffer from being resized after that.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Vincent Donnefort <vdonnefort(a)google.com>
Link: https://lore.kernel.org/20250213131957.530ec3c5@gandalf.local.home
Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b8e0ae15ca5b..07b421115692 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -7126,6 +7126,7 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
kfree(cpu_buffer->subbuf_ids);
cpu_buffer->subbuf_ids = NULL;
rb_free_meta_page(cpu_buffer);
+ atomic_dec(&cpu_buffer->resize_disabled);
}
unlock:
--
2.47.2
commit e4b9852a0f4afe40604afb442e3af4452722050a upstream.
Below two paths could overlap each other if we power off a drive quickly
after powering it on. There are multiple races in nvme_setup_io_queues()
because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit.
nvme_reset_work() nvme_remove()
nvme_setup_io_queues() nvme_dev_disable()
... ...
A1 clear NVMEQ_ENABLED bit for admin queue lock
retry: B1 nvme_suspend_io_queues()
A2 pci_free_irq() admin queue B2 nvme_suspend_queue() admin queue
A3 pci_free_irq_vectors() nvme_pci_disable()
A4 nvme_setup_irqs(); B3 pci_free_irq_vectors()
... unlock
A5 queue_request_irq() for admin queue
set NVMEQ_ENABLED bit
...
nvme_create_io_queues()
A6 result = queue_request_irq();
set NVMEQ_ENABLED bit
...
fail to allocate enough IO queues:
A7 nvme_suspend_io_queues()
goto retry
If B3 runs in between A1 and A2, it will crash if irqaction haven't
been freed by A2. B2 is supposed to free admin queue IRQ but it simply
can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit.
Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit
gets cleared.
After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3
is checking irqaction. A3 also could race with B2 if B2 is freeing
IRQ while A3 is checking irqaction.
Fix: A2 and A3 take lock for mutual exclusion.
A3 could race with B3 since they could run free_msi_irqs() in parallel.
Fix: A3 takes lock for mutual exclusion.
A4 could fail to allocate all needed IRQ vectors if A3 and A4 are
interrupted by B3.
Fix: A4 takes lock for mutual exclusion.
If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL.
They are just allocated by A5/A6.
Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit.
A7 could get chance to pci_free_irq() for certain IO queue while B3 is
checking irqaction.
Fix: A7 takes lock.
nvme_dev->online_queues need to be protected by shutdown_lock. Since it
is not atomic, both paths could modify it using its own copy.
Co-developed-by: Yuanyuan Zhong <yzhong(a)purestorage.com>
Signed-off-by: Casey Chen <cachen(a)purestorage.com>
Reviewed-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
[noahm(a)debian.org: backported to 5.10]
Link: https://lore.kernel.org/linux-nvme/20210707211432.29536-1-cachen@purestorag…
Signed-off-by: Noah Meyerhans <noahm(a)debian.org>
---
drivers/nvme/host/pci.c | 66 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 58 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 875ebef6adc7..ae04bdce560a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1563,6 +1563,28 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
wmb(); /* ensure the first interrupt sees the initialization */
}
+/*
+ * Try getting shutdown_lock while setting up IO queues.
+ */
+static int nvme_setup_io_queues_trylock(struct nvme_dev *dev)
+{
+ /*
+ * Give up if the lock is being held by nvme_dev_disable.
+ */
+ if (!mutex_trylock(&dev->shutdown_lock))
+ return -ENODEV;
+
+ /*
+ * Controller is in wrong state, fail early.
+ */
+ if (dev->ctrl.state != NVME_CTRL_CONNECTING) {
+ mutex_unlock(&dev->shutdown_lock);
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
{
struct nvme_dev *dev = nvmeq->dev;
@@ -1591,8 +1613,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
goto release_cq;
nvmeq->cq_vector = vector;
- nvme_init_queue(nvmeq, qid);
+ result = nvme_setup_io_queues_trylock(dev);
+ if (result)
+ return result;
+ nvme_init_queue(nvmeq, qid);
if (!polled) {
result = queue_request_irq(nvmeq);
if (result < 0)
@@ -1600,10 +1625,12 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
}
set_bit(NVMEQ_ENABLED, &nvmeq->flags);
+ mutex_unlock(&dev->shutdown_lock);
return result;
release_sq:
dev->online_queues--;
+ mutex_unlock(&dev->shutdown_lock);
adapter_delete_sq(dev, qid);
release_cq:
adapter_delete_cq(dev, qid);
@@ -2182,7 +2209,18 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (nr_io_queues == 0)
return 0;
- clear_bit(NVMEQ_ENABLED, &adminq->flags);
+ /*
+ * Free IRQ resources as soon as NVMEQ_ENABLED bit transitions
+ * from set to unset. If there is a window to it is truely freed,
+ * pci_free_irq_vectors() jumping into this window will crash.
+ * And take lock to avoid racing with pci_free_irq_vectors() in
+ * nvme_dev_disable() path.
+ */
+ result = nvme_setup_io_queues_trylock(dev);
+ if (result)
+ return result;
+ if (test_and_clear_bit(NVMEQ_ENABLED, &adminq->flags))
+ pci_free_irq(pdev, 0, adminq);
if (dev->cmb_use_sqes) {
result = nvme_cmb_qdepth(dev, nr_io_queues,
@@ -2198,14 +2236,17 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
result = nvme_remap_bar(dev, size);
if (!result)
break;
- if (!--nr_io_queues)
- return -ENOMEM;
+ if (!--nr_io_queues) {
+ result = -ENOMEM;
+ goto out_unlock;
+ }
} while (1);
adminq->q_db = dev->dbs;
retry:
/* Deregister the admin queue's interrupt */
- pci_free_irq(pdev, 0, adminq);
+ if (test_and_clear_bit(NVMEQ_ENABLED, &adminq->flags))
+ pci_free_irq(pdev, 0, adminq);
/*
* If we enable msix early due to not intx, disable it again before
@@ -2214,8 +2255,10 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
pci_free_irq_vectors(pdev);
result = nvme_setup_irqs(dev, nr_io_queues);
- if (result <= 0)
- return -EIO;
+ if (result <= 0) {
+ result = -EIO;
+ goto out_unlock;
+ }
dev->num_vecs = result;
result = max(result - 1, 1);
@@ -2229,8 +2272,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
*/
result = queue_request_irq(adminq);
if (result)
- return result;
+ goto out_unlock;
set_bit(NVMEQ_ENABLED, &adminq->flags);
+ mutex_unlock(&dev->shutdown_lock);
result = nvme_create_io_queues(dev);
if (result || dev->online_queues < 2)
@@ -2239,6 +2283,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (dev->online_queues - 1 < dev->max_qid) {
nr_io_queues = dev->online_queues - 1;
nvme_disable_io_queues(dev);
+ result = nvme_setup_io_queues_trylock(dev);
+ if (result)
+ return result;
nvme_suspend_io_queues(dev);
goto retry;
}
@@ -2247,6 +2294,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
dev->io_queues[HCTX_TYPE_READ],
dev->io_queues[HCTX_TYPE_POLL]);
return 0;
+out_unlock:
+ mutex_unlock(&dev->shutdown_lock);
+ return result;
}
static void nvme_del_queue_end(struct request *req, blk_status_t error)
--
2.39.5
The patch titled
Subject: Revert "selftests/mm: remove local __NR_* definitions"
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
revert-selftests-mm-remove-local-__nr_-definitions.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: John Hubbard <jhubbard(a)nvidia.com>
Subject: Revert "selftests/mm: remove local __NR_* definitions"
Date: Thu, 13 Feb 2025 19:38:50 -0800
This reverts commit a5c6bc590094a1a73cf6fa3f505e1945d2bf2461.
The general approach described in commit e076eaca5906 ("selftests: break
the dependency upon local header files") was taken one step too far here:
it should not have been extended to include the syscall numbers. This is
because doing so would require per-arch support in tools/include/uapi, and
no such support exists.
This revert fixes two separate reports of test failures, from Dave
Hansen[1], and Li Wang[2]. An excerpt of Dave's report:
Before this commit (a5c6bc590094a1a73cf6fa3f505e1945d2bf2461) things are
fine. But after, I get:
running PKEY tests for unsupported CPU/OS
An excerpt of Li's report:
I just found that mlock2_() return a wrong value in mlock2-test
[1] https://lore.kernel.org/dc585017-6740-4cab-a536-b12b37a7582d@intel.com
[2] https://lore.kernel.org/CAEemH2eW=UMu9+turT2jRie7+6ewUazXmA6kL+VBo3cGDGU6RA…
Link: https://lkml.kernel.org/r/20250214033850.235171-1-jhubbard@nvidia.com
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Li Wang <liwang(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jeff Xu <jeffxu(a)chromium.org>
Cc: Andrei Vagin <avagin(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hugepage-mremap.c | 2 -
tools/testing/selftests/mm/ksm_functional_tests.c | 8 +++++-
tools/testing/selftests/mm/memfd_secret.c | 14 ++++++++++-
tools/testing/selftests/mm/mkdirty.c | 8 +++++-
tools/testing/selftests/mm/mlock2.h | 1
tools/testing/selftests/mm/protection_keys.c | 2 -
tools/testing/selftests/mm/uffd-common.c | 4 +++
tools/testing/selftests/mm/uffd-stress.c | 15 +++++++++++-
tools/testing/selftests/mm/uffd-unit-tests.c | 14 ++++++++++-
9 files changed, 60 insertions(+), 8 deletions(-)
--- a/tools/testing/selftests/mm/hugepage-mremap.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/hugepage-mremap.c
@@ -15,7 +15,7 @@
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
#include <fcntl.h> /* Definition of O_* constants */
--- a/tools/testing/selftests/mm/ksm_functional_tests.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/ksm_functional_tests.c
@@ -11,7 +11,7 @@
#include <string.h>
#include <stdbool.h>
#include <stdint.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
@@ -369,6 +369,7 @@ unmap:
munmap(map, size);
}
+#ifdef __NR_userfaultfd
static void test_unmerge_uffd_wp(void)
{
struct uffdio_writeprotect uffd_writeprotect;
@@ -429,6 +430,7 @@ close_uffd:
unmap:
munmap(map, size);
}
+#endif
/* Verify that KSM can be enabled / queried with prctl. */
static void test_prctl(void)
@@ -684,7 +686,9 @@ int main(int argc, char **argv)
exit(test_child_ksm());
}
+#ifdef __NR_userfaultfd
tests++;
+#endif
ksft_print_header();
ksft_set_plan(tests);
@@ -696,7 +700,9 @@ int main(int argc, char **argv)
test_unmerge();
test_unmerge_zero_pages();
test_unmerge_discarded();
+#ifdef __NR_userfaultfd
test_unmerge_uffd_wp();
+#endif
test_prot_none();
--- a/tools/testing/selftests/mm/memfd_secret.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/memfd_secret.c
@@ -17,7 +17,7 @@
#include <stdlib.h>
#include <string.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <fcntl.h>
@@ -28,6 +28,8 @@
#define pass(fmt, ...) ksft_test_result_pass(fmt, ##__VA_ARGS__)
#define skip(fmt, ...) ksft_test_result_skip(fmt, ##__VA_ARGS__)
+#ifdef __NR_memfd_secret
+
#define PATTERN 0x55
static const int prot = PROT_READ | PROT_WRITE;
@@ -332,3 +334,13 @@ int main(int argc, char *argv[])
ksft_finished();
}
+
+#else /* __NR_memfd_secret */
+
+int main(int argc, char *argv[])
+{
+ printf("skip: skipping memfd_secret test (missing __NR_memfd_secret)\n");
+ return KSFT_SKIP;
+}
+
+#endif /* __NR_memfd_secret */
--- a/tools/testing/selftests/mm/mkdirty.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/mkdirty.c
@@ -9,7 +9,7 @@
*/
#include <fcntl.h>
#include <signal.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
@@ -265,6 +265,7 @@ munmap:
munmap(mmap_mem, mmap_size);
}
+#ifdef __NR_userfaultfd
static void test_uffdio_copy(void)
{
struct uffdio_register uffdio_register;
@@ -322,6 +323,7 @@ munmap:
munmap(dst, pagesize);
free(src);
}
+#endif /* __NR_userfaultfd */
int main(void)
{
@@ -334,7 +336,9 @@ int main(void)
thpsize / 1024);
tests += 3;
}
+#ifdef __NR_userfaultfd
tests += 1;
+#endif /* __NR_userfaultfd */
ksft_print_header();
ksft_set_plan(tests);
@@ -364,7 +368,9 @@ int main(void)
if (thpsize)
test_pte_mapped_thp();
/* Placing a fresh page via userfaultfd may set the PTE dirty. */
+#ifdef __NR_userfaultfd
test_uffdio_copy();
+#endif /* __NR_userfaultfd */
err = ksft_get_fail_cnt();
if (err)
--- a/tools/testing/selftests/mm/mlock2.h~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/mlock2.h
@@ -3,7 +3,6 @@
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
-#include <asm-generic/unistd.h>
static int mlock2_(void *start, size_t len, int flags)
{
--- a/tools/testing/selftests/mm/protection_keys.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/protection_keys.c
@@ -42,7 +42,7 @@
#include <sys/wait.h>
#include <sys/stat.h>
#include <fcntl.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <sys/ptrace.h>
#include <setjmp.h>
--- a/tools/testing/selftests/mm/uffd-common.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/uffd-common.c
@@ -673,7 +673,11 @@ int uffd_open_dev(unsigned int flags)
int uffd_open_sys(unsigned int flags)
{
+#ifdef __NR_userfaultfd
return syscall(__NR_userfaultfd, flags);
+#else
+ return -1;
+#endif
}
int uffd_open(unsigned int flags)
--- a/tools/testing/selftests/mm/uffd-stress.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/uffd-stress.c
@@ -33,10 +33,11 @@
* pthread_mutex_lock will also verify the atomicity of the memory
* transfer (UFFDIO_COPY).
*/
-#include <asm-generic/unistd.h>
+
#include "uffd-common.h"
uint64_t features;
+#ifdef __NR_userfaultfd
#define BOUNCE_RANDOM (1<<0)
#define BOUNCE_RACINGFAULTS (1<<1)
@@ -471,3 +472,15 @@ int main(int argc, char **argv)
nr_pages, nr_pages_per_cpu);
return userfaultfd_stress();
}
+
+#else /* __NR_userfaultfd */
+
+#warning "missing __NR_userfaultfd definition"
+
+int main(void)
+{
+ printf("skip: Skipping userfaultfd test (missing __NR_userfaultfd)\n");
+ return KSFT_SKIP;
+}
+
+#endif /* __NR_userfaultfd */
--- a/tools/testing/selftests/mm/uffd-unit-tests.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -5,11 +5,12 @@
* Copyright (C) 2015-2023 Red Hat, Inc.
*/
-#include <asm-generic/unistd.h>
#include "uffd-common.h"
#include "../../../../mm/gup_test.h"
+#ifdef __NR_userfaultfd
+
/* The unit test doesn't need a large or random size, make it 32MB for now */
#define UFFD_TEST_MEM_SIZE (32UL << 20)
@@ -1558,3 +1559,14 @@ int main(int argc, char *argv[])
return ksft_get_fail_cnt() ? KSFT_FAIL : KSFT_PASS;
}
+#else /* __NR_userfaultfd */
+
+#warning "missing __NR_userfaultfd definition"
+
+int main(void)
+{
+ printf("Skipping %s (missing __NR_userfaultfd)\n", __file__);
+ return KSFT_SKIP;
+}
+
+#endif /* __NR_userfaultfd */
_
Patches currently in -mm which might be from jhubbard(a)nvidia.com are
revert-selftests-mm-remove-local-__nr_-definitions.patch
From: Ashish Kalra <ashish.kalra(a)amd.com>
This patch-set fixes the current SNP host enabling code and effectively SNP
which is broken with respect to the KVM module being built-in.
Essentially SNP host enabling code should be invoked before KVM
initialization, which is currently not the case when KVM is built-in.
SNP host support is currently enabled in snp_rmptable_init() which is
invoked as a device_initcall(). Here device_initcall() is used as
snp_rmptable_init() expects AMD IOMMU SNP support to be enabled prior
to it and the AMD IOMMU driver enables SNP support after PCI bus enumeration.
This patch-set adds support to call snp_rmptable_init() early and
directly from iommu_snp_enable() (after checking and enabling IOMMU
SNP support) which enables SNP host support before KVM initialization
with kvm_amd module built-in.
Additionally the patch-set adds support to initialize PSP SEV driver
during KVM module probe time.
This patch-set has been tested with the following cases/scenarios:
1). kvm_amd module and PSP driver built-in.
2). kvm_amd module built-in with intremap=off kernel command line.
3). kvm_amd module built-in with iommu=off kernel command line.
4). kvm_amd and PSP driver built as modules.
5). kvm_amd built as module with iommu=off kernel command line.
6). kvm_amd module as built-in and PSP driver as module.
7). kvm_amd build as a module and PSP driver as built-in.
v4:
- Add warning if SNP support has been checked on IOMMUs and host
SNP support has been enabled but late IOMMU initialization fails
subsequently.
- Add reviewed-by's.
v3:
- Ensure that dropping the device_initcall() happens in the same
patch that wires up the IOMMU code to invoke snp_rmptable_init()
which then makes sure that snp_rmptable_init() is still getting
called and also merge patches 3 & 4.
- Fix commit logs.
v2:
- Drop calling iommu_snp_enable() early before enabling IOMMUs as
IOMMU subsystem gets initialized via subsys_initcall() and hence
snp_rmptable_init() cannot be invoked via subsys_initcall().
- Instead add support to call snp_rmptable_init() early and
directly via iommu_snp_enable().
- Fix commit logs.
Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
Ashish Kalra (1):
x86/sev: Fix broken SNP support with KVM module built-in
Sean Christopherson (2):
crypto: ccp: Add external API interface for PSP module initialization
KVM: SVM: Ensure PSP module is initialized if KVM module is built-in
arch/x86/include/asm/sev.h | 2 ++
arch/x86/kvm/svm/sev.c | 10 ++++++++++
arch/x86/virt/svm/sev.c | 23 +++++++----------------
drivers/crypto/ccp/sp-dev.c | 14 ++++++++++++++
drivers/iommu/amd/init.c | 34 ++++++++++++++++++++++++++++++----
include/linux/psp-sev.h | 9 +++++++++
6 files changed, 72 insertions(+), 20 deletions(-)
--
2.34.1
From: Joshua Washington <joshwash(a)google.com>
Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by
default as part of driver initialization, and is never cleared. However,
this flag differs from others in that it is used as an indicator for
whether the driver is ready to perform the ndo_xdp_xmit operation as
part of an XDP_REDIRECT. Kernel helpers
xdp_features_(set|clear)_redirect_target exist to convey this meaning.
This patch ensures that the netdev is only reported as a redirect target
when XDP queues exist to forward traffic.
Fixes: 39a7f4aa3e4a ("gve: Add XDP REDIRECT support for GQI-QPL format")
Cc: stable(a)vger.kernel.org
Reviewed-by: Praveen Kaligineedi <pkaligineedi(a)google.com>
Reviewed-by: Jeroen de Borst <jeroendb(a)google.com>
Signed-off-by: Joshua Washington <joshwash(a)google.com>
---
drivers/net/ethernet/google/gve/gve.h | 10 ++++++++++
drivers/net/ethernet/google/gve/gve_main.c | 6 +++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 8167cc5fb0df..78d2a19593d1 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -1116,6 +1116,16 @@ static inline u32 gve_xdp_tx_start_queue_id(struct gve_priv *priv)
return gve_xdp_tx_queue_id(priv, 0);
}
+static inline bool gve_supports_xdp_xmit(struct gve_priv *priv)
+{
+ switch (priv->queue_format) {
+ case GVE_GQI_QPL_FORMAT:
+ return true;
+ default:
+ return false;
+ }
+}
+
/* gqi napi handler defined in gve_main.c */
int gve_napi_poll(struct napi_struct *napi, int budget);
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 533e659b15b3..92237fb0b60c 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1903,6 +1903,8 @@ static void gve_turndown(struct gve_priv *priv)
/* Stop tx queues */
netif_tx_disable(priv->dev);
+ xdp_features_clear_redirect_target(priv->dev);
+
gve_clear_napi_enabled(priv);
gve_clear_report_stats(priv);
@@ -1972,6 +1974,9 @@ static void gve_turnup(struct gve_priv *priv)
napi_schedule(&block->napi);
}
+ if (priv->num_xdp_queues && gve_supports_xdp_xmit(priv))
+ xdp_features_set_redirect_target(priv->dev, false);
+
gve_set_napi_enabled(priv);
}
@@ -2246,7 +2251,6 @@ static void gve_set_netdev_xdp_features(struct gve_priv *priv)
if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
xdp_features = NETDEV_XDP_ACT_BASIC;
xdp_features |= NETDEV_XDP_ACT_REDIRECT;
- xdp_features |= NETDEV_XDP_ACT_NDO_XMIT;
xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY;
} else {
xdp_features = 0;
--
2.48.1.601.g30ceb7b040-goog
Fix several issues in partition probing:
- The bailout for a bad partoffset must use put_dev_sector(), since the
preceding read_part_sector() succeeded.
- If the partition table claims a silly sector size like 0xfff bytes
(which results in partition table entries straddling sector boundaries),
bail out instead of accessing out-of-bounds memory.
- We must not assume that the partition table contains proper NUL
termination - use strnlen() and strncmp() instead of strlen() and
strcmp().
Cc: stable(a)vger.kernel.org
Signed-off-by: Jann Horn <jannh(a)google.com>
---
block/partitions/mac.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/block/partitions/mac.c b/block/partitions/mac.c
index c80183156d68020e0e14974308ac751b3df84421..b02530d986297058de0db929fbf638a76fc44508 100644
--- a/block/partitions/mac.c
+++ b/block/partitions/mac.c
@@ -53,13 +53,25 @@ int mac_partition(struct parsed_partitions *state)
}
secsize = be16_to_cpu(md->block_size);
put_dev_sector(sect);
+
+ /*
+ * If the "block size" is not a power of 2, things get weird - we might
+ * end up with a partition straddling a sector boundary, so we wouldn't
+ * be able to read a partition entry with read_part_sector().
+ * Real block sizes are probably (?) powers of two, so just require
+ * that.
+ */
+ if (!is_power_of_2(secsize))
+ return -1;
datasize = round_down(secsize, 512);
data = read_part_sector(state, datasize / 512, §);
if (!data)
return -1;
partoffset = secsize % 512;
- if (partoffset + sizeof(*part) > datasize)
+ if (partoffset + sizeof(*part) > datasize) {
+ put_dev_sector(sect);
return -1;
+ }
part = (struct mac_partition *) (data + partoffset);
if (be16_to_cpu(part->signature) != MAC_PARTITION_MAGIC) {
put_dev_sector(sect);
@@ -112,8 +124,8 @@ int mac_partition(struct parsed_partitions *state)
int i, l;
goodness++;
- l = strlen(part->name);
- if (strcmp(part->name, "/") == 0)
+ l = strnlen(part->name, sizeof(part->name));
+ if (strncmp(part->name, "/", sizeof(part->name)) == 0)
goodness++;
for (i = 0; i <= l - 4; ++i) {
if (strncasecmp(part->name + i, "root",
---
base-commit: ab68d7eb7b1a64f3f4710da46cc5f93c6c154942
change-id: 20250214-partition-mac-2e7114c62223
--
Jann Horn <jannh(a)google.com>
Hi!
Recently I've been pointed to this driver for an example on how consumers
can get a pointer to the supplier's driver data and I noticed a leak.
Callers of of_qcom_ice_get() leak the device reference taken by
of_find_device_by_node(). Introduce devm_of_qcom_ice_get().
Exporting qcom_ice_put() is not done intentionally as the consumers need
the ICE intance for the entire life of their device. Update the consumers
to use the devm variant and make of_qcom_ice_get() static afterwards.
This set touches mmc and scsi subsystems. Since the fix is trivial for
them, I'd suggest taking everything through the SoC tree with Acked-by
tags if people consider this fine. Note that the mmc and scsi patches
depend on the first patch that introduces devm_of_qcom_ice_get().
Thanks!
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
---
Changes in v2:
- add kernel doc for newly introduced devm_of_qcom_ice_get().
- update cover letter and commit message of first patch.
- collect R-b and A-b tags.
- Link to v1: https://lore.kernel.org/r/20250116-qcom-ice-fix-dev-leak-v1-0-84d937683790@…
---
Tudor Ambarus (4):
soc: qcom: ice: introduce devm_of_qcom_ice_get
mmc: sdhci-msm: fix dev reference leaked through of_qcom_ice_get
scsi: ufs: qcom: fix dev reference leaked through of_qcom_ice_get
soc: qcom: ice: make of_qcom_ice_get() static
drivers/mmc/host/sdhci-msm.c | 2 +-
drivers/soc/qcom/ice.c | 51 ++++++++++++++++++++++++++++++++++++++++++--
drivers/ufs/host/ufs-qcom.c | 2 +-
include/soc/qcom/ice.h | 3 ++-
4 files changed, 53 insertions(+), 5 deletions(-)
---
base-commit: b323d8e7bc03d27dec646bfdccb7d1a92411f189
change-id: 20250110-qcom-ice-fix-dev-leak-bbff59a964fb
Best regards,
--
Tudor Ambarus <tudor.ambarus(a)linaro.org>
From: Saranya R <quic_sarar(a)quicinc.com>
When some client process A call pdr_add_lookup() to add the look up for
the service and does schedule locator work, later a process B got a new
server packet indicating locator is up and call pdr_locator_new_server()
which eventually sets pdr->locator_init_complete to true which process A
sees and takes list lock and queries domain list but it will timeout due
to deadlock as the response will queued to the same qmi->wq and it is
ordered workqueue and process B is not able to complete new server
request work due to deadlock on list lock.
Fix it by removing the unnecessary list iteration as the list iteration
is already being done inside locator work, so avoid it here and just
call schedule_work() here.
Process A Process B
process_scheduled_works()
pdr_add_lookup() qmi_data_ready_work()
process_scheduled_works() pdr_locator_new_server()
pdr->locator_init_complete=true;
pdr_locator_work()
mutex_lock(&pdr->list_lock);
pdr_locate_service() mutex_lock(&pdr->list_lock);
pdr_get_domain_list()
pr_err("PDR: %s get domain list
txn wait failed: %d\n",
req->service_name,
ret);
Timeout error log due to deadlock:
"
PDR: tms/servreg get domain list txn wait failed: -110
PDR: service lookup for msm/adsp/sensor_pd:tms/servreg failed: -110
"
Thanks to Bjorn and Johan for letting me know that this commit also fixes
an audio regression when using the in-kernel pd-mapper as that makes it
easier to hit this race. [1]
Link: https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@hovoldconsulting.com/ # [1]
Fixes: fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers")
CC: stable(a)vger.kernel.org
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com>
Tested-by: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com>
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Saranya R <quic_sarar(a)quicinc.com>
Co-developed-by: Mukesh Ojha <mukesh.ojha(a)oss.qualcomm.com>
Signed-off-by: Mukesh Ojha <mukesh.ojha(a)oss.qualcomm.com>
---
Changes in v3:
- Corrected author and added Co-developed-by for myself.
- Added T-by and R-by tags.
- Modified commit message updated with the link of the issue
which also gets fixed by this commit.
Changes in v2:
- Added Fixes tag.
drivers/soc/qcom/pdr_interface.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c
index 328b6153b2be..71be378d2e43 100644
--- a/drivers/soc/qcom/pdr_interface.c
+++ b/drivers/soc/qcom/pdr_interface.c
@@ -75,7 +75,6 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
{
struct pdr_handle *pdr = container_of(qmi, struct pdr_handle,
locator_hdl);
- struct pdr_service *pds;
mutex_lock(&pdr->lock);
/* Create a local client port for QMI communication */
@@ -87,12 +86,7 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
mutex_unlock(&pdr->lock);
/* Service pending lookup requests */
- mutex_lock(&pdr->list_lock);
- list_for_each_entry(pds, &pdr->lookups, node) {
- if (pds->need_locator_lookup)
- schedule_work(&pdr->locator_work);
- }
- mutex_unlock(&pdr->list_lock);
+ schedule_work(&pdr->locator_work);
return 0;
}
--
2.34.1
Since the conversion to using the TZ allocator, the efivars service is
registered before the memory pool has been allocated, something which
can lead to a NULL-pointer dereference in case of a racing EFI variable
access.
Make sure that all resources have been set up before registering the
efivars.
Fixes: 6612103ec35a ("firmware: qcom: qseecom: convert to using the TZ allocator")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
Note that commit 40289e35ca52 ("firmware: qcom: scm: enable the TZ mem
allocator") looks equally broken as it allocates the tzmem pool only
after qcom_scm_is_available() returns true and other driver can start
making SCM calls.
That one appears to be a bit harder to fix as qcom_tzmem_enable()
currently depends on SCM being available, but someone should definitely
look into untangling that mess.
Johan
.../firmware/qcom/qcom_qseecom_uefisecapp.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/firmware/qcom/qcom_qseecom_uefisecapp.c b/drivers/firmware/qcom/qcom_qseecom_uefisecapp.c
index 447246bd04be..98a463e9774b 100644
--- a/drivers/firmware/qcom/qcom_qseecom_uefisecapp.c
+++ b/drivers/firmware/qcom/qcom_qseecom_uefisecapp.c
@@ -814,15 +814,6 @@ static int qcom_uefisecapp_probe(struct auxiliary_device *aux_dev,
qcuefi->client = container_of(aux_dev, struct qseecom_client, aux_dev);
- auxiliary_set_drvdata(aux_dev, qcuefi);
- status = qcuefi_set_reference(qcuefi);
- if (status)
- return status;
-
- status = efivars_register(&qcuefi->efivars, &qcom_efivar_ops);
- if (status)
- qcuefi_set_reference(NULL);
-
memset(&pool_config, 0, sizeof(pool_config));
pool_config.initial_size = SZ_4K;
pool_config.policy = QCOM_TZMEM_POLICY_MULTIPLIER;
@@ -833,6 +824,15 @@ static int qcom_uefisecapp_probe(struct auxiliary_device *aux_dev,
if (IS_ERR(qcuefi->mempool))
return PTR_ERR(qcuefi->mempool);
+ auxiliary_set_drvdata(aux_dev, qcuefi);
+ status = qcuefi_set_reference(qcuefi);
+ if (status)
+ return status;
+
+ status = efivars_register(&qcuefi->efivars, &qcom_efivar_ops);
+ if (status)
+ qcuefi_set_reference(NULL);
+
return status;
}
--
2.45.2
Changes in v10:
- Updated the commit log of patch #1 to make the reasoning - that it makes
applying the subsequent patch cleaner/nicer clear - Bjorn
- Substantially rewrites final patch commit to mostly reflect Bjorn's
summation of my long and rambling previous paragraphs.
Being a visual person, I've included some example pseudo-code which
hopefully makes the intent clearer plus some ASCII art >= Klimt.
- Link to v9: https://lore.kernel.org/r/20241230-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v9:
- Added patch to unwind pm subdomains in reverse order.
It would also be possible to squash this patch into patch#2 but,
my own preference is for more granular patches like this instead of
"slipping in" functional changes in larger patches like #2. - bod
- Unwinding pm subdomain on error in patch #2.
To facilitate this change patch #1 was created - Vlad
- Drops Bjorn's RB on patch #2. There is a small churn in this patch
but enough that a reviewer might reasonably expect RB to be given again.
- Amends commit log for patch #3 further.
v8 added a lot to the commit log to provide further information but, it
is clear from the comments I received on the commit log that the added
verbiage was occlusive not elucidative.
Reduce down the commit log of patch #3 - especially Q&A item #1.
Sometimes less is more.
- Link to v8: https://lore.kernel.org/r/20241211-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v8:
- Picks up change I agreed with Vlad but failed to cherry-pick into my b4
tree - Vlad/Bod
- Rewords the commit log for patch #3. As I read it I decided I might
translate bits of it from thought-stream into English - Bod
- Link to v7: https://lore.kernel.org/r/20241211-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v7:
- Expand commit log in patch #3
I've discussed with Bjorn on IRC and video what to put into the log here
and captured most of what we discussed.
Mostly the point here is voting for voltages in the power-domain list
is up to the drivers to do with performance states/opp-tables not for the
GDSC code. - Bjorn/Bryan
- Link to v6: https://lore.kernel.org/r/20241129-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v6:
- Passes NULL to second parameter of devm_pm_domain_attach_list - Vlad
- Link to v5: https://lore.kernel.org/r/20241128-b4-linux-next-24-11-18-clock-multiple-po…
Changes in v5:
- In-lines devm_pm_domain_attach_list() in probe() directly - Vlad
- Link to v4: https://lore.kernel.org/r/20241127-b4-linux-next-24-11-18-clock-multiple-po…
v4:
- Adds Bjorn's RB to first patch - Bjorn
- Drops the 'd' in "and int" - Bjorn
- Amends commit log of patch 3 to capture a number of open questions -
Bjorn
- Link to v3: https://lore.kernel.org/r/20241126-b4-linux-next-24-11-18-clock-multiple-po…
v3:
- Fixes commit log "per which" - Bryan
- Link to v2: https://lore.kernel.org/r/20241125-b4-linux-next-24-11-18-clock-multiple-po…
v2:
The main change in this version is Bjorn's pointing out that pm_runtime_*
inside of the gdsc_enable/gdsc_disable path would be recursive and cause a
lockdep splat. Dmitry alluded to this too.
Bjorn pointed to stuff being done lower in the gdsc_register() routine that
might be a starting point.
I iterated around that idea and came up with patch #3. When a gdsc has no
parent and the pd_list is non-NULL then attach that orphan GDSC to the
clock controller power-domain list.
Existing subdomain code in gdsc_register() will connect the parent GDSCs in
the clock-controller to the clock-controller subdomain, the new code here
does that same job for a list of power-domains the clock controller depends
on.
To Dmitry's point about MMCX and MCX dependencies for the registers inside
of the clock controller, I have switched off all references in a test dtsi
and confirmed that accessing the clock-controller regs themselves isn't
required.
On the second point I also verified my test branch with lockdep on which
was a concern with the pm_domain version of this solution but I wanted to
cover it anyway with the new approach for completeness sake.
Here's the item-by-item list of changes:
- Adds a patch to capture pm_genpd_add_subdomain() result code - Bryan
- Changes changelog of second patch to remove singleton and generally
to make the commit log easier to understand - Bjorn
- Uses demv_pm_domain_attach_list - Vlad
- Changes error check to if (ret < 0 && ret != -EEXIST) - Vlad
- Retains passing &pd_data instead of NULL - because NULL doesn't do
the same thing - Bryan/Vlad
- Retains standalone function qcom_cc_pds_attach() because the pd_data
enumeration looks neater in a standalone function - Bryan/Vlad
- Drops pm_runtime in favour of gdsc_add_subdomain_list() for each
power-domain in the pd_list.
The pd_list will be whatever is pointed to by power-domains = <>
in the dtsi - Bjorn
- Link to v1: https://lore.kernel.org/r/20241118-b4-linux-next-24-11-18-clock-multiple-po…
v1:
On x1e80100 and it's SKUs the Camera Clock Controller - CAMCC has
multiple power-domains which power it. Usually with a single power-domain
the core platform code will automatically switch on the singleton
power-domain for you. If you have multiple power-domains for a device, in
this case the clock controller, you need to switch those power-domains
on/off yourself.
The clock controllers can also contain Global Distributed
Switch Controllers - GDSCs which themselves can be referenced from dtsi
nodes ultimately triggering a gdsc_en() in drivers/clk/qcom/gdsc.c.
As an example:
cci0: cci@ac4a000 {
power-domains = <&camcc TITAN_TOP_GDSC>;
};
This series adds the support to attach a power-domain list to the
clock-controllers and the GDSCs those controllers provide so that in the
case of the above example gdsc_toggle_logic() will trigger the power-domain
list with pm_runtime_resume_and_get() and pm_runtime_put_sync()
respectively.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (4):
clk: qcom: gdsc: Release pm subdomains in reverse add order
clk: qcom: gdsc: Capture pm_genpd_add_subdomain result code
clk: qcom: common: Add support for power-domain attachment
clk: qcom: Support attaching GDSCs to multiple parents
drivers/clk/qcom/common.c | 6 ++++
drivers/clk/qcom/gdsc.c | 75 +++++++++++++++++++++++++++++++++++++++--------
drivers/clk/qcom/gdsc.h | 1 +
3 files changed, 69 insertions(+), 13 deletions(-)
---
base-commit: 0907e7fb35756464aa34c35d6abb02998418164b
change-id: 20241118-b4-linux-next-24-11-18-clock-multiple-power-domains-a5f994dc452a
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Hi all,
Here's a bunch of hand-ported bug fixes for 6.12 LTS. Most of the
patches fix a warning about dquot reclaim needing to read dquot buffers
in from disk by pinning buffers at transaction commit time instead of
during reclaim.
If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.
With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.
--D
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=ne…
---
Commits in this patchset:
* xfs: avoid nested calls to __xfs_trans_commit
* xfs: don't lose solo superblock counter update transactions
* xfs: don't lose solo dquot update transactions
* xfs: separate dquot buffer reads from xfs_dqflush
* xfs: clean up log item accesses in xfs_qm_dqflush{,_done}
* xfs: attach dquot buffer to dquot log item buffer
* xfs: convert quotacheck to attach dquot buffers
* xfs: don't over-report free space or inodes in statvfs
* xfs: release the dquot buf outside of qli_lock
* xfs: lock dquot buffer before detaching dquot from b_li_list
* xfs: fix mount hang during primary superblock recovery failure
---
fs/xfs/xfs_dquot.h | 6 +
fs/xfs/xfs_dquot_item.h | 7 +
fs/xfs/xfs_quota.h | 7 +
fs/xfs/xfs_buf_item_recover.c | 11 ++
fs/xfs/xfs_dquot.c | 199 +++++++++++++++++++++++++++++++++++------
fs/xfs/xfs_dquot_item.c | 51 ++++++++---
fs/xfs/xfs_qm.c | 48 ++++++++--
fs/xfs/xfs_qm_bhv.c | 27 ++++--
fs/xfs/xfs_trans.c | 39 ++++----
fs/xfs/xfs_trans_ail.c | 2
fs/xfs/xfs_trans_dquot.c | 31 +++++-
11 files changed, 338 insertions(+), 90 deletions(-)