As reported by Coverity, the logic at tpg_precalculate_line()
blindly rescales the buffer even when scaled_witdh is equal to
zero. If this ever happens, this will cause a division by zero.
Instead, add a WARN_ON_ONCE() to trigger such cases and return
without doing any precalculation.
Fixes: 63881df94d3e ("[media] vivid: add the Test Pattern Generator")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
index c86343a4d0bf..940bfbf275ce 100644
--- a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
+++ b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
@@ -1795,6 +1795,9 @@ static void tpg_precalculate_line(struct tpg_data *tpg)
unsigned p;
unsigned x;
+ if (WARN_ON_ONCE(!tpg->src_width || !tpg->scaled_width))
+ return;
+
switch (tpg->pattern) {
case TPG_PAT_GREEN:
contrast = TPG_COLOR_100_RED;
--
2.47.0
In loongson_sysconf, The "core" of cores_per_node and cores_per_package
stands for a logical core, which means in a SMT system it stands for a
thread indeed. This information is gotten from SMBIOS Type4 Structure,
so in order to get a correct cores_per_package for both SMT and non-SMT
systems in parse_cpu_table() we should use SMBIOS_THREAD_PACKAGE_OFFSET
instead of SMBIOS_CORE_PACKAGE_OFFSET.
Cc: stable(a)vger.kernel.org
Reported-by: Chao Li <lichao(a)loongson.cn>
Tested-by: Chao Li <lichao(a)loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
arch/loongarch/kernel/setup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index 00e307203ddb..cbd3c09a93c1 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -55,6 +55,7 @@
#define SMBIOS_FREQHIGH_OFFSET 0x17
#define SMBIOS_FREQLOW_MASK 0xFF
#define SMBIOS_CORE_PACKAGE_OFFSET 0x23
+#define SMBIOS_THREAD_PACKAGE_OFFSET 0x25
#define LOONGSON_EFI_ENABLE (1 << 3)
unsigned long fw_arg0, fw_arg1, fw_arg2;
@@ -125,7 +126,7 @@ static void __init parse_cpu_table(const struct dmi_header *dm)
cpu_clock_freq = freq_temp * 1000000;
loongson_sysconf.cpuname = (void *)dmi_string_parse(dm, dmi_data[16]);
- loongson_sysconf.cores_per_package = *(dmi_data + SMBIOS_CORE_PACKAGE_OFFSET);
+ loongson_sysconf.cores_per_package = *(dmi_data + SMBIOS_THREAD_PACKAGE_OFFSET);
pr_info("CpuClock = %llu\n", cpu_clock_freq);
}
--
2.43.5
diff --git a/arch/loongarch/include/asm/bootinfo.h b/arch/loongarch/include/asm/bootinfo.h
index 6d5846dd075c..7657e016233f 100644
--- a/arch/loongarch/include/asm/bootinfo.h
+++ b/arch/loongarch/include/asm/bootinfo.h
@@ -26,6 +26,10 @@ struct loongson_board_info {
#define NR_WORDS DIV_ROUND_UP(NR_CPUS, BITS_PER_LONG)
+/*
+ * The "core" of cores_per_node and cores_per_package stands for a
+ * logical core, which means in a SMT system it stands for a thread.
+ */
struct loongson_system_configuration {
int nr_cpus;
int nr_nodes;
Hi, Conor
> On Thu, Oct 17, 2024 at 05:49:56AM +0000, Changhuang Liang wrote:
> > Hi, Conor,
> >
> > > Hi, Conor
> > >
> > > Thanks for your patch.
> > >
> > > > From: Conor Dooley <conor.dooley(a)microchip.com>
> > > >
> > > > Aurelien reported probe failures due to the csi node being enabled
> > > > without having a camera attached to it. A camera was in the
> > > > initial submissions, but was removed from the dts, as it had not
> > > > actually been present on the board, but was from an addon board
> > > > used by the developer
> > > of the relevant drivers.
> > > > The non-camera pipeline nodes were not disabled when this happened
> > > > and the probe failures are problematic for Debian. Disable them.
> > > >
> > > > CC: stable(a)vger.kernel.org
> > > > Fixes: 28ecaaa5af192 ("riscv: dts: starfive: jh7110: Add camera
> > > > subsystem
> > > > nodes")
> > >
> > > Here you write it in 13 characters, should be "Fixes: 28ecaaa5af19 ..."
> > >
> >
> > After fixing this:
> > Reviewed-by: Changhuang Liang <changhuang.liang(a)starfivetech.com>
>
> Ye, I know it was 13 not 12. I don't think that's a problem though.
Okay, that's fine.
Best Regards,
Changhuang
The patch titled
Subject: resource,kexec: walk_system_ram_res_rev must retain resource flags
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
resourcekexec-walk_system_ram_res_rev-must-retain-resource-flags.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gregory Price <gourry(a)gourry.net>
Subject: resource,kexec: walk_system_ram_res_rev must retain resource flags
Date: Thu, 17 Oct 2024 15:03:47 -0400
walk_system_ram_res_rev() erroneously discards resource flags when passing
the information to the callback.
This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to have
these resources selected during kexec to store kexec buffers if that
memory happens to be at placed above normal system ram.
This leads to undefined behavior after reboot. If the kexec buffer is
never touched, nothing happens. If the kexec buffer is touched, it could
lead to a crash (like below) or undefined behavior.
Tested on a system with CXL memory expanders with driver managed memory,
TPM enabled, and CONFIG_IMA_KEXEC=y. Adding printk's showed the flags
were being discarded and as a result the check for
IORESOURCE_SYSRAM_DRIVER_MANAGED passes.
find_next_iomem_res: name(System RAM (kmem))
start(10000000000)
end(1034fffffff)
flags(83000200)
locate_mem_hole_top_down: start(10000000000) end(1034fffffff) flags(0)
[.] BUG: unable to handle page fault for address: ffff89834ffff000
[.] #PF: supervisor read access in kernel mode
[.] #PF: error_code(0x0000) - not-present page
[.] PGD c04c8bf067 P4D c04c8bf067 PUD c04c8be067 PMD 0
[.] Oops: 0000 [#1] SMP
[.] RIP: 0010:ima_restore_measurement_list+0x95/0x4b0
[.] RSP: 0018:ffffc900000d3a80 EFLAGS: 00010286
[.] RAX: 0000000000001000 RBX: 0000000000000000 RCX: ffff89834ffff000
[.] RDX: 0000000000000018 RSI: ffff89834ffff000 RDI: ffff89834ffff018
[.] RBP: ffffc900000d3ba0 R08: 0000000000000020 R09: ffff888132b8a900
[.] R10: 4000000000000000 R11: 000000003a616d69 R12: 0000000000000000
[.] R13: ffffffff8404ac28 R14: 0000000000000000 R15: ffff89834ffff000
[.] FS: 0000000000000000(0000) GS:ffff893d44640000(0000) knlGS:0000000000000000
[.] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[.] ata5: SATA link down (SStatus 0 SControl 300)
[.] CR2: ffff89834ffff000 CR3: 000001034d00f001 CR4: 0000000000770ef0
[.] PKRU: 55555554
[.] Call Trace:
[.] <TASK>
[.] ? __die+0x78/0xc0
[.] ? page_fault_oops+0x2a8/0x3a0
[.] ? exc_page_fault+0x84/0x130
[.] ? asm_exc_page_fault+0x22/0x30
[.] ? ima_restore_measurement_list+0x95/0x4b0
[.] ? template_desc_init_fields+0x317/0x410
[.] ? crypto_alloc_tfm_node+0x9c/0xc0
[.] ? init_ima_lsm+0x30/0x30
[.] ima_load_kexec_buffer+0x72/0xa0
[.] ima_init+0x44/0xa0
[.] __initstub__kmod_ima__373_1201_init_ima7+0x1e/0xb0
[.] ? init_ima_lsm+0x30/0x30
[.] do_one_initcall+0xad/0x200
[.] ? idr_alloc_cyclic+0xaa/0x110
[.] ? new_slab+0x12c/0x420
[.] ? new_slab+0x12c/0x420
[.] ? number+0x12a/0x430
[.] ? sysvec_apic_timer_interrupt+0xa/0x80
[.] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[.] ? parse_args+0xd4/0x380
[.] ? parse_args+0x14b/0x380
[.] kernel_init_freeable+0x1c1/0x2b0
[.] ? rest_init+0xb0/0xb0
[.] kernel_init+0x16/0x1a0
[.] ret_from_fork+0x2f/0x40
[.] ? rest_init+0xb0/0xb0
[.] ret_from_fork_asm+0x11/0x20
[.] </TASK>
Link: https://lore.kernel.org/all/20231114091658.228030-1-bhe@redhat.com/
Link: https://lkml.kernel.org/r/20241017190347.5578-1-gourry@gourry.net
Fixes: 7acf164b259d ("resource: add walk_system_ram_res_rev()")
Signed-off-by: Gregory Price <gourry(a)gourry.net>
Cc: AKASHI Takahiro <takahiro.akashi(a)linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: Baoquan he <bhe(a)redhat.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Ilpo J��rvinen <ilpo.jarvinen(a)linux.intel.com>
Cc: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/resource.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
--- a/kernel/resource.c~resourcekexec-walk_system_ram_res_rev-must-retain-resource-flags
+++ a/kernel/resource.c
@@ -459,9 +459,7 @@ int walk_system_ram_res_rev(u64 start, u
rams_size += 16;
}
- rams[i].start = res.start;
- rams[i++].end = res.end;
-
+ rams[i++] = res;
start = res.end + 1;
}
_
Patches currently in -mm which might be from gourry(a)gourry.net are
resourcekexec-walk_system_ram_res_rev-must-retain-resource-flags.patch
The patch titled
Subject: nilfs2: fix kernel bug due to missing clearing of checked flag
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-checked-flag.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix kernel bug due to missing clearing of checked flag
Date: Fri, 18 Oct 2024 04:33:10 +0900
Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.
This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded. So, fix that.
This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.
Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/page.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/nilfs2/page.c~nilfs2-fix-kernel-bug-due-to-missing-clearing-of-checked-flag
+++ a/fs/nilfs2/page.c
@@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct foli
folio_clear_uptodate(folio);
folio_clear_mappedtodisk(folio);
+ folio_clear_checked(folio);
head = folio_buffers(folio);
if (head) {
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-buffer-delay-flag.patch
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-checked-flag.patch
If some remap_pfn_range() calls succeeded before one failed, we still have
buffer pages mapped into the userspace page tables when we drop the buffer
reference with comedi_buf_map_put(bm). The userspace mappings are only
cleaned up later in the mmap error path.
Fix it by explicitly flushing all mappings in our VMA on the error path.
See commit 79a61cc3fc04 ("mm: avoid leaving partial pfn mappings around in
error case").
Cc: stable(a)vger.kernel.org
Fixes: ed9eccbe8970 ("Staging: add comedi core")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
Note: compile-tested only; I don't actually have comedi hardware, and I
don't know anything about comedi.
---
Changes in v2:
- only do the zapping in the pfnmap path (Ian Abbott)
- use zap_vma_ptes() instead of zap_page_range_single() (Ian Abbott)
- Link to v1: https://lore.kernel.org/r/20241014-comedi-tlb-v1-1-4b699144b438@google.com
---
drivers/comedi/comedi_fops.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/comedi/comedi_fops.c b/drivers/comedi/comedi_fops.c
index 1b481731df96..68e5301e6281 100644
--- a/drivers/comedi/comedi_fops.c
+++ b/drivers/comedi/comedi_fops.c
@@ -2407,6 +2407,16 @@ static int comedi_mmap(struct file *file, struct vm_area_struct *vma)
start += PAGE_SIZE;
}
+
+ /*
+ * Leaving behind a partial mapping of a buffer we're about to
+ * drop is unsafe, see remap_pfn_range_notrack().
+ * We need to zap the range here ourselves instead of relying
+ * on the automatic zapping in remap_pfn_range() because we call
+ * remap_pfn_range() in a loop.
+ */
+ if (retval)
+ zap_vma_ptes(vma, vma->vm_start, size);
}
if (retval == 0) {
---
base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
change-id: 20241014-comedi-tlb-400246505961
--
Jann Horn <jannh(a)google.com>
From: Mikulas Patocka <mpatocka(a)redhat.com>
commit 0a9bab391e336489169b95cb0d4553d921302189 upstream.
Tasklets have an inherent problem with memory corruption. The function
tasklet_action_common calls tasklet_trylock, then it calls the tasklet
callback and then it calls tasklet_unlock. If the tasklet callback frees
the structure that contains the tasklet or if it calls some code that may
free it, tasklet_unlock will write into free memory.
The commits 8e14f610159d and d9a02e016aaf try to fix it for dm-crypt, but
it is not a sufficient fix and the data corruption can still happen [1].
There is no fix for dm-verity and dm-verity will write into free memory
with every tasklet-processed bio.
There will be atomic workqueues implemented in the kernel 6.9 [2]. They
will have better interface and they will not suffer from the memory
corruption problem.
But we need something that stops the memory corruption now and that can be
backported to the stable kernels. So, I'm proposing this commit that
disables tasklets in both dm-crypt and dm-verity. This commit doesn't
remove the tasklet support, because the tasklet code will be reused when
atomic workqueues will be implemented.
[1] https://lore.kernel.org/all/d390d7ee-f142-44d3-822a-87949e14608b@suse.de/T/
[2] https://lore.kernel.org/lkml/20240130091300.2968534-1-tj@kernel.org/
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 39d42fa96ba1b ("dm crypt: add flags to optionally bypass kcryptd workqueues")
Fixes: 5721d4e5a9cdb ("dm verity: Add optional "try_verify_in_tasklet" feature")
Signed-off-by: Mike Snitzer <snitzer(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
(cherry picked from commit 30884a44e0cedc3dfda8c22432f3ba4078ec2d94)
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi(a)oracle.com>
---
drivers/md/dm-crypt.c | 37 ++-----------------------------------
1 file changed, 2 insertions(+), 35 deletions(-)
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 9889035c343e3..95b3b69a5e3c4 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -69,10 +69,8 @@ struct dm_crypt_io {
struct bio *base_bio;
u8 *integrity_metadata;
bool integrity_metadata_from_pool:1;
- bool in_tasklet:1;
struct work_struct work;
- struct tasklet_struct tasklet;
struct convert_context ctx;
@@ -1769,7 +1767,6 @@ static void crypt_io_init(struct dm_crypt_io *io, struct crypt_config *cc,
io->ctx.r.req = NULL;
io->integrity_metadata = NULL;
io->integrity_metadata_from_pool = false;
- io->in_tasklet = false;
atomic_set(&io->io_pending, 0);
}
@@ -1778,12 +1775,6 @@ static void crypt_inc_pending(struct dm_crypt_io *io)
atomic_inc(&io->io_pending);
}
-static void kcryptd_io_bio_endio(struct work_struct *work)
-{
- struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
- bio_endio(io->base_bio);
-}
-
/*
* One of the bios was finished. Check for completion of
* the whole request and correctly clean up the buffer.
@@ -1807,20 +1798,6 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
base_bio->bi_status = error;
- /*
- * If we are running this function from our tasklet,
- * we can't call bio_endio() here, because it will call
- * clone_endio() from dm.c, which in turn will
- * free the current struct dm_crypt_io structure with
- * our tasklet. In this case we need to delay bio_endio()
- * execution to after the tasklet is done and dequeued.
- */
- if (io->in_tasklet) {
- INIT_WORK(&io->work, kcryptd_io_bio_endio);
- queue_work(cc->io_queue, &io->work);
- return;
- }
-
bio_endio(base_bio);
}
@@ -2264,11 +2241,6 @@ static void kcryptd_crypt(struct work_struct *work)
kcryptd_crypt_write_convert(io);
}
-static void kcryptd_crypt_tasklet(unsigned long work)
-{
- kcryptd_crypt((struct work_struct *)work);
-}
-
static void kcryptd_queue_crypt(struct dm_crypt_io *io)
{
struct crypt_config *cc = io->cc;
@@ -2280,15 +2252,10 @@ static void kcryptd_queue_crypt(struct dm_crypt_io *io)
* irqs_disabled(): the kernel may run some IO completion from the idle thread, but
* it is being executed with irqs disabled.
*/
- if (in_hardirq() || irqs_disabled()) {
- io->in_tasklet = true;
- tasklet_init(&io->tasklet, kcryptd_crypt_tasklet, (unsigned long)&io->work);
- tasklet_schedule(&io->tasklet);
+ if (!(in_hardirq() || irqs_disabled())) {
+ kcryptd_crypt(&io->work);
return;
}
-
- kcryptd_crypt(&io->work);
- return;
}
INIT_WORK(&io->work, kcryptd_crypt);
--
2.46.0
If there is an error during some initialization related to firmware,
the buffers dp->tx_ring[i].tx_status are released.
However this is released again when the device is unbinded (ath12k_pci),
and we get:
WARNING: CPU: 0 PID: 2098 at mm/slub.c:4689 free_large_kmalloc+0x4d/0x80
Call Trace:
free_large_kmalloc
ath12k_dp_free
ath12k_core_deinit
ath12k_pci_remove
...
The issue is always reproducible from a VM because the MSI addressing
initialization is failing.
In order to fix the issue, just set the buffers to NULL after releasing in
order to avoid the double free.
cc: stable(a)vger.kernel.org
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com>
---
v4:
- Send with cover letter to get reference with 1/2
v3: https://lore.kernel.org/linux-wireless/20241017074854.176765-1-jtornosm@red…
v2: https://lore.kernel.org/linux-wireless/20241016123722.206899-1-jtornosm@red…
v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-3-jtornosm@red…
drivers/net/wireless/ath/ath12k/dp.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c
index 789d430e4455..15061782a2df 100644
--- a/drivers/net/wireless/ath/ath12k/dp.c
+++ b/drivers/net/wireless/ath/ath12k/dp.c
@@ -1277,8 +1277,10 @@ void ath12k_dp_free(struct ath12k_base *ab)
ath12k_dp_rx_reo_cmd_list_cleanup(ab);
- for (i = 0; i < ab->hw_params->max_tx_ring; i++)
+ for (i = 0; i < ab->hw_params->max_tx_ring; i++) {
kfree(dp->tx_ring[i].tx_status);
+ dp->tx_ring[i].tx_status = NULL;
+ }
ath12k_dp_rx_free(ab);
/* Deinit any SOC level resource */
--
2.47.0
If there is an error during some initialization related to firmware,
the function ath12k_dp_cc_cleanup is called to release resources.
However this is released again when the device is unbinded (ath12k_pci),
and we get:
BUG: kernel NULL pointer dereference, address: 0000000000000020
at RIP: 0010:ath12k_dp_cc_cleanup.part.0+0xb6/0x500 [ath12k]
Call Trace:
ath12k_dp_cc_cleanup
ath12k_dp_free
ath12k_core_deinit
ath12k_pci_remove
...
The issue is always reproducible from a VM because the MSI addressing
initialization is failing.
In order to fix the issue, just set to NULL the released structure in
ath12k_dp_cc_cleanup at the end.
cc: stable(a)vger.kernel.org
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com>
---
v4:
- Send with cover letter to get reference with 2/2
v3: https://lore.kernel.org/linux-wireless/20241017074654.176678-1-jtornosm@red…
v2: https://lore.kernel.org/linux-wireless/20241016123452.206671-1-jtornosm@red…
v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-2-jtornosm@red…
drivers/net/wireless/ath/ath12k/dp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c
index 61aa78d8bd8c..789d430e4455 100644
--- a/drivers/net/wireless/ath/ath12k/dp.c
+++ b/drivers/net/wireless/ath/ath12k/dp.c
@@ -1241,6 +1241,7 @@ static void ath12k_dp_cc_cleanup(struct ath12k_base *ab)
}
kfree(dp->spt_info);
+ dp->spt_info = NULL;
}
static void ath12k_dp_reoq_lut_cleanup(struct ath12k_base *ab)
--
2.46.2
Hi, Conor,
> Hi, Conor
>
> Thanks for your patch.
>
> > From: Conor Dooley <conor.dooley(a)microchip.com>
> >
> > Aurelien reported probe failures due to the csi node being enabled
> > without having a camera attached to it. A camera was in the initial
> > submissions, but was removed from the dts, as it had not actually been
> > present on the board, but was from an addon board used by the developer
> of the relevant drivers.
> > The non-camera pipeline nodes were not disabled when this happened and
> > the probe failures are problematic for Debian. Disable them.
> >
> > CC: stable(a)vger.kernel.org
> > Fixes: 28ecaaa5af192 ("riscv: dts: starfive: jh7110: Add camera
> > subsystem
> > nodes")
>
> Here you write it in 13 characters, should be "Fixes: 28ecaaa5af19 ..."
>
After fixing this:
Reviewed-by: Changhuang Liang <changhuang.liang(a)starfivetech.com>
It seems it is necessary to set WILC MAC address after operation mode,
otherwise the MAC address of the WILC MAC is reset back to what is in
nvmem. This causes a failure to associate with AP after the WILC MAC
address was overridden by userspace.
Test case:
"
ap$ cat << EOF > hostap.conf
interface=wlan0
ssid=ssid
hw_mode=g
channel=6
wpa=2
wpa_passphrase=pass
wpa_key_mgmt=WPA-PSK
EOF
ap$ hostapd -d hostap.conf
ap$ ifconfig wlan0 10.0.0.1
"
"
sta$ ifconfig wlan0 hw ether 00:11:22:33:44:55
sta$ wpa_supplicant -i wlan0 -c <(wpa_passphrase ssid pass)
sta$ ifconfig wlan0 10.0.0.2
sta$ ping 10.0.0.1 # fails without this patch
"
AP still indicates SA with original MAC address from nvmem without this patch:
"
nl80211: RX frame da=ff:ff:ff:ff:ff:ff sa=60:01:23:45:67:89 bssid=ff:ff:ff:ff:ff:ff ...
^^^^^^^^^^^^^^^^^
"
Fixes: 83d9b54ee5d4 ("wifi: wilc1000: read MAC address from fuse at probe")
Tested-by: Alexis Lothoré <alexis.lothore(a)bootlin.com>
Signed-off-by: Marek Vasut <marex(a)denx.de>
---
Cc: Ajay Singh <ajay.kathat(a)microchip.com>
Cc: Alexis Lothoré <alexis.lothore(a)bootlin.com>
Cc: Claudiu Beznea <claudiu.beznea(a)tuxon.dev>
Cc: Kalle Valo <kvalo(a)kernel.org>
Cc: Marek Vasut <marex(a)denx.de>
Cc: linux-wireless(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
V2: Add TB, Fixes, CC Stable
---
drivers/net/wireless/microchip/wilc1000/netdev.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c b/drivers/net/wireless/microchip/wilc1000/netdev.c
index 9ecf3fb29b558..8bc127c5a538c 100644
--- a/drivers/net/wireless/microchip/wilc1000/netdev.c
+++ b/drivers/net/wireless/microchip/wilc1000/netdev.c
@@ -608,6 +608,9 @@ static int wilc_mac_open(struct net_device *ndev)
return ret;
}
+ wilc_set_operation_mode(vif, wilc_get_vif_idx(vif), vif->iftype,
+ vif->idx);
+
netdev_dbg(ndev, "Mac address: %pM\n", ndev->dev_addr);
ret = wilc_set_mac_address(vif, ndev->dev_addr);
if (ret) {
@@ -618,9 +621,6 @@ static int wilc_mac_open(struct net_device *ndev)
return ret;
}
- wilc_set_operation_mode(vif, wilc_get_vif_idx(vif), vif->iftype,
- vif->idx);
-
mgmt_regs.interface_stypes = vif->mgmt_reg_stypes;
/* so we detect a change */
vif->mgmt_reg_stypes = 0;
--
2.45.2
From: Conor Dooley <conor.dooley(a)microchip.com>
Aurelien reported probe failures due to the csi node being enabled
without having a camera attached to it. A camera was in the initial
submissions, but was removed from the dts, as it had not actually been
present on the board, but was from an addon board used by the
developer of the relevant drivers. The non-camera pipeline nodes were
not disabled when this happened and the probe failures are problematic
for Debian. Disable them.
CC: stable(a)vger.kernel.org
Fixes: 28ecaaa5af192 ("riscv: dts: starfive: jh7110: Add camera subsystem nodes")
Closes: https://lore.kernel.org/all/Zw1-vcN4CoVkfLjU@aurel32.net/
Reported-by: Aurelien Jarno <aurelien(a)aurel32.net>
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
---
CC: Emil Renner Berthing <kernel(a)esmil.dk>
CC: Rob Herring <robh(a)kernel.org>
CC: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
CC: Conor Dooley <conor+dt(a)kernel.org>
CC: Changhuang Liang <changhuang.liang(a)starfivetech.com>
CC: devicetree(a)vger.kernel.org
CC: linux-riscv(a)lists.infradead.org
CC: linux-kernel(a)vger.kernel.org
---
arch/riscv/boot/dts/starfive/jh7110-common.dtsi | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
index c7771b3b64758..d6c55f1cc96a9 100644
--- a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
+++ b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
@@ -128,7 +128,6 @@ &camss {
assigned-clocks = <&ispcrg JH7110_ISPCLK_DOM4_APB_FUNC>,
<&ispcrg JH7110_ISPCLK_MIPI_RX0_PXL>;
assigned-clock-rates = <49500000>, <198000000>;
- status = "okay";
ports {
#address-cells = <1>;
@@ -151,7 +150,6 @@ camss_from_csi2rx: endpoint {
&csi2rx {
assigned-clocks = <&ispcrg JH7110_ISPCLK_VIN_SYS>;
assigned-clock-rates = <297000000>;
- status = "okay";
ports {
#address-cells = <1>;
--
2.45.2
The IMAGE_DLLCHARACTERISTICS_NX_COMPAT informs the firmware that the
EFI binary does not rely on pages that are both executable and
writable.
The flag is used by some distro versions of GRUB to decide if the EFI
binary may be executed.
As the Linux kernel neither has RWX sections nor needs RWX pages for
relocation we should set the flag.
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt(a)canonical.com>
---
arch/riscv/kernel/efi-header.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/riscv/kernel/efi-header.S b/arch/riscv/kernel/efi-header.S
index 515b2dfbca75..c5f17c2710b5 100644
--- a/arch/riscv/kernel/efi-header.S
+++ b/arch/riscv/kernel/efi-header.S
@@ -64,7 +64,7 @@ extra_header_fields:
.long efi_header_end - _start // SizeOfHeaders
.long 0 // CheckSum
.short IMAGE_SUBSYSTEM_EFI_APPLICATION // Subsystem
- .short 0 // DllCharacteristics
+ .short IMAGE_DLL_CHARACTERISTICS_NX_COMPAT // DllCharacteristics
.quad 0 // SizeOfStackReserve
.quad 0 // SizeOfStackCommit
.quad 0 // SizeOfHeapReserve
--
2.45.2
The existing code moves VF to the same namespace as the synthetic NIC
during netvsc_register_vf(). But, if the synthetic device is moved to a
new namespace after the VF registration, the VF won't be moved together.
To make the behavior more consistent, add a namespace check for synthetic
NIC's NETDEV_REGISTER event (generated during its move), and move the VF
if it is not in the same namespace.
Cc: stable(a)vger.kernel.org
Fixes: c0a41b887ce6 ("hv_netvsc: move VF to same namespace as netvsc device")
Suggested-by: Stephen Hemminger <stephen(a)networkplumber.org>
Signed-off-by: Haiyang Zhang <haiyangz(a)microsoft.com>
---
v2: Move my fix to synthetic NIC's NETDEV_REGISTER event as suggested by Stephen.
---
drivers/net/hyperv/netvsc_drv.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 153b97f8ec0d..54e98356ee93 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2798,6 +2798,30 @@ static struct hv_driver netvsc_drv = {
},
};
+/* Set VF's namespace same as the synthetic NIC */
+static void netvsc_event_set_vf_ns(struct net_device *ndev)
+{
+ struct net_device_context *ndev_ctx = netdev_priv(ndev);
+ struct net_device *vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
+ int ret;
+
+ if (!vf_netdev)
+ return;
+
+ if (!net_eq(dev_net(ndev), dev_net(vf_netdev))) {
+ ret = dev_change_net_namespace(vf_netdev, dev_net(ndev),
+ "eth%d");
+ if (ret)
+ netdev_err(vf_netdev,
+ "Cannot move to same namespace as %s: %d\n",
+ ndev->name, ret);
+ else
+ netdev_info(vf_netdev,
+ "Moved VF to namespace with: %s\n",
+ ndev->name);
+ }
+}
+
/*
* On Hyper-V, every VF interface is matched with a corresponding
* synthetic interface. The synthetic interface is presented first
@@ -2810,6 +2834,11 @@ static int netvsc_netdev_event(struct notifier_block *this,
struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
int ret = 0;
+ if (event_dev->netdev_ops == &device_ops && event == NETDEV_REGISTER) {
+ netvsc_event_set_vf_ns(event_dev);
+ return NOTIFY_DONE;
+ }
+
ret = check_dev_is_matching_vf(event_dev);
if (ret != 0)
return NOTIFY_DONE;
--
2.34.1
The quilt patch titled
Subject: maple_tree: correct tree corruption on spanning store
has been removed from the -mm tree. Its filename was
maple_tree-correct-tree-corruption-on-spanning-store.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: maple_tree: correct tree corruption on spanning store
Date: Mon, 7 Oct 2024 16:28:32 +0100
Patch series "maple_tree: correct tree corruption on spanning store", v3.
There has been a nasty yet subtle maple tree corruption bug that appears
to have been in existence since the inception of the algorithm.
This bug seems far more likely to happen since commit f8d112a4e657
("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point
at which reports started to be submitted concerning this bug.
We were made definitely aware of the bug thanks to the kind efforts of
Bert Karwatzki who helped enormously in my being able to track this down
and identify the cause of it.
The bug arises when an attempt is made to perform a spanning store across
two leaf nodes, where the right leaf node is the rightmost child of the
shared parent, AND the store completely consumes the right-mode node.
This results in mas_wr_spanning_store() mitakenly duplicating the new and
existing entries at the maximum pivot within the range, and thus maple
tree corruption.
The fix patch corrects this by detecting this scenario and disallowing the
mistaken duplicate copy.
The fix patch commit message goes into great detail as to how this occurs.
This series also includes a test which reliably reproduces the issue, and
asserts that the fix works correctly.
Bert has kindly tested the fix and confirmed it resolved his issues. Also
Mikhail Gavrilov kindly reported what appears to be precisely the same
bug, which this fix should also resolve.
This patch (of 2):
There has been a subtle bug present in the maple tree implementation from
its inception.
This arises from how stores are performed - when a store occurs, it will
overwrite overlapping ranges and adjust the tree as necessary to
accommodate this.
A range may always ultimately span two leaf nodes. In this instance we
walk the two leaf nodes, determine which elements are not overwritten to
the left and to the right of the start and end of the ranges respectively
and then rebalance the tree to contain these entries and the newly
inserted one.
This kind of store is dubbed a 'spanning store' and is implemented by
mas_wr_spanning_store().
In order to reach this stage, mas_store_gfp() invokes
mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to
walk the tree and update the object (mas) to traverse to the location
where the write should be performed, determining its store type.
When a spanning store is required, this function returns false stopping at
the parent node which contains the target range, and mas_wr_store_type()
marks the mas->store_type as wr_spanning_store to denote this fact.
When we go to perform the store in mas_wr_spanning_store(), we first
determine the elements AFTER the END of the range we wish to store (that
is, to the right of the entry to be inserted) - we do this by walking to
the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we
have just determined contains the range over which we intend to write.
We then turn our attention to the entries to the left of the entry we are
inserting, whose state is represented by l_mas, and copy these into a 'big
node', which is a special node which contains enough slots to contain two
leaf node's worth of data.
We then copy the entry we wish to store immediately after this - the copy
and the insertion of the new entry is performed by mas_store_b_node().
After this we copy the elements to the right of the end of the range which
we are inserting, if we have not exceeded the length of the node (i.e.
r_mas.offset <= r_mas.end).
Herein lies the bug - under very specific circumstances, this logic can
break and corrupt the maple tree.
Consider the following tree:
Height
0 Root Node
/ \
pivot = 0xffff / \ pivot = ULONG_MAX
/ \
1 A [-----] ...
/ \
pivot = 0x4fff / \ pivot = 0xffff
/ \
2 (LEAVES) B [-----] [-----] C
^--- Last pivot 0xffff.
Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
that all ranges expressed in maple tree code are inclusive):
1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
determines that this is a spanning store across nodes B and C. The mas
state is set such that the current node from which we traverse further
is node A.
2. In mas_wr_spanning_store() we try to find elements to the right of pivot
0xffff by searching for an index of 0x10000:
- mas_wr_walk_index() invokes mas_wr_walk_descend() and
mas_wr_node_walk() in turn.
- mas_wr_node_walk() loops over entries in node A until EITHER it
finds an entry whose pivot equals or exceeds 0x10000 OR it
reaches the final entry.
- Since no entry has a pivot equal to or exceeding 0x10000, pivot
0xffff is selected, leading to node C.
- mas_wr_walk_traverse() resets the mas state to traverse node C. We
loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
in turn once again.
- Again, we reach the last entry in node C, which has a pivot of
0xffff.
3. We then copy the elements to the left of 0x4000 in node B to the big
node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
too.
4. We determine whether we have any entries to copy from the right of the
end of the range via - and with r_mas set up at the entry at pivot
0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
pivot 0xffff.
5. BUG! The maple tree is corrupted with a duplicate entry.
This requires a very specific set of circumstances - we must be spanning
the last element in a leaf node, which is the last element in the parent
node.
spanning store across two leaf nodes with a range that ends at that shared
pivot.
A potential solution to this problem would simply be to reset the walk
each time we traverse r_mas, however given the rarity of this situation it
seems that would be rather inefficient.
Instead, this patch detects if the right hand node is populated, i.e. has
anything we need to copy.
We do so by only copying elements from the right of the entry being
inserted when the maximum value present exceeds the last, rather than
basing this on offset position.
The patch also updates some comments and eliminates the unused bool return
value in mas_wr_walk_index().
The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma
tree in mmap_region()") seems to have made the probability of this event
much more likely, which is the point at which reports started to be
submitted concerning this bug.
The motivation for this change arose from Bert Karwatzki's report of
encountering mm instability after the release of kernel v6.12-rc1 which,
after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
options, was identified as maple tree corruption.
After Bert very generously provided his time and ability to reproduce this
event consistently, I was able to finally identify that the issue
discussed in this commit message was occurring for him.
Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.17283144…
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Bert Karwatzki <spasswolf(a)web.de>
Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/
Tested-by: Bert Karwatzki <spasswolf(a)web.de>
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7…
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Reviewed-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/lib/maple_tree.c~maple_tree-correct-tree-corruption-on-spanning-store
+++ a/lib/maple_tree.c
@@ -2196,6 +2196,8 @@ static inline void mas_node_or_none(stru
/*
* mas_wr_node_walk() - Find the correct offset for the index in the @mas.
+ * If @mas->index cannot be found within the containing
+ * node, we traverse to the last entry in the node.
* @wr_mas: The maple write state
*
* Uses mas_slot_locked() and does not need to worry about dead nodes.
@@ -3532,7 +3534,7 @@ static bool mas_wr_walk(struct ma_wr_sta
return true;
}
-static bool mas_wr_walk_index(struct ma_wr_state *wr_mas)
+static void mas_wr_walk_index(struct ma_wr_state *wr_mas)
{
struct ma_state *mas = wr_mas->mas;
@@ -3541,11 +3543,9 @@ static bool mas_wr_walk_index(struct ma_
wr_mas->content = mas_slot_locked(mas, wr_mas->slots,
mas->offset);
if (ma_is_leaf(wr_mas->type))
- return true;
+ return;
mas_wr_walk_traverse(wr_mas);
-
}
- return true;
}
/*
* mas_extend_spanning_null() - Extend a store of a %NULL to include surrounding %NULLs.
@@ -3765,8 +3765,8 @@ static noinline void mas_wr_spanning_sto
memset(&b_node, 0, sizeof(struct maple_big_node));
/* Copy l_mas and store the value in b_node. */
mas_store_b_node(&l_wr_mas, &b_node, l_mas.end);
- /* Copy r_mas into b_node. */
- if (r_mas.offset <= r_mas.end)
+ /* Copy r_mas into b_node if there is anything to copy. */
+ if (r_mas.max > r_mas.last)
mas_mab_cp(&r_mas, r_mas.offset, r_mas.end,
&b_node, b_node.b_end + 1);
else
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
fork-do-not-invoke-uffd-on-fork-if-error-occurs.patch
fork-only-invoke-khugepaged-ksm-hooks-if-no-error.patch
selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch
mm-refactor-mm_access-to-not-return-null.patch
mm-refactor-mm_access-to-not-return-null-fix.patch
mm-madvise-unrestrict-process_madvise-for-current-process.patch
maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch
misc_minor_alloc was allocating id using ida for minor only in case of
MISC_DYNAMIC_MINOR but misc_minor_free was always freeing ids
using ida_free causing a mismatch and following warn:
> > WARNING: CPU: 0 PID: 159 at lib/idr.c:525 ida_free+0x3e0/0x41f
> > ida_free called for id=127 which is not allocated.
> > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
...
> > [<60941eb4>] ida_free+0x3e0/0x41f
> > [<605ac993>] misc_minor_free+0x3e/0xbc
> > [<605acb82>] misc_deregister+0x171/0x1b3
misc_minor_alloc is changed to allocate id from ida for all minors
falling in the range of dynamic/ misc dynamic minors
Fixes: ab760791c0cf ("char: misc: Increase the maximum number of dynamic misc devices to 1048448")
Signed-off-by: Vimal Agrawal <avimalin(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
v2: Added Fixes:
added missed case for static minor in misc_minor_alloc
v3: Removed kunit changes as that will be added as second patch in this two patch series
v4: Updated Signed-off-by: to match from:
drivers/char/misc.c | 39 ++++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index 541edc26ec89..2cf595d2e10b 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -63,16 +63,30 @@ static DEFINE_MUTEX(misc_mtx);
#define DYNAMIC_MINORS 128 /* like dynamic majors */
static DEFINE_IDA(misc_minors_ida);
-static int misc_minor_alloc(void)
+static int misc_minor_alloc(int minor)
{
- int ret;
-
- ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
- if (ret >= 0) {
- ret = DYNAMIC_MINORS - ret - 1;
+ int ret = 0;
+
+ if (minor == MISC_DYNAMIC_MINOR) {
+ /* allocate free id */
+ ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
+ if (ret >= 0) {
+ ret = DYNAMIC_MINORS - ret - 1;
+ } else {
+ ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
+ MINORMASK, GFP_KERNEL);
+ }
} else {
- ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
- MINORMASK, GFP_KERNEL);
+ /* specific minor, check if it is in dynamic or misc dynamic range */
+ if (minor < DYNAMIC_MINORS) {
+ minor = DYNAMIC_MINORS - minor - 1;
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else if (minor > MISC_DYNAMIC_MINOR) {
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else {
+ /* case of non-dynamic minors, no need to allocate id */
+ ret = 0;
+ }
}
return ret;
}
@@ -219,7 +233,7 @@ int misc_register(struct miscdevice *misc)
mutex_lock(&misc_mtx);
if (is_dynamic) {
- int i = misc_minor_alloc();
+ int i = misc_minor_alloc(misc->minor);
if (i < 0) {
err = -EBUSY;
@@ -228,6 +242,7 @@ int misc_register(struct miscdevice *misc)
misc->minor = i;
} else {
struct miscdevice *c;
+ int i;
list_for_each_entry(c, &misc_list, list) {
if (c->minor == misc->minor) {
@@ -235,6 +250,12 @@ int misc_register(struct miscdevice *misc)
goto out;
}
}
+
+ i = misc_minor_alloc(misc->minor);
+ if (i < 0) {
+ err = -EBUSY;
+ goto out;
+ }
}
dev = MKDEV(MISC_MAJOR, misc->minor);
--
2.17.1
misc_minor_alloc was allocating id using ida for minor only in case of
MISC_DYNAMIC_MINOR but misc_minor_free was always freeing ids
using ida_free causing a mismatch and following warn:
> > WARNING: CPU: 0 PID: 159 at lib/idr.c:525 ida_free+0x3e0/0x41f
> > ida_free called for id=127 which is not allocated.
> > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
...
> > [<60941eb4>] ida_free+0x3e0/0x41f
> > [<605ac993>] misc_minor_free+0x3e/0xbc
> > [<605acb82>] misc_deregister+0x171/0x1b3
misc_minor_alloc is changed to allocate id from ida for all minors
falling in the range of dynamic/ misc dynamic minors
Fixes: ab760791c0cf ("char: misc: Increase the maximum number of dynamic misc devices to 1048448")
Signed-off-by: Vimal Agrawal <vimal.agrawal(a)sophos.com>
Cc: stable(a)vger.kernel.org
---
v2: Added Fixes:
added missed case for static minor in misc_minor_alloc
v3: removed kunit changes as that will be added as second patch in this two patch series
drivers/char/misc.c | 35 ++++++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index 541edc26ec89..9d0cd3459b4f 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -63,16 +63,30 @@ static DEFINE_MUTEX(misc_mtx);
#define DYNAMIC_MINORS 128 /* like dynamic majors */
static DEFINE_IDA(misc_minors_ida);
-static int misc_minor_alloc(void)
+static int misc_minor_alloc(int minor)
{
int ret = 0;
- ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
- if (ret >= 0) {
- ret = DYNAMIC_MINORS - ret - 1;
- } else {
- ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
+ if (minor == MISC_DYNAMIC_MINOR) {
+ /* allocate free id */
+ ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
+ if (ret >= 0) {
+ ret = DYNAMIC_MINORS - ret - 1;
+ } else {
+ ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
MINORMASK, GFP_KERNEL);
+ }
+ } else {
+ /* specific minor, check if it is in dynamic or misc dynamic range */
+ if (minor < DYNAMIC_MINORS) {
+ minor = DYNAMIC_MINORS - minor - 1;
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else if (minor > MISC_DYNAMIC_MINOR) {
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else {
+ /* case of non-dynamic minors, no need to allocate id */
+ ret = 0;
+ }
}
return ret;
}
@@ -219,7 +233,7 @@ int misc_register(struct miscdevice *misc)
mutex_lock(&misc_mtx);
if (is_dynamic) {
- int i = misc_minor_alloc();
+ int i = misc_minor_alloc(misc->minor);
if (i < 0) {
err = -EBUSY;
@@ -228,6 +242,7 @@ int misc_register(struct miscdevice *misc)
misc->minor = i;
} else {
struct miscdevice *c;
+ int i;
list_for_each_entry(c, &misc_list, list) {
if (c->minor == misc->minor) {
@@ -235,6 +250,12 @@ int misc_register(struct miscdevice *misc)
goto out;
}
}
+
+ i = misc_minor_alloc(misc->minor);
+ if (i < 0) {
+ err = -EBUSY;
+ goto out;
+ }
}
dev = MKDEV(MISC_MAJOR, misc->minor);
--
2.17.1
misc_minor_alloc was allocating id using ida for minor only in case of
MISC_DYNAMIC_MINOR but misc_minor_free was always freeing ids
using ida_free causing a mismatch and following warn:
> > WARNING: CPU: 0 PID: 159 at lib/idr.c:525 ida_free+0x3e0/0x41f
> > ida_free called for id=127 which is not allocated.
> > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
...
> > [<60941eb4>] ida_free+0x3e0/0x41f
> > [<605ac993>] misc_minor_free+0x3e/0xbc
> > [<605acb82>] misc_deregister+0x171/0x1b3
misc_minor_alloc is changed to allocate id from ida for all minors
falling in the range of dynamic/ misc dynamic minors
Fixes: ab760791c0cf ("char: misc: Increase the maximum number of dynamic misc devices to 1048448")
Signed-off-by: Vimal Agrawal <vimal.agrawal(a)sophos.com>
Cc: stable(a)vger.kernel.org
---
v2: Added Fixes:
added missed case for static minor in misc_minor_alloc
v3: removed kunit changes as that will be added as second patch in this two patch series
drivers/char/misc.c | 35 ++++++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index 541edc26ec89..9d0cd3459b4f 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -63,16 +63,30 @@ static DEFINE_MUTEX(misc_mtx);
#define DYNAMIC_MINORS 128 /* like dynamic majors */
static DEFINE_IDA(misc_minors_ida);
-static int misc_minor_alloc(void)
+static int misc_minor_alloc(int minor)
{
int ret;
- ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
- if (ret >= 0) {
- ret = DYNAMIC_MINORS - ret - 1;
- } else {
- ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
+ if (minor == MISC_DYNAMIC_MINOR) {
+ /* allocate free id */
+ ret = ida_alloc_max(&misc_minors_ida, DYNAMIC_MINORS - 1, GFP_KERNEL);
+ if (ret >= 0) {
+ ret = DYNAMIC_MINORS - ret - 1;
+ } else {
+ ret = ida_alloc_range(&misc_minors_ida, MISC_DYNAMIC_MINOR + 1,
MINORMASK, GFP_KERNEL);
+ }
+ } else {
+ /* specific minor, check if it is in dynamic or misc dynamic range */
+ if (minor < DYNAMIC_MINORS) {
+ minor = DYNAMIC_MINORS - minor - 1;
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else if (minor > MISC_DYNAMIC_MINOR) {
+ ret = ida_alloc_range(&misc_minors_ida, minor, minor, GFP_KERNEL);
+ } else {
+ /* case of non-dynamic minors, no need to allocate id */
+ ret = 0;
+ }
}
return ret;
}
@@ -219,7 +233,7 @@ int misc_register(struct miscdevice *misc)
mutex_lock(&misc_mtx);
if (is_dynamic) {
- int i = misc_minor_alloc();
+ int i = misc_minor_alloc(misc->minor);
if (i < 0) {
err = -EBUSY;
@@ -228,6 +242,7 @@ int misc_register(struct miscdevice *misc)
misc->minor = i;
} else {
struct miscdevice *c;
+ int i;
list_for_each_entry(c, &misc_list, list) {
if (c->minor == misc->minor) {
@@ -235,6 +250,12 @@ int misc_register(struct miscdevice *misc)
goto out;
}
}
+
+ i = misc_minor_alloc(misc->minor);
+ if (i < 0) {
+ err = -EBUSY;
+ goto out;
+ }
}
dev = MKDEV(MISC_MAJOR, misc->minor);
--
2.17.1
If there is an error during some initialization related to firmware,
the buffers dp->tx_ring[i].tx_status are released.
However this is released again when the device is unbinded (ath12k_pci),
and we get:
WARNING: CPU: 0 PID: 2098 at mm/slub.c:4689 free_large_kmalloc+0x4d/0x80
Call Trace:
free_large_kmalloc
ath12k_dp_free
ath12k_core_deinit
ath12k_pci_remove
...
The issue is always reproducible from a VM because the MSI addressing
initialization is failing.
In order to fix the issue, just set the buffers to NULL after releasing in
order to avoid the double free.
cc: stable(a)vger.kernel.org
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com>
---
v3:
- Remove unnecessary check to not free if buffer is NULL.
- Trim backtrace.
- Fix typos.
v2: https://lore.kernel.org/linux-wireless/20241016123722.206899-1-jtornosm@red…
v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-3-jtornosm@red…
drivers/net/wireless/ath/ath12k/dp.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c
index 789d430e4455..15061782a2df 100644
--- a/drivers/net/wireless/ath/ath12k/dp.c
+++ b/drivers/net/wireless/ath/ath12k/dp.c
@@ -1277,8 +1277,10 @@ void ath12k_dp_free(struct ath12k_base *ab)
ath12k_dp_rx_reo_cmd_list_cleanup(ab);
- for (i = 0; i < ab->hw_params->max_tx_ring; i++)
+ for (i = 0; i < ab->hw_params->max_tx_ring; i++) {
kfree(dp->tx_ring[i].tx_status);
+ dp->tx_ring[i].tx_status = NULL;
+ }
ath12k_dp_rx_free(ab);
/* Deinit any SOC level resource */
--
2.47.0
If there is an error during some initialization related to firmware,
the function ath12k_dp_cc_cleanup is called to release resources.
However this is released again when the device is unbinded (ath12k_pci),
and we get:
BUG: kernel NULL pointer dereference, address: 0000000000000020
at RIP: 0010:ath12k_dp_cc_cleanup.part.0+0xb6/0x500 [ath12k]
Call Trace:
ath12k_dp_cc_cleanup
ath12k_dp_free
ath12k_core_deinit
ath12k_pci_remove
...
The issue is always reproducible from a VM because the MSI addressing
initialization is failing.
In order to fix the issue, just set to NULL the released structure in
ath12k_dp_cc_cleanup at the end.
cc: stable(a)vger.kernel.org
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm(a)redhat.com>
---
v3:
- Trim backtrace.
- Fix typos.
v2: https://lore.kernel.org/linux-wireless/20241016123452.206671-1-jtornosm@red…
v1: https://lore.kernel.org/linux-wireless/20241010175102.207324-2-jtornosm@red…
drivers/net/wireless/ath/ath12k/dp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c
index 61aa78d8bd8c..789d430e4455 100644
--- a/drivers/net/wireless/ath/ath12k/dp.c
+++ b/drivers/net/wireless/ath/ath12k/dp.c
@@ -1241,6 +1241,7 @@ static void ath12k_dp_cc_cleanup(struct ath12k_base *ab)
}
kfree(dp->spt_info);
+ dp->spt_info = NULL;
}
static void ath12k_dp_reoq_lut_cleanup(struct ath12k_base *ab)
--
2.46.2
The quilt patch titled
Subject: mm/mglru: only clear kswapd_failures if reclaimable
has been removed from the -mm tree. Its filename was
mm-mglru-only-clear-kswapd_failures-if-reclaimable.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Wei Xu <weixugc(a)google.com>
Subject: mm/mglru: only clear kswapd_failures if reclaimable
Date: Mon, 14 Oct 2024 22:12:11 +0000
lru_gen_shrink_node() unconditionally clears kswapd_failures, which can
prevent kswapd from sleeping and cause 100% kswapd cpu usage even when
kswapd repeatedly fails to make progress in reclaim.
Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes some
progress, similar to shrink_node().
I happened to run into this problem in one of my tests recently. It
requires a combination of several conditions: The allocator needs to
allocate a right amount of pages such that it can wake up kswapd
without itself being OOM killed; there is no memory for kswapd to
reclaim (My test disables swap and cleans page cache first); no other
process frees enough memory at the same time.
Link: https://lkml.kernel.org/r/20241014221211.832591-1-weixugc@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Wei Xu <weixugc(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: Jan Alexander Steffens <heftig(a)archlinux.org>
Cc: Suleiman Souhlal <suleiman(a)google.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/vmscan.c~mm-mglru-only-clear-kswapd_failures-if-reclaimable
+++ a/mm/vmscan.c
@@ -4963,8 +4963,8 @@ static void lru_gen_shrink_node(struct p
blk_finish_plug(&plug);
done:
- /* kswapd should never fail */
- pgdat->kswapd_failures = 0;
+ if (sc->nr_reclaimed > reclaimed)
+ pgdat->kswapd_failures = 0;
}
/******************************************************************************
_
Patches currently in -mm which might be from weixugc(a)google.com are
The quilt patch titled
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
has been removed from the -mm tree. Its filename was
mm-swapfile-skip-hugetlb-pages-for-unuse_vma.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
Date: Tue, 15 Oct 2024 09:45:21 +0800
I got a bad pud error and lost a 1GB HugeTLB when calling swapoff. The
problem can be reproduced by the following steps:
1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
2. Swapout the above anonymous memory.
3. run swapoff and we will get a bad pud error in kernel message:
mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)
We can tell that pud_clear_bad is called by pud_none_or_clear_bad in
unuse_pud_range() by ftrace. And therefore the HugeTLB pages will never
be freed because we lost it from page table. We can skip HugeTLB pages
for unuse_vma to fix it.
Link: https://lkml.kernel.org/r/20241015014521.570237-1-liushixin2@huawei.com
Fixes: 0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swapfile.c~mm-swapfile-skip-hugetlb-pages-for-unuse_vma
+++ a/mm/swapfile.c
@@ -2313,7 +2313,7 @@ static int unuse_mm(struct mm_struct *mm
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
- if (vma->anon_vma) {
+ if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
ret = unuse_vma(vma, type);
if (ret)
break;
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
The quilt patch titled
Subject: mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
has been removed from the -mm tree. Its filename was
mm-dont-install-pmd-mappings-when-thps-are-disabled-by-the-hw-process-vma.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
Date: Fri, 11 Oct 2024 12:24:45 +0200
We (or rather, readahead logic :) ) might be allocating a THP in the
pagecache and then try mapping it into a process that explicitly disabled
THP: we might end up installing PMD mappings.
This is a problem for s390x KVM, which explicitly remaps all PMD-mapped
THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before
starting the VM.
For example, starting a VM backed on a file system with large folios
supported makes the VM crash when the VM tries accessing such a mapping
using KVM.
Is it also a problem when the HW disabled THP using
TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case
without X86_FEATURE_PSE.
In the future, we might be able to do better on s390x and only disallow
PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED
really wants. For now, fix it by essentially performing the same check as
would be done in __thp_vma_allowable_orders() or in shmem code, where this
works as expected, and disallow PMD mappings, making us fallback to PTE
mappings.
Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Leo Fu <bfu(a)redhat.com>
Tested-by: Thomas Huth <thuth(a)redhat.com>
Cc: Thomas Huth <thuth(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/mm/memory.c~mm-dont-install-pmd-mappings-when-thps-are-disabled-by-the-hw-process-vma
+++ a/mm/memory.c
@@ -4920,6 +4920,15 @@ vm_fault_t do_set_pmd(struct vm_fault *v
pmd_t entry;
vm_fault_t ret = VM_FAULT_FALLBACK;
+ /*
+ * It is too late to allocate a small folio, we already have a large
+ * folio in the pagecache: especially s390 KVM cannot tolerate any
+ * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any
+ * PMD mappings if THPs are disabled.
+ */
+ if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags))
+ return ret;
+
if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
return ret;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-pagewalk-fix-usage-of-pmd_leaf-pud_leaf-without-present-check.patch
selftests-mm-hugetlb_fault_after_madv-use-default-hguetlb-page-size.patch
selftests-mm-hugetlb_fault_after_madv-improve-test-output.patch
The quilt patch titled
Subject: mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
has been removed from the -mm tree. Its filename was
mm-huge_memory-add-vma_thp_disabled-and-thp_disabled_by_hw.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Subject: mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
Date: Fri, 11 Oct 2024 12:24:44 +0200
Patch series "mm: don't install PMD mappings when THPs are disabled by the
hw/process/vma".
During testing, it was found that we can get PMD mappings in processes
where THP (and more precisely, PMD mappings) are supposed to be disabled.
While it works as expected for anon+shmem, the pagecache is the
problematic bit.
For s390 KVM this currently means that a VM backed by a file located on
filesystem with large folio support can crash when KVM tries accessing the
problematic page, because the readahead logic might decide to use a
PMD-sized THP and faulting it into the page tables will install a PMD
mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings,
but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we can
install a PMD mapping. khugepaged should already be taking care of not
collapsing if THPs are effectively disabled for the hw/process/vma.
This patch (of 2):
Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by
shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
[david(a)redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ]
Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Leo Fu <bfu(a)redhat.com>
Tested-by: Thomas Huth <thuth(a)redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Boqiao Fu <bfu(a)redhat.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/huge_mm.h | 18 ++++++++++++++++++
mm/huge_memory.c | 13 +------------
mm/shmem.c | 7 +------
3 files changed, 20 insertions(+), 18 deletions(-)
--- a/include/linux/huge_mm.h~mm-huge_memory-add-vma_thp_disabled-and-thp_disabled_by_hw
+++ a/include/linux/huge_mm.h
@@ -322,6 +322,24 @@ struct thpsize {
(transparent_hugepage_flags & \
(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
+static inline bool vma_thp_disabled(struct vm_area_struct *vma,
+ unsigned long vm_flags)
+{
+ /*
+ * Explicitly disabled through madvise or prctl, or some
+ * architectures may disable THP for some mappings, for
+ * example, s390 kvm.
+ */
+ return (vm_flags & VM_NOHUGEPAGE) ||
+ test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags);
+}
+
+static inline bool thp_disabled_by_hw(void)
+{
+ /* If the hardware/firmware marked hugepage support disabled. */
+ return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED);
+}
+
unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
--- a/mm/huge_memory.c~mm-huge_memory-add-vma_thp_disabled-and-thp_disabled_by_hw
+++ a/mm/huge_memory.c
@@ -109,18 +109,7 @@ unsigned long __thp_vma_allowable_orders
if (!vma->vm_mm) /* vdso */
return 0;
- /*
- * Explicitly disabled through madvise or prctl, or some
- * architectures may disable THP for some mappings, for
- * example, s390 kvm.
- * */
- if ((vm_flags & VM_NOHUGEPAGE) ||
- test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
- return 0;
- /*
- * If the hardware/firmware marked hugepage support disabled.
- */
- if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
+ if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags))
return 0;
/* khugepaged doesn't collapse DAX vma, but page fault is fine. */
--- a/mm/shmem.c~mm-huge_memory-add-vma_thp_disabled-and-thp_disabled_by_hw
+++ a/mm/shmem.c
@@ -1664,12 +1664,7 @@ unsigned long shmem_allowable_huge_order
loff_t i_size;
int order;
- if (vma && ((vm_flags & VM_NOHUGEPAGE) ||
- test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)))
- return 0;
-
- /* If the hardware/firmware marked hugepage support disabled. */
- if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
+ if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags)))
return 0;
global_huge = shmem_huge_global_enabled(inode, index, write_end,
_
Patches currently in -mm which might be from wangkefeng.wang(a)huawei.com are
mm-remove-unused-hugepage-for-vma_alloc_folio.patch
The quilt patch titled
Subject: mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
has been removed from the -mm tree. Its filename was
mm-khugepaged-fix-the-arguments-order-in-khugepaged_collapse_file-trace-point.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yang Shi <yang(a)os.amperecomputing.com>
Subject: mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
Date: Fri, 11 Oct 2024 18:17:02 -0700
The "addr" and "is_shmem" arguments have different order in TP_PROTO and
TP_ARGS. This resulted in the incorrect trace result:
text-hugepage-644429 [276] 392092.878683: mm_khugepaged_collapse_file:
mm=0xffff20025d52c440, hpage_pfn=0x200678c00, index=512, addr=1, is_shmem=0,
filename=text-hugepage, nr=512, result=failed
The value of "addr" is wrong because it was treated as bool value, the
type of is_shmem.
Fix the order in TP_PROTO to keep "addr" is before "is_shmem" since the
original patch review suggested this order to achieve best packing.
And use "lx" for "addr" instead of "ld" in TP_printk because address is
typically shown in hex.
After the fix, the trace result looks correct:
text-hugepage-7291 [004] 128.627251: mm_khugepaged_collapse_file:
mm=0xffff0001328f9500, hpage_pfn=0x20016ea00, index=512, addr=0x400000,
is_shmem=0, filename=text-hugepage, nr=512, result=failed
Link: https://lkml.kernel.org/r/20241012011702.1084846-1-yang@os.amperecomputing.…
Fixes: 4c9473e87e75 ("mm/khugepaged: add tracepoint to collapse_file()")
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
Cc: Gautam Menghani <gautammenghani201(a)gmail.com>
Cc: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Cc: <stable(a)vger.kernel.org> [6.2+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/trace/events/huge_memory.h | 4 ++--
mm/khugepaged.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
--- a/include/trace/events/huge_memory.h~mm-khugepaged-fix-the-arguments-order-in-khugepaged_collapse_file-trace-point
+++ a/include/trace/events/huge_memory.h
@@ -208,7 +208,7 @@ TRACE_EVENT(mm_khugepaged_scan_file,
TRACE_EVENT(mm_khugepaged_collapse_file,
TP_PROTO(struct mm_struct *mm, struct folio *new_folio, pgoff_t index,
- bool is_shmem, unsigned long addr, struct file *file,
+ unsigned long addr, bool is_shmem, struct file *file,
int nr, int result),
TP_ARGS(mm, new_folio, index, addr, is_shmem, file, nr, result),
TP_STRUCT__entry(
@@ -233,7 +233,7 @@ TRACE_EVENT(mm_khugepaged_collapse_file,
__entry->result = result;
),
- TP_printk("mm=%p, hpage_pfn=0x%lx, index=%ld, addr=%ld, is_shmem=%d, filename=%s, nr=%d, result=%s",
+ TP_printk("mm=%p, hpage_pfn=0x%lx, index=%ld, addr=%lx, is_shmem=%d, filename=%s, nr=%d, result=%s",
__entry->mm,
__entry->hpfn,
__entry->index,
--- a/mm/khugepaged.c~mm-khugepaged-fix-the-arguments-order-in-khugepaged_collapse_file-trace-point
+++ a/mm/khugepaged.c
@@ -2227,7 +2227,7 @@ rollback:
folio_put(new_folio);
out:
VM_BUG_ON(!list_empty(&pagelist));
- trace_mm_khugepaged_collapse_file(mm, new_folio, index, is_shmem, addr, file, HPAGE_PMD_NR, result);
+ trace_mm_khugepaged_collapse_file(mm, new_folio, index, addr, is_shmem, file, HPAGE_PMD_NR, result);
return result;
}
_
Patches currently in -mm which might be from yang(a)os.amperecomputing.com are
The quilt patch titled
Subject: mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
has been removed from the -mm tree. Its filename was
mm-damon-fix-memory-leak-in-damon_sysfs_test_add_targets.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jinjie Ruan <ruanjinjie(a)huawei.com>
Subject: mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
Date: Thu, 10 Oct 2024 20:53:23 +0800
The sysfs_target->regions allocated in damon_sysfs_regions_alloc() is not
freed in damon_sysfs_test_add_targets(), which cause the following memory
leak, free it to fix it.
unreferenced object 0xffffff80c2a8db80 (size 96):
comm "kunit_try_catch", pid 187, jiffies 4294894363
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 0):
[<0000000001e3714d>] kmemleak_alloc+0x34/0x40
[<000000008e6835c1>] __kmalloc_cache_noprof+0x26c/0x2f4
[<000000001286d9f8>] damon_sysfs_test_add_targets+0x1cc/0x738
[<0000000032ef8f77>] kunit_try_run_case+0x13c/0x3ac
[<00000000f3edea23>] kunit_generic_run_threadfn_adapter+0x80/0xec
[<00000000adf936cf>] kthread+0x2e8/0x374
[<0000000041bb1628>] ret_from_fork+0x10/0x20
Link: https://lkml.kernel.org/r/20241010125323.3127187-1-ruanjinjie@huawei.com
Fixes: b8ee5575f763 ("mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()")
Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/tests/sysfs-kunit.h | 1 +
1 file changed, 1 insertion(+)
--- a/mm/damon/tests/sysfs-kunit.h~mm-damon-fix-memory-leak-in-damon_sysfs_test_add_targets
+++ a/mm/damon/tests/sysfs-kunit.h
@@ -67,6 +67,7 @@ static void damon_sysfs_test_add_targets
damon_destroy_ctx(ctx);
kfree(sysfs_targets->targets_arr);
kfree(sysfs_targets);
+ kfree(sysfs_target->regions);
kfree(sysfs_target);
}
_
Patches currently in -mm which might be from ruanjinjie(a)huawei.com are
The quilt patch titled
Subject: lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
has been removed from the -mm tree. Its filename was
lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Florian Westphal <fw(a)strlen.de>
Subject: lib: alloc_tag_module_unload must wait for pending kfree_rcu calls
Date: Mon, 7 Oct 2024 22:52:24 +0200
Ben Greear reports following splat:
------------[ cut here ]------------
net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
codetag_unload_module+0x19b/0x2a0
? codetag_load_module+0x80/0x80
nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.
Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.
Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat # will WARN.
[akpm(a)linux-foundation.org: add comment]
Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de
Fixes: a473573964e5 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Reported-by: Ben Greear <greearb(a)candelatech.com>
Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candela…
Cc: Uladzislau Rezki <urezki(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/codetag.c | 3 +++
1 file changed, 3 insertions(+)
--- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls
+++ a/lib/codetag.c
@@ -228,6 +228,9 @@ bool codetag_unload_module(struct module
if (!mod)
return true;
+ /* await any module's kfree_rcu() operations to complete */
+ kvfree_rcu_barrier();
+
mutex_lock(&codetag_lock);
list_for_each_entry(cttype, &codetag_types, link) {
struct codetag_module *found = NULL;
_
Patches currently in -mm which might be from fw(a)strlen.de are
The quilt patch titled
Subject: mm/mremap: fix move_normal_pmd/retract_page_tables race
has been removed from the -mm tree. Its filename was
mm-mremap-fix-move_normal_pmd-retract_page_tables-race.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm/mremap: fix move_normal_pmd/retract_page_tables race
Date: Mon, 07 Oct 2024 23:42:04 +0200
In mremap(), move_page_tables() looks at the type of the PMD entry and the
specified address range to figure out by which method the next chunk of
page table entries should be moved.
At that point, the mmap_lock is held in write mode, but no rmap locks are
held yet. For PMD entries that point to page tables and are fully covered
by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called,
which first takes rmap locks, then does move_normal_pmd().
move_normal_pmd() takes the necessary page table locks at source and
destination, then moves an entire page table from the source to the
destination.
The problem is: The rmap locks, which protect against concurrent page
table removal by retract_page_tables() in the THP code, are only taken
after the PMD entry has been read and it has been decided how to move it.
So we can race as follows (with two processes that have mappings of the
same tmpfs file that is stored on a tmpfs mount with huge=advise); note
that process A accesses page tables through the MM while process B does it
through the file rmap:
process A process B
========= =========
mremap
mremap_to
move_vma
move_page_tables
get_old_pmd
alloc_new_pmd
*** PREEMPT ***
madvise(MADV_COLLAPSE)
do_madvise
madvise_walk_vmas
madvise_vma_behavior
madvise_collapse
hpage_collapse_scan_file
collapse_file
retract_page_tables
i_mmap_lock_read(mapping)
pmdp_collapse_flush
i_mmap_unlock_read(mapping)
move_pgt_entry(NORMAL_PMD, ...)
take_rmap_locks
move_normal_pmd
drop_rmap_locks
When this happens, move_normal_pmd() can end up creating bogus PMD entries
in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`. The effect
depends on arch-specific and machine-specific details; on x86, you can end
up with physical page 0 mapped as a page table, which is likely
exploitable for user->kernel privilege escalation.
Fix the race by letting process B recheck that the PMD still points to a
page table after the rmap locks have been taken. Otherwise, we bail and
let the caller fall back to the PTE-level copying path, which will then
bail immediately at the pmd_none() check.
Bug reachability: Reaching this bug requires that you can create
shmem/file THP mappings - anonymous THP uses different code that doesn't
zap stuff under rmap locks. File THP is gated on an experimental config
flag (CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need
shmem THP to hit this bug. As far as I know, getting shmem THP normally
requires that you can mount your own tmpfs with the right mount flags,
which would require creating your own user+mount namespace; though I don't
know if some distros maybe enable shmem THP by default or something like
that.
Bug impact: This issue can likely be used for user->kernel privilege
escalation when it is reachable.
Link: https://lkml.kernel.org/r/20241007-move_normal_pmd-vs-collapse-fix-2-v1-1-5…
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Co-developed-by: David Hildenbrand <david(a)redhat.com>
Closes: https://project-zero.issues.chromium.org/371047675
Acked-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mremap.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
--- a/mm/mremap.c~mm-mremap-fix-move_normal_pmd-retract_page_tables-race
+++ a/mm/mremap.c
@@ -238,6 +238,7 @@ static bool move_normal_pmd(struct vm_ar
{
spinlock_t *old_ptl, *new_ptl;
struct mm_struct *mm = vma->vm_mm;
+ bool res = false;
pmd_t pmd;
if (!arch_supports_page_table_move())
@@ -277,19 +278,25 @@ static bool move_normal_pmd(struct vm_ar
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
- /* Clear the pmd */
pmd = *old_pmd;
+
+ /* Racing with collapse? */
+ if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd)))
+ goto out_unlock;
+ /* Clear the pmd */
pmd_clear(old_pmd);
+ res = true;
VM_BUG_ON(!pmd_none(*new_pmd));
pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
+out_unlock:
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
spin_unlock(old_ptl);
- return true;
+ return res;
}
#else
static inline bool move_normal_pmd(struct vm_area_struct *vma,
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-mark-mas-allocation-in-vms_abort_munmap_vmas-as-__gfp_nofail.patch
The quilt patch titled
Subject: selftests/mm: fix deadlock for fork after pthread_create on ARM
has been removed from the -mm tree. Its filename was
selftests-mm-fix-deadlock-for-fork-after-pthread_create-on-arm.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Edward Liaw <edliaw(a)google.com>
Subject: selftests/mm: fix deadlock for fork after pthread_create on ARM
Date: Thu, 3 Oct 2024 21:17:11 +0000
On Android with arm, there is some synchronization needed to avoid a
deadlock when forking after pthread_create.
Link: https://lkml.kernel.org/r/20241003211716.371786-3-edliaw@google.com
Fixes: cff294582798 ("selftests/mm: extend and rename uffd pagemap test")
Signed-off-by: Edward Liaw <edliaw(a)google.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/uffd-unit-tests.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/tools/testing/selftests/mm/uffd-unit-tests.c~selftests-mm-fix-deadlock-for-fork-after-pthread_create-on-arm
+++ a/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -241,6 +241,9 @@ static void *fork_event_consumer(void *d
fork_event_args *args = data;
struct uffd_msg msg = { 0 };
+ /* Ready for parent thread to fork */
+ pthread_barrier_wait(&ready_for_fork);
+
/* Read until a full msg received */
while (uffd_read_msg(args->parent_uffd, &msg));
@@ -308,8 +311,12 @@ static int pagemap_test_fork(int uffd, b
/* Prepare a thread to resolve EVENT_FORK */
if (with_event) {
+ pthread_barrier_init(&ready_for_fork, NULL, 2);
if (pthread_create(&thread, NULL, fork_event_consumer, &args))
err("pthread_create()");
+ /* Wait for child thread to start before forking */
+ pthread_barrier_wait(&ready_for_fork);
+ pthread_barrier_destroy(&ready_for_fork);
}
child = fork();
_
Patches currently in -mm which might be from edliaw(a)google.com are
The quilt patch titled
Subject: selftests/mm: replace atomic_bool with pthread_barrier_t
has been removed from the -mm tree. Its filename was
selftests-mm-replace-atomic_bool-with-pthread_barrier_t.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Edward Liaw <edliaw(a)google.com>
Subject: selftests/mm: replace atomic_bool with pthread_barrier_t
Date: Thu, 3 Oct 2024 21:17:10 +0000
Patch series "selftests/mm: fix deadlock after pthread_create".
On Android arm, pthread_create followed by a fork caused a deadlock in the
case where the fork required work to be completed by the created thread.
Update the synchronization primitive to use pthread_barrier instead of
atomic_bool.
Apply the same fix to the wp-fork-with-event test.
This patch (of 2):
Swap synchronization primitive with pthread_barrier, so that stdatomic.h
does not need to be included.
The synchronization is needed on Android ARM64; we see a deadlock with
pthread_create when the parent thread races forward before the child has a
chance to start doing work.
Link: https://lkml.kernel.org/r/20241003211716.371786-1-edliaw@google.com
Link: https://lkml.kernel.org/r/20241003211716.371786-2-edliaw@google.com
Fixes: cff294582798 ("selftests/mm: extend and rename uffd pagemap test")
Signed-off-by: Edward Liaw <edliaw(a)google.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/uffd-common.c | 5 +++--
tools/testing/selftests/mm/uffd-common.h | 3 +--
tools/testing/selftests/mm/uffd-unit-tests.c | 14 ++++++++------
3 files changed, 12 insertions(+), 10 deletions(-)
--- a/tools/testing/selftests/mm/uffd-common.c~selftests-mm-replace-atomic_bool-with-pthread_barrier_t
+++ a/tools/testing/selftests/mm/uffd-common.c
@@ -18,7 +18,7 @@ bool test_uffdio_wp = true;
unsigned long long *count_verify;
uffd_test_ops_t *uffd_test_ops;
uffd_test_case_ops_t *uffd_test_case_ops;
-atomic_bool ready_for_fork;
+pthread_barrier_t ready_for_fork;
static int uffd_mem_fd_create(off_t mem_size, bool hugetlb)
{
@@ -519,7 +519,8 @@ void *uffd_poll_thread(void *arg)
pollfd[1].fd = pipefd[cpu*2];
pollfd[1].events = POLLIN;
- ready_for_fork = true;
+ /* Ready for parent thread to fork */
+ pthread_barrier_wait(&ready_for_fork);
for (;;) {
ret = poll(pollfd, 2, -1);
--- a/tools/testing/selftests/mm/uffd-common.h~selftests-mm-replace-atomic_bool-with-pthread_barrier_t
+++ a/tools/testing/selftests/mm/uffd-common.h
@@ -33,7 +33,6 @@
#include <inttypes.h>
#include <stdint.h>
#include <sys/random.h>
-#include <stdatomic.h>
#include "../kselftest.h"
#include "vm_util.h"
@@ -105,7 +104,7 @@ extern bool map_shared;
extern bool test_uffdio_wp;
extern unsigned long long *count_verify;
extern volatile bool test_uffdio_copy_eexist;
-extern atomic_bool ready_for_fork;
+extern pthread_barrier_t ready_for_fork;
extern uffd_test_ops_t anon_uffd_test_ops;
extern uffd_test_ops_t shmem_uffd_test_ops;
--- a/tools/testing/selftests/mm/uffd-unit-tests.c~selftests-mm-replace-atomic_bool-with-pthread_barrier_t
+++ a/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -774,7 +774,7 @@ static void uffd_sigbus_test_common(bool
char c;
struct uffd_args args = { 0 };
- ready_for_fork = false;
+ pthread_barrier_init(&ready_for_fork, NULL, 2);
fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
@@ -791,8 +791,9 @@ static void uffd_sigbus_test_common(bool
if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &args))
err("uffd_poll_thread create");
- while (!ready_for_fork)
- ; /* Wait for the poll_thread to start executing before forking */
+ /* Wait for child thread to start before forking */
+ pthread_barrier_wait(&ready_for_fork);
+ pthread_barrier_destroy(&ready_for_fork);
pid = fork();
if (pid < 0)
@@ -833,7 +834,7 @@ static void uffd_events_test_common(bool
char c;
struct uffd_args args = { 0 };
- ready_for_fork = false;
+ pthread_barrier_init(&ready_for_fork, NULL, 2);
fcntl(uffd, F_SETFL, uffd_flags | O_NONBLOCK);
if (uffd_register(uffd, area_dst, nr_pages * page_size,
@@ -844,8 +845,9 @@ static void uffd_events_test_common(bool
if (pthread_create(&uffd_mon, NULL, uffd_poll_thread, &args))
err("uffd_poll_thread create");
- while (!ready_for_fork)
- ; /* Wait for the poll_thread to start executing before forking */
+ /* Wait for child thread to start before forking */
+ pthread_barrier_wait(&ready_for_fork);
+ pthread_barrier_destroy(&ready_for_fork);
pid = fork();
if (pid < 0)
_
Patches currently in -mm which might be from edliaw(a)google.com are
The quilt patch titled
Subject: fat: fix uninitialized variable
has been removed from the -mm tree. Its filename was
fat-fix-uninitialized-variable.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Subject: fat: fix uninitialized variable
Date: Fri, 04 Oct 2024 15:03:49 +0900
syszbot produced this with a corrupted fs image. In theory, however an IO
error would trigger this also.
This affects just an error report, so should not be a serious error.
Link: https://lkml.kernel.org/r/87r08wjsnh.fsf@mail.parknet.co.jp
Link: https://lkml.kernel.org/r/66ff2c95.050a0220.49194.03e9.GAE@google.com
Signed-off-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Reported-by: syzbot+ef0d7bc412553291aa86(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/fat/namei_vfat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/fat/namei_vfat.c~fat-fix-uninitialized-variable
+++ a/fs/fat/namei_vfat.c
@@ -1037,7 +1037,7 @@ error_inode:
if (corrupt < 0) {
fat_fs_error(new_dir->i_sb,
"%s: Filesystem corrupted (i_pos %lld)",
- __func__, sinfo.i_pos);
+ __func__, new_i_pos);
}
goto out;
}
_
Patches currently in -mm which might be from hirofumi(a)mail.parknet.co.jp are
The LG Gram Pro 16 2-in-1 (2024) the 16T90SP has its keybopard IRQ (1)
described as ActiveLow in the DSDT, which the kernel overrides to EdgeHigh
which breaks the keyboard.
Add the 16T90SP to the irq1_level_low_skip_override[] quirk table to fix
this.
Reported-by: Dirk Holten <dirk.holten(a)gmx.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219382
Cc: stable(a)vger.kernel.org
Suggested-by: Dirk Holten <dirk.holten(a)gmx.de>
Signed-off-by: Christian Heusel <christian(a)heusel.eu>
---
Note that I do not have the relevant hardware since I'm sending in this
quirk at the request of someone else.
Also does this change need a "Fixes: ..." tag?
---
drivers/acpi/resource.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 129bceb1f4a27df93439bcefdb27fd9c91258028..dd6249fb76c24f08db4149883be4548130d0ef1e 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -502,6 +502,11 @@ static const struct dmi_system_id irq1_level_low_skip_override[] = {
DMI_MATCH(DMI_SYS_VENDOR, "LG Electronics"),
DMI_MATCH(DMI_BOARD_NAME, "17U70P"),
},
+ /* LG Electronics 16T90SP */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LG Electronics"),
+ DMI_MATCH(DMI_BOARD_NAME, "16T90SP"),
+ },
},
{ }
};
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241016-lg-gram-pro-keyboard-9a9d8b9aa647
Best regards,
--
Christian Heusel <christian(a)heusel.eu>
There is a probability that the host machine will also restart
when the virtual machine is restarting.
Commit ad362fe07fec ("KVM: arm64: vgic-its: Avoid potential UAF
in LPI translation cache") released the reference count of an IRQ
when it shouldn't have. This led to a situation where, when the
system finally released the IRQ, it found that the structure had
already been freed, triggering a
'refcount_t: underflow; use-after-free' error.
In fact, the function "vgic_put_irq" should be called by
"vgic_its_inject_cached_translation" instead of
"vgic_its_trigger_msi".
Call trace:
its_free_ite+0x90/0xa0
vgic_its_free_device+0x3c/0xa0
vgic_its_destroy+0x4c/0xb8
kvm_put_kvm+0x214/0x358
kvm_vcpu_release+0x24/0x38
__fput+0x84/0x278
____fput+0x20/0x30
task_work_run+0xcc/0x190
do_exit+0x36c/0xa88
do_group_exit+0x4c/0xb8
__arm64_sys_exit_group+0x24/0x28
invoke_syscall+0x54/0x120
el0_svc_common.constprop.4+0x16c/0x1f0
do_el0_svc+0x34/0xb0
el0_svc+0x1c/0x28
el0_sync_handler+0x8c/0xb0
el0_sync+0x148/0x180
Fixes: ad362fe07fec ("KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wenyao Hai <haiwenyao(a)uniontech.com>
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
---
arch/arm64/kvm/vgic/vgic-its.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index ba945ba78cc7..fb5f57cbab42 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -679,6 +679,7 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->pending_latch = true;
vgic_queue_irq_unlock(kvm, irq, flags);
+ vgic_put_irq(kvm, irq);
return 0;
}
@@ -697,7 +698,6 @@ int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi)
raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->pending_latch = true;
vgic_queue_irq_unlock(kvm, irq, flags);
- vgic_put_irq(kvm, irq);
return 0;
}
--
2.45.2
With these features are enabled, the EEVDF scheduler introduces a large
performance degradation, observed in multiple database tests on kernel
versions using EEVDF, across multiple architectures (x86, aarch64, amd64)
and CPU generations.
Disable the features to minimize default performance impact.
Cc: <stable(a)vger.kernel.org> # 6.6.x
Fixes: 86bfbb7ce4f6 ("sched/fair: Add lag based placement")
Fixes: 63304558ba5d ("sched/eevdf: Curb wakeup-preemption")
Signed-off-by: Cristian Prundeanu <cpru(a)amazon.com>
---
kernel/sched/features.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index a3d331dd2d8f..8a5ca80665b3 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -4,7 +4,7 @@
* Using the avg_vruntime, do the right thing and preserve lag across
* sleep+wake cycles. EEVDF placement strategy #1, #2 if disabled.
*/
-SCHED_FEAT(PLACE_LAG, true)
+SCHED_FEAT(PLACE_LAG, false)
/*
* Give new tasks half a slice to ease into the competition.
*/
@@ -17,7 +17,7 @@ SCHED_FEAT(PLACE_REL_DEADLINE, true)
* Inhibit (wakeup) preemption until the current task has either matched the
* 0-lag point or until is has exhausted it's slice.
*/
-SCHED_FEAT(RUN_TO_PARITY, true)
+SCHED_FEAT(RUN_TO_PARITY, false)
/*
* Allow wakeup of tasks with a shorter slice to cancel RUN_TO_PARITY for
* current.
--
2.40.1
Hi, Conor
Thanks for your patch.
> From: Conor Dooley <conor.dooley(a)microchip.com>
>
> Aurelien reported probe failures due to the csi node being enabled without
> having a camera attached to it. A camera was in the initial submissions, but
> was removed from the dts, as it had not actually been present on the board,
> but was from an addon board used by the developer of the relevant drivers.
> The non-camera pipeline nodes were not disabled when this happened and
> the probe failures are problematic for Debian. Disable them.
>
> CC: stable(a)vger.kernel.org
> Fixes: 28ecaaa5af192 ("riscv: dts: starfive: jh7110: Add camera subsystem
> nodes")
Here you write it in 13 characters, should be "Fixes: 28ecaaa5af19 ..."
Best Regards
Changhuang.
> Closes: https://lore.kernel.org/all/Zw1-vcN4CoVkfLjU@aurel32.net/
> Reported-by: Aurelien Jarno <aurelien(a)aurel32.net>
> Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
> ---
> CC: Emil Renner Berthing <kernel(a)esmil.dk>
> CC: Rob Herring <robh(a)kernel.org>
> CC: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
> CC: Conor Dooley <conor+dt(a)kernel.org>
> CC: Changhuang Liang <changhuang.liang(a)starfivetech.com>
> CC: devicetree(a)vger.kernel.org
> CC: linux-riscv(a)lists.infradead.org
> CC: linux-kernel(a)vger.kernel.org
> ---
> arch/riscv/boot/dts/starfive/jh7110-common.dtsi | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> index c7771b3b64758..d6c55f1cc96a9 100644
> --- a/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> +++ b/arch/riscv/boot/dts/starfive/jh7110-common.dtsi
> @@ -128,7 +128,6 @@ &camss {
> assigned-clocks = <&ispcrg JH7110_ISPCLK_DOM4_APB_FUNC>,
> <&ispcrg JH7110_ISPCLK_MIPI_RX0_PXL>;
> assigned-clock-rates = <49500000>, <198000000>;
> - status = "okay";
>
> ports {
> #address-cells = <1>;
> @@ -151,7 +150,6 @@ camss_from_csi2rx: endpoint { &csi2rx {
> assigned-clocks = <&ispcrg JH7110_ISPCLK_VIN_SYS>;
> assigned-clock-rates = <297000000>;
> - status = "okay";
>
> ports {
> #address-cells = <1>;
> --
> 2.45.2
The patch titled
Subject: ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-pass-u64-to-ocfs2_truncate_inline-maybe-overflow.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Edward Adam Davis <eadavis(a)qq.com>
Subject: ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow
Date: Wed, 16 Oct 2024 19:43:47 +0800
Syzbot reported a kernel BUG in ocfs2_truncate_inline. There are two
reasons for this: first, the parameter value passed is greater than
ocfs2_max_inline_data_with_xattr, second, the start and end parameters of
ocfs2_truncate_inline are "unsigned int".
So, we need to add a sanity check for byte_start and byte_len right before
ocfs2_truncate_inline() in ocfs2_remove_inode_range(), if they are greater
than ocfs2_max_inline_data_with_xattr return -EINVAL.
Link: https://lkml.kernel.org/r/tencent_D48DB5122ADDAEDDD11918CFB68D93258C07@qq.c…
Fixes: 1afc32b95233 ("ocfs2: Write support for inline data")
Signed-off-by: Edward Adam Davis <eadavis(a)qq.com>
Reported-by: syzbot+81092778aac03460d6b7(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=81092778aac03460d6b7
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/file.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/fs/ocfs2/file.c~ocfs2-pass-u64-to-ocfs2_truncate_inline-maybe-overflow
+++ a/fs/ocfs2/file.c
@@ -1784,6 +1784,14 @@ int ocfs2_remove_inode_range(struct inod
return 0;
if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) {
+ int id_count = ocfs2_max_inline_data_with_xattr(inode->i_sb, di);
+
+ if (byte_start > id_count || byte_start + byte_len > id_count) {
+ ret = -EINVAL;
+ mlog_errno(ret);
+ goto out;
+ }
+
ret = ocfs2_truncate_inline(inode, di_bh, byte_start,
byte_start + byte_len, 0);
if (ret) {
_
Patches currently in -mm which might be from eadavis(a)qq.com are
ocfs2-pass-u64-to-ocfs2_truncate_inline-maybe-overflow.patch
Hey,
Would you be interested in acquiring the attendees list of NEPCON NAGOYA 2024?
List contains: Names, Titles, Phone Numbers, Company Details, and more…
Interested? Let me know so that I’ll send you the pricing for the same.
Kind Regards,
Jane Wilkins
Marketing Executive
If you do not wish to receive our emails, please reply with "Not Interested."
Jeongjun Park <aha310510(a)gmail.com> wrote:
>
> I got the following KCSAN report during syzbot testing:
>
> ==================================================================
> BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current
>
> write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1:
> inode_set_ctime_to_ts include/linux/fs.h:1638 [inline]
> inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626
> shmem_mknod+0x117/0x180 mm/shmem.c:3443
> shmem_create+0x34/0x40 mm/shmem.c:3497
> lookup_open fs/namei.c:3578 [inline]
> open_last_lookups fs/namei.c:3647 [inline]
> path_openat+0xdbc/0x1f00 fs/namei.c:3883
> do_filp_open+0xf7/0x200 fs/namei.c:3913
> do_sys_openat2+0xab/0x120 fs/open.c:1416
> do_sys_open fs/open.c:1431 [inline]
> __do_sys_openat fs/open.c:1447 [inline]
> __se_sys_openat fs/open.c:1442 [inline]
> __x64_sys_openat+0xf3/0x120 fs/open.c:1442
> x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0:
> inode_get_ctime_nsec include/linux/fs.h:1623 [inline]
> inode_get_ctime include/linux/fs.h:1629 [inline]
> generic_fillattr+0x1dd/0x2f0 fs/stat.c:62
> shmem_getattr+0x17b/0x200 mm/shmem.c:1157
> vfs_getattr_nosec fs/stat.c:166 [inline]
> vfs_getattr+0x19b/0x1e0 fs/stat.c:207
> vfs_statx_path fs/stat.c:251 [inline]
> vfs_statx+0x134/0x2f0 fs/stat.c:315
> vfs_fstatat+0xec/0x110 fs/stat.c:341
> __do_sys_newfstatat fs/stat.c:505 [inline]
> __se_sys_newfstatat+0x58/0x260 fs/stat.c:499
> __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499
> x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> value changed: 0x2755ae53 -> 0x27ee44d3
>
> Since there is no special protection when shmem_getattr() calls
> generic_fillattr(), data-race occurs by functions such as shmem_unlink()
> or shmem_mknod(). This can cause unexpected results, so commenting it out
> is not enough.
>
> Therefore, when calling generic_fillattr() from shmem_getattr(), it is
> appropriate to protect the inode using inode_lock_shared() and
> inode_unlock_shared() to prevent data-race.
>
Cc: stable(a)vger.kernel.org
I think this patch should be applied from next rc version and also stable
version. When calling generic_fillattr(), if you don't hold read lock,
data-race will occur in inode member variables, which can cause unexpected
behavior. This problem is also present in several stable versions, so I think
it should be fixed as soon as possible.
Regards,
Jeongjun Park
> Reported-by: syzbot <syzkaller(a)googlegroups.com>
> Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat")
> Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
> ---
> mm/shmem.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 5a77acf6ac6a..9beeb47c3743 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1154,7 +1154,9 @@ static int shmem_getattr(struct mnt_idmap *idmap,
> stat->attributes_mask |= (STATX_ATTR_APPEND |
> STATX_ATTR_IMMUTABLE |
> STATX_ATTR_NODUMP);
> + inode_lock_shared(inode);
> generic_fillattr(idmap, request_mask, inode, stat);
> + inode_unlock_shared(inode);
>
> if (shmem_is_huge(inode, 0, false, NULL, 0))
> stat->blksize = HPAGE_PMD_SIZE;
> --
The patch titled
Subject: mm/gup: stop leaking pinned pages in low memory conditions
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-gup-stop-leaking-pinned-pages-in-low-memory-conditions.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: John Hubbard <jhubbard(a)nvidia.com>
Subject: mm/gup: stop leaking pinned pages in low memory conditions
Date: Wed, 16 Oct 2024 13:22:42 -0700
If a driver tries to call any of the pin_user_pages*(FOLL_LONGTERM) family
of functions, and requests "too many" pages, then the call will
erroneously leave pages pinned. This is visible in user space as an
actual memory leak.
Repro is trivial: just make enough pin_user_pages(FOLL_LONGTERM) calls to
exhaust memory.
The root cause of the problem is this sequence, within
__gup_longterm_locked():
__get_user_pages_locked()
rc = check_and_migrate_movable_pages()
...which gets retried in a loop. The loop error handling is incomplete,
clearly due to a somewhat unusual and complicated tri-state error API.
But anyway, if -ENOMEM, or in fact, any unexpected error is returned from
check_and_migrate_movable_pages(), then __gup_longterm_locked() happily
returns the error, while leaving the pages pinned.
In the failed case, which is an app that requests (via a device driver)
30720000000 bytes to be pinned, and then exits, I see this:
$ grep foll /proc/vmstat
nr_foll_pin_acquired 7502048
nr_foll_pin_released 2048
And after applying this patch, it returns to balanced pins:
$ grep foll /proc/vmstat
nr_foll_pin_acquired 7502048
nr_foll_pin_released 7502048
Fix this by unpinning the pages that __get_user_pages_locked() has
pinned, in such error cases.
Link: https://lkml.kernel.org/r/20241016202242.456953-1-jhubbard@nvidia.com
Fixes: 24a95998e9ba ("mm/gup.c: simplify and fix check_and_migrate_movable_pages() return codes")
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Shigeru Yoshida <syoshida(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/mm/gup.c~mm-gup-stop-leaking-pinned-pages-in-low-memory-conditions
+++ a/mm/gup.c
@@ -2492,6 +2492,17 @@ static long __gup_longterm_locked(struct
/* FOLL_LONGTERM implies FOLL_PIN */
rc = check_and_migrate_movable_pages(nr_pinned_pages, pages);
+
+ /*
+ * The __get_user_pages_locked() call happens before we know
+ * that whether it's possible to successfully complete the whole
+ * operation. To compensate for this, if we get an unexpected
+ * error (such as -ENOMEM) then we must unpin everything, before
+ * erroring out.
+ */
+ if (rc != -EAGAIN && rc != 0)
+ unpin_user_pages(pages, nr_pinned_pages);
+
} while (rc == -EAGAIN);
memalloc_pin_restore(flags);
return rc ? rc : nr_pinned_pages;
_
Patches currently in -mm which might be from jhubbard(a)nvidia.com are
mm-gup-stop-leaking-pinned-pages-in-low-memory-conditions.patch
kaslr-rename-physmem_end-and-physmem_end-to-direct_map_physmem_end.patch
The patch titled
Subject: x86/traps: move kmsan check after instrumentation_begin
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
x86-traps-move-kmsan-check-after-instrumentation_begin.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Subject: x86/traps: move kmsan check after instrumentation_begin
Date: Wed, 16 Oct 2024 20:24:07 +0500
During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following:
AR built-in.a
AR vmlinux.a
LD vmlinux.o
vmlinux.o: warning: objtool: handle_bug+0x4: call to
kmsan_unpoison_entry_regs() leaves .noinstr.text section
OBJCOPY modules.builtin.modinfo
GEN modules.builtin
MODPOST Module.symvers
CC .vmlinux.export.o
Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes
the warning.
There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but
it has the return condition and if we include it after
instrumentation_begin() it results the warning "return with
instrumentation enabled", hence, I'm concerned that regs will not be KMSAN
unpoisoned if `ud_type == BUG_NONE` is true.
Link: https://lkml.kernel.org/r/20241016152407.3149001-1-snovitoll@gmail.com
Fixes: ba54d194f8da ("x86/traps: avoid KMSAN bugs originating from handle_bug()")
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
Reviewed-by: Alexander Potapenko <glider(a)google.com>
Cc: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/traps.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/arch/x86/kernel/traps.c~x86-traps-move-kmsan-check-after-instrumentation_begin
+++ a/arch/x86/kernel/traps.c
@@ -261,12 +261,6 @@ static noinstr bool handle_bug(struct pt
int ud_type;
u32 imm;
- /*
- * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug()
- * is a rare case that uses @regs without passing them to
- * irqentry_enter().
- */
- kmsan_unpoison_entry_regs(regs);
ud_type = decode_bug(regs->ip, &imm);
if (ud_type == BUG_NONE)
return handled;
@@ -276,6 +270,12 @@ static noinstr bool handle_bug(struct pt
*/
instrumentation_begin();
/*
+ * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug()
+ * is a rare case that uses @regs without passing them to
+ * irqentry_enter().
+ */
+ kmsan_unpoison_entry_regs(regs);
+ /*
* Since we're emulating a CALL with exceptions, restore the interrupt
* state to what it was at the exception site.
*/
_
Patches currently in -mm which might be from snovitoll(a)gmail.com are
x86-traps-move-kmsan-check-after-instrumentation_begin.patch
mm-kasan-kmsan-copy_from-to_kernel_nofault.patch
-Wenum-enum-conversion and -Wenum-compare-conditional were strengthened
in clang-19 to warn in C mode, which caused the kernel to move them to
W=1 in commit 75b5ab134bb5 ("kbuild: Move
-Wenum-{compare-conditional,enum-conversion} into W=1") because there
were numerous instances of each that would break builds with -Werror.
Unfortunately, this is not a full solution, as more and more developers,
subsystems, and distributors are building with W=1 as well, so they
continue to see the numerous instances of these warnings.
Since the move to W=1, there have not been many new instances that have
appeared through various build reports and the ones that have appeared
seem to be following similar existing patterns, suggesting that most
instances of these warnings will not be real issues. The only
alternatives for silencing these warnings are adding casts (which is
generally seen as an ugly practice) or refactoring the enums to macro
defines or a unified enum (which may be undesirable because of type
safety in other parts of the code).
Disable the warnings altogether so that W=1 users do not see them.
Cc: stable(a)vger.kernel.org
Fixes: 75b5ab134bb5 ("kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1")
Link: https://lore.kernel.org/ZwRA9SOcOjjLJcpi@google.com/
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
I am open to discussion around whether or not to disable this entirely
but I think it needs to be out of W=1.
---
scripts/Makefile.extrawarn | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index 1d13cecc7cc7808610e635ddc03476cf92b3a8c1..3f124c80bee5133899047c07bc05e3173ee4c7f0 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -31,6 +31,8 @@ KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
ifdef CONFIG_CC_IS_CLANG
# The kernel builds with '-std=gnu11' so use of GNU extensions is acceptable.
KBUILD_CFLAGS += -Wno-gnu
+KBUILD_CFLAGS += -Wno-enum-compare-conditional
+KBUILD_CFLAGS += -Wno-enum-enum-conversion
else
# gcc inanely warns about local variables called 'main'
@@ -129,8 +131,6 @@ endif
KBUILD_CFLAGS += $(call cc-disable-warning, pointer-to-enum-cast)
KBUILD_CFLAGS += -Wno-tautological-constant-out-of-range-compare
KBUILD_CFLAGS += $(call cc-disable-warning, unaligned-access)
-KBUILD_CFLAGS += -Wno-enum-compare-conditional
-KBUILD_CFLAGS += -Wno-enum-enum-conversion
endif
endif
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241016-disable-two-clang-enum-warnings-e7994d44f948
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
Add missing Broadcast_AND region to the LLCC block for x1e80100,
as the LLCC version on this platform is 4.1 and it provides the region.
This also fixes the following error caused by the missing region:
[ 3.797768] qcom-llcc 25000000.system-cache-controller: error -EINVAL: invalid resource (null)
This error started showing up only after the new regmap region called
Broadcast_AND that has been added to the llcc-qcom driver.
Cc: stable(a)vger.kernel.org # 6.11: 055afc34fd21: soc: qcom: llcc: Add regmap for Broadcast_AND region
Fixes: af16b00578a7 ("arm64: dts: qcom: Add base X1E80100 dtsi and the QCP dts")
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
Changes in v2:
- fixed subject line to say x1e80100 instead of sm8450
- mentioned the reason why the new error is showing up
and how it is related to the llcc-qcom driver
- cc'ed stable with patch dependency for cherry-picking
- Link to v1: https://lore.kernel.org/r/20240718-x1e80100-dts-llcc-add-broadcastand_regio…
---
arch/arm64/boot/dts/qcom/x1e80100.dtsi | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
index 0e6802c1d2d8375987c614ec69c440e2f38d25c6..fbf1acf8b0d84a2d2c723785242a65f47e63340b 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
@@ -6093,7 +6093,8 @@ system-cache-controller@25000000 {
<0 0x25a00000 0 0x200000>,
<0 0x25c00000 0 0x200000>,
<0 0x25e00000 0 0x200000>,
- <0 0x26000000 0 0x200000>;
+ <0 0x26000000 0 0x200000>,
+ <0 0x26200000 0 0x200000>;
reg-names = "llcc0_base",
"llcc1_base",
"llcc2_base",
@@ -6102,7 +6103,8 @@ system-cache-controller@25000000 {
"llcc5_base",
"llcc6_base",
"llcc7_base",
- "llcc_broadcast_base";
+ "llcc_broadcast_base",
+ "llcc_broadcast_and_base";
interrupts = <GIC_SPI 266 IRQ_TYPE_LEVEL_HIGH>;
};
---
base-commit: d61a00525464bfc5fe92c6ad713350988e492b88
change-id: 20240718-x1e80100-dts-llcc-add-broadcastand_region-797413ee2a8f
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
This problem reported by Clement LE GOFFIC manifest when
using CONFIG_KASAN_IN_VMALLOC and VMAP_STACK:
https://lore.kernel.org/linux-arm-kernel/a1a1d062-f3a2-4d05-9836-3b098de9db…
After some analysis it seems we are missing to sync the
VMALLOC shadow memory in top level PGD to all CPUs.
Add some code to perform this sync, and the bug appears
to go away.
As suggested by Ard, also perform a dummy read from the
shadow memory of the new VMAP_STACK in the low level
assembly.
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
Linus Walleij (2):
ARM: ioremap: Flush PGDs for VMALLOC shadow
ARM: entry: Do a dummy read from VMAP shadow
arch/arm/kernel/entry-armv.S | 8 ++++++++
arch/arm/mm/ioremap.c | 7 +++++++
2 files changed, 15 insertions(+)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241015-arm-kasan-vmalloc-crash-fcbd51416457
Best regards,
--
Linus Walleij <linus.walleij(a)linaro.org>
When copying a namespace we won't have added the new copy into the
namespace rbtree until after the copy succeeded. Calling free_mnt_ns()
will try to remove the copy from the rbtree which is invalid. Simply
free the namespace skeleton directly.
Fixes: 1901c92497bd ("fs: keep an index of current mount namespaces")
Cc: stable(a)vger.kernel.org # v6.11+
Reported-by: Brad Spengler <spender(a)grsecurity.net>
Tested-by: Brad Spengler <spender(a)grsecurity.net>
Suggested-by: Brad Spengler <spender(a)grsecurity.net>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
In vfs.fixes unless I hear objections.
---
fs/namespace.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index 93c377816d75..d26f5e6d2ca3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3944,7 +3944,9 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
new = copy_tree(old, old->mnt.mnt_root, copy_flags);
if (IS_ERR(new)) {
namespace_unlock();
- free_mnt_ns(new_ns);
+ ns_free_inum(&new_ns->ns);
+ dec_mnt_namespaces(new_ns->ucounts);
+ mnt_ns_release(new_ns);
return ERR_CAST(new);
}
if (user_ns != ns->user_ns) {
--
2.45.2
iowarrior_read() uses the iowarrior dev structure, but does not use any
lock on the structure. This can cause various bugs including data-races,
so it is more appropriate to use a mutex lock to safely protect the
iowarrior dev structure. When using a mutex lock, you should split the
branch to prevent blocking when the O_NONBLOCK flag is set.
In addition, it is unnecessary to check for NULL on the iowarrior dev
structure obtained by reading file->private_data. Therefore, it is
better to remove the check.
Cc: stable(a)vger.kernel.org
Fixes: 946b960d13c1 ("USB: add driver for iowarrior devices.")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
---
v1 -> v2: Added cc tag and change log
drivers/usb/misc/iowarrior.c | 46 ++++++++++++++++++++++++++++--------
1 file changed, 36 insertions(+), 10 deletions(-)
diff --git a/drivers/usb/misc/iowarrior.c b/drivers/usb/misc/iowarrior.c
index 6d28467ce352..a513766b4985 100644
--- a/drivers/usb/misc/iowarrior.c
+++ b/drivers/usb/misc/iowarrior.c
@@ -277,28 +277,45 @@ static ssize_t iowarrior_read(struct file *file, char __user *buffer,
struct iowarrior *dev;
int read_idx;
int offset;
+ int retval;
dev = file->private_data;
+ if (file->f_flags & O_NONBLOCK) {
+ retval = mutex_trylock(&dev->mutex);
+ if (!retval)
+ return -EAGAIN;
+ } else {
+ retval = mutex_lock_interruptible(&dev->mutex);
+ if (retval)
+ return -ERESTARTSYS;
+ }
+
/* verify that the device wasn't unplugged */
- if (!dev || !dev->present)
- return -ENODEV;
+ if (!dev->present) {
+ retval = -ENODEV;
+ goto exit;
+ }
dev_dbg(&dev->interface->dev, "minor %d, count = %zd\n",
dev->minor, count);
/* read count must be packet size (+ time stamp) */
if ((count != dev->report_size)
- && (count != (dev->report_size + 1)))
- return -EINVAL;
+ && (count != (dev->report_size + 1))) {
+ retval = -EINVAL;
+ goto exit;
+ }
/* repeat until no buffer overrun in callback handler occur */
do {
atomic_set(&dev->overflow_flag, 0);
if ((read_idx = read_index(dev)) == -1) {
/* queue empty */
- if (file->f_flags & O_NONBLOCK)
- return -EAGAIN;
+ if (file->f_flags & O_NONBLOCK) {
+ retval = -EAGAIN;
+ goto exit;
+ }
else {
//next line will return when there is either new data, or the device is unplugged
int r = wait_event_interruptible(dev->read_wait,
@@ -309,28 +326,37 @@ static ssize_t iowarrior_read(struct file *file, char __user *buffer,
-1));
if (r) {
//we were interrupted by a signal
- return -ERESTART;
+ retval = -ERESTART;
+ goto exit;
}
if (!dev->present) {
//The device was unplugged
- return -ENODEV;
+ retval = -ENODEV;
+ goto exit;
}
if (read_idx == -1) {
// Can this happen ???
- return 0;
+ retval = 0;
+ goto exit;
}
}
}
offset = read_idx * (dev->report_size + 1);
if (copy_to_user(buffer, dev->read_queue + offset, count)) {
- return -EFAULT;
+ retval = -EFAULT;
+ goto exit;
}
} while (atomic_read(&dev->overflow_flag));
read_idx = ++read_idx == MAX_INTERRUPT_BUFFER ? 0 : read_idx;
atomic_set(&dev->read_idx, read_idx);
+ mutex_unlock(&dev->mutex);
return count;
+
+exit:
+ mutex_unlock(&dev->mutex);
+ return retval;
}
/*
--
On 11.10.24 10:48, Joel GUITTET wrote:
>
> I faced some issue related to scaling frequency on ZynqMP device
> using v5.15.167 kernel. As an exemple setting the scaling frequency
> below show it's not properly set:
>
> cat /sys/devices/system/cpu/cpufreq/policy0/
> scaling_available_frequencies 299999 399999 599999 1199999
>
> echo 399999 > /sys/devices/system/cpu/cpufreq/policy0/
> scaling_setspeed
>
> cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq 399999
>
> cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq 299999
> ====> Should be 399999
>
> After analysis of this issue with the help of Xilinx, it appears
> that a commit was backported on the 5.15.y branch, but probably it
> should not, or not as is. The commit is
> 9117fc44fd3a9538261e530ba5a022dfc9519620 modifying drivers/clk/
> zynqmp/divider.c.
FWIW, that is 1fe15be1fb6135 ("drivers: clk: zynqmp: update divider
round rate logic") [v6.8-rc1].
> Is anybody reading this message able to answer why it was
> backported ?
Looks like because it fixes a bug. I CCed the original author and those
that handled the patch, maybe they can help us out and tell us what's
the best strategy forward here.
> The information I have until now is that it is intended
> recent kernel version. Dependencies for this modification is
> currently under clarification with Xilinx (maybe another commit to
> backport).
>
> By the way, reverting this commit fix the issue shown above.
Does 6.12-rc work fine for you? Because if not, we should fix the
problem there.
Ciao, Thorsten
The NVMe regulator has been left enabled by the boot firmware. Mark it
as such to avoid disabling the regulator temporarily during boot.
Fixes: eb57cbe730d1 ("arm64: dts: qcom: x1e80100: Describe the PCIe 6a resources")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Abel Vesa <abel.vesa(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-qcp.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
index 1c3a6a7b3ed6..5ef030c60abe 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-qcp.dts
@@ -253,6 +253,8 @@ vreg_nvme: regulator-nvme {
pinctrl-names = "default";
pinctrl-0 = <&nvme_reg_en>;
+
+ regulator-boot-on;
};
};
--
2.45.2
The NVMe regulator has been left enabled by the boot firmware. Mark it
as such to avoid disabling the regulator temporarily during boot.
Fixes: d0e2f8f62dff ("arm64: dts: qcom: Add device tree for ASUS Vivobook S 15")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Xilin Wu <wuxilin123(a)gmail.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts b/arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts
index 20616bd4aa6c..fb4a48a1e2a8 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-asus-vivobook-s15.dts
@@ -134,6 +134,8 @@ vreg_nvme: regulator-nvme {
pinctrl-0 = <&nvme_reg_en>;
pinctrl-names = "default";
+
+ regulator-boot-on;
};
};
--
2.45.2
The NVMe regulator has been left enabled by the boot firmware. Mark it
as such to avoid disabling the regulator temporarily during boot.
Fixes: eb57cbe730d1 ("arm64: dts: qcom: x1e80100: Describe the PCIe 6a resources")
Cc: stable(a)vger.kernel.org # 6.11
Cc: Abel Vesa <abel.vesa(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/x1e80100-crd.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/x1e80100-crd.dts b/arch/arm64/boot/dts/qcom/x1e80100-crd.dts
index 10b28d870f08..5d2eec8590ce 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100-crd.dts
+++ b/arch/arm64/boot/dts/qcom/x1e80100-crd.dts
@@ -300,6 +300,8 @@ vreg_nvme: regulator-nvme {
pinctrl-names = "default";
pinctrl-0 = <&nvme_reg_en>;
+
+ regulator-boot-on;
};
vreg_wwan: regulator-wwan {
--
2.45.2
The environ pointer itself is never NULL, this is guaranteed by crt.h.
However if the environment is empty, environ will point to a NULL
pointer.
While this case will be checked by the loop later, this only happens
after the first loop iteration.
To avoid reading invalid memory inside the loop, fix the test that
checks for an empty environment.
Fixes: 077d0a392446 ("tools/nolibc/stdlib: add a simple getenv() implementation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
tools/include/nolibc/stdlib.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/include/nolibc/stdlib.h b/tools/include/nolibc/stdlib.h
index 75aa273c23a6153db6a32facaea16457a522703b..c31967378cf1f699283d801487c1a91d17e4d1ce 100644
--- a/tools/include/nolibc/stdlib.h
+++ b/tools/include/nolibc/stdlib.h
@@ -90,7 +90,7 @@ char *getenv(const char *name)
{
int idx, i;
- if (environ) {
+ if (*environ) {
for (idx = 0; environ[idx]; idx++) {
for (i = 0; name[i] && name[i] == environ[idx][i];)
i++;
---
base-commit: 2f87d0916ce0d2925cedbc9e8f5d6291ba2ac7b2
change-id: 20241016-nolibc-getenv-f99f4d3df2cd
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Treat each completed full size write to /dev/ttyDBC0 as a separate usb
transfer. Make sure the size of the TRBs matches the size of the tty
write by first queuing as many max packet size TRBs as possible up to
the last TRB which will be cut short to match the size of the tty write.
This solves an issue where userspace writes several transfers back to
back via /dev/ttyDBC0 into a kfifo before dbgtty can find available
request to turn that kfifo data into TRBs on the transfer ring.
The boundary between transfer was lost as xhci-dbgtty then turned
everyting in the kfifo into as many 'max packet size' TRBs as possible.
DbC would then send more data to the host than intended for that
transfer, causing host to issue a babble error.
Refuse to write more data to kfifo until previous tty write data is
turned into properly sized TRBs with data size boundaries matching tty
write size
Tested-by: Uday M Bhat <uday.m.bhat(a)intel.com>
Tested-by: Łukasz Bartosik <ukaszb(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-dbgcap.h | 1 +
drivers/usb/host/xhci-dbgtty.c | 55 ++++++++++++++++++++++++++++++----
2 files changed, 51 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/host/xhci-dbgcap.h b/drivers/usb/host/xhci-dbgcap.h
index 8ec813b6e9fd..9dc8f4d8077c 100644
--- a/drivers/usb/host/xhci-dbgcap.h
+++ b/drivers/usb/host/xhci-dbgcap.h
@@ -110,6 +110,7 @@ struct dbc_port {
struct tasklet_struct push;
struct list_head write_pool;
+ unsigned int tx_boundary;
bool registered;
};
diff --git a/drivers/usb/host/xhci-dbgtty.c b/drivers/usb/host/xhci-dbgtty.c
index b8e78867e25a..d719c16ea30b 100644
--- a/drivers/usb/host/xhci-dbgtty.c
+++ b/drivers/usb/host/xhci-dbgtty.c
@@ -24,6 +24,29 @@ static inline struct dbc_port *dbc_to_port(struct xhci_dbc *dbc)
return dbc->priv;
}
+static unsigned int
+dbc_kfifo_to_req(struct dbc_port *port, char *packet)
+{
+ unsigned int len;
+
+ len = kfifo_len(&port->port.xmit_fifo);
+
+ if (len == 0)
+ return 0;
+
+ len = min(len, DBC_MAX_PACKET);
+
+ if (port->tx_boundary)
+ len = min(port->tx_boundary, len);
+
+ len = kfifo_out(&port->port.xmit_fifo, packet, len);
+
+ if (port->tx_boundary)
+ port->tx_boundary -= len;
+
+ return len;
+}
+
static int dbc_start_tx(struct dbc_port *port)
__releases(&port->port_lock)
__acquires(&port->port_lock)
@@ -36,7 +59,7 @@ static int dbc_start_tx(struct dbc_port *port)
while (!list_empty(pool)) {
req = list_entry(pool->next, struct dbc_request, list_pool);
- len = kfifo_out(&port->port.xmit_fifo, req->buf, DBC_MAX_PACKET);
+ len = dbc_kfifo_to_req(port, req->buf);
if (len == 0)
break;
do_tty_wake = true;
@@ -200,14 +223,32 @@ static ssize_t dbc_tty_write(struct tty_struct *tty, const u8 *buf,
{
struct dbc_port *port = tty->driver_data;
unsigned long flags;
+ unsigned int written = 0;
spin_lock_irqsave(&port->port_lock, flags);
- if (count)
- count = kfifo_in(&port->port.xmit_fifo, buf, count);
- dbc_start_tx(port);
+
+ /*
+ * Treat tty write as one usb transfer. Make sure the writes are turned
+ * into TRB request having the same size boundaries as the tty writes.
+ * Don't add data to kfifo before previous write is turned into TRBs
+ */
+ if (port->tx_boundary) {
+ spin_unlock_irqrestore(&port->port_lock, flags);
+ return 0;
+ }
+
+ if (count) {
+ written = kfifo_in(&port->port.xmit_fifo, buf, count);
+
+ if (written == count)
+ port->tx_boundary = kfifo_len(&port->port.xmit_fifo);
+
+ dbc_start_tx(port);
+ }
+
spin_unlock_irqrestore(&port->port_lock, flags);
- return count;
+ return written;
}
static int dbc_tty_put_char(struct tty_struct *tty, u8 ch)
@@ -241,6 +282,10 @@ static unsigned int dbc_tty_write_room(struct tty_struct *tty)
spin_lock_irqsave(&port->port_lock, flags);
room = kfifo_avail(&port->port.xmit_fifo);
+
+ if (port->tx_boundary)
+ room = 0;
+
spin_unlock_irqrestore(&port->port_lock, flags);
return room;
--
2.25.1
The stream contex type (SCT) bitfield is used both in the stream context
data structure, and in the 'Set TR Dequeue pointer' command TRB.
In both cases it uses bits 3:1
The SCT_FOR_TRB(p) macro used to set the stream context type (SCT) field
for the 'Set TR Dequeue pointer' command TRB incorrectly shifts the value
1 bit left before masking the three bits.
Fix this by first masking and rshifting, just like the similar
SCT_FOR_CTX(p) macro does
This issue has not been visibile as the lost bit 3 is only used with
secondary stream arrays (SSA). Xhci driver currently only supports using
a primary stream array with Linear stream addressing.
Fixes: 95241dbdf828 ("xhci: Set SCT field for Set TR dequeue on streams")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 620502de971a..f0fb696d5619 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1001,7 +1001,7 @@ enum xhci_setup_dev {
/* Set TR Dequeue Pointer command TRB fields, 6.4.3.9 */
#define TRB_TO_STREAM_ID(p) ((((p) & (0xffff << 16)) >> 16))
#define STREAM_ID_FOR_TRB(p) ((((p)) & 0xffff) << 16)
-#define SCT_FOR_TRB(p) (((p) << 1) & 0x7)
+#define SCT_FOR_TRB(p) (((p) & 0x7) << 1)
/* Link TRB specific fields */
#define TRB_TC (1<<1)
--
2.25.1
The struct vchiq_arm_state 'platform_state' is currently allocated
dynamically using kzalloc(). Unfortunately, it is never freed and is
subjected to memory leaks in the error handling paths of the probe()
function.
To address the issue, use device resource management helper
devm_kzalloc(), to ensure cleanup after its allocation.
Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com>
Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org>
---
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
index af623ad87c15..7ece82c361ee 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
@@ -285,7 +285,7 @@ vchiq_platform_init_state(struct vchiq_state *state)
{
struct vchiq_arm_state *platform_state;
- platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL);
+ platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL);
if (!platform_state)
return -ENOMEM;
--
2.45.2
The PLL checks are comparing 64 bit integers with 32 bit
ones. Depending on the values of the variables, this may
underflow.
Fix it ensuring that both sides of the expression are u64.
Fixes: 852b50aeed15 ("media: On Semi AR0521 sensor driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/i2c/ar0521.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/media/i2c/ar0521.c b/drivers/media/i2c/ar0521.c
index fc27238dd4d3..24873149096c 100644
--- a/drivers/media/i2c/ar0521.c
+++ b/drivers/media/i2c/ar0521.c
@@ -255,10 +255,10 @@ static u32 calc_pll(struct ar0521_dev *sensor, u32 freq, u16 *pre_ptr, u16 *mult
continue; /* Minimum value */
if (new_mult > 254)
break; /* Maximum, larger pre won't work either */
- if (sensor->extclk_freq * (u64)new_mult < AR0521_PLL_MIN *
+ if (sensor->extclk_freq * (u64)new_mult < (u64)AR0521_PLL_MIN *
new_pre)
continue;
- if (sensor->extclk_freq * (u64)new_mult > AR0521_PLL_MAX *
+ if (sensor->extclk_freq * (u64)new_mult > (u64)AR0521_PLL_MAX *
new_pre)
break; /* Larger pre won't work either */
new_pll = div64_round_up(sensor->extclk_freq * (u64)new_mult,
--
2.47.0
The error check logic at get_ctrl() is broken: if ptr_to_user()
fails to fill a control due to an error, no errors are returned
and v4l2_g_ctrl() returns success on a failed operation, which
may cause applications to fail.
Add an error check at get_ctrl() and ensure that it will
be returned to userspace without filling the control value if
get_ctrl() fails.
Fixes: 71c689dc2e73 ("media: v4l2-ctrls: split up into four source files")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/v4l2-core/v4l2-ctrls-api.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-ctrls-api.c b/drivers/media/v4l2-core/v4l2-ctrls-api.c
index e5a364efd5e6..be32dccf9830 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls-api.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls-api.c
@@ -753,9 +753,10 @@ static int get_ctrl(struct v4l2_ctrl *ctrl, struct v4l2_ext_control *c)
for (i = 0; i < master->ncontrols; i++)
cur_to_new(master->cluster[i]);
ret = call_op(master, g_volatile_ctrl);
- new_to_user(c, ctrl);
+ if (!ret)
+ ret = new_to_user(c, ctrl);
} else {
- cur_to_user(c, ctrl);
+ ret = cur_to_user(c, ctrl);
}
v4l2_ctrl_unlock(master);
return ret;
@@ -770,7 +771,10 @@ int v4l2_g_ctrl(struct v4l2_ctrl_handler *hdl, struct v4l2_control *control)
if (!ctrl || !ctrl->is_int)
return -EINVAL;
ret = get_ctrl(ctrl, &c);
- control->value = c.value;
+
+ if (!ret)
+ control->value = c.value;
+
return ret;
}
EXPORT_SYMBOL(v4l2_g_ctrl);
--
2.47.0
The logic at tpg_precalculate_line() blindly rescales the
buffer even when scaled_witdh is equal to zero. If this ever
happens, this will cause a division by zero.
Instead, add a WARN_ON() to trigger such cases and return
without doing any precalculation.
Fixes: 63881df94d3e ("[media] vivid: add the Test Pattern Generator")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
index c86343a4d0bf..a22f31515d7e 100644
--- a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
+++ b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
@@ -1795,6 +1795,9 @@ static void tpg_precalculate_line(struct tpg_data *tpg)
unsigned p;
unsigned x;
+ if (WARN_ON(tpg->src_width == 0))
+ return;
+
switch (tpg->pattern) {
case TPG_PAT_GREEN:
contrast = TPG_COLOR_100_RED;
--
2.47.0
The logic at get_edid_tag_location() returns either an offset
or an error condition. However, the error condition uses a
non-standard "-1" value.
Use instead -ENOENT to indicate that the tag was not found.
Fixes: 056f2821b631 ("media: cec: extron-da-hd-4k-plus: add the Extron DA HD 4K Plus CEC driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
.../cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c b/drivers/media/cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c
index a526464af88c..7d03a36df5cf 100644
--- a/drivers/media/cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c
+++ b/drivers/media/cec/usb/extron-da-hd-4k-plus/extron-da-hd-4k-plus.c
@@ -348,12 +348,12 @@ static int get_edid_tag_location(const u8 *edid, unsigned int size,
/* Return if not a CTA-861 extension block */
if (size < 256 || edid[0] != 0x02 || edid[1] != 0x03)
- return -1;
+ return -ENOENT;
/* search tag */
d = edid[0x02] & 0x7f;
if (d <= 4)
- return -1;
+ return -ENOENT;
i = 0x04;
end = 0x00 + d;
@@ -371,7 +371,7 @@ static int get_edid_tag_location(const u8 *edid, unsigned int size,
return offset + i;
i += len + 1;
} while (i < end);
- return -1;
+ return -ENOENT;
}
static void extron_edid_crc(u8 *edid)
--
2.47.0
fepriv->auto_sub_step is unsigned. Setting it to -1 is just a
trick to avoid calling continue.
It relies to have this code just afterwards:
if (!ready) fepriv->auto_sub_step++;
Simplify the code by simply setting it to zero and use
continue to return to the while loop.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/dvb-core/dvb_frontend.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c
index d48f48fda87c..c9283100332a 100644
--- a/drivers/media/dvb-core/dvb_frontend.c
+++ b/drivers/media/dvb-core/dvb_frontend.c
@@ -443,8 +443,8 @@ static int dvb_frontend_swzigzag_autotune(struct dvb_frontend *fe, int check_wra
default:
fepriv->auto_step++;
- fepriv->auto_sub_step = -1; /* it'll be incremented to 0 in a moment */
- break;
+ fepriv->auto_sub_step = 0;
+ continue;
}
if (!ready) fepriv->auto_sub_step++;
--
2.47.0
The dvbdev contains a static variable used to store dvb minors.
The behavior of it depends if CONFIG_DVB_DYNAMIC_MINORS is set
or not. When not set, dvb_register_device() won't check for
boundaries, as it will rely that a previous call to
dvb_register_adapter() would already be enforcing it.
On a similar way, dvb_device_open() uses the assumption
that the register functions already did the needed checks.
This can be fragile if some device ends using different
calls. This also generate warnings on static check analysers.
So, add explicit guards to prevent potential risk of OOM issues.
Fixes: 5dd3f3071070 ("V4L/DVB (9361): Dynamic DVB minor allocation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
---
drivers/media/dvb-core/dvbdev.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/media/dvb-core/dvbdev.c b/drivers/media/dvb-core/dvbdev.c
index b43695bc51e7..14f323fbada7 100644
--- a/drivers/media/dvb-core/dvbdev.c
+++ b/drivers/media/dvb-core/dvbdev.c
@@ -86,10 +86,15 @@ static DECLARE_RWSEM(minor_rwsem);
static int dvb_device_open(struct inode *inode, struct file *file)
{
struct dvb_device *dvbdev;
+ unsigned int minor = iminor(inode);
+
+ if (minor >= MAX_DVB_MINORS)
+ return -ENODEV;
mutex_lock(&dvbdev_mutex);
down_read(&minor_rwsem);
- dvbdev = dvb_minors[iminor(inode)];
+
+ dvbdev = dvb_minors[minor];
if (dvbdev && dvbdev->fops) {
int err = 0;
@@ -525,7 +530,7 @@ int dvb_register_device(struct dvb_adapter *adap, struct dvb_device **pdvbdev,
for (minor = 0; minor < MAX_DVB_MINORS; minor++)
if (!dvb_minors[minor])
break;
- if (minor == MAX_DVB_MINORS) {
+ if (minor >= MAX_DVB_MINORS) {
if (new_node) {
list_del(&new_node->list_head);
kfree(dvbdevfops);
@@ -540,6 +545,14 @@ int dvb_register_device(struct dvb_adapter *adap, struct dvb_device **pdvbdev,
}
#else
minor = nums2minor(adap->num, type, id);
+ if (minor >= MAX_DVB_MINORS) {
+ dvb_media_device_free(dvbdev);
+ list_del(&dvbdev->list_head);
+ kfree(dvbdev);
+ *pdvbdev = NULL;
+ mutex_unlock(&dvbdev_register_lock);
+ return ret;
+ }
#endif
dvbdev->minor = minor;
dvb_minors[minor] = dvb_device_get(dvbdev);
--
2.47.0
From: Ard Biesheuvel <ardb(a)kernel.org>
GCC and Clang both implement stack protector support based on Thread
Local Storage (TLS) variables, and this is used in the kernel to
implement per-task stack cookies, by copying a task's stack cookie into
a per-CPU variable every time it is scheduled in.
Both now also implement -mstack-protector-guard-symbol=, which permits
the TLS variable to be specified directly. This is useful because it
will allow us to move away from using a fixed offset of 40 bytes into
the per-CPU area on x86_64, which requires a lot of special handling in
the per-CPU code and the runtime relocation code.
However, while GCC is rather lax in its implementation of this command
line option, Clang actually requires that the provided symbol name
refers to a TLS variable (i.e., one declared with __thread), although it
also permits the variable to be undeclared entirely, in which case it
will use an implicit declaration of the right type.
The upshot of this is that Clang will emit the correct references to the
stack cookie variable in most cases, e.g.,
10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax
10f: R_386_32 __stack_chk_guard
However, if a non-TLS definition of the symbol in question is visible in
the same compilation unit (which amounts to the whole of vmlinux if LTO
is enabled), it will drop the per-CPU prefix and emit a load from a
bogus address.
Work around this by using a symbol name that never occurs in C code, and
emit it as an alias in the linker script.
Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Cc: <stable(a)vger.kernel.org>
Cc: Fangrui Song <i(a)maskray.me>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Uros Bizjak <ubizjak(a)gmail.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
v2: add declaration of alias to asm-prototypes.h, but expose it only to
genksyms
arch/x86/Makefile | 5 +++--
arch/x86/entry/entry.S | 16 ++++++++++++++++
arch/x86/include/asm/asm-prototypes.h | 3 +++
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/kernel/vmlinux.lds.S | 3 +++
5 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index cd75e78a06c1..5b773b34768d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -142,9 +142,10 @@ ifeq ($(CONFIG_X86_32),y)
ifeq ($(CONFIG_STACKPROTECTOR),y)
ifeq ($(CONFIG_SMP),y)
- KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+ KBUILD_CFLAGS += -mstack-protector-guard-reg=fs \
+ -mstack-protector-guard-symbol=__ref_stack_chk_guard
else
- KBUILD_CFLAGS += -mstack-protector-guard=global
+ KBUILD_CFLAGS += -mstack-protector-guard=global
endif
endif
else
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index d9feadffa972..a503e6d535f8 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -46,3 +46,19 @@ EXPORT_SYMBOL_GPL(mds_verw_sel);
.popsection
THUNK warn_thunk_thunk, __warn_thunk
+
+#ifndef CONFIG_X86_64
+/*
+ * Clang's implementation of TLS stack cookies requires the variable in
+ * question to be a TLS variable. If the variable happens to be defined as an
+ * ordinary variable with external linkage in the same compilation unit (which
+ * amounts to the whole of vmlinux with LTO enabled), Clang will drop the
+ * segment register prefix from the references, resulting in broken code. Work
+ * around this by avoiding the symbol used in -mstack-protector-guard-symbol=
+ * entirely in the C code, and use an alias emitted by the linker script
+ * instead.
+ */
+#ifdef CONFIG_STACKPROTECTOR
+EXPORT_SYMBOL(__ref_stack_chk_guard);
+#endif
+#endif
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 25466c4d2134..3674006e3974 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -20,3 +20,6 @@
extern void cmpxchg8b_emu(void);
#endif
+#if defined(__GENKSYMS__) && defined(CONFIG_STACKPROTECTOR)
+extern unsigned long __ref_stack_chk_guard;
+#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 07a34d723505..ba83f54dfaa8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2085,8 +2085,10 @@ void syscall_init(void)
#ifdef CONFIG_STACKPROTECTOR
DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+#ifndef CONFIG_SMP
EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
#endif
+#endif
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 2b7c8c14c6fd..a80ad2bf8da4 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -490,6 +490,9 @@ SECTIONS
. = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
"kernel image bigger than KERNEL_IMAGE_SIZE");
+/* needed for Clang - see arch/x86/entry/entry.S */
+PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);
+
#ifdef CONFIG_X86_64
/*
* Per-cpu symbols which need to be offset from __per_cpu_load
--
2.47.0.rc0.187.ge670bccf7e-goog
Commit 904140fa4553 ("dt-bindings: pinctrl: samsung: use Exynos7
fallbacks for newer wake-up controllers") added
samsung,exynos7-wakeup-eint fallback to some compatibles, so the
intention in the if:then: conditions was to handle the cases:
1. Single Exynos7 compatible or Exynos5433+Exynos7 or
Exynos7885+Exynos7: only one interrupt
2. Exynos850+Exynos7: no interrupts
This was not implemented properly however and if:then: block matches
only single Exynos5433 or Exynos7885 compatibles, which do not exist in
DTS anymore, so basically is a no-op and no enforcement on number of
interrupts is made by the binding.
Fix the if:then: condition so interrupts in the Exynos5433 and
Exynos7885 wake-up pin controller will be properly constrained.
Fixes: 904140fa4553 ("dt-bindings: pinctrl: samsung: use Exynos7 fallbacks for newer wake-up controllers")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Cc: Igor Belwon <igor.belwon(a)mentallysanemainliners.org>
---
.../samsung,pinctrl-wakeup-interrupt.yaml | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/Documentation/devicetree/bindings/pinctrl/samsung,pinctrl-wakeup-interrupt.yaml b/Documentation/devicetree/bindings/pinctrl/samsung,pinctrl-wakeup-interrupt.yaml
index 91516fedc872..49cb2b1a3d28 100644
--- a/Documentation/devicetree/bindings/pinctrl/samsung,pinctrl-wakeup-interrupt.yaml
+++ b/Documentation/devicetree/bindings/pinctrl/samsung,pinctrl-wakeup-interrupt.yaml
@@ -92,14 +92,17 @@ allOf:
- if:
properties:
compatible:
- # Match without "contains", to skip newer variants which are still
- # compatible with samsung,exynos7-wakeup-eint
- enum:
- - samsung,s5pv210-wakeup-eint
- - samsung,exynos4210-wakeup-eint
- - samsung,exynos5433-wakeup-eint
- - samsung,exynos7-wakeup-eint
- - samsung,exynos7885-wakeup-eint
+ oneOf:
+ # Match without "contains", to skip newer variants which are still
+ # compatible with samsung,exynos7-wakeup-eint
+ - enum:
+ - samsung,exynos4210-wakeup-eint
+ - samsung,exynos7-wakeup-eint
+ - samsung,s5pv210-wakeup-eint
+ - contains:
+ enum:
+ - samsung,exynos5433-wakeup-eint
+ - samsung,exynos7885-wakeup-eint
then:
properties:
interrupts:
--
2.43.0
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: ffd95846c6ec6cf1f93da411ea10d504036cab42
Gitweb: https://git.kernel.org/tip/ffd95846c6ec6cf1f93da411ea10d504036cab42
Author: Zhang Rui <rui.zhang(a)intel.com>
AuthorDate: Tue, 15 Oct 2024 14:15:22 +08:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Tue, 15 Oct 2024 05:45:18 -07:00
x86/apic: Always explicitly disarm TSC-deadline timer
New processors have become pickier about the local APIC timer state
before entering low power modes. These low power modes are used (for
example) when you close your laptop lid and suspend. If you put your
laptop in a bag and it is not in this low power mode, it is likely
to get quite toasty while it quickly sucks the battery dry.
The problem boils down to some CPUs' inability to power down until the
CPU recognizes that the local APIC timer is shut down. The current
kernel code works in one-shot and periodic modes but does not work for
deadline mode. Deadline mode has been the supported and preferred mode
on Intel CPUs for over a decade and uses an MSR to drive the timer
instead of an APIC register.
Disable the TSC Deadline timer in lapic_timer_shutdown() by writing to
MSR_IA32_TSC_DEADLINE when in TSC-deadline mode. Also avoid writing
to the initial-count register (APIC_TMICT) which is ignored in
TSC-deadline mode.
Note: The APIC_LVTT|=APIC_LVT_MASKED operation should theoretically be
enough to tell the hardware that the timer will not fire in any of the
timer modes. But mitigating AMD erratum 411[1] also requires clearing
out APIC_TMICT. Solely setting APIC_LVT_MASKED is also ineffective in
practice on Intel Lunar Lake systems, which is the motivation for this
change.
1. 411 Processor May Exit Message-Triggered C1E State Without an Interrupt if Local APIC Timer Reaches Zero - https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/revisio…
Fixes: 279f1461432c ("x86: apic: Use tsc deadline for oneshot when available")
Suggested-by: Dave Hansen <dave.hansen(a)intel.com>
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20241015061522.25288-1-rui.zhang%40intel.com
---
arch/x86/kernel/apic/apic.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 6513c53..c5fb28e 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -440,7 +440,19 @@ static int lapic_timer_shutdown(struct clock_event_device *evt)
v = apic_read(APIC_LVTT);
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write(APIC_LVTT, v);
- apic_write(APIC_TMICT, 0);
+
+ /*
+ * Setting APIC_LVT_MASKED (above) should be enough to tell
+ * the hardware that this timer will never fire. But AMD
+ * erratum 411 and some Intel CPU behavior circa 2024 say
+ * otherwise. Time for belt and suspenders programming: mask
+ * the timer _and_ zero the counter registers:
+ */
+ if (v & APIC_LVT_TIMER_TSCDEADLINE)
+ wrmsrl(MSR_IA32_TSC_DEADLINE, 0);
+ else
+ apic_write(APIC_TMICT, 0);
+
return 0;
}
The patch titled
Subject: fork: only invoke khugepaged, ksm hooks if no error
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fork-only-invoke-khugepaged-ksm-hooks-if-no-error.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: fork: only invoke khugepaged, ksm hooks if no error
Date: Tue, 15 Oct 2024 18:56:06 +0100
There is no reason to invoke these hooks early against an mm that is in an
incomplete state.
The change in commit d24062914837 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.
Their placement early in dup_mmap() only appears to have been meaningful
for early error checking, and since functionally it'd require a very small
allocation to fail (in practice 'too small to fail') that'd only occur in
the most dire circumstances, meaning the fork would fail or be OOM'd in
any case.
Since both khugepaged and KSM tracking are there to provide optimisations
to memory performance rather than critical functionality, it doesn't
really matter all that much if, under such dire memory pressure, we fail
to register an mm with these.
As a result, we follow the example of commit d2081b2bf819 ("mm:
khugepaged: make khugepaged_enter() void function") and make ksm_fork() a
void function also.
We only expose the mm to these functions once we are done with them and
only if no error occurred in the fork operation.
Link: https://lkml.kernel.org/r/e0cb8b840c9d1d5a6e84d4f8eff5f3f2022aa10c.17290143…
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Jann Horn <jannh(a)google.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Linus Torvalds <torvalds(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/ksm.h | 10 ++++------
kernel/fork.c | 7 ++-----
2 files changed, 6 insertions(+), 11 deletions(-)
--- a/include/linux/ksm.h~fork-only-invoke-khugepaged-ksm-hooks-if-no-error
+++ a/include/linux/ksm.h
@@ -54,12 +54,11 @@ static inline long mm_ksm_zero_pages(str
return atomic_long_read(&mm->ksm_zero_pages);
}
-static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
+ /* Adding mm to ksm is best effort on fork. */
if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
- return __ksm_enter(mm);
-
- return 0;
+ __ksm_enter(mm);
}
static inline int ksm_execve(struct mm_struct *mm)
@@ -107,9 +106,8 @@ static inline int ksm_disable(struct mm_
return 0;
}
-static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
+static inline void ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
{
- return 0;
}
static inline int ksm_execve(struct mm_struct *mm)
--- a/kernel/fork.c~fork-only-invoke-khugepaged-ksm-hooks-if-no-error
+++ a/kernel/fork.c
@@ -653,11 +653,6 @@ static __latent_entropy int dup_mmap(str
mm->exec_vm = oldmm->exec_vm;
mm->stack_vm = oldmm->stack_vm;
- retval = ksm_fork(mm, oldmm);
- if (retval)
- goto out;
- khugepaged_fork(mm, oldmm);
-
/* Use __mt_dup() to efficiently build an identical maple tree. */
retval = __mt_dup(&oldmm->mm_mt, &mm->mm_mt, GFP_KERNEL);
if (unlikely(retval))
@@ -760,6 +755,8 @@ loop_out:
vma_iter_free(&vmi);
if (!retval) {
mt_set_in_rcu(vmi.mas.tree);
+ ksm_fork(mm, oldmm);
+ khugepaged_fork(mm, oldmm);
} else if (mpnt) {
/*
* The entire maple tree has already been duplicated. If the
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-mmap-correct-error-handling-in-mmap_region.patch
maple_tree-correct-tree-corruption-on-spanning-store.patch
maple_tree-add-regression-test-for-spanning-store-bug.patch
maintainers-add-memory-mapping-vma-co-maintainers.patch
fork-do-not-invoke-uffd-on-fork-if-error-occurs.patch
fork-only-invoke-khugepaged-ksm-hooks-if-no-error.patch
selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch
mm-refactor-mm_access-to-not-return-null.patch
mm-refactor-mm_access-to-not-return-null-fix.patch
mm-madvise-unrestrict-process_madvise-for-current-process.patch
maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch
The patch titled
Subject: fork: do not invoke uffd on fork if error occurs
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fork-do-not-invoke-uffd-on-fork-if-error-occurs.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: fork: do not invoke uffd on fork if error occurs
Date: Tue, 15 Oct 2024 18:56:05 +0100
Patch series "fork: do not expose incomplete mm on fork".
During fork we may place the virtual memory address space into an
inconsistent state before the fork operation is complete.
In addition, we may encounter an error during the fork operation that
indicates that the virtual memory address space is invalidated.
As a result, we should not be exposing it in any way to external machinery
that might interact with the mm or VMAs, machinery that is not designed to
deal with incomplete state.
We specifically update the fork logic to defer khugepaged and ksm to the
end of the operation and only to be invoked if no error arose, and
disallow uffd from observing fork events should an error have occurred.
This patch (of 2):
Currently on fork we expose the virtual address space of a process to
userland unconditionally if uffd is registered in VMAs, regardless of
whether an error arose in the fork.
This is performed in dup_userfaultfd_complete() which is invoked
unconditionally, and performs two duties - invoking registered handlers
for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down
userfaultfd_fork_ctx objects established in dup_userfaultfd().
This is problematic, because the virtual address space may not yet be
correctly initialised if an error arose.
The change in commit d24062914837 ("fork: use __mt_dup() to duplicate
maple tree in dup_mmap()") makes this more pertinent as we may be in a
state where entries in the maple tree are not yet consistent.
We address this by, on fork error, ensuring that we roll back state that
we would otherwise expect to clean up through the event being handled by
userland and perform the memory freeing duty otherwise performed by
dup_userfaultfd_complete().
We do this by implementing a new function, dup_userfaultfd_fail(), which
performs the same loop, only decrementing reference counts.
Note that we perform mmgrab() on the parent and child mm's, however
userfaultfd_ctx_put() will mmdrop() this once the reference count drops to
zero, so we will avoid memory leaks correctly here.
Link: https://lkml.kernel.org/r/cover.1729014377.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.17290143…
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Linus Torvalds <torvalds(a)linuxfoundation.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 28 ++++++++++++++++++++++++++++
include/linux/userfaultfd_k.h | 5 +++++
kernel/fork.c | 5 ++++-
3 files changed, 37 insertions(+), 1 deletion(-)
--- a/fs/userfaultfd.c~fork-do-not-invoke-uffd-on-fork-if-error-occurs
+++ a/fs/userfaultfd.c
@@ -692,6 +692,34 @@ void dup_userfaultfd_complete(struct lis
}
}
+void dup_userfaultfd_fail(struct list_head *fcs)
+{
+ struct userfaultfd_fork_ctx *fctx, *n;
+
+ /*
+ * An error has occurred on fork, we will tear memory down, but have
+ * allocated memory for fctx's and raised reference counts for both the
+ * original and child contexts (and on the mm for each as a result).
+ *
+ * These would ordinarily be taken care of by a user handling the event,
+ * but we are no longer doing so, so manually clean up here.
+ *
+ * mm tear down will take care of cleaning up VMA contexts.
+ */
+ list_for_each_entry_safe(fctx, n, fcs, list) {
+ struct userfaultfd_ctx *octx = fctx->orig;
+ struct userfaultfd_ctx *ctx = fctx->new;
+
+ atomic_dec(&octx->mmap_changing);
+ VM_BUG_ON(atomic_read(&octx->mmap_changing) < 0);
+ userfaultfd_ctx_put(octx);
+ userfaultfd_ctx_put(ctx);
+
+ list_del(&fctx->list);
+ kfree(fctx);
+ }
+}
+
void mremap_userfaultfd_prep(struct vm_area_struct *vma,
struct vm_userfaultfd_ctx *vm_ctx)
{
--- a/include/linux/userfaultfd_k.h~fork-do-not-invoke-uffd-on-fork-if-error-occurs
+++ a/include/linux/userfaultfd_k.h
@@ -249,6 +249,7 @@ static inline bool vma_can_userfault(str
extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *);
extern void dup_userfaultfd_complete(struct list_head *);
+void dup_userfaultfd_fail(struct list_head *);
extern void mremap_userfaultfd_prep(struct vm_area_struct *,
struct vm_userfaultfd_ctx *);
@@ -351,6 +352,10 @@ static inline void dup_userfaultfd_compl
{
}
+static inline void dup_userfaultfd_fail(struct list_head *l)
+{
+}
+
static inline void mremap_userfaultfd_prep(struct vm_area_struct *vma,
struct vm_userfaultfd_ctx *ctx)
{
--- a/kernel/fork.c~fork-do-not-invoke-uffd-on-fork-if-error-occurs
+++ a/kernel/fork.c
@@ -775,7 +775,10 @@ out:
mmap_write_unlock(mm);
flush_tlb_mm(oldmm);
mmap_write_unlock(oldmm);
- dup_userfaultfd_complete(&uf);
+ if (!retval)
+ dup_userfaultfd_complete(&uf);
+ else
+ dup_userfaultfd_fail(&uf);
fail_uprobe_end:
uprobe_end_dup_mmap();
return retval;
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-mmap-correct-error-handling-in-mmap_region.patch
maple_tree-correct-tree-corruption-on-spanning-store.patch
maple_tree-add-regression-test-for-spanning-store-bug.patch
maintainers-add-memory-mapping-vma-co-maintainers.patch
fork-do-not-invoke-uffd-on-fork-if-error-occurs.patch
fork-only-invoke-khugepaged-ksm-hooks-if-no-error.patch
selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch
mm-refactor-mm_access-to-not-return-null.patch
mm-refactor-mm_access-to-not-return-null-fix.patch
mm-madvise-unrestrict-process_madvise-for-current-process.patch
maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch
The patch titled
Subject: nilfs2: fix kernel bug due to missing clearing of buffer delay flag
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-buffer-delay-flag.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix kernel bug due to missing clearing of buffer delay flag
Date: Wed, 16 Oct 2024 06:32:07 +0900
Syzbot reported that after nilfs2 reads a corrupted file system image and
degrades to read-only, the BUG_ON check for the buffer delay flag in
submit_bh_wbc() may fail, causing a kernel bug.
This is because the buffer delay flag is not cleared when clearing the
buffer state flags to discard a page/folio or a buffer head. So, fix
this.
This became necessary when the use of nilfs2's own page clear routine was
expanded. This state inconsistency does not occur if the buffer is
written normally by log writing.
Link: https://lkml.kernel.org/r/20241015213300.7114-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+985ada84bf055a575c07(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=985ada84bf055a575c07
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/page.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/nilfs2/page.c~nilfs2-fix-kernel-bug-due-to-missing-clearing-of-buffer-delay-flag
+++ a/fs/nilfs2/page.c
@@ -77,7 +77,8 @@ void nilfs_forget_buffer(struct buffer_h
const unsigned long clear_bits =
(BIT(BH_Uptodate) | BIT(BH_Dirty) | BIT(BH_Mapped) |
BIT(BH_Async_Write) | BIT(BH_NILFS_Volatile) |
- BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected));
+ BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected) |
+ BIT(BH_Delay));
lock_buffer(bh);
set_mask_bits(&bh->b_state, clear_bits, 0);
@@ -406,7 +407,8 @@ void nilfs_clear_folio_dirty(struct foli
const unsigned long clear_bits =
(BIT(BH_Uptodate) | BIT(BH_Dirty) | BIT(BH_Mapped) |
BIT(BH_Async_Write) | BIT(BH_NILFS_Volatile) |
- BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected));
+ BIT(BH_NILFS_Checked) | BIT(BH_NILFS_Redirected) |
+ BIT(BH_Delay));
bh = head;
do {
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-propagate-directory-read-errors-from-nilfs_find_entry.patch
nilfs2-fix-kernel-bug-due-to-missing-clearing-of-buffer-delay-flag.patch
tpm2_load_null() has weak and broken error handling:
- The return value of tpm2_create_primary() is ignored.
- Leaks TPM return codes from tpm2_load_context() to the caller.
- If the key name comparison succeeds returns previous error
instead of zero to the caller.
Implement a proper error rollback.
Cc: stable(a)vger.kernel.org # v6.10+
Fixes: eb24c9788cd9 ("tpm: disable the TPM if NULL name changes")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v6:
- Address Stefan's remark:
https://lore.kernel.org/linux-integrity/def4ec2d-584b-405f-9d5e-99267013c3c…
v5:
- Fix the TPM error code leak from tpm2_load_context().
v4:
- No changes.
v3:
- Update log messages. Previously the log message incorrectly stated
on load failure that integrity check had been failed, even tho the
check is done *after* the load operation.
v2:
- Refined the commit message.
- Reverted tpm2_create_primary() changes. They are not required if
tmp_null_key is used as the parameter.
---
drivers/char/tpm/tpm2-sessions.c | 43 +++++++++++++++++---------------
1 file changed, 23 insertions(+), 20 deletions(-)
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index 253639767c1e..1215c53f0ae7 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -915,33 +915,36 @@ static int tpm2_parse_start_auth_session(struct tpm2_auth *auth,
static int tpm2_load_null(struct tpm_chip *chip, u32 *null_key)
{
- int rc;
unsigned int offset = 0; /* dummy offset for null seed context */
u8 name[SHA256_DIGEST_SIZE + 2];
+ u32 tmp_null_key;
+ int rc;
rc = tpm2_load_context(chip, chip->null_key_context, &offset,
- null_key);
- if (rc != -EINVAL)
- return rc;
+ &tmp_null_key);
+ if (rc != -EINVAL) {
+ if (!rc)
+ *null_key = tmp_null_key;
+ goto err;
+ }
- /* an integrity failure may mean the TPM has been reset */
- dev_err(&chip->dev, "NULL key integrity failure!\n");
- /* check the null name against what we know */
- tpm2_create_primary(chip, TPM2_RH_NULL, NULL, name);
- if (memcmp(name, chip->null_key_name, sizeof(name)) == 0)
- /* name unchanged, assume transient integrity failure */
- return rc;
- /*
- * Fatal TPM failure: the NULL seed has actually changed, so
- * the TPM must have been illegally reset. All in-kernel TPM
- * operations will fail because the NULL primary can't be
- * loaded to salt the sessions, but disable the TPM anyway so
- * userspace programmes can't be compromised by it.
- */
- dev_err(&chip->dev, "NULL name has changed, disabling TPM due to interference\n");
+ rc = tpm2_create_primary(chip, TPM2_RH_NULL, &tmp_null_key, name);
+ if (rc)
+ goto err;
+
+ /* Return the null key if the name has not been changed: */
+ if (memcmp(name, chip->null_key_name, sizeof(name)) == 0) {
+ *null_key = tmp_null_key;
+ return 0;
+ }
+
+ /* Deduce from the name change TPM interference: */
+ dev_err(&chip->dev, "the null key integrity check failedh\n");
+ tpm2_flush_context(chip, tmp_null_key);
chip->flags |= TPM_CHIP_FLAG_DISABLE;
- return rc;
+err:
+ return rc ? -ENODEV : 0;
}
/**
--
2.47.0
These patches address some issues I spotted while looking at kprobes and
uprobes.
Patch 1 is the most pressing, as a uprobes user can trigger a kernel
BUG(). Patches 2 and 3 fix latent endianness bugs which only manifest on
big-endian kernels, and patchs 4-6 clean things up so that it's harder
to get this wrong again in future.
Mark.
Mark Rutland (6):
arm64: probes: Remove broken LDR (literal) uprobe support
arm64: probes: Fix simulate_ldr*_literal()
arm64: probes: Fix uprobes for big-endian kernels
arm64: probes: Move kprobes-specific fields
arm64: probes: Cleanup kprobes endianness conversions
arm64: probes: Remove probe_opcode_t
arch/arm64/include/asm/probes.h | 11 +++----
arch/arm64/include/asm/uprobes.h | 8 ++---
arch/arm64/kernel/probes/decode-insn.c | 22 ++++++++-----
arch/arm64/kernel/probes/decode-insn.h | 2 +-
arch/arm64/kernel/probes/kprobes.c | 39 ++++++++++++------------
arch/arm64/kernel/probes/simulate-insn.c | 18 +++++------
arch/arm64/kernel/probes/uprobes.c | 8 ++---
7 files changed, 53 insertions(+), 55 deletions(-)
--
2.30.2
If some remap_pfn_range() calls succeeded before one failed, we still have
buffer pages mapped into the userspace page tables when we drop the buffer
reference with comedi_buf_map_put(bm). The userspace mappings are only
cleaned up later in the mmap error path.
Fix it by explicitly flushing all mappings in our VMA on the comedi_mmap()
error path.
See commit 79a61cc3fc04 ("mm: avoid leaving partial pfn mappings around in
error case").
Cc: stable(a)vger.kernel.org
Fixes: ed9eccbe8970 ("Staging: add comedi core")
Signed-off-by: Jann Horn <jannh(a)google.com>
---
Note: compile-tested only; I don't actually have comedi hardware, and I
don't know anything about comedi.
---
drivers/comedi/comedi_fops.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/comedi/comedi_fops.c b/drivers/comedi/comedi_fops.c
index 1b481731df96..0e573df8646f 100644
--- a/drivers/comedi/comedi_fops.c
+++ b/drivers/comedi/comedi_fops.c
@@ -2414,6 +2414,15 @@ static int comedi_mmap(struct file *file, struct vm_area_struct *vma)
vma->vm_private_data = bm;
vma->vm_ops->open(vma);
+ } else {
+ /*
+ * Leaving behind a partial mapping of a buffer we're about to
+ * drop is unsafe, see remap_pfn_range_notrack().
+ * We need to zap the range here ourselves instead of relying
+ * on the automatic zapping in remap_pfn_range() because we call
+ * remap_pfn_range() in a loop.
+ */
+ zap_page_range_single(vma, vma->vm_start, size, NULL);
}
done:
---
base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
change-id: 20241014-comedi-tlb-400246505961
--
Jann Horn <jannh(a)google.com>
From: Ard Biesheuvel <ardb(a)kernel.org>
cmdline_ptr is an out parameter, which is not allocated by the function
itself, and likely points into the caller's stack.
cmdline refers to the pool allocation that should be freed when cleaning
up after a failure, so pass this instead to free_pool().
Fixes: 42c8ea3dca09 ("efi: libstub: Factor out EFI stub entrypoint ...")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
drivers/firmware/efi/libstub/efi-stub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
index f09e277ba210..fc71dcab43e0 100644
--- a/drivers/firmware/efi/libstub/efi-stub.c
+++ b/drivers/firmware/efi/libstub/efi-stub.c
@@ -148,7 +148,7 @@ efi_status_t efi_handle_cmdline(efi_loaded_image_t *image, char **cmdline_ptr)
return EFI_SUCCESS;
fail_free_cmdline:
- efi_bs_call(free_pool, cmdline_ptr);
+ efi_bs_call(free_pool, cmdline);
return status;
}
--
2.47.0.rc1.288.g06298d1525-goog
MPTCP connection requests toward a listening socket created by the
in-kernel PM for a port based signal endpoint will never be accepted,
they need to be explicitly rejected.
- Patch 1: Explicitly reject such requests. A fix for >= v5.12.
- Patch 2: Cover this case in the MPTCP selftests to avoid regressions.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- This new version fixes the root cause for the issue Cong Wang sent a
patch for a few weeks ago, see the v1, and the explanations below. The
new version is very different from the v1, from a different author.
Thanks to Cong Wang for the first analysis, and to Paolo for having
spot the root cause, and sent a fix for it.
- Link to v1: https://lore.kernel.org/r/20240908180620.822579-1-xiyou.wangcong@gmail.com
- Link: https://lore.kernel.org/r/a5289a0d-2557-40b8-9575-6f1a0bbf06e4@redhat.com
---
Paolo Abeni (2):
mptcp: prevent MPC handshake on port-based signal endpoints
selftests: mptcp: join: test for prohibited MPC to port-based endp
net/mptcp/mib.c | 1 +
net/mptcp/mib.h | 1 +
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 11 +++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 117 +++++++++++++++++-------
6 files changed, 101 insertions(+), 31 deletions(-)
---
base-commit: 174714f0e505070a16be6fbede30d32b81df789f
change-id: 20241014-net-mptcp-mpc-port-endp-4f88bd428ec7
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
A boot delay was introduced by commit 79540d133ed6 ("net: macb: Fix
handling of fixed-link node"). This delay was caused by the call to
`mdiobus_register()` in cases where a fixed-link PHY was present. The
MDIO bus registration triggered unnecessary PHY address scans, leading
to a 20-second delay due to attempts to detect Clause 45 (C45)
compatible PHYs, despite no MDIO bus being attached.
The commit 79540d133ed6 ("net: macb: Fix handling of fixed-link node")
was originally introduced to fix a regression caused by commit
7897b071ac3b4 ("net: macb: convert to phylink"), which caused the driver
to misinterpret fixed-link nodes as PHY nodes. This resulted in warnings
like:
mdio_bus f0028000.ethernet-ffffffff: fixed-link has invalid PHY address
mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 0
...
mdio_bus f0028000.ethernet-ffffffff: scan phy fixed-link at address 31
This patch reworks the logic to avoid registering and allocation of the
MDIO bus when:
- The device tree contains a fixed-link node.
- There is no "mdio" child node in the device tree.
If a child node named "mdio" exists, the MDIO bus will be registered to
support PHYs attached to the MACB's MDIO bus. Otherwise, with only a
fixed-link, the MDIO bus is skipped.
Tested on a sama5d35 based system with a ksz8863 switch attached to
macb0.
Fixes: 79540d133ed6 ("net: macb: Fix handling of fixed-link node")
Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Cc: stable(a)vger.kernel.org
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
---
changes v2:
- s/some PHYs may be attached/some devices may be attached/
---
drivers/net/ethernet/cadence/macb_main.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index f06babec04a0b..56901280ba047 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -930,9 +930,6 @@ static int macb_mdiobus_register(struct macb *bp)
return ret;
}
- if (of_phy_is_fixed_link(np))
- return mdiobus_register(bp->mii_bus);
-
/* Only create the PHY from the device tree if at least one PHY is
* described. Otherwise scan the entire MDIO bus. We do this to support
* old device tree that did not follow the best practices and did not
@@ -953,8 +950,19 @@ static int macb_mdiobus_register(struct macb *bp)
static int macb_mii_init(struct macb *bp)
{
+ struct device_node *child, *np = bp->pdev->dev.of_node;
int err = -ENXIO;
+ /* With fixed-link, we don't need to register the MDIO bus,
+ * except if we have a child named "mdio" in the device tree.
+ * In that case, some devices may be attached to the MACB's MDIO bus.
+ */
+ child = of_get_child_by_name(np, "mdio");
+ if (child)
+ of_node_put(child);
+ else if (of_phy_is_fixed_link(np))
+ return macb_mii_probe(bp->dev);
+
/* Enable management port */
macb_writel(bp, NCR, MACB_BIT(MPE));
--
2.39.5
Hello,
I am offering my late husband?s Yamaha piano to anyone who would truly appreciate it. If you or someone you know would be interested in receiving this instrument for free, please do not hesitate to contact me.
Warm regards,
Josey
[ Upstream commit 7c2fd76048e95dd267055b5f5e0a48e6e7c81fd9 ]
On an NVMe namespace that does not support metadata, it is possible to
send an IO command with metadata through io-passthru. This allows issues
like [1] to trigger in the completion code path.
nvme_map_user_request() doesn't check if the namespace supports metadata
before sending it forward. It also allows admin commands with metadata
to be processed as it ignores metadata when bdev == NULL and may report
success.
Reject an IO command with metadata when the NVMe namespace doesn't
support it and reject an admin command if it has metadata.
[1] https://lore.kernel.org/all/mb61pcylvnym8.fsf@amazon.com/
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Anuj Gupta <anuj20.g(a)samsung.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
[ Move the changes from nvme_map_user_request() to nvme_submit_user_cmd()
to make it work on 5.10 ]
Signed-off-by: Puranjay Mohan <pjy(a)amazon.com>
---
drivers/nvme/host/core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 30a642c8f5374..bee55902fe6ce 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1121,11 +1121,16 @@ static int nvme_submit_user_cmd(struct request_queue *q,
bool write = nvme_is_write(cmd);
struct nvme_ns *ns = q->queuedata;
struct gendisk *disk = ns ? ns->disk : NULL;
+ bool supports_metadata = disk && blk_get_integrity(disk);
+ bool has_metadata = meta_buffer && meta_len;
struct request *req;
struct bio *bio = NULL;
void *meta = NULL;
int ret;
+ if (has_metadata && !supports_metadata)
+ return -EINVAL;
+
req = nvme_alloc_request(q, cmd, 0);
if (IS_ERR(req))
return PTR_ERR(req);
@@ -1141,7 +1146,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
goto out;
bio = req->bio;
bio->bi_disk = disk;
- if (disk && meta_buffer && meta_len) {
+ if (has_metadata) {
meta = nvme_add_user_metadata(bio, meta_buffer, meta_len,
meta_seed, write);
if (IS_ERR(meta)) {
--
2.40.1
From: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
[Why&How]
vblank immediate disable currently does not work for all asics. On
DCN401, the vblank interrupts never stop coming, and hence we never
get a chance to trigger idle optimizations.
Add a workaround to enable immediate disable only on APUs for now. This
adds a 2-frame delay for triggering idle optimization, which is a
negligible overhead.
Fixes: db11e20a1144 ("drm/amd/display: use a more lax vblank enable policy for older ASICs")
Fixes: 6dfb3a42a914 ("drm/amd/display: use a more lax vblank enable policy for DCN35+")
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira(a)amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a4882b16ace2..6ea54eb5d68d 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -8379,7 +8379,8 @@ static void manage_dm_interrupts(struct amdgpu_device *adev,
if (amdgpu_ip_version(adev, DCE_HWIP, 0) <
IP_VERSION(3, 5, 0) ||
acrtc_state->stream->link->psr_settings.psr_version <
- DC_PSR_VERSION_UNSUPPORTED) {
+ DC_PSR_VERSION_UNSUPPORTED ||
+ !(adev->flags & AMD_IS_APU)) {
timing = &acrtc_state->stream->timing;
/* at least 2 frames */
--
2.37.3
From: Matt Fleming <mfleming(a)cloudflare.com>
Under memory pressure it's possible for GFP_ATOMIC order-0 allocations
to fail even though free pages are available in the highatomic reserves.
GFP_ATOMIC allocations cannot trigger unreserve_highatomic_pageblock()
since it's only run from reclaim.
Given that such allocations will pass the watermarks in
__zone_watermark_unusable_free(), it makes sense to fallback to
highatomic reserves the same way that ALLOC_OOM can.
This fixes order-0 page allocation failures observed on Cloudflare's
fleet when handling network packets:
kswapd1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0-7
CPU: 10 PID: 696 Comm: kswapd1 Kdump: loaded Tainted: G O 6.6.43-CUSTOM #1
Hardware name: MACHINE
Call Trace:
<IRQ>
dump_stack_lvl+0x3c/0x50
warn_alloc+0x13a/0x1c0
__alloc_pages_slowpath.constprop.0+0xc9d/0xd10
__alloc_pages+0x327/0x340
__napi_alloc_skb+0x16d/0x1f0
bnxt_rx_page_skb+0x96/0x1b0 [bnxt_en]
bnxt_rx_pkt+0x201/0x15e0 [bnxt_en]
__bnxt_poll_work+0x156/0x2b0 [bnxt_en]
bnxt_poll+0xd9/0x1c0 [bnxt_en]
__napi_poll+0x2b/0x1b0
bpf_trampoline_6442524138+0x7d/0x1000
__napi_poll+0x5/0x1b0
net_rx_action+0x342/0x740
handle_softirqs+0xcf/0x2b0
irq_exit_rcu+0x6c/0x90
sysvec_apic_timer_interrupt+0x72/0x90
</IRQ>
Fixes: 1d91df85f399 ("mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs")
Suggested-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: linux-mm(a)kvack.org
Cc: stable(a)vger.kernel.org # v5.9+
Link: https://lore.kernel.org/all/CAGis_TWzSu=P7QJmjD58WWiu3zjMTVKSzdOwWE8ORaGytz…
Signed-off-by: Matt Fleming <mfleming(a)cloudflare.com>
---
mm/page_alloc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
v2: Update comment and add Fixes, Reviewed-by, and Cc: stable tags.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8afab64814dc..94a2ffe28008 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2893,12 +2893,12 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
page = __rmqueue(zone, order, migratetype, alloc_flags);
/*
- * If the allocation fails, allow OOM handling access
- * to HIGHATOMIC reserves as failing now is worse than
- * failing a high-order atomic allocation in the
- * future.
+ * If the allocation fails, allow OOM handling and
+ * order-0 (atomic) allocs access to HIGHATOMIC
+ * reserves as failing now is worse than failing a
+ * high-order atomic allocation in the future.
*/
- if (!page && (alloc_flags & ALLOC_OOM))
+ if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_NON_BLOCK)))
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
if (!page) {
--
2.34.1
Hi Greg,
How are you?
What is the process to backport Pedro's recent mseal fixes to 6.10 ?
Specifically those 5 commits:
67203f3f2a63d429272f0c80451e5fcc469fdb46
selftests/mm: add mseal test for no-discard madvise
4d1b3416659be70a2251b494e85e25978de06519
mm: move can_modify_vma to mm/vma.h
4a2dd02b09160ee43f96c759fafa7b56dfc33816
mm/mprotect: replace can_modify_mm with can_modify_vma
23c57d1fa2b9530e38f7964b4e457fed5a7a0ae8
mseal: replace can_modify_mm_madv with a vma variant
f28bdd1b17ec187eaa34845814afaaff99832762
selftests/mm: add more mseal traversal tests
There will be merge conflicts, I can backport them to 5.10 and test
to help the backporting process.
Those 5 fixes are needed for two reasons: maintain the consistency of
mseal's semantics across releases, and for ease of backporting future
fixes.
PS: There are also three other commits for munmap and remap (see below),
they have dependency on Michael Ellerman's arch_unmap() patch [1] and maybe
uprobe change [2]. If Michael and Oleg are OK with backporting their
patches, then great !
Otherwise, since those commits below don't change mseal's semantics, I
think it is OK to just backport above 5 patches.
df2a7df9a9aa32c3df227de346693e6e802c8591
mm/munmap: replace can_modify_mm with can_modify_vma
38075679b5f157eeacd46c900e9cfc684bdbc167
mm/mremap: replace can_modify_mm with can_modify_vma
5b3db2b812a1f86dfab587324d198a5d10c7a5cf
mm: remove can_modify_mm()
[1] https://lore.kernel.org/all/20240812082605.743814-1-mpe@ellerman.id.au/
[2] https://lore.kernel.org/all/20240911131320.GA3448@redhat.com/
Thanks!
-Jeff
From: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
[Why&How]
Disabling P-State support on full updates for DCN401 results in
introducing additional communication with SMU. A UCLK hard min message
to SMU takes 4 seconds to go through, which was due to DCN not allowing
pstate switch, which was caused by incorrect value for TTU watermark
before blanking the HUBP prior to DPG on for servicing the test request.
Fix the issue temporarily by disallowing pstate changes for compliance
test while test request handler is reworked for a proper fix.
Fixes: 67ea53a4bd9d ("drm/amd/display: Disable DCN401 UCLK P-State support on full updates")
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Dillon Varone <dillon.varone(a)amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
---
.../drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
index 8eaf292bc4eb..b0fea0856866 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
@@ -46,6 +46,7 @@
#include "dm_helpers.h"
#include "ddc_service_types.h"
+#include "clk_mgr.h"
static u32 edid_extract_panel_id(struct edid *edid)
{
@@ -1181,6 +1182,8 @@ bool dm_helpers_dp_handle_test_pattern_request(
struct pipe_ctx *pipe_ctx = NULL;
struct amdgpu_dm_connector *aconnector = link->priv;
struct drm_device *dev = aconnector->base.dev;
+ struct dc_state *dc_state = ctx->dc->current_state;
+ struct clk_mgr *clk_mgr = ctx->dc->clk_mgr;
int i;
for (i = 0; i < MAX_PIPES; i++) {
@@ -1281,6 +1284,16 @@ bool dm_helpers_dp_handle_test_pattern_request(
pipe_ctx->stream->test_pattern.type = test_pattern;
pipe_ctx->stream->test_pattern.color_space = test_pattern_color_space;
+ /* Temp W/A for compliance test failure */
+ dc_state->bw_ctx.bw.dcn.clk.p_state_change_support = false;
+ dc_state->bw_ctx.bw.dcn.clk.dramclk_khz = clk_mgr->dc_mode_softmax_enabled ?
+ clk_mgr->bw_params->dc_mode_softmax_memclk : clk_mgr->bw_params->max_memclk_mhz;
+ dc_state->bw_ctx.bw.dcn.clk.idle_dramclk_khz = dc_state->bw_ctx.bw.dcn.clk.dramclk_khz;
+ ctx->dc->clk_mgr->funcs->update_clocks(
+ ctx->dc->clk_mgr,
+ dc_state,
+ false);
+
dc_link_dp_set_test_pattern(
(struct dc_link *) link,
test_pattern,
--
2.37.3
During fuzz testing, the following issue was discovered:
BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x598/0x2a30
_copy_to_iter+0x598/0x2a30
__skb_datagram_iter+0x168/0x1060
skb_copy_datagram_iter+0x5b/0x220
netlink_recvmsg+0x362/0x1700
sock_recvmsg+0x2dc/0x390
__sys_recvfrom+0x381/0x6d0
__x64_sys_recvfrom+0x130/0x200
x64_sys_call+0x32c8/0x3cc0
do_syscall_64+0xd8/0x1c0
entry_SYSCALL_64_after_hwframe+0x79/0x81
Uninit was stored to memory at:
copy_to_user_state_extra+0xcc1/0x1e00
dump_one_state+0x28c/0x5f0
xfrm_state_walk+0x548/0x11e0
xfrm_dump_sa+0x1e0/0x840
netlink_dump+0x943/0x1c40
__netlink_dump_start+0x746/0xdb0
xfrm_user_rcv_msg+0x429/0xc00
netlink_rcv_skb+0x613/0x780
xfrm_netlink_rcv+0x77/0xc0
netlink_unicast+0xe90/0x1280
netlink_sendmsg+0x126d/0x1490
__sock_sendmsg+0x332/0x3d0
____sys_sendmsg+0x863/0xc30
___sys_sendmsg+0x285/0x3e0
__x64_sys_sendmsg+0x2d6/0x560
x64_sys_call+0x1316/0x3cc0
do_syscall_64+0xd8/0x1c0
entry_SYSCALL_64_after_hwframe+0x79/0x81
Uninit was created at:
__kmalloc+0x571/0xd30
attach_auth+0x106/0x3e0
xfrm_add_sa+0x2aa0/0x4230
xfrm_user_rcv_msg+0x832/0xc00
netlink_rcv_skb+0x613/0x780
xfrm_netlink_rcv+0x77/0xc0
netlink_unicast+0xe90/0x1280
netlink_sendmsg+0x126d/0x1490
__sock_sendmsg+0x332/0x3d0
____sys_sendmsg+0x863/0xc30
___sys_sendmsg+0x285/0x3e0
__x64_sys_sendmsg+0x2d6/0x560
x64_sys_call+0x1316/0x3cc0
do_syscall_64+0xd8/0x1c0
entry_SYSCALL_64_after_hwframe+0x79/0x81
Bytes 328-379 of 732 are uninitialized
Memory access of size 732 starts at ffff88800e18e000
Data copied to user address 00007ff30f48aff0
CPU: 2 PID: 18167 Comm: syz-executor.0 Not tainted 6.8.11 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Fixes copying of xfrm algorithms where some random
data of the structure fields can end up in userspace.
Padding in structures may be filled with random (possibly sensitve)
data and should never be given directly to user-space.
A similar issue was resolved in the commit
8222d5910dae ("xfrm: Zero padding when dumping algos and encap")
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: c7a5899eb26e ("xfrm: redact SA secret with lockdown confidentiality")
Cc: stable(a)vger.kernel.org
Co-developed-by: Boris Tonofa <b.tonofa(a)ideco.ru>
Signed-off-by: Boris Tonofa <b.tonofa(a)ideco.ru>
Signed-off-by: Petr Vaganov <p.vaganov(a)ideco.ru>
---
v3: Corrected commit description "This patch fixes copying..." to
"Fixes copying..." according to accepted rules of Linux kernel commits,
as suggested by Markus Elfring <elfring(a)users.sourceforge.net>.
v2: Fixed typo in sizeof(sizeof(ap->alg_name) expression.
The third argument for the strscpy_pad macro was chosen by
analogy with those in other functions of this file - as did
commit 8222d5910dae ("xfrm: Zero padding when dumping algos and encap").
I still think it would be better to leave the strscpy_pad() macro with
three arguments in this patch, mainly so that it would be consistent
with the existing similar code in this module.
Regarding strncpy() above, we don't think it is
a real issue since the uncopied parts should be already padded with zeroes
by nla_reserve().
net/xfrm/xfrm_user.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 55f039ec3d59..0083faabe8be 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1098,7 +1098,9 @@ static int copy_to_user_auth(struct xfrm_algo_auth *auth, struct sk_buff *skb)
if (!nla)
return -EMSGSIZE;
ap = nla_data(nla);
- memcpy(ap, auth, sizeof(struct xfrm_algo_auth));
+ strscpy_pad(ap->alg_name, auth->alg_name, sizeof(ap->alg_name));
+ ap->alg_key_len = auth->alg_key_len;
+ ap->alg_trunc_len = auth->alg_trunc_len;
if (redact_secret && auth->alg_key_len)
memset(ap->alg_key, 0, (auth->alg_key_len + 7) / 8);
else
--
2.46.2
New processors have become pickier about the local APIC timer state
before entering low power modes. These low power modes are used (for
example) when you close your laptop lid and suspend. If you put your
laptop in a bag in this unnecessarily-high-power state, it is likely
to get quite toasty while it quickly sucks the battery dry.
The problem boils down to some CPUs' inability to power down until the
kernel fully disables the local APIC timer. The current kernel code
works in one-shot and periodic modes but does not work for deadline
mode. Deadline mode has been the supported and preferred mode on
Intel CPUs for over a decade and uses an MSR to drive the timer
instead of an APIC register.
Disable the TSC Deadline timer in lapic_timer_shutdown() by writing to
MSR_IA32_TSC_DEADLINE when in TSC-deadline mode. Also avoid writing
to the initial-count register (APIC_TMICT) which is ignored in
TSC-deadline mode.
Note: The APIC_LVTT|=APIC_LVT_MASKED operation should theoretically be
enough to tell the hardware that the timer will not fire in any of the
timer modes. But mitigating AMD erratum 411[1] also requires clearing
out APIC_TMICT. Solely setting APIC_LVT_MASKED is also ineffective in
practice on Intel Lunar Lake systems, which is the motivation for this
change.
1. 411 Processor May Exit Message-Triggered C1E State Without an Interrupt if Local APIC Timer Reaches Zero - https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/revisio…
Cc: stable(a)vger.kernel.org
Fixes: 279f1461432c ("x86: apic: Use tsc deadline for oneshot when available")
Suggested-by: Dave Hansen <dave.hansen(a)intel.com>
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt(a)intel.com>
---
V2
- Improve changelog
V3
- Subject and changelog rewrite
- Check LAPIC Timer mode using APIC_LVTT value instead of extra CPU feature flag check
- Avoid APIC_TMICT write which is ignored in TSC-deadline mode
V4
- Add back Fixes tag and stable tag which was missing in V3
- Update patch recipients
---
arch/x86/kernel/apic/apic.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 6513c53c9459..5436a4083065 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -440,7 +440,19 @@ static int lapic_timer_shutdown(struct clock_event_device *evt)
v = apic_read(APIC_LVTT);
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write(APIC_LVTT, v);
- apic_write(APIC_TMICT, 0);
+
+ /*
+ * Setting APIC_LVT_MASKED should be enough to tell the
+ * hardware that this timer will never fire. But AMD
+ * erratum 411 and some Intel CPU behavior circa 2024
+ * say otherwise. Time for belt and suspenders programming,
+ * mask the timer and zero the counter registers:
+ */
+ if (v & APIC_LVT_TIMER_TSCDEADLINE)
+ wrmsrl(MSR_IA32_TSC_DEADLINE, 0);
+ else
+ apic_write(APIC_TMICT, 0);
+
return 0;
}
--
2.34.1
The patch titled
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-swapfile-skip-hugetlb-pages-for-unuse_vma.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
Date: Tue, 15 Oct 2024 09:45:21 +0800
I got a bad pud error and lost a 1GB HugeTLB when calling swapoff. The
problem can be reproduced by the following steps:
1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
2. Swapout the above anonymous memory.
3. run swapoff and we will get a bad pud error in kernel message:
mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)
We can tell that pud_clear_bad is called by pud_none_or_clear_bad in
unuse_pud_range() by ftrace. And therefore the HugeTLB pages will never
be freed because we lost it from page table. We can skip HugeTLB pages
for unuse_vma to fix it.
Link: https://lkml.kernel.org/r/20241015014521.570237-1-liushixin2@huawei.com
Fixes: 0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swapfile.c~mm-swapfile-skip-hugetlb-pages-for-unuse_vma
+++ a/mm/swapfile.c
@@ -2313,7 +2313,7 @@ static int unuse_mm(struct mm_struct *mm
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
- if (vma->anon_vma) {
+ if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
ret = unuse_vma(vma, type);
if (ret)
break;
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
mm-swapfile-skip-hugetlb-pages-for-unuse_vma.patch
I got a bad pud error and lost a 1GB HugeTLB when calling swapoff.
The problem can be reproduced by the following steps:
1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
2. Swapout the above anonymous memory.
3. run swapoff and we will get a bad pud error in kernel message:
mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)
We can tell that pud_clear_bad is called by pud_none_or_clear_bad
in unuse_pud_range() by ftrace. And therefore the HugeTLB pages will
never be freed because we lost it from page table. We can skip
HugeTLB pages for unuse_vma to fix it.
Fixes: 0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0cded32414a1..f4ef91513fc9 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2312,7 +2312,7 @@ static int unuse_mm(struct mm_struct *mm, unsigned int type)
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
- if (vma->anon_vma) {
+ if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
ret = unuse_vma(vma, type);
if (ret)
break;
--
2.34.1
Previously, the domain_context_clear() function incorrectly called
pci_for_each_dma_alias() to set up context entries for non-PCI devices.
This could lead to kernel hangs or other unexpected behavior.
Add a check to only call pci_for_each_dma_alias() for PCI devices. For
non-PCI devices, domain_context_clear_one() is called directly.
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219363
Fixes: 9a16ab9d6402 ("iommu/vt-d: Make context clearing consistent with context mapping")
Cc: stable(a)vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
---
drivers/iommu/intel/iommu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9f6b0780f2ef..e860bc9439a2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3340,8 +3340,10 @@ static int domain_context_clear_one_cb(struct pci_dev *pdev, u16 alias, void *op
*/
static void domain_context_clear(struct device_domain_info *info)
{
- if (!dev_is_pci(info->dev))
+ if (!dev_is_pci(info->dev)) {
domain_context_clear_one(info, info->bus, info->devfn);
+ return;
+ }
pci_for_each_dma_alias(to_pci_dev(info->dev),
&domain_context_clear_one_cb, info);
--
2.43.0
The added test case from commit
09bcf9254838 ("selftests/ftrace: Add new test case which checks non unique symbol")
failed, it is fixed by
b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols").
Backport it and its fix commit to 5.4.y together. Resolved minor context change conflicts.
Resend the patch series after backpporting to 5.10.y first.
Andrii Nakryiko (1):
tracing/kprobes: Fix symbol counting logic by looking at modules as
well
Francis Laniel (1):
tracing/kprobes: Return EADDRNOTAVAIL when func matches several
symbols
kernel/trace/trace_kprobe.c | 76 +++++++++++++++++++++++++++++++++++++
kernel/trace/trace_probe.h | 1 +
2 files changed, 77 insertions(+)
--
2.46.0
Hello all,
We are experiencing a boot hang issue when booting kernel version
6.1.83+ on a Dell Inc. PowerEdge R770 equipped with an Intel Xeon
6710E processor. After extensive testing and use of `git bisect`, we
have traced the issue to commit:
`586e19c88a0c ("iommu/vt-d: Retrieve IOMMU perfmon capability information")`
This commit appears to be part of a larger patchset, which can be found here:
[Patchset on lore.kernel.org](https://lore.kernel.org/lkml/7c4b3e4e-1c5d-04f1-1891-84f68…
We attempted to boot with the `intel_iommu=off` option, but the system
hangs in the same manner. However, the system boots successfully after
disabling `CONFIG_INTEL_IOMMU_PERF_EVENTS`.
I'm reporting here in case others hit the same issue.
Any assistance or guidance on understanding/resolving this issue would
be greatly appreciated.
Thank you.
Jinpu Wang @ IONOS Cloud
The patch below does not apply to the 6.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y
git checkout FETCH_HEAD
git cherry-pick -x c9d952b9103b600ddafc5d1c0e2f2dbd30f0b805
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101436-avalanche-approach-5c90@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^..
Possible dependencies:
c9d952b9103b ("io_uring/rw: fix cflags posting for single issue multishot read")
6733e678ba12 ("io_uring/kbuf: pass in 'len' argument for buffer commit")
ecd5c9b29643 ("io_uring/kbuf: add io_kbuf_commit() helper")
03e02e8f95fe ("io_uring/kbuf: use 'bl' directly rather than req->buf_list")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c9d952b9103b600ddafc5d1c0e2f2dbd30f0b805 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Sat, 5 Oct 2024 19:06:50 -0600
Subject: [PATCH] io_uring/rw: fix cflags posting for single issue multishot
read
If multishot gets disabled, and hence the request will get terminated
rather than persist for more iterations, then posting the CQE with the
right cflags is still important. Most notably, the buffer reference
needs to be included.
Refactor the return of __io_read() a bit, so that the provided buffer
is always put correctly, and hence returned to the application.
Reported-by: Sharon Rosner <Sharon Rosner>
Link: https://github.com/axboe/liburing/issues/1257
Cc: stable(a)vger.kernel.org
Fixes: 2a975d426c82 ("io_uring/rw: don't allow multishot reads without NOWAIT support")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/rw.c b/io_uring/rw.c
index f023ff49c688..93ad92605884 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -972,17 +972,21 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
if (issue_flags & IO_URING_F_MULTISHOT)
return IOU_ISSUE_SKIP_COMPLETE;
return -EAGAIN;
- }
-
- /*
- * Any successful return value will keep the multishot read armed.
- */
- if (ret > 0 && req->flags & REQ_F_APOLL_MULTISHOT) {
+ } else if (ret <= 0) {
+ io_kbuf_recycle(req, issue_flags);
+ if (ret < 0)
+ req_set_fail(req);
+ } else {
/*
- * Put our buffer and post a CQE. If we fail to post a CQE, then
+ * Any successful return value will keep the multishot read
+ * armed, if it's still set. Put our buffer and post a CQE. If
+ * we fail to post a CQE, or multishot is no longer set, then
* jump to the termination path. This request is then done.
*/
cflags = io_put_kbuf(req, ret, issue_flags);
+ if (!(req->flags & REQ_F_APOLL_MULTISHOT))
+ goto done;
+
rw->len = 0; /* similarly to above, reset len to 0 */
if (io_req_post_cqe(req, ret, cflags | IORING_CQE_F_MORE)) {
@@ -1003,6 +1007,7 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
* Either an error, or we've hit overflow posting the CQE. For any
* multishot request, hitting overflow will terminate it.
*/
+done:
io_req_set_res(req, ret, cflags);
io_req_rw_cleanup(req, issue_flags);
if (issue_flags & IO_URING_F_MULTISHOT)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 2942dfab630444d46aaa37fb7d629b620abbf6ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101406-matted-figure-a255@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
2942dfab6304 ("net: ethernet: cortina: Restore TSO support")
ac631873c9e7 ("net: ethernet: cortina: Drop TSO support")
d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2942dfab630444d46aaa37fb7d629b620abbf6ba Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 27 May 2024 21:26:44 +0200
Subject: [PATCH] net: ethernet: cortina: Restore TSO support
An earlier commit deleted the TSO support in the Cortina Gemini
driver because the driver was confusing gso_size and MTU,
probably because what the Linux kernel calls "gso_size" was
called "MTU" in the datasheet.
Restore the functionality properly reading the gso_size from
the skbuff.
Tested with iperf3, running a server on a different machine
and client on the device with the cortina gemini ethernet:
Connecting to host 192.168.1.2, port 5201
60008000.ethernet-port eth0: segment offloading mss = 05ea len=1c8a
60008000.ethernet-port eth0: segment offloading mss = 05ea len=1c8a
60008000.ethernet-port eth0: segment offloading mss = 05ea len=27da
60008000.ethernet-port eth0: segment offloading mss = 05ea len=0b92
60008000.ethernet-port eth0: segment offloading mss = 05ea len=2bda
(...)
(The hardware MSS 0x05ea here includes the ethernet headers.)
If I disable all segment offloading on the receiving host and
dump packets using tcpdump -xx like this:
ethtool -K enp2s0 gro off gso off tso off
tcpdump -xx -i enp2s0 host 192.168.1.136
I get segmented packages such as this when running iperf3:
23:16:54.024139 IP OpenWrt.lan.59168 > Fecusia.targus-getdata1:
Flags [.], seq 1486:2934, ack 1, win 4198,
options [nop,nop,TS val 3886192908 ecr 3601341877], length 1448
0x0000: fc34 9701 a0c6 14d6 4da8 3c4f 0800 4500
0x0010: 05dc 16a0 4000 4006 9aa1 c0a8 0188 c0a8
0x0020: 0102 e720 1451 ff25 9822 4c52 29cf 8010
0x0030: 1066 ac8c 0000 0101 080a e7a2 990c d6a8
(...)
0x05c0: 5e49 e109 fe8c 4617 5e18 7a82 7eae d647
0x05d0: e8ee ae64 dc88 c897 3f8a 07a4 3a33 6b1b
0x05e0: 3501 a30f 2758 cc44 4b4a
Several such packets often follow after each other verifying
the segmentation into 0x05a8 (1448) byte packages also on the
reveiving end. As can be seen, the ethernet frames are
0x05ea (1514) in size.
Performance with iperf3 before this patch: ~15.5 Mbit/s
Performance with iperf3 after this patch: ~175 Mbit/s
This was running a 60 second test (twice) the best measurement
was 179 Mbit/s.
For comparison if I run iperf3 with UDP I get around 1.05 Mbit/s
both before and after this patch.
While this is a gigabit ethernet interface, the CPU is a cheap
D-Link DIR-685 router (based on the ARMv5 Faraday FA526 at
~50 MHz), and the software is not supposed to drive traffic,
as the device has a DSA chip, so this kind of numbers can be
expected.
Fixes: ac631873c9e7 ("net: ethernet: cortina: Drop TSO support")
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 5f0c9e1771db..7ebd61a3a49b 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -79,7 +79,8 @@ MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
#define GMAC0_IRQ4_8 (GMAC0_MIB_INT_BIT | GMAC0_RX_OVERRUN_INT_BIT)
#define GMAC_OFFLOAD_FEATURES (NETIF_F_SG | NETIF_F_IP_CSUM | \
- NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM)
+ NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM | \
+ NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
/**
* struct gmac_queue_page - page buffer per-page info
@@ -1148,13 +1149,25 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
skb_frag_t *skb_frag;
dma_addr_t mapping;
void *buffer;
+ u16 mss;
int ret;
- /* TODO: implement proper TSO using MTU in word3 */
word1 = skb->len;
word3 = SOF_BIT;
- if (skb->len >= ETH_FRAME_LEN) {
+ mss = skb_shinfo(skb)->gso_size;
+ if (mss) {
+ /* This means we are dealing with TCP and skb->len is the
+ * sum total of all the segments. The TSO will deal with
+ * chopping this up for us.
+ */
+ /* The accelerator needs the full frame size here */
+ mss += skb_tcp_all_headers(skb);
+ netdev_dbg(netdev, "segment offloading mss = %04x len=%04x\n",
+ mss, skb->len);
+ word1 |= TSS_MTU_ENABLE_BIT;
+ word3 |= mss;
+ } else if (skb->len >= ETH_FRAME_LEN) {
/* Hardware offloaded checksumming isn't working on frames
* bigger than 1514 bytes. A hypothesis about this is that the
* checksum buffer is only 1518 bytes, so when the frames get
@@ -1169,7 +1182,9 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
return ret;
}
word1 |= TSS_BYPASS_BIT;
- } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ }
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
int tcp = 0;
/* We do not switch off the checksumming on non TCP/UDP
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2942dfab630444d46aaa37fb7d629b620abbf6ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101404-contently-large-e751@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
2942dfab6304 ("net: ethernet: cortina: Restore TSO support")
ac631873c9e7 ("net: ethernet: cortina: Drop TSO support")
d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2942dfab630444d46aaa37fb7d629b620abbf6ba Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 27 May 2024 21:26:44 +0200
Subject: [PATCH] net: ethernet: cortina: Restore TSO support
An earlier commit deleted the TSO support in the Cortina Gemini
driver because the driver was confusing gso_size and MTU,
probably because what the Linux kernel calls "gso_size" was
called "MTU" in the datasheet.
Restore the functionality properly reading the gso_size from
the skbuff.
Tested with iperf3, running a server on a different machine
and client on the device with the cortina gemini ethernet:
Connecting to host 192.168.1.2, port 5201
60008000.ethernet-port eth0: segment offloading mss = 05ea len=1c8a
60008000.ethernet-port eth0: segment offloading mss = 05ea len=1c8a
60008000.ethernet-port eth0: segment offloading mss = 05ea len=27da
60008000.ethernet-port eth0: segment offloading mss = 05ea len=0b92
60008000.ethernet-port eth0: segment offloading mss = 05ea len=2bda
(...)
(The hardware MSS 0x05ea here includes the ethernet headers.)
If I disable all segment offloading on the receiving host and
dump packets using tcpdump -xx like this:
ethtool -K enp2s0 gro off gso off tso off
tcpdump -xx -i enp2s0 host 192.168.1.136
I get segmented packages such as this when running iperf3:
23:16:54.024139 IP OpenWrt.lan.59168 > Fecusia.targus-getdata1:
Flags [.], seq 1486:2934, ack 1, win 4198,
options [nop,nop,TS val 3886192908 ecr 3601341877], length 1448
0x0000: fc34 9701 a0c6 14d6 4da8 3c4f 0800 4500
0x0010: 05dc 16a0 4000 4006 9aa1 c0a8 0188 c0a8
0x0020: 0102 e720 1451 ff25 9822 4c52 29cf 8010
0x0030: 1066 ac8c 0000 0101 080a e7a2 990c d6a8
(...)
0x05c0: 5e49 e109 fe8c 4617 5e18 7a82 7eae d647
0x05d0: e8ee ae64 dc88 c897 3f8a 07a4 3a33 6b1b
0x05e0: 3501 a30f 2758 cc44 4b4a
Several such packets often follow after each other verifying
the segmentation into 0x05a8 (1448) byte packages also on the
reveiving end. As can be seen, the ethernet frames are
0x05ea (1514) in size.
Performance with iperf3 before this patch: ~15.5 Mbit/s
Performance with iperf3 after this patch: ~175 Mbit/s
This was running a 60 second test (twice) the best measurement
was 179 Mbit/s.
For comparison if I run iperf3 with UDP I get around 1.05 Mbit/s
both before and after this patch.
While this is a gigabit ethernet interface, the CPU is a cheap
D-Link DIR-685 router (based on the ARMv5 Faraday FA526 at
~50 MHz), and the software is not supposed to drive traffic,
as the device has a DSA chip, so this kind of numbers can be
expected.
Fixes: ac631873c9e7 ("net: ethernet: cortina: Drop TSO support")
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 5f0c9e1771db..7ebd61a3a49b 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -79,7 +79,8 @@ MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
#define GMAC0_IRQ4_8 (GMAC0_MIB_INT_BIT | GMAC0_RX_OVERRUN_INT_BIT)
#define GMAC_OFFLOAD_FEATURES (NETIF_F_SG | NETIF_F_IP_CSUM | \
- NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM)
+ NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM | \
+ NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
/**
* struct gmac_queue_page - page buffer per-page info
@@ -1148,13 +1149,25 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
skb_frag_t *skb_frag;
dma_addr_t mapping;
void *buffer;
+ u16 mss;
int ret;
- /* TODO: implement proper TSO using MTU in word3 */
word1 = skb->len;
word3 = SOF_BIT;
- if (skb->len >= ETH_FRAME_LEN) {
+ mss = skb_shinfo(skb)->gso_size;
+ if (mss) {
+ /* This means we are dealing with TCP and skb->len is the
+ * sum total of all the segments. The TSO will deal with
+ * chopping this up for us.
+ */
+ /* The accelerator needs the full frame size here */
+ mss += skb_tcp_all_headers(skb);
+ netdev_dbg(netdev, "segment offloading mss = %04x len=%04x\n",
+ mss, skb->len);
+ word1 |= TSS_MTU_ENABLE_BIT;
+ word3 |= mss;
+ } else if (skb->len >= ETH_FRAME_LEN) {
/* Hardware offloaded checksumming isn't working on frames
* bigger than 1514 bytes. A hypothesis about this is that the
* checksum buffer is only 1518 bytes, so when the frames get
@@ -1169,7 +1182,9 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
return ret;
}
word1 |= TSS_BYPASS_BIT;
- } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ }
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
int tcp = 0;
/* We do not switch off the checksumming on non TCP/UDP
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2942dfab630444d46aaa37fb7d629b620abbf6ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101403-widget-outpour-ede3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2942dfab6304 ("net: ethernet: cortina: Restore TSO support")
ac631873c9e7 ("net: ethernet: cortina: Drop TSO support")
d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2942dfab630444d46aaa37fb7d629b620abbf6ba Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 27 May 2024 21:26:44 +0200
Subject: [PATCH] net: ethernet: cortina: Restore TSO support
An earlier commit deleted the TSO support in the Cortina Gemini
driver because the driver was confusing gso_size and MTU,
probably because what the Linux kernel calls "gso_size" was
called "MTU" in the datasheet.
Restore the functionality properly reading the gso_size from
the skbuff.
Tested with iperf3, running a server on a different machine
and client on the device with the cortina gemini ethernet:
Connecting to host 192.168.1.2, port 5201
60008000.ethernet-port eth0: segment offloading mss = 05ea len=1c8a
60008000.ethernet-port eth0: segment offloading mss = 05ea len=1c8a
60008000.ethernet-port eth0: segment offloading mss = 05ea len=27da
60008000.ethernet-port eth0: segment offloading mss = 05ea len=0b92
60008000.ethernet-port eth0: segment offloading mss = 05ea len=2bda
(...)
(The hardware MSS 0x05ea here includes the ethernet headers.)
If I disable all segment offloading on the receiving host and
dump packets using tcpdump -xx like this:
ethtool -K enp2s0 gro off gso off tso off
tcpdump -xx -i enp2s0 host 192.168.1.136
I get segmented packages such as this when running iperf3:
23:16:54.024139 IP OpenWrt.lan.59168 > Fecusia.targus-getdata1:
Flags [.], seq 1486:2934, ack 1, win 4198,
options [nop,nop,TS val 3886192908 ecr 3601341877], length 1448
0x0000: fc34 9701 a0c6 14d6 4da8 3c4f 0800 4500
0x0010: 05dc 16a0 4000 4006 9aa1 c0a8 0188 c0a8
0x0020: 0102 e720 1451 ff25 9822 4c52 29cf 8010
0x0030: 1066 ac8c 0000 0101 080a e7a2 990c d6a8
(...)
0x05c0: 5e49 e109 fe8c 4617 5e18 7a82 7eae d647
0x05d0: e8ee ae64 dc88 c897 3f8a 07a4 3a33 6b1b
0x05e0: 3501 a30f 2758 cc44 4b4a
Several such packets often follow after each other verifying
the segmentation into 0x05a8 (1448) byte packages also on the
reveiving end. As can be seen, the ethernet frames are
0x05ea (1514) in size.
Performance with iperf3 before this patch: ~15.5 Mbit/s
Performance with iperf3 after this patch: ~175 Mbit/s
This was running a 60 second test (twice) the best measurement
was 179 Mbit/s.
For comparison if I run iperf3 with UDP I get around 1.05 Mbit/s
both before and after this patch.
While this is a gigabit ethernet interface, the CPU is a cheap
D-Link DIR-685 router (based on the ARMv5 Faraday FA526 at
~50 MHz), and the software is not supposed to drive traffic,
as the device has a DSA chip, so this kind of numbers can be
expected.
Fixes: ac631873c9e7 ("net: ethernet: cortina: Drop TSO support")
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 5f0c9e1771db..7ebd61a3a49b 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -79,7 +79,8 @@ MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
#define GMAC0_IRQ4_8 (GMAC0_MIB_INT_BIT | GMAC0_RX_OVERRUN_INT_BIT)
#define GMAC_OFFLOAD_FEATURES (NETIF_F_SG | NETIF_F_IP_CSUM | \
- NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM)
+ NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM | \
+ NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
/**
* struct gmac_queue_page - page buffer per-page info
@@ -1148,13 +1149,25 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
skb_frag_t *skb_frag;
dma_addr_t mapping;
void *buffer;
+ u16 mss;
int ret;
- /* TODO: implement proper TSO using MTU in word3 */
word1 = skb->len;
word3 = SOF_BIT;
- if (skb->len >= ETH_FRAME_LEN) {
+ mss = skb_shinfo(skb)->gso_size;
+ if (mss) {
+ /* This means we are dealing with TCP and skb->len is the
+ * sum total of all the segments. The TSO will deal with
+ * chopping this up for us.
+ */
+ /* The accelerator needs the full frame size here */
+ mss += skb_tcp_all_headers(skb);
+ netdev_dbg(netdev, "segment offloading mss = %04x len=%04x\n",
+ mss, skb->len);
+ word1 |= TSS_MTU_ENABLE_BIT;
+ word3 |= mss;
+ } else if (skb->len >= ETH_FRAME_LEN) {
/* Hardware offloaded checksumming isn't working on frames
* bigger than 1514 bytes. A hypothesis about this is that the
* checksum buffer is only 1518 bytes, so when the frames get
@@ -1169,7 +1182,9 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
return ret;
}
word1 |= TSS_BYPASS_BIT;
- } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ }
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
int tcp = 0;
/* We do not switch off the checksumming on non TCP/UDP
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2942dfab630444d46aaa37fb7d629b620abbf6ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101401-skinless-commodity-1b95@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2942dfab6304 ("net: ethernet: cortina: Restore TSO support")
ac631873c9e7 ("net: ethernet: cortina: Drop TSO support")
d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2942dfab630444d46aaa37fb7d629b620abbf6ba Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 27 May 2024 21:26:44 +0200
Subject: [PATCH] net: ethernet: cortina: Restore TSO support
An earlier commit deleted the TSO support in the Cortina Gemini
driver because the driver was confusing gso_size and MTU,
probably because what the Linux kernel calls "gso_size" was
called "MTU" in the datasheet.
Restore the functionality properly reading the gso_size from
the skbuff.
Tested with iperf3, running a server on a different machine
and client on the device with the cortina gemini ethernet:
Connecting to host 192.168.1.2, port 5201
60008000.ethernet-port eth0: segment offloading mss = 05ea len=1c8a
60008000.ethernet-port eth0: segment offloading mss = 05ea len=1c8a
60008000.ethernet-port eth0: segment offloading mss = 05ea len=27da
60008000.ethernet-port eth0: segment offloading mss = 05ea len=0b92
60008000.ethernet-port eth0: segment offloading mss = 05ea len=2bda
(...)
(The hardware MSS 0x05ea here includes the ethernet headers.)
If I disable all segment offloading on the receiving host and
dump packets using tcpdump -xx like this:
ethtool -K enp2s0 gro off gso off tso off
tcpdump -xx -i enp2s0 host 192.168.1.136
I get segmented packages such as this when running iperf3:
23:16:54.024139 IP OpenWrt.lan.59168 > Fecusia.targus-getdata1:
Flags [.], seq 1486:2934, ack 1, win 4198,
options [nop,nop,TS val 3886192908 ecr 3601341877], length 1448
0x0000: fc34 9701 a0c6 14d6 4da8 3c4f 0800 4500
0x0010: 05dc 16a0 4000 4006 9aa1 c0a8 0188 c0a8
0x0020: 0102 e720 1451 ff25 9822 4c52 29cf 8010
0x0030: 1066 ac8c 0000 0101 080a e7a2 990c d6a8
(...)
0x05c0: 5e49 e109 fe8c 4617 5e18 7a82 7eae d647
0x05d0: e8ee ae64 dc88 c897 3f8a 07a4 3a33 6b1b
0x05e0: 3501 a30f 2758 cc44 4b4a
Several such packets often follow after each other verifying
the segmentation into 0x05a8 (1448) byte packages also on the
reveiving end. As can be seen, the ethernet frames are
0x05ea (1514) in size.
Performance with iperf3 before this patch: ~15.5 Mbit/s
Performance with iperf3 after this patch: ~175 Mbit/s
This was running a 60 second test (twice) the best measurement
was 179 Mbit/s.
For comparison if I run iperf3 with UDP I get around 1.05 Mbit/s
both before and after this patch.
While this is a gigabit ethernet interface, the CPU is a cheap
D-Link DIR-685 router (based on the ARMv5 Faraday FA526 at
~50 MHz), and the software is not supposed to drive traffic,
as the device has a DSA chip, so this kind of numbers can be
expected.
Fixes: ac631873c9e7 ("net: ethernet: cortina: Drop TSO support")
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 5f0c9e1771db..7ebd61a3a49b 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -79,7 +79,8 @@ MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
#define GMAC0_IRQ4_8 (GMAC0_MIB_INT_BIT | GMAC0_RX_OVERRUN_INT_BIT)
#define GMAC_OFFLOAD_FEATURES (NETIF_F_SG | NETIF_F_IP_CSUM | \
- NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM)
+ NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM | \
+ NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
/**
* struct gmac_queue_page - page buffer per-page info
@@ -1148,13 +1149,25 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
skb_frag_t *skb_frag;
dma_addr_t mapping;
void *buffer;
+ u16 mss;
int ret;
- /* TODO: implement proper TSO using MTU in word3 */
word1 = skb->len;
word3 = SOF_BIT;
- if (skb->len >= ETH_FRAME_LEN) {
+ mss = skb_shinfo(skb)->gso_size;
+ if (mss) {
+ /* This means we are dealing with TCP and skb->len is the
+ * sum total of all the segments. The TSO will deal with
+ * chopping this up for us.
+ */
+ /* The accelerator needs the full frame size here */
+ mss += skb_tcp_all_headers(skb);
+ netdev_dbg(netdev, "segment offloading mss = %04x len=%04x\n",
+ mss, skb->len);
+ word1 |= TSS_MTU_ENABLE_BIT;
+ word3 |= mss;
+ } else if (skb->len >= ETH_FRAME_LEN) {
/* Hardware offloaded checksumming isn't working on frames
* bigger than 1514 bytes. A hypothesis about this is that the
* checksum buffer is only 1518 bytes, so when the frames get
@@ -1169,7 +1182,9 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
return ret;
}
word1 |= TSS_BYPASS_BIT;
- } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ }
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
int tcp = 0;
/* We do not switch off the checksumming on non TCP/UDP
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 119d51e225febc8152476340a880f5415a01e99e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101443-lunchroom-refinish-4251@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
119d51e225fe ("mptcp: fallback when MPTCP opts are dropped after 1st data")
ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure")
4cf86ae84c71 ("mptcp: strict local address ID selection")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 119d51e225febc8152476340a880f5415a01e99e Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Tue, 8 Oct 2024 13:04:54 +0200
Subject: [PATCH] mptcp: fallback when MPTCP opts are dropped after 1st data
As reported by Christoph [1], before this patch, an MPTCP connection was
wrongly reset when a host received a first data packet with MPTCP
options after the 3wHS, but got the next ones without.
According to the MPTCP v1 specs [2], a fallback should happen in this
case, because the host didn't receive a DATA_ACK from the other peer,
nor receive data for more than the initial window which implies a
DATA_ACK being received by the other peer.
The patch here re-uses the same logic as the one used in other places:
by looking at allow_infinite_fallback, which is disabled at the creation
of an additional subflow. It's not looking at the first DATA_ACK (or
implying one received from the other side) as suggested by the RFC, but
it is in continuation with what was already done, which is safer, and it
fixes the reported issue. The next step, looking at this first DATA_ACK,
is tracked in [4].
This patch has been validated using the following Packetdrill script:
0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
// 3WHS is OK
+0.0 < S 0:0(0) win 65535 <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
+0.0 > S. 0:0(0) ack 1 <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
+0.1 < . 1:1(0) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
+0 accept(3, ..., ...) = 4
// Data from the client with valid MPTCP options (no DATA_ACK: normal)
+0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
// From here, the MPTCP options will be dropped by a middlebox
+0.0 > . 1:1(0) ack 501 <dss dack8=501 dll=0 nocs>
+0.1 read(4, ..., 500) = 500
+0 write(4, ..., 100) = 100
// The server replies with data, still thinking MPTCP is being used
+0.0 > P. 1:101(100) ack 501 <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
// But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
+0.1 < . 501:501(0) ack 101 win 2048
+0.0 < P. 501:601(100) ack 101 win 2048
// The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
+0.0 > . 101:101(0) ack 601
Note that this script requires Packetdrill with MPTCP support, see [3].
Fixes: dea2b1ea9c70 ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
Cc: stable(a)vger.kernel.org
Reported-by: Christoph Paasch <cpaasch(a)apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
Link: https://github.com/multipath-tcp/packetdrill [3]
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
Reviewed-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-3-c6fb8e93e55…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index e1046a696ab5..25dde81bcb75 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1282,7 +1282,7 @@ static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
else if (READ_ONCE(msk->csum_enabled))
return !subflow->valid_csum_seen;
else
- return !subflow->fully_established;
+ return READ_ONCE(msk->allow_infinite_fallback);
}
static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 18fd04ad856df07733f5bb07e7f7168e7443d393
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101446-approve-rants-581d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
18fd04ad856d ("Bluetooth: hci_conn: Fix UAF in hci_enhanced_setup_sync")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 18fd04ad856df07733f5bb07e7f7168e7443d393 Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Date: Wed, 2 Oct 2024 11:17:26 -0400
Subject: [PATCH] Bluetooth: hci_conn: Fix UAF in hci_enhanced_setup_sync
This checks if the ACL connection remains valid as it could be destroyed
while hci_enhanced_setup_sync is pending on cmd_sync leading to the
following trace:
BUG: KASAN: slab-use-after-free in hci_enhanced_setup_sync+0x91b/0xa60
Read of size 1 at addr ffff888002328ffd by task kworker/u5:2/37
CPU: 0 UID: 0 PID: 37 Comm: kworker/u5:2 Not tainted 6.11.0-rc6-01300-g810be445d8d6 #7099
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
? hci_enhanced_setup_sync+0x91b/0xa60
print_report+0x152/0x4c0
? hci_enhanced_setup_sync+0x91b/0xa60
? __virt_addr_valid+0x1fa/0x420
? hci_enhanced_setup_sync+0x91b/0xa60
kasan_report+0xda/0x1b0
? hci_enhanced_setup_sync+0x91b/0xa60
hci_enhanced_setup_sync+0x91b/0xa60
? __pfx_hci_enhanced_setup_sync+0x10/0x10
? __pfx___mutex_lock+0x10/0x10
hci_cmd_sync_work+0x1c2/0x330
process_one_work+0x7d9/0x1360
? __pfx_lock_acquire+0x10/0x10
? __pfx_process_one_work+0x10/0x10
? assign_work+0x167/0x240
worker_thread+0x5b7/0xf60
? __kthread_parkme+0xac/0x1c0
? __pfx_worker_thread+0x10/0x10
? __pfx_worker_thread+0x10/0x10
kthread+0x293/0x360
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Allocated by task 34:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_kmalloc+0x8f/0xa0
__hci_conn_add+0x187/0x17d0
hci_connect_sco+0x2e1/0xb90
sco_sock_connect+0x2a2/0xb80
__sys_connect+0x227/0x2a0
__x64_sys_connect+0x6d/0xb0
do_syscall_64+0x71/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 37:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x101/0x160
kfree+0xd0/0x250
device_release+0x9a/0x210
kobject_put+0x151/0x280
hci_conn_del+0x448/0xbf0
hci_abort_conn_sync+0x46f/0x980
hci_cmd_sync_work+0x1c2/0x330
process_one_work+0x7d9/0x1360
worker_thread+0x5b7/0xf60
kthread+0x293/0x360
ret_from_fork+0x2f/0x70
ret_from_fork_asm+0x1a/0x30
Cc: stable(a)vger.kernel.org
Fixes: e07a06b4eb41 ("Bluetooth: Convert SCO configure_datapath to hci_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index d083117ee36c..c4c74b82ed21 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -289,6 +289,9 @@ static int hci_enhanced_setup_sync(struct hci_dev *hdev, void *data)
kfree(conn_handle);
+ if (!hci_conn_valid(hdev, conn))
+ return -ECANCELED;
+
bt_dev_dbg(hdev, "hcon %p", conn);
configure_datapath_sync(hdev, &conn->codec);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0b2ad4f6f2bec74a5287d96cb2325a5e11706f22
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101425-pug-throwback-22c7@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
0b2ad4f6f2be ("drm/vc4: Stop the active perfmon before being destroyed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0b2ad4f6f2bec74a5287d96cb2325a5e11706f22 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ma=C3=ADra=20Canal?= <mcanal(a)igalia.com>
Date: Fri, 4 Oct 2024 09:36:00 -0300
Subject: [PATCH] drm/vc4: Stop the active perfmon before being destroyed
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Upon closing the file descriptor, the active performance monitor is not
stopped. Although all perfmons are destroyed in `vc4_perfmon_close_file()`,
the active performance monitor's pointer (`vc4->active_perfmon`) is still
retained.
If we open a new file descriptor and submit a few jobs with performance
monitors, the driver will attempt to stop the active performance monitor
using the stale pointer in `vc4->active_perfmon`. However, this pointer
is no longer valid because the previous process has already terminated,
and all performance monitors associated with it have been destroyed and
freed.
To fix this, when the active performance monitor belongs to a given
process, explicitly stop it before destroying and freeing it.
Cc: stable(a)vger.kernel.org # v4.17+
Cc: Boris Brezillon <bbrezillon(a)kernel.org>
Cc: Juan A. Suarez Romero <jasuarez(a)igalia.com>
Fixes: 65101d8c9108 ("drm/vc4: Expose performance counters to userspace")
Signed-off-by: Maíra Canal <mcanal(a)igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez(a)igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004123817.890016-2-mcana…
diff --git a/drivers/gpu/drm/vc4/vc4_perfmon.c b/drivers/gpu/drm/vc4/vc4_perfmon.c
index c4ac2c946238..c00a5cc2316d 100644
--- a/drivers/gpu/drm/vc4/vc4_perfmon.c
+++ b/drivers/gpu/drm/vc4/vc4_perfmon.c
@@ -116,6 +116,11 @@ void vc4_perfmon_open_file(struct vc4_file *vc4file)
static int vc4_perfmon_idr_del(int id, void *elem, void *data)
{
struct vc4_perfmon *perfmon = elem;
+ struct vc4_dev *vc4 = (struct vc4_dev *)data;
+
+ /* If the active perfmon is being destroyed, stop it first */
+ if (perfmon == vc4->active_perfmon)
+ vc4_perfmon_stop(vc4, perfmon, false);
vc4_perfmon_put(perfmon);
@@ -130,7 +135,7 @@ void vc4_perfmon_close_file(struct vc4_file *vc4file)
return;
mutex_lock(&vc4file->perfmon.lock);
- idr_for_each(&vc4file->perfmon.idr, vc4_perfmon_idr_del, NULL);
+ idr_for_each(&vc4file->perfmon.idr, vc4_perfmon_idr_del, vc4);
idr_destroy(&vc4file->perfmon.idr);
mutex_unlock(&vc4file->perfmon.lock);
mutex_destroy(&vc4file->perfmon.lock);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 0b2ad4f6f2bec74a5287d96cb2325a5e11706f22
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101424-skinning-tightness-ecf1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
0b2ad4f6f2be ("drm/vc4: Stop the active perfmon before being destroyed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0b2ad4f6f2bec74a5287d96cb2325a5e11706f22 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ma=C3=ADra=20Canal?= <mcanal(a)igalia.com>
Date: Fri, 4 Oct 2024 09:36:00 -0300
Subject: [PATCH] drm/vc4: Stop the active perfmon before being destroyed
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Upon closing the file descriptor, the active performance monitor is not
stopped. Although all perfmons are destroyed in `vc4_perfmon_close_file()`,
the active performance monitor's pointer (`vc4->active_perfmon`) is still
retained.
If we open a new file descriptor and submit a few jobs with performance
monitors, the driver will attempt to stop the active performance monitor
using the stale pointer in `vc4->active_perfmon`. However, this pointer
is no longer valid because the previous process has already terminated,
and all performance monitors associated with it have been destroyed and
freed.
To fix this, when the active performance monitor belongs to a given
process, explicitly stop it before destroying and freeing it.
Cc: stable(a)vger.kernel.org # v4.17+
Cc: Boris Brezillon <bbrezillon(a)kernel.org>
Cc: Juan A. Suarez Romero <jasuarez(a)igalia.com>
Fixes: 65101d8c9108 ("drm/vc4: Expose performance counters to userspace")
Signed-off-by: Maíra Canal <mcanal(a)igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez(a)igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004123817.890016-2-mcana…
diff --git a/drivers/gpu/drm/vc4/vc4_perfmon.c b/drivers/gpu/drm/vc4/vc4_perfmon.c
index c4ac2c946238..c00a5cc2316d 100644
--- a/drivers/gpu/drm/vc4/vc4_perfmon.c
+++ b/drivers/gpu/drm/vc4/vc4_perfmon.c
@@ -116,6 +116,11 @@ void vc4_perfmon_open_file(struct vc4_file *vc4file)
static int vc4_perfmon_idr_del(int id, void *elem, void *data)
{
struct vc4_perfmon *perfmon = elem;
+ struct vc4_dev *vc4 = (struct vc4_dev *)data;
+
+ /* If the active perfmon is being destroyed, stop it first */
+ if (perfmon == vc4->active_perfmon)
+ vc4_perfmon_stop(vc4, perfmon, false);
vc4_perfmon_put(perfmon);
@@ -130,7 +135,7 @@ void vc4_perfmon_close_file(struct vc4_file *vc4file)
return;
mutex_lock(&vc4file->perfmon.lock);
- idr_for_each(&vc4file->perfmon.idr, vc4_perfmon_idr_del, NULL);
+ idr_for_each(&vc4file->perfmon.idr, vc4_perfmon_idr_del, vc4);
idr_destroy(&vc4file->perfmon.idr);
mutex_unlock(&vc4file->perfmon.lock);
mutex_destroy(&vc4file->perfmon.lock);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 0b2ad4f6f2bec74a5287d96cb2325a5e11706f22
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101412-stipulate-plentiful-20b6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
0b2ad4f6f2be ("drm/vc4: Stop the active perfmon before being destroyed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0b2ad4f6f2bec74a5287d96cb2325a5e11706f22 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ma=C3=ADra=20Canal?= <mcanal(a)igalia.com>
Date: Fri, 4 Oct 2024 09:36:00 -0300
Subject: [PATCH] drm/vc4: Stop the active perfmon before being destroyed
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Upon closing the file descriptor, the active performance monitor is not
stopped. Although all perfmons are destroyed in `vc4_perfmon_close_file()`,
the active performance monitor's pointer (`vc4->active_perfmon`) is still
retained.
If we open a new file descriptor and submit a few jobs with performance
monitors, the driver will attempt to stop the active performance monitor
using the stale pointer in `vc4->active_perfmon`. However, this pointer
is no longer valid because the previous process has already terminated,
and all performance monitors associated with it have been destroyed and
freed.
To fix this, when the active performance monitor belongs to a given
process, explicitly stop it before destroying and freeing it.
Cc: stable(a)vger.kernel.org # v4.17+
Cc: Boris Brezillon <bbrezillon(a)kernel.org>
Cc: Juan A. Suarez Romero <jasuarez(a)igalia.com>
Fixes: 65101d8c9108 ("drm/vc4: Expose performance counters to userspace")
Signed-off-by: Maíra Canal <mcanal(a)igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez(a)igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004123817.890016-2-mcana…
diff --git a/drivers/gpu/drm/vc4/vc4_perfmon.c b/drivers/gpu/drm/vc4/vc4_perfmon.c
index c4ac2c946238..c00a5cc2316d 100644
--- a/drivers/gpu/drm/vc4/vc4_perfmon.c
+++ b/drivers/gpu/drm/vc4/vc4_perfmon.c
@@ -116,6 +116,11 @@ void vc4_perfmon_open_file(struct vc4_file *vc4file)
static int vc4_perfmon_idr_del(int id, void *elem, void *data)
{
struct vc4_perfmon *perfmon = elem;
+ struct vc4_dev *vc4 = (struct vc4_dev *)data;
+
+ /* If the active perfmon is being destroyed, stop it first */
+ if (perfmon == vc4->active_perfmon)
+ vc4_perfmon_stop(vc4, perfmon, false);
vc4_perfmon_put(perfmon);
@@ -130,7 +135,7 @@ void vc4_perfmon_close_file(struct vc4_file *vc4file)
return;
mutex_lock(&vc4file->perfmon.lock);
- idr_for_each(&vc4file->perfmon.idr, vc4_perfmon_idr_del, NULL);
+ idr_for_each(&vc4file->perfmon.idr, vc4_perfmon_idr_del, vc4);
idr_destroy(&vc4file->perfmon.idr);
mutex_unlock(&vc4file->perfmon.lock);
mutex_destroy(&vc4file->perfmon.lock);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 5c14e51d2d7df49fe0d4e64a12c58d2542f452ff
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101451-twisting-shorts-fd38@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
5c14e51d2d7d ("net: dsa: lan9303: ensure chip reset and wait for READY status")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5c14e51d2d7df49fe0d4e64a12c58d2542f452ff Mon Sep 17 00:00:00 2001
From: Anatolij Gustschin <agust(a)denx.de>
Date: Fri, 4 Oct 2024 13:36:54 +0200
Subject: [PATCH] net: dsa: lan9303: ensure chip reset and wait for READY
status
Accessing device registers seems to be not reliable, the chip
revision is sometimes detected wrongly (0 instead of expected 1).
Ensure that the chip reset is performed via reset GPIO and then
wait for 'Device Ready' status in HW_CFG register before doing
any register initializations.
Cc: stable(a)vger.kernel.org
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Anatolij Gustschin <agust(a)denx.de>
[alex: reworked using read_poll_timeout()]
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Link: https://patch.msgid.link/20241004113655.3436296-1-alexander.sverdlin@siemen…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 268949939636..d246f95d57ec 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -6,6 +6,7 @@
#include <linux/module.h>
#include <linux/gpio/consumer.h>
#include <linux/regmap.h>
+#include <linux/iopoll.h>
#include <linux/mutex.h>
#include <linux/mii.h>
#include <linux/of.h>
@@ -839,6 +840,8 @@ static void lan9303_handle_reset(struct lan9303 *chip)
if (!chip->reset_gpio)
return;
+ gpiod_set_value_cansleep(chip->reset_gpio, 1);
+
if (chip->reset_duration != 0)
msleep(chip->reset_duration);
@@ -864,8 +867,34 @@ static int lan9303_disable_processing(struct lan9303 *chip)
static int lan9303_check_device(struct lan9303 *chip)
{
int ret;
+ int err;
u32 reg;
+ /* In I2C-managed configurations this polling loop will clash with
+ * switch's reading of EEPROM right after reset and this behaviour is
+ * not configurable. While lan9303_read() already has quite long retry
+ * timeout, seems not all cases are being detected as arbitration error.
+ *
+ * According to datasheet, EEPROM loader has 30ms timeout (in case of
+ * missing EEPROM).
+ *
+ * Loading of the largest supported EEPROM is expected to take at least
+ * 5.9s.
+ */
+ err = read_poll_timeout(lan9303_read, ret,
+ !ret && reg & LAN9303_HW_CFG_READY,
+ 20000, 6000000, false,
+ chip->regmap, LAN9303_HW_CFG, ®);
+ if (ret) {
+ dev_err(chip->dev, "failed to read HW_CFG reg: %pe\n",
+ ERR_PTR(ret));
+ return ret;
+ }
+ if (err) {
+ dev_err(chip->dev, "HW_CFG not ready: 0x%08x\n", reg);
+ return err;
+ }
+
ret = lan9303_read(chip->regmap, LAN9303_CHIP_REV, ®);
if (ret) {
dev_err(chip->dev, "failed to read chip revision register: %d\n",
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 0b2ad4f6f2bec74a5287d96cb2325a5e11706f22
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101431-able-caress-7fe5@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
0b2ad4f6f2be ("drm/vc4: Stop the active perfmon before being destroyed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0b2ad4f6f2bec74a5287d96cb2325a5e11706f22 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ma=C3=ADra=20Canal?= <mcanal(a)igalia.com>
Date: Fri, 4 Oct 2024 09:36:00 -0300
Subject: [PATCH] drm/vc4: Stop the active perfmon before being destroyed
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Upon closing the file descriptor, the active performance monitor is not
stopped. Although all perfmons are destroyed in `vc4_perfmon_close_file()`,
the active performance monitor's pointer (`vc4->active_perfmon`) is still
retained.
If we open a new file descriptor and submit a few jobs with performance
monitors, the driver will attempt to stop the active performance monitor
using the stale pointer in `vc4->active_perfmon`. However, this pointer
is no longer valid because the previous process has already terminated,
and all performance monitors associated with it have been destroyed and
freed.
To fix this, when the active performance monitor belongs to a given
process, explicitly stop it before destroying and freeing it.
Cc: stable(a)vger.kernel.org # v4.17+
Cc: Boris Brezillon <bbrezillon(a)kernel.org>
Cc: Juan A. Suarez Romero <jasuarez(a)igalia.com>
Fixes: 65101d8c9108 ("drm/vc4: Expose performance counters to userspace")
Signed-off-by: Maíra Canal <mcanal(a)igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez(a)igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004123817.890016-2-mcana…
diff --git a/drivers/gpu/drm/vc4/vc4_perfmon.c b/drivers/gpu/drm/vc4/vc4_perfmon.c
index c4ac2c946238..c00a5cc2316d 100644
--- a/drivers/gpu/drm/vc4/vc4_perfmon.c
+++ b/drivers/gpu/drm/vc4/vc4_perfmon.c
@@ -116,6 +116,11 @@ void vc4_perfmon_open_file(struct vc4_file *vc4file)
static int vc4_perfmon_idr_del(int id, void *elem, void *data)
{
struct vc4_perfmon *perfmon = elem;
+ struct vc4_dev *vc4 = (struct vc4_dev *)data;
+
+ /* If the active perfmon is being destroyed, stop it first */
+ if (perfmon == vc4->active_perfmon)
+ vc4_perfmon_stop(vc4, perfmon, false);
vc4_perfmon_put(perfmon);
@@ -130,7 +135,7 @@ void vc4_perfmon_close_file(struct vc4_file *vc4file)
return;
mutex_lock(&vc4file->perfmon.lock);
- idr_for_each(&vc4file->perfmon.idr, vc4_perfmon_idr_del, NULL);
+ idr_for_each(&vc4file->perfmon.idr, vc4_perfmon_idr_del, vc4);
idr_destroy(&vc4file->perfmon.idr);
mutex_unlock(&vc4file->perfmon.lock);
mutex_destroy(&vc4file->perfmon.lock);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 5c14e51d2d7df49fe0d4e64a12c58d2542f452ff
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101400-rectal-cattle-3833@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
5c14e51d2d7d ("net: dsa: lan9303: ensure chip reset and wait for READY status")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5c14e51d2d7df49fe0d4e64a12c58d2542f452ff Mon Sep 17 00:00:00 2001
From: Anatolij Gustschin <agust(a)denx.de>
Date: Fri, 4 Oct 2024 13:36:54 +0200
Subject: [PATCH] net: dsa: lan9303: ensure chip reset and wait for READY
status
Accessing device registers seems to be not reliable, the chip
revision is sometimes detected wrongly (0 instead of expected 1).
Ensure that the chip reset is performed via reset GPIO and then
wait for 'Device Ready' status in HW_CFG register before doing
any register initializations.
Cc: stable(a)vger.kernel.org
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Anatolij Gustschin <agust(a)denx.de>
[alex: reworked using read_poll_timeout()]
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Link: https://patch.msgid.link/20241004113655.3436296-1-alexander.sverdlin@siemen…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 268949939636..d246f95d57ec 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -6,6 +6,7 @@
#include <linux/module.h>
#include <linux/gpio/consumer.h>
#include <linux/regmap.h>
+#include <linux/iopoll.h>
#include <linux/mutex.h>
#include <linux/mii.h>
#include <linux/of.h>
@@ -839,6 +840,8 @@ static void lan9303_handle_reset(struct lan9303 *chip)
if (!chip->reset_gpio)
return;
+ gpiod_set_value_cansleep(chip->reset_gpio, 1);
+
if (chip->reset_duration != 0)
msleep(chip->reset_duration);
@@ -864,8 +867,34 @@ static int lan9303_disable_processing(struct lan9303 *chip)
static int lan9303_check_device(struct lan9303 *chip)
{
int ret;
+ int err;
u32 reg;
+ /* In I2C-managed configurations this polling loop will clash with
+ * switch's reading of EEPROM right after reset and this behaviour is
+ * not configurable. While lan9303_read() already has quite long retry
+ * timeout, seems not all cases are being detected as arbitration error.
+ *
+ * According to datasheet, EEPROM loader has 30ms timeout (in case of
+ * missing EEPROM).
+ *
+ * Loading of the largest supported EEPROM is expected to take at least
+ * 5.9s.
+ */
+ err = read_poll_timeout(lan9303_read, ret,
+ !ret && reg & LAN9303_HW_CFG_READY,
+ 20000, 6000000, false,
+ chip->regmap, LAN9303_HW_CFG, ®);
+ if (ret) {
+ dev_err(chip->dev, "failed to read HW_CFG reg: %pe\n",
+ ERR_PTR(ret));
+ return ret;
+ }
+ if (err) {
+ dev_err(chip->dev, "HW_CFG not ready: 0x%08x\n", reg);
+ return err;
+ }
+
ret = lan9303_read(chip->regmap, LAN9303_CHIP_REV, ®);
if (ret) {
dev_err(chip->dev, "failed to read chip revision register: %d\n",
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 69313850dce33ce8c24b38576a279421f4c60996
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101413-famine-default-fecd@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
69313850dce3 ("btrfs: add cancellation points to trim loops")
a99fcb015897 ("btrfs: split remaining space to discard in chunks")
602035d7fecf ("btrfs: add forward declarations and headers, part 2")
22b46bdc5f11 ("btrfs: add forward declarations and headers, part 1")
2b712e3bb2c4 ("btrfs: remove unused included headers")
f86f7a75e2fb ("btrfs: use the flags of an extent map to identify the compression type")
1a9fb16c6052 ("btrfs: avoid useless rbtree iterations when attempting to merge extent map")
ad21f15b0f79 ("btrfs: switch to the new mount API")
f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes")
3bb17a25bcb0 ("btrfs: add get_tree callback for new mount API")
eddb1a433f26 ("btrfs: add reconfigure callback for fs_context")
0f85e244dfc5 ("btrfs: add fs context handling functions")
17b3612022fe ("btrfs: add parse_param callback for the new mount API")
15ddcdd34ebf ("btrfs: add fs_parameter definitions")
a6a8f22a4af6 ("btrfs: move space cache settings into open_ctree")
2b41b19dd6d0 ("btrfs: split out the mount option validation code into its own helper")
3c0e918b8fb3 ("btrfs: remove no longer used EXTENT_MAP_DELALLOC block start value")
7dc66abb5a47 ("btrfs: use a dedicated data structure for chunk maps")
2ecec0d6a5b5 ("btrfs: unexport extent_map_block_end()")
3128b548c759 ("btrfs: split assert into two different asserts when removing block group")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 69313850dce33ce8c24b38576a279421f4c60996 Mon Sep 17 00:00:00 2001
From: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Date: Tue, 17 Sep 2024 22:33:05 +0200
Subject: [PATCH] btrfs: add cancellation points to trim loops
There are reports that system cannot suspend due to running trim because
the task responsible for trimming the device isn't able to finish in
time, especially since we have a free extent discarding phase, which can
trim a lot of unallocated space. There are no limits on the trim size
(unlike the block group part).
Since trime isn't a critical call it can be interrupted at any time,
in such cases we stop the trim, report the amount of discarded bytes and
return an error.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
CC: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ad70548d1f72..d9f511babd89 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1316,6 +1316,11 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
start += bytes_to_discard;
bytes_left -= bytes_to_discard;
*discarded_bytes += bytes_to_discard;
+
+ if (btrfs_trim_interrupted()) {
+ ret = -ERESTARTSYS;
+ break;
+ }
}
return ret;
@@ -6470,7 +6475,7 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
start += len;
*trimmed += bytes;
- if (fatal_signal_pending(current)) {
+ if (btrfs_trim_interrupted()) {
ret = -ERESTARTSYS;
break;
}
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index eaa1dbd31352..f4bcb2530660 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3809,7 +3809,7 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
if (async && *total_trimmed)
break;
- if (fatal_signal_pending(current)) {
+ if (btrfs_trim_interrupted()) {
ret = -ERESTARTSYS;
break;
}
@@ -4000,7 +4000,7 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
}
block_group->discard_cursor = start;
- if (fatal_signal_pending(current)) {
+ if (btrfs_trim_interrupted()) {
if (start != offset)
reset_trimming_bitmap(ctl, offset);
ret = -ERESTARTSYS;
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 83774bfd7b3b..9f1dbfdee8ca 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -10,6 +10,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/mutex.h>
+#include <linux/freezer.h>
#include "fs.h"
struct inode;
@@ -56,6 +57,11 @@ static inline bool btrfs_free_space_trimming_bitmap(
return (info->trim_state == BTRFS_TRIM_STATE_TRIMMING);
}
+static inline bool btrfs_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
/*
* Deltas are an effective way to populate global statistics. Give macro names
* to make it clear what we're doing. An example is discard_extents in
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 69313850dce33ce8c24b38576a279421f4c60996
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101412-backfire-pacific-1f0b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
69313850dce3 ("btrfs: add cancellation points to trim loops")
a99fcb015897 ("btrfs: split remaining space to discard in chunks")
602035d7fecf ("btrfs: add forward declarations and headers, part 2")
22b46bdc5f11 ("btrfs: add forward declarations and headers, part 1")
2b712e3bb2c4 ("btrfs: remove unused included headers")
f86f7a75e2fb ("btrfs: use the flags of an extent map to identify the compression type")
1a9fb16c6052 ("btrfs: avoid useless rbtree iterations when attempting to merge extent map")
ad21f15b0f79 ("btrfs: switch to the new mount API")
f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes")
3bb17a25bcb0 ("btrfs: add get_tree callback for new mount API")
eddb1a433f26 ("btrfs: add reconfigure callback for fs_context")
0f85e244dfc5 ("btrfs: add fs context handling functions")
17b3612022fe ("btrfs: add parse_param callback for the new mount API")
15ddcdd34ebf ("btrfs: add fs_parameter definitions")
a6a8f22a4af6 ("btrfs: move space cache settings into open_ctree")
2b41b19dd6d0 ("btrfs: split out the mount option validation code into its own helper")
3c0e918b8fb3 ("btrfs: remove no longer used EXTENT_MAP_DELALLOC block start value")
7dc66abb5a47 ("btrfs: use a dedicated data structure for chunk maps")
2ecec0d6a5b5 ("btrfs: unexport extent_map_block_end()")
3128b548c759 ("btrfs: split assert into two different asserts when removing block group")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 69313850dce33ce8c24b38576a279421f4c60996 Mon Sep 17 00:00:00 2001
From: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Date: Tue, 17 Sep 2024 22:33:05 +0200
Subject: [PATCH] btrfs: add cancellation points to trim loops
There are reports that system cannot suspend due to running trim because
the task responsible for trimming the device isn't able to finish in
time, especially since we have a free extent discarding phase, which can
trim a lot of unallocated space. There are no limits on the trim size
(unlike the block group part).
Since trime isn't a critical call it can be interrupted at any time,
in such cases we stop the trim, report the amount of discarded bytes and
return an error.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
CC: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ad70548d1f72..d9f511babd89 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1316,6 +1316,11 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
start += bytes_to_discard;
bytes_left -= bytes_to_discard;
*discarded_bytes += bytes_to_discard;
+
+ if (btrfs_trim_interrupted()) {
+ ret = -ERESTARTSYS;
+ break;
+ }
}
return ret;
@@ -6470,7 +6475,7 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
start += len;
*trimmed += bytes;
- if (fatal_signal_pending(current)) {
+ if (btrfs_trim_interrupted()) {
ret = -ERESTARTSYS;
break;
}
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index eaa1dbd31352..f4bcb2530660 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3809,7 +3809,7 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
if (async && *total_trimmed)
break;
- if (fatal_signal_pending(current)) {
+ if (btrfs_trim_interrupted()) {
ret = -ERESTARTSYS;
break;
}
@@ -4000,7 +4000,7 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
}
block_group->discard_cursor = start;
- if (fatal_signal_pending(current)) {
+ if (btrfs_trim_interrupted()) {
if (start != offset)
reset_trimming_bitmap(ctl, offset);
ret = -ERESTARTSYS;
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 83774bfd7b3b..9f1dbfdee8ca 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -10,6 +10,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/mutex.h>
+#include <linux/freezer.h>
#include "fs.h"
struct inode;
@@ -56,6 +57,11 @@ static inline bool btrfs_free_space_trimming_bitmap(
return (info->trim_state == BTRFS_TRIM_STATE_TRIMMING);
}
+static inline bool btrfs_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
/*
* Deltas are an effective way to populate global statistics. Give macro names
* to make it clear what we're doing. An example is discard_extents in
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 69313850dce33ce8c24b38576a279421f4c60996
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101412-diaphragm-sly-ea40@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
69313850dce3 ("btrfs: add cancellation points to trim loops")
a99fcb015897 ("btrfs: split remaining space to discard in chunks")
602035d7fecf ("btrfs: add forward declarations and headers, part 2")
22b46bdc5f11 ("btrfs: add forward declarations and headers, part 1")
2b712e3bb2c4 ("btrfs: remove unused included headers")
f86f7a75e2fb ("btrfs: use the flags of an extent map to identify the compression type")
1a9fb16c6052 ("btrfs: avoid useless rbtree iterations when attempting to merge extent map")
ad21f15b0f79 ("btrfs: switch to the new mount API")
f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes")
3bb17a25bcb0 ("btrfs: add get_tree callback for new mount API")
eddb1a433f26 ("btrfs: add reconfigure callback for fs_context")
0f85e244dfc5 ("btrfs: add fs context handling functions")
17b3612022fe ("btrfs: add parse_param callback for the new mount API")
15ddcdd34ebf ("btrfs: add fs_parameter definitions")
a6a8f22a4af6 ("btrfs: move space cache settings into open_ctree")
2b41b19dd6d0 ("btrfs: split out the mount option validation code into its own helper")
3c0e918b8fb3 ("btrfs: remove no longer used EXTENT_MAP_DELALLOC block start value")
7dc66abb5a47 ("btrfs: use a dedicated data structure for chunk maps")
2ecec0d6a5b5 ("btrfs: unexport extent_map_block_end()")
3128b548c759 ("btrfs: split assert into two different asserts when removing block group")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 69313850dce33ce8c24b38576a279421f4c60996 Mon Sep 17 00:00:00 2001
From: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Date: Tue, 17 Sep 2024 22:33:05 +0200
Subject: [PATCH] btrfs: add cancellation points to trim loops
There are reports that system cannot suspend due to running trim because
the task responsible for trimming the device isn't able to finish in
time, especially since we have a free extent discarding phase, which can
trim a lot of unallocated space. There are no limits on the trim size
(unlike the block group part).
Since trime isn't a critical call it can be interrupted at any time,
in such cases we stop the trim, report the amount of discarded bytes and
return an error.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
CC: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ad70548d1f72..d9f511babd89 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1316,6 +1316,11 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
start += bytes_to_discard;
bytes_left -= bytes_to_discard;
*discarded_bytes += bytes_to_discard;
+
+ if (btrfs_trim_interrupted()) {
+ ret = -ERESTARTSYS;
+ break;
+ }
}
return ret;
@@ -6470,7 +6475,7 @@ static int btrfs_trim_free_extents(struct btrfs_device *device, u64 *trimmed)
start += len;
*trimmed += bytes;
- if (fatal_signal_pending(current)) {
+ if (btrfs_trim_interrupted()) {
ret = -ERESTARTSYS;
break;
}
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index eaa1dbd31352..f4bcb2530660 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -3809,7 +3809,7 @@ static int trim_no_bitmap(struct btrfs_block_group *block_group,
if (async && *total_trimmed)
break;
- if (fatal_signal_pending(current)) {
+ if (btrfs_trim_interrupted()) {
ret = -ERESTARTSYS;
break;
}
@@ -4000,7 +4000,7 @@ static int trim_bitmaps(struct btrfs_block_group *block_group,
}
block_group->discard_cursor = start;
- if (fatal_signal_pending(current)) {
+ if (btrfs_trim_interrupted()) {
if (start != offset)
reset_trimming_bitmap(ctl, offset);
ret = -ERESTARTSYS;
diff --git a/fs/btrfs/free-space-cache.h b/fs/btrfs/free-space-cache.h
index 83774bfd7b3b..9f1dbfdee8ca 100644
--- a/fs/btrfs/free-space-cache.h
+++ b/fs/btrfs/free-space-cache.h
@@ -10,6 +10,7 @@
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/mutex.h>
+#include <linux/freezer.h>
#include "fs.h"
struct inode;
@@ -56,6 +57,11 @@ static inline bool btrfs_free_space_trimming_bitmap(
return (info->trim_state == BTRFS_TRIM_STATE_TRIMMING);
}
+static inline bool btrfs_trim_interrupted(void)
+{
+ return fatal_signal_pending(current) || freezing(current);
+}
+
/*
* Deltas are an effective way to populate global statistics. Give macro names
* to make it clear what we're doing. An example is discard_extents in
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a99fcb0158978ed332009449b484e5f3ca2d7df4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101458-lyrically-defile-5bc8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a99fcb015897 ("btrfs: split remaining space to discard in chunks")
29e70be261d9 ("btrfs: use SECTOR_SHIFT to convert physical offset to LBA")
b7d463a1d125 ("btrfs: store a pointer to the original btrfs_bio in struct compressed_bio")
690834e47cf7 ("btrfs: pass a btrfs_bio to btrfs_submit_compressed_read")
7edb9a3e7200 ("btrfs: move zero filling of compressed read bios into common code")
32586c5bca72 ("btrfs: factor out a btrfs_free_compressed_pages helper")
10e924bc320a ("btrfs: factor out a btrfs_add_compressed_bio_pages helper")
d7294e4deeb9 ("btrfs: use the bbio file offset in add_ra_bio_pages")
e7aff33e3161 ("btrfs: use the bbio file offset in btrfs_submit_compressed_read")
798c9fc74d03 ("btrfs: remove redundant free_extent_map in btrfs_submit_compressed_read")
544fe4a903ce ("btrfs: embed a btrfs_bio into struct compressed_bio")
d5e4377d5051 ("btrfs: split zone append bios in btrfs_submit_bio")
35a8d7da3ca8 ("btrfs: remove now spurious bio submission helpers")
285599b6fe15 ("btrfs: remove the fs_info argument to btrfs_submit_bio")
48253076c3a9 ("btrfs: open code submit_encoded_read_bio")
30493ff49f81 ("btrfs: remove stripe boundary calculation for compressed I/O")
2380220e1e13 ("btrfs: remove stripe boundary calculation for buffered I/O")
67d669825090 ("btrfs: pass the iomap bio to btrfs_submit_bio")
852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios")
69ccf3f4244a ("btrfs: handle recording of zoned writes in the storage layer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a99fcb0158978ed332009449b484e5f3ca2d7df4 Mon Sep 17 00:00:00 2001
From: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Date: Tue, 17 Sep 2024 22:33:04 +0200
Subject: [PATCH] btrfs: split remaining space to discard in chunks
Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
mostly empty although we will do the split according to our super block
locations, the last super block ends at 256G, we can submit a huge
discard for the range [256G, 8T), causing a large delay.
Split the space left to discard based on BTRFS_MAX_DISCARD_CHUNK_SIZE in
preparation of introduction of cancellation points to trim. The value
of the chunk size is arbitrary, it can be higher or derived from actual
device capabilities but we can't easily read that using
bio_discard_limit().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
CC: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a5966324607d..ad70548d1f72 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1300,13 +1300,24 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
bytes_left = end - start;
}
- if (bytes_left) {
+ while (bytes_left) {
+ u64 bytes_to_discard = min(BTRFS_MAX_DISCARD_CHUNK_SIZE, bytes_left);
+
ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
- bytes_left >> SECTOR_SHIFT,
+ bytes_to_discard >> SECTOR_SHIFT,
GFP_NOFS);
- if (!ret)
- *discarded_bytes += bytes_left;
+
+ if (ret) {
+ if (ret != -EOPNOTSUPP)
+ break;
+ continue;
+ }
+
+ start += bytes_to_discard;
+ bytes_left -= bytes_to_discard;
+ *discarded_bytes += bytes_to_discard;
}
+
return ret;
}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 03d2d60afe0c..4481575dd70f 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -30,6 +30,12 @@ struct btrfs_zoned_device_info;
#define BTRFS_MAX_DATA_CHUNK_SIZE (10ULL * SZ_1G)
+/*
+ * Arbitratry maximum size of one discard request to limit potentially long time
+ * spent in blkdev_issue_discard().
+ */
+#define BTRFS_MAX_DISCARD_CHUNK_SIZE (SZ_1G)
+
extern struct mutex uuid_mutex;
#define BTRFS_STRIPE_LEN SZ_64K
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a99fcb0158978ed332009449b484e5f3ca2d7df4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101454-pouncing-unrobed-994a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
a99fcb015897 ("btrfs: split remaining space to discard in chunks")
29e70be261d9 ("btrfs: use SECTOR_SHIFT to convert physical offset to LBA")
b7d463a1d125 ("btrfs: store a pointer to the original btrfs_bio in struct compressed_bio")
690834e47cf7 ("btrfs: pass a btrfs_bio to btrfs_submit_compressed_read")
7edb9a3e7200 ("btrfs: move zero filling of compressed read bios into common code")
32586c5bca72 ("btrfs: factor out a btrfs_free_compressed_pages helper")
10e924bc320a ("btrfs: factor out a btrfs_add_compressed_bio_pages helper")
d7294e4deeb9 ("btrfs: use the bbio file offset in add_ra_bio_pages")
e7aff33e3161 ("btrfs: use the bbio file offset in btrfs_submit_compressed_read")
798c9fc74d03 ("btrfs: remove redundant free_extent_map in btrfs_submit_compressed_read")
544fe4a903ce ("btrfs: embed a btrfs_bio into struct compressed_bio")
d5e4377d5051 ("btrfs: split zone append bios in btrfs_submit_bio")
35a8d7da3ca8 ("btrfs: remove now spurious bio submission helpers")
285599b6fe15 ("btrfs: remove the fs_info argument to btrfs_submit_bio")
48253076c3a9 ("btrfs: open code submit_encoded_read_bio")
30493ff49f81 ("btrfs: remove stripe boundary calculation for compressed I/O")
2380220e1e13 ("btrfs: remove stripe boundary calculation for buffered I/O")
67d669825090 ("btrfs: pass the iomap bio to btrfs_submit_bio")
852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios")
69ccf3f4244a ("btrfs: handle recording of zoned writes in the storage layer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a99fcb0158978ed332009449b484e5f3ca2d7df4 Mon Sep 17 00:00:00 2001
From: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Date: Tue, 17 Sep 2024 22:33:04 +0200
Subject: [PATCH] btrfs: split remaining space to discard in chunks
Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
mostly empty although we will do the split according to our super block
locations, the last super block ends at 256G, we can submit a huge
discard for the range [256G, 8T), causing a large delay.
Split the space left to discard based on BTRFS_MAX_DISCARD_CHUNK_SIZE in
preparation of introduction of cancellation points to trim. The value
of the chunk size is arbitrary, it can be higher or derived from actual
device capabilities but we can't easily read that using
bio_discard_limit().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
CC: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Luca Stefani <luca.stefani.ge1(a)gmail.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a5966324607d..ad70548d1f72 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1300,13 +1300,24 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
bytes_left = end - start;
}
- if (bytes_left) {
+ while (bytes_left) {
+ u64 bytes_to_discard = min(BTRFS_MAX_DISCARD_CHUNK_SIZE, bytes_left);
+
ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
- bytes_left >> SECTOR_SHIFT,
+ bytes_to_discard >> SECTOR_SHIFT,
GFP_NOFS);
- if (!ret)
- *discarded_bytes += bytes_left;
+
+ if (ret) {
+ if (ret != -EOPNOTSUPP)
+ break;
+ continue;
+ }
+
+ start += bytes_to_discard;
+ bytes_left -= bytes_to_discard;
+ *discarded_bytes += bytes_to_discard;
}
+
return ret;
}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 03d2d60afe0c..4481575dd70f 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -30,6 +30,12 @@ struct btrfs_zoned_device_info;
#define BTRFS_MAX_DATA_CHUNK_SIZE (10ULL * SZ_1G)
+/*
+ * Arbitratry maximum size of one discard request to limit potentially long time
+ * spent in blkdev_issue_discard().
+ */
+#define BTRFS_MAX_DISCARD_CHUNK_SIZE (SZ_1G)
+
extern struct mutex uuid_mutex;
#define BTRFS_STRIPE_LEN SZ_64K
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 76503e1fa1a53ef041a120825d5ce81c7fe7bdd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101402-aspirate-saline-b8e9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
76503e1fa1a5 ("selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76503e1fa1a53ef041a120825d5ce81c7fe7bdd7 Mon Sep 17 00:00:00 2001
From: Donet Tom <donettom(a)linux.ibm.com>
Date: Fri, 27 Sep 2024 00:07:52 -0500
Subject: [PATCH] selftests/mm: fix incorrect buffer->mirror size in hmm2
double_map test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The hmm2 double_map test was failing due to an incorrect buffer->mirror
size. The buffer->mirror size was 6, while buffer->ptr size was 6 *
PAGE_SIZE. The test failed because the kernel's copy_to_user function was
attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror. Since the
size of buffer->mirror was incorrect, copy_to_user failed.
This patch corrects the buffer->mirror size to 6 * PAGE_SIZE.
Test Result without this patch
==============================
# RUN hmm2.hmm2_device_private.double_map ...
# hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0)
# double_map: Test terminated by assertion
# FAIL hmm2.hmm2_device_private.double_map
not ok 53 hmm2.hmm2_device_private.double_map
Test Result with this patch
===========================
# RUN hmm2.hmm2_device_private.double_map ...
# OK hmm2.hmm2_device_private.double_map
ok 53 hmm2.hmm2_device_private.double_map
Link: https://lkml.kernel.org/r/20240927050752.51066-1-donettom@linux.ibm.com
Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM")
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c
index d2cfc9b494a0..141bf63cbe05 100644
--- a/tools/testing/selftests/mm/hmm-tests.c
+++ b/tools/testing/selftests/mm/hmm-tests.c
@@ -1657,7 +1657,7 @@ TEST_F(hmm2, double_map)
buffer->fd = -1;
buffer->size = size;
- buffer->mirror = malloc(npages);
+ buffer->mirror = malloc(size);
ASSERT_NE(buffer->mirror, NULL);
/* Reserve a range of addresses. */
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 76503e1fa1a53ef041a120825d5ce81c7fe7bdd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101401-envelope-repurpose-0c02@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
76503e1fa1a5 ("selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76503e1fa1a53ef041a120825d5ce81c7fe7bdd7 Mon Sep 17 00:00:00 2001
From: Donet Tom <donettom(a)linux.ibm.com>
Date: Fri, 27 Sep 2024 00:07:52 -0500
Subject: [PATCH] selftests/mm: fix incorrect buffer->mirror size in hmm2
double_map test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The hmm2 double_map test was failing due to an incorrect buffer->mirror
size. The buffer->mirror size was 6, while buffer->ptr size was 6 *
PAGE_SIZE. The test failed because the kernel's copy_to_user function was
attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror. Since the
size of buffer->mirror was incorrect, copy_to_user failed.
This patch corrects the buffer->mirror size to 6 * PAGE_SIZE.
Test Result without this patch
==============================
# RUN hmm2.hmm2_device_private.double_map ...
# hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0)
# double_map: Test terminated by assertion
# FAIL hmm2.hmm2_device_private.double_map
not ok 53 hmm2.hmm2_device_private.double_map
Test Result with this patch
===========================
# RUN hmm2.hmm2_device_private.double_map ...
# OK hmm2.hmm2_device_private.double_map
ok 53 hmm2.hmm2_device_private.double_map
Link: https://lkml.kernel.org/r/20240927050752.51066-1-donettom@linux.ibm.com
Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM")
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c
index d2cfc9b494a0..141bf63cbe05 100644
--- a/tools/testing/selftests/mm/hmm-tests.c
+++ b/tools/testing/selftests/mm/hmm-tests.c
@@ -1657,7 +1657,7 @@ TEST_F(hmm2, double_map)
buffer->fd = -1;
buffer->size = size;
- buffer->mirror = malloc(npages);
+ buffer->mirror = malloc(size);
ASSERT_NE(buffer->mirror, NULL);
/* Reserve a range of addresses. */
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 76503e1fa1a53ef041a120825d5ce81c7fe7bdd7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101400-sensually-talon-9006@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
76503e1fa1a5 ("selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76503e1fa1a53ef041a120825d5ce81c7fe7bdd7 Mon Sep 17 00:00:00 2001
From: Donet Tom <donettom(a)linux.ibm.com>
Date: Fri, 27 Sep 2024 00:07:52 -0500
Subject: [PATCH] selftests/mm: fix incorrect buffer->mirror size in hmm2
double_map test
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The hmm2 double_map test was failing due to an incorrect buffer->mirror
size. The buffer->mirror size was 6, while buffer->ptr size was 6 *
PAGE_SIZE. The test failed because the kernel's copy_to_user function was
attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror. Since the
size of buffer->mirror was incorrect, copy_to_user failed.
This patch corrects the buffer->mirror size to 6 * PAGE_SIZE.
Test Result without this patch
==============================
# RUN hmm2.hmm2_device_private.double_map ...
# hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0)
# double_map: Test terminated by assertion
# FAIL hmm2.hmm2_device_private.double_map
not ok 53 hmm2.hmm2_device_private.double_map
Test Result with this patch
===========================
# RUN hmm2.hmm2_device_private.double_map ...
# OK hmm2.hmm2_device_private.double_map
ok 53 hmm2.hmm2_device_private.double_map
Link: https://lkml.kernel.org/r/20240927050752.51066-1-donettom@linux.ibm.com
Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM")
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c
index d2cfc9b494a0..141bf63cbe05 100644
--- a/tools/testing/selftests/mm/hmm-tests.c
+++ b/tools/testing/selftests/mm/hmm-tests.c
@@ -1657,7 +1657,7 @@ TEST_F(hmm2, double_map)
buffer->fd = -1;
buffer->size = size;
- buffer->mirror = malloc(npages);
+ buffer->mirror = malloc(size);
ASSERT_NE(buffer->mirror, NULL);
/* Reserve a range of addresses. */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4cc2718f621a6a57a02581125bb6d914ce74d23b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101406-gallows-construct-02e9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
4cc2718f621a ("drm/i915/hdcp: fix connector refcounting")
848a4e5c096d ("drm/i915: add a dedicated workqueue inside drm_i915_private")
f48eab290287 ("drm/i915/dp: Add link training debug and error printing helpers")
f60500f31e99 ("drm/i915/display/dp: 128/132b LT requirement")
40053823baad ("drm/i915/display: move modeset probe/remove functions to intel_display_driver.c")
15e4f0b541d4 ("drm/i915/display: rename intel_modeset_probe_defer() -> intel_display_driver_probe_defer()")
ff2c80be1a00 ("drm/i915/display: move intel_modeset_probe_defer() to intel_display_driver.[ch]")
77316e755213 ("drm/i915/display: start high level display driver file")
99cfbed19d06 ("drm/i915/vrr: Relocate VRR enable/disable")
ecaeecea9263 ("drm/i915/vrr: Tell intel_crtc_update_active_timings() about VRR explicitly")
fa9e4fce52ec ("drm/i915/vrr: Make delayed vblank operational in VRR mode on adl/dg2")
b25e07419fee ("drm/i915/vrr: Eliminate redundant function arguments")
6a9856075563 ("drm/i915: Generalize planes_{enabling,disabling}()")
c5de248484af ("drm/i915/dpt: Add a modparam to disable DPT via the chicken bit")
5a08585d38d6 ("drm/i915: Add PLANE_CHICKEN registers")
1a324a40b452 ("i915/display/dp: SDP CRC16 for 128b132b link layer")
b5202a93cd37 ("drm/i915: Extract intel_crtc_scanline_offset()")
84f4ebe8c1ab ("drm/i915: Relocate intel_crtc_update_active_timings()")
6e8acb6686d8 ("drm/i915: Add belts and suspenders locking for seamless M/N changes")
8cb1f95cca68 ("drm/i915: Update vblank timestamping stuff on seamless M/N change")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4cc2718f621a6a57a02581125bb6d914ce74d23b Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Tue, 24 Sep 2024 18:30:22 +0300
Subject: [PATCH] drm/i915/hdcp: fix connector refcounting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We acquire a connector reference before scheduling an HDCP prop work,
and expect the work function to release the reference.
However, if the work was already queued, it won't be queued multiple
times, and the reference is not dropped.
Release the reference immediately if the work was already queued.
Fixes: a6597faa2d59 ("drm/i915: Protect workers against disappearing connectors")
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: Suraj Kandpal <suraj.kandpal(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.10+
Reviewed-by: Suraj Kandpal <suraj.kandpal(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240924153022.2255299-1-jani…
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit abc0742c79bdb3b164eacab24aea0916d2ec1cb5)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c b/drivers/gpu/drm/i915/display/intel_hdcp.c
index 6980b98792c2..377939de0ff4 100644
--- a/drivers/gpu/drm/i915/display/intel_hdcp.c
+++ b/drivers/gpu/drm/i915/display/intel_hdcp.c
@@ -1094,7 +1094,8 @@ static void intel_hdcp_update_value(struct intel_connector *connector,
hdcp->value = value;
if (update_property) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
}
}
@@ -2524,7 +2525,8 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
mutex_lock(&hdcp->mutex);
hdcp->value = DRM_MODE_CONTENT_PROTECTION_DESIRED;
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
mutex_unlock(&hdcp->mutex);
}
@@ -2541,7 +2543,9 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
*/
if (!desired_and_not_enabled && !content_protection_type_changed) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
+
}
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4cc2718f621a6a57a02581125bb6d914ce74d23b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101405-upstate-waggle-385b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
4cc2718f621a ("drm/i915/hdcp: fix connector refcounting")
848a4e5c096d ("drm/i915: add a dedicated workqueue inside drm_i915_private")
f48eab290287 ("drm/i915/dp: Add link training debug and error printing helpers")
f60500f31e99 ("drm/i915/display/dp: 128/132b LT requirement")
40053823baad ("drm/i915/display: move modeset probe/remove functions to intel_display_driver.c")
15e4f0b541d4 ("drm/i915/display: rename intel_modeset_probe_defer() -> intel_display_driver_probe_defer()")
ff2c80be1a00 ("drm/i915/display: move intel_modeset_probe_defer() to intel_display_driver.[ch]")
77316e755213 ("drm/i915/display: start high level display driver file")
99cfbed19d06 ("drm/i915/vrr: Relocate VRR enable/disable")
ecaeecea9263 ("drm/i915/vrr: Tell intel_crtc_update_active_timings() about VRR explicitly")
fa9e4fce52ec ("drm/i915/vrr: Make delayed vblank operational in VRR mode on adl/dg2")
b25e07419fee ("drm/i915/vrr: Eliminate redundant function arguments")
6a9856075563 ("drm/i915: Generalize planes_{enabling,disabling}()")
c5de248484af ("drm/i915/dpt: Add a modparam to disable DPT via the chicken bit")
5a08585d38d6 ("drm/i915: Add PLANE_CHICKEN registers")
1a324a40b452 ("i915/display/dp: SDP CRC16 for 128b132b link layer")
b5202a93cd37 ("drm/i915: Extract intel_crtc_scanline_offset()")
84f4ebe8c1ab ("drm/i915: Relocate intel_crtc_update_active_timings()")
6e8acb6686d8 ("drm/i915: Add belts and suspenders locking for seamless M/N changes")
8cb1f95cca68 ("drm/i915: Update vblank timestamping stuff on seamless M/N change")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4cc2718f621a6a57a02581125bb6d914ce74d23b Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Tue, 24 Sep 2024 18:30:22 +0300
Subject: [PATCH] drm/i915/hdcp: fix connector refcounting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We acquire a connector reference before scheduling an HDCP prop work,
and expect the work function to release the reference.
However, if the work was already queued, it won't be queued multiple
times, and the reference is not dropped.
Release the reference immediately if the work was already queued.
Fixes: a6597faa2d59 ("drm/i915: Protect workers against disappearing connectors")
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: Suraj Kandpal <suraj.kandpal(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.10+
Reviewed-by: Suraj Kandpal <suraj.kandpal(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240924153022.2255299-1-jani…
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit abc0742c79bdb3b164eacab24aea0916d2ec1cb5)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c b/drivers/gpu/drm/i915/display/intel_hdcp.c
index 6980b98792c2..377939de0ff4 100644
--- a/drivers/gpu/drm/i915/display/intel_hdcp.c
+++ b/drivers/gpu/drm/i915/display/intel_hdcp.c
@@ -1094,7 +1094,8 @@ static void intel_hdcp_update_value(struct intel_connector *connector,
hdcp->value = value;
if (update_property) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
}
}
@@ -2524,7 +2525,8 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
mutex_lock(&hdcp->mutex);
hdcp->value = DRM_MODE_CONTENT_PROTECTION_DESIRED;
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
mutex_unlock(&hdcp->mutex);
}
@@ -2541,7 +2543,9 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
*/
if (!desired_and_not_enabled && !content_protection_type_changed) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
+
}
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4cc2718f621a6a57a02581125bb6d914ce74d23b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101404-ferment-uptight-0034@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
4cc2718f621a ("drm/i915/hdcp: fix connector refcounting")
848a4e5c096d ("drm/i915: add a dedicated workqueue inside drm_i915_private")
f48eab290287 ("drm/i915/dp: Add link training debug and error printing helpers")
f60500f31e99 ("drm/i915/display/dp: 128/132b LT requirement")
40053823baad ("drm/i915/display: move modeset probe/remove functions to intel_display_driver.c")
15e4f0b541d4 ("drm/i915/display: rename intel_modeset_probe_defer() -> intel_display_driver_probe_defer()")
ff2c80be1a00 ("drm/i915/display: move intel_modeset_probe_defer() to intel_display_driver.[ch]")
77316e755213 ("drm/i915/display: start high level display driver file")
99cfbed19d06 ("drm/i915/vrr: Relocate VRR enable/disable")
ecaeecea9263 ("drm/i915/vrr: Tell intel_crtc_update_active_timings() about VRR explicitly")
fa9e4fce52ec ("drm/i915/vrr: Make delayed vblank operational in VRR mode on adl/dg2")
b25e07419fee ("drm/i915/vrr: Eliminate redundant function arguments")
6a9856075563 ("drm/i915: Generalize planes_{enabling,disabling}()")
c5de248484af ("drm/i915/dpt: Add a modparam to disable DPT via the chicken bit")
5a08585d38d6 ("drm/i915: Add PLANE_CHICKEN registers")
1a324a40b452 ("i915/display/dp: SDP CRC16 for 128b132b link layer")
b5202a93cd37 ("drm/i915: Extract intel_crtc_scanline_offset()")
84f4ebe8c1ab ("drm/i915: Relocate intel_crtc_update_active_timings()")
6e8acb6686d8 ("drm/i915: Add belts and suspenders locking for seamless M/N changes")
8cb1f95cca68 ("drm/i915: Update vblank timestamping stuff on seamless M/N change")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4cc2718f621a6a57a02581125bb6d914ce74d23b Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Tue, 24 Sep 2024 18:30:22 +0300
Subject: [PATCH] drm/i915/hdcp: fix connector refcounting
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We acquire a connector reference before scheduling an HDCP prop work,
and expect the work function to release the reference.
However, if the work was already queued, it won't be queued multiple
times, and the reference is not dropped.
Release the reference immediately if the work was already queued.
Fixes: a6597faa2d59 ("drm/i915: Protect workers against disappearing connectors")
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: Suraj Kandpal <suraj.kandpal(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.10+
Reviewed-by: Suraj Kandpal <suraj.kandpal(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240924153022.2255299-1-jani…
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit abc0742c79bdb3b164eacab24aea0916d2ec1cb5)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c b/drivers/gpu/drm/i915/display/intel_hdcp.c
index 6980b98792c2..377939de0ff4 100644
--- a/drivers/gpu/drm/i915/display/intel_hdcp.c
+++ b/drivers/gpu/drm/i915/display/intel_hdcp.c
@@ -1094,7 +1094,8 @@ static void intel_hdcp_update_value(struct intel_connector *connector,
hdcp->value = value;
if (update_property) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
}
}
@@ -2524,7 +2525,8 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
mutex_lock(&hdcp->mutex);
hdcp->value = DRM_MODE_CONTENT_PROTECTION_DESIRED;
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
mutex_unlock(&hdcp->mutex);
}
@@ -2541,7 +2543,9 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state,
*/
if (!desired_and_not_enabled && !content_protection_type_changed) {
drm_connector_get(&connector->base);
- queue_work(i915->unordered_wq, &hdcp->prop_work);
+ if (!queue_work(i915->unordered_wq, &hdcp->prop_work))
+ drm_connector_put(&connector->base);
+
}
}
Hi,
I just noticed this series' `Fixes:` tag was dropped when it was
committed. However, we believe it should be considered for backporting
to stable, so let this be a heads-up to the stable team.
Bence
On 2024. 08. 14. 4:30, patchwork-bot+netdevbpf(a)kernel.org wrote:
> Hello:
>
> This series was applied to netdev/net-next.git (main)
> by Jakub Kicinski <kuba(a)kernel.org>:
>
> On Mon, 12 Aug 2024 11:47:13 +0200 you wrote:
>> This function is used in `fec_ptp_enable_pps()` through
>> struct cyclecounter read(). Moving the declaration makes
>> it clearer, what's happening.
>>
>> Fixes: 61d5e2a251fb ("fec: Fix timer capture timing in `fec_ptp_enable_pps()`")
>> Suggested-by: Frank Li <Frank.li(a)nxp.com>
>> Link: https://lore.kernel.org/netdev/20240805144754.2384663-1-csokas.bence@prolan…
>> Signed-off-by: Csókás, Bence <csokas.bence(a)prolan.hu>
>>
>> [...]
>
> Here is the summary with links:
> - [v3,net-next,1/2] net: fec: Move `fec_ptp_read()` to the top of the file
> https://git.kernel.org/netdev/net-next/c/4374a1fe580a
> - [v3,net-next,2/2] net: fec: Remove duplicated code
> https://git.kernel.org/netdev/net-next/c/713ebaed68d8
>
> You are awesome, thank you!
The struct vchiq_arm_state 'platform_state' is currently allocated
dynamically using kzalloc(). Unfortunately, it is never freed and is
subjected to memory leaks in the error handling paths of the probe()
function.
To address the issue, use device resource management helper
devm_kzalloc(), to ensure cleanup after its allocation.
Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com>
---
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
index af623ad87c15..7ece82c361ee 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
@@ -285,7 +285,7 @@ vchiq_platform_init_state(struct vchiq_state *state)
{
struct vchiq_arm_state *platform_state;
- platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL);
+ platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL);
if (!platform_state)
return -ENOMEM;
--
2.45.2
On 2024/10/11 8:17, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> Revert "mm/filemap: avoid buffered read/write race to read inconsistent data"
>
> to the 6.6-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> revert-mm-filemap-avoid-buffered-read-write-race-to-.patch
> and it can be found in the queue-6.6 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
Hi Sasha,
The current patch is a cleanup after adding
smp_load_acquire/store_release() to i_size_read/write().
In my opinion stable versions continue to use smp_rmb just fine,
no need to backport the current patch to stable.
In addition, the current patch belongs to a patch set:
https://lore.kernel.org/all/20240124142857.4146716-1-libaokun1@huawei.com/
If the current patch does need to be backported to stable, then the
entire patch set needs to be backported or problems will be introduced.
All the patches in the patch set are listed below:
d8f899d13d72 ("fs: make the i_size_read/write helpers be
smp_load_acquire/store_release()")
4b944f8ef996 ("Revert "mm/filemap: avoid buffered read/write race to
read inconsistent data"")
ad72872eb3ae ("asm-generic: remove extra type checking in
acquire/release for non-SMP case")
Thanks,
Baokun
>
> commit d87bcb0965636a9b665231560ce147d06a7a18d8
> Author: Baokun Li <libaokun1(a)huawei.com>
> Date: Wed Jan 24 22:28:56 2024 +0800
>
> Revert "mm/filemap: avoid buffered read/write race to read inconsistent data"
>
> [ Upstream commit 4b944f8ef99641d5af287c7d9df91d20ef5d3e88 ]
>
> This reverts commit e2c27b803bb6 ("mm/filemap: avoid buffered read/write
> race to read inconsistent data"). After making the i_size_read/write
> helpers be smp_load_acquire/store_release(), it is already guaranteed that
> changes to page contents are visible before we see increased inode size,
> so the extra smp_rmb() in filemap_read() can be removed.
>
> Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
> Link: https://lore.kernel.org/r/20240124142857.4146716-3-libaokun1@huawei.com
> Signed-off-by: Christian Brauner <brauner(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index e6c112f3a211f..cd028f3be6026 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2694,15 +2694,6 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
> goto put_folios;
> end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
>
> - /*
> - * Pairs with a barrier in
> - * block_write_end()->mark_buffer_dirty() or other page
> - * dirtying routines like iomap_write_end() to ensure
> - * changes to page contents are visible before we see
> - * increased inode size.
> - */
> - smp_rmb();
> -
> /*
> * Once we start copying data, we don't want to be touching any
> * cachelines that might be contended:
This series refactors some useless goto instructions as a preparation
for the fix of a missing of_node_put() by means of the cleanup
attribute.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Changes in v2:
- Refactor vchiq_probe() to remove goto instructions.
- Declare and initialize the node right before its first usage.
- Link to v1: https://lore.kernel.org/r/20241013-vchiq_arm-of_node_put-v1-1-f72b2a6e47d0@…
---
Javier Carrasco (2):
staging: vchiq_arm: refactor goto instructions in vchiq_probe()
staging: vchiq_arm: Fix missing refcount decrement in error path for fw_node
.../vc04_services/interface/vchiq_arm/vchiq_arm.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
---
base-commit: d61a00525464bfc5fe92c6ad713350988e492b88
change-id: 20241013-vchiq_arm-of_node_put-60a5eaaafd70
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Supposing first scenario with a virtio_blk driver.
CPU0 CPU1
blk_mq_try_issue_directly()
__blk_mq_issue_directly()
q->mq_ops->queue_rq()
virtio_queue_rq()
blk_mq_stop_hw_queue()
virtblk_done()
blk_mq_request_bypass_insert() 1) store
blk_mq_start_stopped_hw_queue()
clear_bit(BLK_MQ_S_STOPPED) 3) store
blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending()) 4) load
return
blk_mq_sched_dispatch_requests()
blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending())
return
blk_mq_sched_dispatch_requests()
if (blk_mq_hctx_stopped()) 2) load
return
__blk_mq_sched_dispatch_requests()
Supposing another scenario.
CPU0 CPU1
blk_mq_requeue_work()
blk_mq_insert_request() 1) store
virtblk_done()
blk_mq_start_stopped_hw_queue()
blk_mq_run_hw_queues() clear_bit(BLK_MQ_S_STOPPED) 3) store
blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending()) 4) load
return
blk_mq_sched_dispatch_requests()
if (blk_mq_hctx_stopped()) 2) load
continue
blk_mq_run_hw_queue()
Both scenarios are similar, the full memory barrier should be inserted
between 1) and 2), as well as between 3) and 4) to make sure that either
CPU0 sees BLK_MQ_S_STOPPED is cleared or CPU1 sees dispatch list.
Otherwise, either CPU will not rerun the hardware queue causing starvation
of the request.
The easy way to fix it is to add the essential full memory barrier into
helper of blk_mq_hctx_stopped(). In order to not affect the fast path
(hardware queue is not stopped most of the time), we only insert the
barrier into the slow path. Actually, only slow path needs to care about
missing of dispatching the request to the low-level device driver.
Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism")
Cc: stable(a)vger.kernel.org
Cc: Muchun Song <muchun.song(a)linux.dev>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-mq.c | 6 ++++++
block/blk-mq.h | 13 +++++++++++++
2 files changed, 19 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ff6df6c7eeb25..b90c1680cb780 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2413,6 +2413,12 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
return;
clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
+ /*
+ * Pairs with the smp_mb() in blk_mq_hctx_stopped() to order the
+ * clearing of BLK_MQ_S_STOPPED above and the checking of dispatch
+ * list in the subsequent routine.
+ */
+ smp_mb__after_atomic();
blk_mq_run_hw_queue(hctx, async);
}
EXPORT_SYMBOL_GPL(blk_mq_start_stopped_hw_queue);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 260beea8e332c..f36f3bff70d86 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -228,6 +228,19 @@ static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data
static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
{
+ /* Fast path: hardware queue is not stopped most of the time. */
+ if (likely(!test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+ return false;
+
+ /*
+ * This barrier is used to order adding of dispatch list before and
+ * the test of BLK_MQ_S_STOPPED below. Pairs with the memory barrier
+ * in blk_mq_start_stopped_hw_queue() so that dispatch code could
+ * either see BLK_MQ_S_STOPPED is cleared or dispatch list is not
+ * empty to avoid missing dispatching requests.
+ */
+ smp_mb();
+
return test_bit(BLK_MQ_S_STOPPED, &hctx->state);
}
--
2.20.1
Supposing the following scenario.
CPU0 CPU1
blk_mq_insert_request() 1) store
blk_mq_unquiesce_queue()
blk_queue_flag_clear() 3) store
blk_mq_run_hw_queues()
blk_mq_run_hw_queue()
if (!blk_mq_hctx_has_pending()) 4) load
return
blk_mq_run_hw_queue()
if (blk_queue_quiesced()) 2) load
return
blk_mq_sched_dispatch_requests()
The full memory barrier should be inserted between 1) and 2), as well as
between 3) and 4) to make sure that either CPU0 sees QUEUE_FLAG_QUIESCED is
cleared or CPU1 sees dispatch list or setting of bitmap of software queue.
Otherwise, either CPU will not rerun the hardware queue causing starvation.
So the first solution is to 1) add a pair of memory barrier to fix the
problem, another solution is to 2) use hctx->queue->queue_lock to
synchronize QUEUE_FLAG_QUIESCED. Here, we chose 2) to fix it since memory
barrier is not easy to be maintained.
Fixes: f4560ffe8cec ("blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue")
Cc: stable(a)vger.kernel.org
Cc: Muchun Song <muchun.song(a)linux.dev>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-mq.c | 47 ++++++++++++++++++++++++++++++++++-------------
1 file changed, 34 insertions(+), 13 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b2d0f22de0c7f..ff6df6c7eeb25 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2202,6 +2202,24 @@ void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs)
}
EXPORT_SYMBOL(blk_mq_delay_run_hw_queue);
+static inline bool blk_mq_hw_queue_need_run(struct blk_mq_hw_ctx *hctx)
+{
+ bool need_run;
+
+ /*
+ * When queue is quiesced, we may be switching io scheduler, or
+ * updating nr_hw_queues, or other things, and we can't run queue
+ * any more, even blk_mq_hctx_has_pending() can't be called safely.
+ *
+ * And queue will be rerun in blk_mq_unquiesce_queue() if it is
+ * quiesced.
+ */
+ __blk_mq_run_dispatch_ops(hctx->queue, false,
+ need_run = !blk_queue_quiesced(hctx->queue) &&
+ blk_mq_hctx_has_pending(hctx));
+ return need_run;
+}
+
/**
* blk_mq_run_hw_queue - Start to run a hardware queue.
* @hctx: Pointer to the hardware queue to run.
@@ -2222,20 +2240,23 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING);
- /*
- * When queue is quiesced, we may be switching io scheduler, or
- * updating nr_hw_queues, or other things, and we can't run queue
- * any more, even __blk_mq_hctx_has_pending() can't be called safely.
- *
- * And queue will be rerun in blk_mq_unquiesce_queue() if it is
- * quiesced.
- */
- __blk_mq_run_dispatch_ops(hctx->queue, false,
- need_run = !blk_queue_quiesced(hctx->queue) &&
- blk_mq_hctx_has_pending(hctx));
+ need_run = blk_mq_hw_queue_need_run(hctx);
+ if (!need_run) {
+ unsigned long flags;
- if (!need_run)
- return;
+ /*
+ * Synchronize with blk_mq_unquiesce_queue(), because we check
+ * if hw queue is quiesced locklessly above, we need the use
+ * ->queue_lock to make sure we see the up-to-date status to
+ * not miss rerunning the hw queue.
+ */
+ spin_lock_irqsave(&hctx->queue->queue_lock, flags);
+ need_run = blk_mq_hw_queue_need_run(hctx);
+ spin_unlock_irqrestore(&hctx->queue->queue_lock, flags);
+
+ if (!need_run)
+ return;
+ }
if (async || !cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask)) {
blk_mq_delay_run_hw_queue(hctx, 0);
--
2.20.1
Supposing the following scenario with a virtio_blk driver.
CPU0 CPU1 CPU2
blk_mq_try_issue_directly()
__blk_mq_issue_directly()
q->mq_ops->queue_rq()
virtio_queue_rq()
blk_mq_stop_hw_queue()
virtblk_done()
blk_mq_try_issue_directly()
if (blk_mq_hctx_stopped())
blk_mq_request_bypass_insert() blk_mq_run_hw_queue()
blk_mq_run_hw_queue() blk_mq_run_hw_queue()
blk_mq_insert_request()
return
After CPU0 has marked the queue as stopped, CPU1 will see the queue is
stopped. But before CPU1 puts the request on the dispatch list, CPU2
receives the interrupt of completion of request, so it will run the
hardware queue and marks the queue as non-stopped. Meanwhile, CPU1 also
runs the same hardware queue. After both CPU1 and CPU2 complete
blk_mq_run_hw_queue(), CPU1 just puts the request to the same hardware
queue and returns. It misses dispatching a request. Fix it by running
the hardware queue explicitly. And blk_mq_request_issue_directly()
should handle a similar situation. Fix it as well.
Fixes: d964f04a8fde ("blk-mq: fix direct issue")
Cc: stable(a)vger.kernel.org
Cc: Muchun Song <muchun.song(a)linux.dev>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-mq.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e3c3c0c21b553..b2d0f22de0c7f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2619,6 +2619,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) {
blk_mq_insert_request(rq, 0);
+ blk_mq_run_hw_queue(hctx, false);
return;
}
@@ -2649,6 +2650,7 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
if (blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(rq->q)) {
blk_mq_insert_request(rq, 0);
+ blk_mq_run_hw_queue(hctx, false);
return BLK_STS_OK;
}
--
2.20.1
An error path was introduced without including the required call to
of_node_put() to decrement the node's refcount and avoid leaking memory.
If the call to kzalloc() for 'mgmt' fails, the probe returns without
decrementing the refcount.
Use the automatic cleanup facility to fix the bug and protect the code
against new error paths where the call to of_node_put() might be missing
again.
Cc: stable(a)vger.kernel.org
Fixes: 1c9e16b73166 ("staging: vc04_services: vchiq_arm: Split driver static and runtime data")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
index 27ceaac8f6cc..792cf3a807e1 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
@@ -1332,7 +1332,8 @@ MODULE_DEVICE_TABLE(of, vchiq_of_match);
static int vchiq_probe(struct platform_device *pdev)
{
- struct device_node *fw_node;
+ struct device_node *fw_node __free(device_node) =
+ of_find_compatible_node(NULL, NULL, "raspberrypi,bcm2835-firmware");
const struct vchiq_platform_info *info;
struct vchiq_drv_mgmt *mgmt;
int ret;
@@ -1341,8 +1342,6 @@ static int vchiq_probe(struct platform_device *pdev)
if (!info)
return -EINVAL;
- fw_node = of_find_compatible_node(NULL, NULL,
- "raspberrypi,bcm2835-firmware");
if (!fw_node) {
dev_err(&pdev->dev, "Missing firmware node\n");
return -ENOENT;
@@ -1353,7 +1352,6 @@ static int vchiq_probe(struct platform_device *pdev)
return -ENOMEM;
mgmt->fw = devm_rpi_firmware_get(&pdev->dev, fw_node);
- of_node_put(fw_node);
if (!mgmt->fw)
return -EPROBE_DEFER;
---
base-commit: d61a00525464bfc5fe92c6ad713350988e492b88
change-id: 20241013-vchiq_arm-of_node_put-60a5eaaafd70
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
When trying to use kunit for s390 with
./tools/testing/kunit/kunit.py run --arch=s390 --kunitconfig drivers/base/test --cross_compile=$CROSS_COMPILE
I ran into some bugs.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Thomas Weißschuh (2):
s390/sclp: deactivate sclp after all its users
s390/sclp_vt220: convert newlines to CRLF instead of LFCR
drivers/s390/char/sclp.c | 3 ++-
drivers/s390/char/sclp_vt220.c | 4 ++--
2 files changed, 4 insertions(+), 3 deletions(-)
---
base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
change-id: 20241014-s390-kunit-47cbc26a99e6
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Commit 808a40b69468 ("drm/fbdev-dma: Implement damage handling and
deferred I/O") added deferred I/O for fbdev-dma. Also select the
Kconfig symbol FB_DEFERRED_IO (via FB_DMAMEM_HELPERS_DEFERRED). Fixes
build errors about missing fbdefio, such as
drivers/gpu/drm/drm_fbdev_dma.c:218:26: error: 'struct drm_fb_helper' has no member named 'fbdefio'
218 | fb_helper->fbdefio.delay = HZ / 20;
| ^~
drivers/gpu/drm/drm_fbdev_dma.c:219:26: error: 'struct drm_fb_helper' has no member named 'fbdefio'
219 | fb_helper->fbdefio.deferred_io = drm_fb_helper_deferred_io;
| ^~
drivers/gpu/drm/drm_fbdev_dma.c:221:21: error: 'struct fb_info' has no member named 'fbdefio'
221 | info->fbdefio = &fb_helper->fbdefio;
| ^~
drivers/gpu/drm/drm_fbdev_dma.c:221:43: error: 'struct drm_fb_helper' has no member named 'fbdefio'
221 | info->fbdefio = &fb_helper->fbdefio;
| ^~
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410050241.Mox9QRjP-lkp@intel.com/
Fixes: 808a40b69468 ("drm/fbdev-dma: Implement damage handling and deferred I/O")
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v6.11+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt(a)intel.com>
---
drivers/gpu/drm/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 1df4e627e3d3..db2e206a117c 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -338,7 +338,7 @@ config DRM_TTM_HELPER
config DRM_GEM_DMA_HELPER
tristate
depends on DRM
- select FB_DMAMEM_HELPERS if DRM_FBDEV_EMULATION
+ select FB_DMAMEM_HELPERS_DEFERRED if DRM_FBDEV_EMULATION
help
Choose this if you need the GEM DMA helper functions
--
2.46.0
The new test case which checks non unique symbol kprobe_non_uniq_symbol.tc
failed because of missing kernel functionality support from commit
b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols").
Backport it and its fix commit to 5.4.y together. Resolved minor context change conflicts.
Andrii Nakryiko (1):
tracing/kprobes: Fix symbol counting logic by looking at modules as
well
Francis Laniel (1):
tracing/kprobes: Return EADDRNOTAVAIL when func matches several
symbols
kernel/trace/trace_kprobe.c | 76 +++++++++++++++++++++++++++++++++++++
kernel/trace/trace_probe.h | 1 +
2 files changed, 77 insertions(+)
--
2.46.0
From: Ma Ke <make24(a)iscas.ac.cn>
commit 3ff810b9bebe5578a245cfa97c252ab602e703f1 upstream.
Return devm_of_clk_add_hw_provider() in order to transfer the error, if it
fails due to resource allocation failure or device tree clock provider
registration failure.
Fixes: bdd229ab26be ("ASoC: rt5682s: Add driver for ALC5682I-VS codec")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
Link: https://patch.msgid.link/20240717115436.3449492-1-make24@iscas.ac.cn
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Vitaliy Shevtsov <v.shevtsov(a)maxima.ru>
---
sound/soc/codecs/rt5682s.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/sound/soc/codecs/rt5682s.c b/sound/soc/codecs/rt5682s.c
index 80c673aa14db..07d514b4ce70 100644
--- a/sound/soc/codecs/rt5682s.c
+++ b/sound/soc/codecs/rt5682s.c
@@ -2828,7 +2828,9 @@ static int rt5682s_register_dai_clks(struct snd_soc_component *component)
}
if (dev->of_node) {
- devm_of_clk_add_hw_provider(dev, of_clk_hw_simple_get, dai_clk_hw);
+ ret = devm_of_clk_add_hw_provider(dev, of_clk_hw_simple_get, dai_clk_hw);
+ if (ret)
+ return ret;
} else {
ret = devm_clk_hw_register_clkdev(dev, dai_clk_hw,
init.name, dev_name(dev));
--
2.46.2
The struct vchiq_arm_state 'platform_state' is currently allocated
dynamically using kzalloc(). Unfortunately, it is never freed and is
subjected to memory leaks in the error handling paths of the probe()
function.
To address the issue, use device resource management helper
devm_kzalloc(), to ensure cleanup after its allocation.
Cc: stable(a)vger.kernel.org
Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com>
---
drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
index 29e78700463f..146442a3552c 100644
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
+++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c
@@ -285,7 +285,7 @@ vchiq_platform_init_state(struct vchiq_state *state)
{
struct vchiq_arm_state *platform_state;
- platform_state = kzalloc(sizeof(*platform_state), GFP_KERNEL);
+ platform_state = devm_kzalloc(state->dev, sizeof(*platform_state), GFP_KERNEL);
if (!platform_state)
return -ENOMEM;
--
2.45.2
This patchset adds the dependencies to apply commit a6f88ac32c6e
("lockdep: fix deadlock issue between lockdep and rcu") to the
5.4-stable tree. See the "FAILED" report at [1].
Note the dependencies actually fix a UAF and a bad recursion pattern.
Thus it makes sense to also backport them.
[1] https://lore.kernel.org/all/2024100226-unselfish-triangle-e5eb@gregkh/
Peter Zijlstra (2):
locking/lockdep: Fix bad recursion pattern
locking/lockdep: Rework lockdep_lock
Waiman Long (1):
locking/lockdep: Avoid potential access of invalid memory in
lock_class
Zhiguo Niu (1):
lockdep: fix deadlock issue between lockdep and rcu
kernel/locking/lockdep.c | 215 +++++++++++++++++++++++----------------
1 file changed, 125 insertions(+), 90 deletions(-)
--
2.47.0.rc1.288.g06298d1525-goog
From: Daniel Palmer <daniel(a)0x0f.com>
[ Upstream commit 82c5b53140faf89c31ea2b3a0985a2f291694169 ]
Currently this driver prints this line with what looks like
a rogue format specifier when the device is probed:
[ 2.840000] eth%d: MVME147 at 0xfffe1800, irq 12, Hardware Address xx:xx:xx:xx:xx:xx
Change the printk() for netdev_info() and move it after the
registration has completed so it prints out the name of the
interface properly.
Signed-off-by: Daniel Palmer <daniel(a)0x0f.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/ethernet/amd/mvme147.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/amd/mvme147.c b/drivers/net/ethernet/amd/mvme147.c
index 0a920448522f3..0bb27f4dd6428 100644
--- a/drivers/net/ethernet/amd/mvme147.c
+++ b/drivers/net/ethernet/amd/mvme147.c
@@ -105,10 +105,6 @@ struct net_device * __init mvme147lance_probe(int unit)
address = address >> 8;
dev->dev_addr[3] = address&0xff;
- printk("%s: MVME147 at 0x%08lx, irq %d, Hardware Address %pM\n",
- dev->name, dev->base_addr, MVME147_LANCE_IRQ,
- dev->dev_addr);
-
lp = netdev_priv(dev);
lp->ram = __get_dma_pages(GFP_ATOMIC, 3); /* 32K */
if (!lp->ram) {
@@ -138,6 +134,9 @@ struct net_device * __init mvme147lance_probe(int unit)
return ERR_PTR(err);
}
+ netdev_info(dev, "MVME147 at 0x%08lx, irq %d, Hardware Address %pM\n",
+ dev->base_addr, MVME147_LANCE_IRQ, dev->dev_addr);
+
return dev;
}
--
2.43.0
From: Daniel Palmer <daniel(a)0x0f.com>
[ Upstream commit 82c5b53140faf89c31ea2b3a0985a2f291694169 ]
Currently this driver prints this line with what looks like
a rogue format specifier when the device is probed:
[ 2.840000] eth%d: MVME147 at 0xfffe1800, irq 12, Hardware Address xx:xx:xx:xx:xx:xx
Change the printk() for netdev_info() and move it after the
registration has completed so it prints out the name of the
interface properly.
Signed-off-by: Daniel Palmer <daniel(a)0x0f.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/ethernet/amd/mvme147.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/amd/mvme147.c b/drivers/net/ethernet/amd/mvme147.c
index 72abd3f82249b..eb7b1da2fb06f 100644
--- a/drivers/net/ethernet/amd/mvme147.c
+++ b/drivers/net/ethernet/amd/mvme147.c
@@ -106,10 +106,6 @@ struct net_device * __init mvme147lance_probe(int unit)
address = address >> 8;
dev->dev_addr[3] = address&0xff;
- printk("%s: MVME147 at 0x%08lx, irq %d, Hardware Address %pM\n",
- dev->name, dev->base_addr, MVME147_LANCE_IRQ,
- dev->dev_addr);
-
lp = netdev_priv(dev);
lp->ram = __get_dma_pages(GFP_ATOMIC, 3); /* 32K */
if (!lp->ram) {
@@ -139,6 +135,9 @@ struct net_device * __init mvme147lance_probe(int unit)
return ERR_PTR(err);
}
+ netdev_info(dev, "MVME147 at 0x%08lx, irq %d, Hardware Address %pM\n",
+ dev->base_addr, MVME147_LANCE_IRQ, dev->dev_addr);
+
return dev;
}
--
2.43.0
From: Daniel Palmer <daniel(a)0x0f.com>
[ Upstream commit 82c5b53140faf89c31ea2b3a0985a2f291694169 ]
Currently this driver prints this line with what looks like
a rogue format specifier when the device is probed:
[ 2.840000] eth%d: MVME147 at 0xfffe1800, irq 12, Hardware Address xx:xx:xx:xx:xx:xx
Change the printk() for netdev_info() and move it after the
registration has completed so it prints out the name of the
interface properly.
Signed-off-by: Daniel Palmer <daniel(a)0x0f.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/ethernet/amd/mvme147.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/amd/mvme147.c b/drivers/net/ethernet/amd/mvme147.c
index 3f2e4cdd0b83e..133fe0f1166b0 100644
--- a/drivers/net/ethernet/amd/mvme147.c
+++ b/drivers/net/ethernet/amd/mvme147.c
@@ -106,10 +106,6 @@ struct net_device * __init mvme147lance_probe(int unit)
address = address >> 8;
dev->dev_addr[3] = address&0xff;
- printk("%s: MVME147 at 0x%08lx, irq %d, Hardware Address %pM\n",
- dev->name, dev->base_addr, MVME147_LANCE_IRQ,
- dev->dev_addr);
-
lp = netdev_priv(dev);
lp->ram = __get_dma_pages(GFP_ATOMIC, 3); /* 32K */
if (!lp->ram) {
@@ -139,6 +135,9 @@ struct net_device * __init mvme147lance_probe(int unit)
return ERR_PTR(err);
}
+ netdev_info(dev, "MVME147 at 0x%08lx, irq %d, Hardware Address %pM\n",
+ dev->base_addr, MVME147_LANCE_IRQ, dev->dev_addr);
+
return dev;
}
--
2.43.0
From: Andrew Ballance <andrewjballance(a)gmail.com>
[ Upstream commit 9931122d04c6d431b2c11b5bb7b10f28584067f0 ]
A incorrectly formatted chunk may decompress into
more than LZNT_CHUNK_SIZE bytes and a index out of bounds
will occur in s_max_off.
Signed-off-by: Andrew Ballance <andrewjballance(a)gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/ntfs3/lznt.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ntfs3/lznt.c b/fs/ntfs3/lznt.c
index 28f654561f279..09db01c1098cd 100644
--- a/fs/ntfs3/lznt.c
+++ b/fs/ntfs3/lznt.c
@@ -236,6 +236,9 @@ static inline ssize_t decompress_chunk(u8 *unc, u8 *unc_end, const u8 *cmpr,
/* Do decompression until pointers are inside range. */
while (up < unc_end && cmpr < cmpr_end) {
+ // return err if more than LZNT_CHUNK_SIZE bytes are written
+ if (up - unc > LZNT_CHUNK_SIZE)
+ return -EINVAL;
/* Correct index */
while (unc + s_max_off[index] < up)
index += 1;
--
2.43.0
From: Andrew Ballance <andrewjballance(a)gmail.com>
[ Upstream commit 9931122d04c6d431b2c11b5bb7b10f28584067f0 ]
A incorrectly formatted chunk may decompress into
more than LZNT_CHUNK_SIZE bytes and a index out of bounds
will occur in s_max_off.
Signed-off-by: Andrew Ballance <andrewjballance(a)gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/ntfs3/lznt.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ntfs3/lznt.c b/fs/ntfs3/lznt.c
index 28f654561f279..09db01c1098cd 100644
--- a/fs/ntfs3/lznt.c
+++ b/fs/ntfs3/lznt.c
@@ -236,6 +236,9 @@ static inline ssize_t decompress_chunk(u8 *unc, u8 *unc_end, const u8 *cmpr,
/* Do decompression until pointers are inside range. */
while (up < unc_end && cmpr < cmpr_end) {
+ // return err if more than LZNT_CHUNK_SIZE bytes are written
+ if (up - unc > LZNT_CHUNK_SIZE)
+ return -EINVAL;
/* Correct index */
while (unc + s_max_off[index] < up)
index += 1;
--
2.43.0