From: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
vmalloc area runs out in our ARM64 system during an erofs test as
vm_map_ram failed[1]. By following the debug log, we find that
vm_map_ram()->vb_alloc() will allocate new vb->va which corresponding
to 4MB vmalloc area as list_for_each_entry_rcu returns immediately
when vbq->free->next points to vbq->free. That is to say, 65536 times
of page fault after the list's broken will run out of the whole
vmalloc area. This should be introduced by one vbq->free->next point to
vbq->free which makes list_for_each_entry_rcu can not iterate the list
and find the BUG.
[1]
PID: 1 TASK: ffffff80802b4e00 CPU: 6 COMMAND: "init"
#0 [ffffffc08006afe0] __switch_to at ffffffc08111d5cc
#1 [ffffffc08006b040] __schedule at ffffffc08111dde0
#2 [ffffffc08006b0a0] schedule at ffffffc08111e294
#3 [ffffffc08006b0d0] schedule_preempt_disabled at ffffffc08111e3f0
#4 [ffffffc08006b140] __mutex_lock at ffffffc08112068c
#5 [ffffffc08006b180] __mutex_lock_slowpath at ffffffc08111f8f8
#6 [ffffffc08006b1a0] mutex_lock at ffffffc08111f834
#7 [ffffffc08006b1d0] reclaim_and_purge_vmap_areas at ffffffc0803ebc3c
#8 [ffffffc08006b290] alloc_vmap_area at ffffffc0803e83fc
#9 [ffffffc08006b300] vm_map_ram at ffffffc0803e78c0
Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
For detailed reason of broken list, please refer to below URL
https://lore.kernel.org/all/20240531024820.5507-1-hailong.liu@oppo.com/
Suggested-by: Hailong.Liu <hailong.liu(a)oppo.com>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
---
v2: introduce cpu in vmap_block to record the right CPU number
v3: use get_cpu/put_cpu to prevent schedule between core
v4: replace get_cpu/put_cpu by another API to avoid disabling preemption
---
---
mm/vmalloc.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 22aa63f4ef63..89eb034f4ac6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2458,6 +2458,7 @@ struct vmap_block {
struct list_head free_list;
struct rcu_head rcu_head;
struct list_head purge;
+ unsigned int cpu;
};
/* Queue of free and dirty vmap blocks, for allocation and flushing purposes */
@@ -2585,8 +2586,15 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
free_vmap_area(va);
return ERR_PTR(err);
}
-
- vbq = raw_cpu_ptr(&vmap_block_queue);
+ /*
+ * list_add_tail_rcu could happened in another core
+ * rather than vb->cpu due to task migration, which
+ * is safe as list_add_tail_rcu will ensure the list's
+ * integrity together with list_for_each_rcu from read
+ * side.
+ */
+ vb->cpu = raw_smp_processor_id();
+ vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu);
spin_lock(&vbq->lock);
list_add_tail_rcu(&vb->free_list, &vbq->free);
spin_unlock(&vbq->lock);
@@ -2614,9 +2622,10 @@ static void free_vmap_block(struct vmap_block *vb)
}
static bool purge_fragmented_block(struct vmap_block *vb,
- struct vmap_block_queue *vbq, struct list_head *purge_list,
- bool force_purge)
+ struct list_head *purge_list, bool force_purge)
{
+ struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, vb->cpu);
+
if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
vb->dirty == VMAP_BBMAP_BITS)
return false;
@@ -2664,7 +2673,7 @@ static void purge_fragmented_blocks(int cpu)
continue;
spin_lock(&vb->lock);
- purge_fragmented_block(vb, vbq, &purge, true);
+ purge_fragmented_block(vb, &purge, true);
spin_unlock(&vb->lock);
}
rcu_read_unlock();
@@ -2801,7 +2810,7 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
* not purgeable, check whether there is dirty
* space to be flushed.
*/
- if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
+ if (!purge_fragmented_block(vb, &purge_list, false) &&
vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
unsigned long va_start = vb->va->va_start;
unsigned long s, e;
--
2.25.1
The patch titled
Subject: mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-shmem-fix-getting-incorrect-lruvec-when-replacing-a-shmem-folio.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Subject: mm: shmem: fix getting incorrect lruvec when replacing a shmem folio
Date: Thu, 13 Jun 2024 16:21:19 +0800
When testing shmem swapin, I encountered the warning below on my machine.
The reason is that replacing an old shmem folio with a new one causes
mem_cgroup_migrate() to clear the old folio's memcg data. As a result,
the old folio cannot get the correct memcg's lruvec needed to remove
itself from the LRU list when it is being freed. This could lead to
possible serious problems, such as LRU list crashes due to holding the
wrong LRU lock, and incorrect LRU statistics.
To fix this issue, we can fallback to use the mem_cgroup_replace_folio()
to replace the old shmem folio.
[ 5241.100311] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5d9960
[ 5241.100317] head: order:4 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 5241.100319] flags: 0x17fffe0000040068(uptodate|lru|head|swapbacked|node=0|zone=2|lastcpupid=0x3ffff)
[ 5241.100323] raw: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
[ 5241.100325] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 5241.100326] head: 17fffe0000040068 fffffdffd6687948 fffffdffd69ae008 0000000000000000
[ 5241.100327] head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 5241.100328] head: 17fffe0000000204 fffffdffd6665801 ffffffffffffffff 0000000000000000
[ 5241.100329] head: 0000000a00000010 0000000000000000 00000000ffffffff 0000000000000000
[ 5241.100330] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
[ 5241.100338] ------------[ cut here ]------------
[ 5241.100339] WARNING: CPU: 19 PID: 78402 at include/linux/memcontrol.h:775 folio_lruvec_lock_irqsave+0x140/0x150
[...]
[ 5241.100374] pc : folio_lruvec_lock_irqsave+0x140/0x150
[ 5241.100375] lr : folio_lruvec_lock_irqsave+0x138/0x150
[ 5241.100376] sp : ffff80008b38b930
[...]
[ 5241.100398] Call trace:
[ 5241.100399] folio_lruvec_lock_irqsave+0x140/0x150
[ 5241.100401] __page_cache_release+0x90/0x300
[ 5241.100404] __folio_put+0x50/0x108
[ 5241.100406] shmem_replace_folio+0x1b4/0x240
[ 5241.100409] shmem_swapin_folio+0x314/0x528
[ 5241.100411] shmem_get_folio_gfp+0x3b4/0x930
[ 5241.100412] shmem_fault+0x74/0x160
[ 5241.100414] __do_fault+0x40/0x218
[ 5241.100417] do_shared_fault+0x34/0x1b0
[ 5241.100419] do_fault+0x40/0x168
[ 5241.100420] handle_pte_fault+0x80/0x228
[ 5241.100422] __handle_mm_fault+0x1c4/0x440
[ 5241.100424] handle_mm_fault+0x60/0x1f0
[ 5241.100426] do_page_fault+0x120/0x488
[ 5241.100429] do_translation_fault+0x4c/0x68
[ 5241.100431] do_mem_abort+0x48/0xa0
[ 5241.100434] el0_da+0x38/0xc0
[ 5241.100436] el0t_64_sync_handler+0x68/0xc0
[ 5241.100437] el0t_64_sync+0x14c/0x150
[ 5241.100439] ---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/3c11000dd6c1df83015a8321a859e9775ebbc23e.17182661…
Fixes: 85ce2c517ade ("memcontrol: only transfer the memcg data for migration")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/shmem.c~mm-shmem-fix-getting-incorrect-lruvec-when-replacing-a-shmem-folio
+++ a/mm/shmem.c
@@ -1786,7 +1786,7 @@ static int shmem_replace_folio(struct fo
xa_lock_irq(&swap_mapping->i_pages);
error = shmem_replace_entry(swap_mapping, swap_index, old, new);
if (!error) {
- mem_cgroup_migrate(old, new);
+ mem_cgroup_replace_folio(old, new);
__lruvec_stat_mod_folio(new, NR_FILE_PAGES, 1);
__lruvec_stat_mod_folio(new, NR_SHMEM, 1);
__lruvec_stat_mod_folio(old, NR_FILE_PAGES, -1);
_
Patches currently in -mm which might be from baolin.wang(a)linux.alibaba.com are
mm-shmem-fix-getting-incorrect-lruvec-when-replacing-a-shmem-folio.patch
mm-memory-extend-finish_fault-to-support-large-folio.patch
mm-memory-extend-finish_fault-to-support-large-folio-fix.patch
mm-shmem-add-thp-validation-for-pmd-mapped-thp-related-statistics.patch
mm-shmem-add-multi-size-thp-sysfs-interface-for-anonymous-shmem.patch
mm-shmem-add-multi-size-thp-sysfs-interface-for-anonymous-shmem-fix.patch
mm-shmem-add-mthp-support-for-anonymous-shmem.patch
mm-shmem-add-mthp-size-alignment-in-shmem_get_unmapped_area.patch
mm-shmem-add-mthp-counters-for-anonymous-shmem.patch
The patch titled
Subject: mm: fix incorrect vbq reference in purge_fragmented_block
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-incorrect-vbq-reference-in-purge_fragmented_block.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Subject: mm: fix incorrect vbq reference in purge_fragmented_block
Date: Fri, 7 Jun 2024 10:31:16 +0800
vmalloc area runs out in our ARM64 system during an erofs test as
vm_map_ram failed[1]. By following the debug log, we find that
vm_map_ram()->vb_alloc() will allocate new vb->va which corresponding to
4MB vmalloc area as list_for_each_entry_rcu returns immediately when
vbq->free->next points to vbq->free. That is to say, 65536 times of page
fault after the list's broken will run out of the whole vmalloc area.
This should be introduced by one vbq->free->next point to vbq->free which
makes list_for_each_entry_rcu can not iterate the list and find the BUG.
[1]
PID: 1 TASK: ffffff80802b4e00 CPU: 6 COMMAND: "init"
#0 [ffffffc08006afe0] __switch_to at ffffffc08111d5cc
#1 [ffffffc08006b040] __schedule at ffffffc08111dde0
#2 [ffffffc08006b0a0] schedule at ffffffc08111e294
#3 [ffffffc08006b0d0] schedule_preempt_disabled at ffffffc08111e3f0
#4 [ffffffc08006b140] __mutex_lock at ffffffc08112068c
#5 [ffffffc08006b180] __mutex_lock_slowpath at ffffffc08111f8f8
#6 [ffffffc08006b1a0] mutex_lock at ffffffc08111f834
#7 [ffffffc08006b1d0] reclaim_and_purge_vmap_areas at ffffffc0803ebc3c
#8 [ffffffc08006b290] alloc_vmap_area at ffffffc0803e83fc
#9 [ffffffc08006b300] vm_map_ram at ffffffc0803e78c0
For detailed descri[ption of the broken list, please see
https://lore.kernel.org/all/20240531024820.5507-1-hailong.liu@oppo.com/
Link: https://lkml.kernel.org/r/20240607023116.1720640-1-zhaoyang.huang@unisoc.com
Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
Signed-off-by: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Suggested-by: Hailong.Liu <hailong.liu(a)oppo.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
--- a/mm/vmalloc.c~mm-fix-incorrect-vbq-reference-in-purge_fragmented_block
+++ a/mm/vmalloc.c
@@ -2498,6 +2498,7 @@ struct vmap_block {
struct list_head free_list;
struct rcu_head rcu_head;
struct list_head purge;
+ unsigned int cpu;
};
/* Queue of free and dirty vmap blocks, for allocation and flushing purposes */
@@ -2625,8 +2626,15 @@ static void *new_vmap_block(unsigned int
free_vmap_area(va);
return ERR_PTR(err);
}
-
- vbq = raw_cpu_ptr(&vmap_block_queue);
+ /*
+ * list_add_tail_rcu could happened in another core
+ * rather than vb->cpu due to task migration, which
+ * is safe as list_add_tail_rcu will ensure the list's
+ * integrity together with list_for_each_rcu from read
+ * side.
+ */
+ vb->cpu = raw_smp_processor_id();
+ vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu);
spin_lock(&vbq->lock);
list_add_tail_rcu(&vb->free_list, &vbq->free);
spin_unlock(&vbq->lock);
@@ -2654,9 +2662,10 @@ static void free_vmap_block(struct vmap_
}
static bool purge_fragmented_block(struct vmap_block *vb,
- struct vmap_block_queue *vbq, struct list_head *purge_list,
- bool force_purge)
+ struct list_head *purge_list, bool force_purge)
{
+ struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, vb->cpu);
+
if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
vb->dirty == VMAP_BBMAP_BITS)
return false;
@@ -2704,7 +2713,7 @@ static void purge_fragmented_blocks(int
continue;
spin_lock(&vb->lock);
- purge_fragmented_block(vb, vbq, &purge, true);
+ purge_fragmented_block(vb, &purge, true);
spin_unlock(&vb->lock);
}
rcu_read_unlock();
@@ -2841,7 +2850,7 @@ static void _vm_unmap_aliases(unsigned l
* not purgeable, check whether there is dirty
* space to be flushed.
*/
- if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
+ if (!purge_fragmented_block(vb, &purge_list, false) &&
vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
unsigned long va_start = vb->va->va_start;
unsigned long s, e;
_
Patches currently in -mm which might be from zhaoyang.huang(a)unisoc.com are
mm-fix-incorrect-vbq-reference-in-purge_fragmented_block.patch
mm-optimization-on-page-allocation-when-cma-enabled.patch
Hello,
a machine booted with Linux 6.6.23 up to 6.6.32:
writing /dev/zero with dd on a mounted cifs share with vers=1.0 or
vers=2.0 slows down drastically in my setup after writing approx. 46GB of
data.
The whole machine gets unresponsive as it was under very high IO load. It
pings but opening a new ssh session needs too much time. I can stop the dd
(ctrl-c) and after a few minutes the machine is fine again.
cifs with vers=3.1.1 seems to be fine with 6.6.32.
Linux 6.10-rc3 is fine with vers=1.0 and vers=2.0.
Bisected down to:
cifs-fix-writeback-data-corruption.patch
which is:
Upstream commit f3dc1bdb6b0b0693562c7c54a6c28bafa608ba3c
and
linux-stable commit e45deec35bf7f1f4f992a707b2d04a8c162f2240
Reverting this patch on 6.6.32 fixes the problem for me.
Thomas
Recently it was reported that the following USB storage devices are unusable
with Linux kernel 6.9:
* Kingston DataTraveler G2
* Garmin FR35
This is because attempting to read the IO hint VPD page causes these devices
to reset. Hence do not read the IO hint VPD page from USB storage devices.
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: linux-usb(a)vger.kernel.org
Cc: Joao Machado <jocrismachado(a)gmail.com>
Cc: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Cc: Christian Heusel <christian(a)heusel.eu>
Cc: stable(a)vger.kernel.org
Fixes: 4f53138fffc2 ("scsi: sd: Translate data lifetime information")
Reported-by: Joao Machado <jocrismachado(a)gmail.com>
Closes: https://lore.kernel.org/linux-scsi/20240130214911.1863909-1-bvanassche@acm.…
Tested-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Bisected-by: Christian Heusel <christian(a)heusel.eu>
Reported-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Closes: https://lore.kernel.org/linux-scsi/CACLx9VdpUanftfPo2jVAqXdcWe8Y43MsDeZmMPo…
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
drivers/usb/storage/scsiglue.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index b31464740f6c..9a7185c68872 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -79,6 +79,8 @@ static int slave_alloc (struct scsi_device *sdev)
if (us->protocol == USB_PR_BULK && us->max_lun > 0)
sdev->sdev_bflags |= BLIST_FORCELUN;
+ sdev->sdev_bflags |= BLIST_SKIP_IO_HINTS;
+
return 0;
}