On Fri, Apr 11, 2025 at 9:59 AM Suren Baghdasaryan surenb@google.com wrote:
On Fri, Apr 11, 2025 at 9:27 AM Vlastimil Babka vbabka@suse.cz wrote:
On 4/11/25 17:57, Suren Baghdasaryan wrote:
ktest recently reported crashes while running several buffered io tests with __alloc_tagging_slab_alloc_hook() at the top of the crash call stack. The signature indicates an invalid address dereference with low bits of slab->obj_exts being set. The bits were outside of the range used by page_memcg_data_flags and objext_flags and hence were not masked out by slab_obj_exts() when obtaining the pointer stored in slab->obj_exts. The typical crash log looks like this:
00510 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 00510 Mem abort info: 00510 ESR = 0x0000000096000045 00510 EC = 0x25: DABT (current EL), IL = 32 bits 00510 SET = 0, FnV = 0 00510 EA = 0, S1PTW = 0 00510 FSC = 0x05: level 1 translation fault 00510 Data abort info: 00510 ISV = 0, ISS = 0x00000045, ISS2 = 0x00000000 00510 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 00510 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 00510 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000104175000 00510 [0000000000000010] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 00510 Internal error: Oops: 0000000096000045 [#1] SMP 00510 Modules linked in: 00510 CPU: 10 UID: 0 PID: 7692 Comm: cat Not tainted 6.15.0-rc1-ktest-g189e17946605 #19327 NONE 00510 Hardware name: linux,dummy-virt (DT) 00510 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) 00510 pc : __alloc_tagging_slab_alloc_hook+0xe0/0x190 00510 lr : __kmalloc_noprof+0x150/0x310 00510 sp : ffffff80c87df6c0 00510 x29: ffffff80c87df6c0 x28: 000000000013d1ff x27: 000000000013d200 00510 x26: ffffff80c87df9e0 x25: 0000000000000000 x24: 0000000000000001 00510 x23: ffffffc08041953c x22: 000000000000004c x21: ffffff80c0002180 00510 x20: fffffffec3120840 x19: ffffff80c4821000 x18: 0000000000000000 00510 x17: fffffffec3d02f00 x16: fffffffec3d02e00 x15: fffffffec3d00700 00510 x14: fffffffec3d00600 x13: 0000000000000200 x12: 0000000000000006 00510 x11: ffffffc080bb86c0 x10: 0000000000000000 x9 : ffffffc080201e58 00510 x8 : ffffff80c4821060 x7 : 0000000000000000 x6 : 0000000055555556 00510 x5 : 0000000000000001 x4 : 0000000000000010 x3 : 0000000000000060 00510 x2 : 0000000000000000 x1 : ffffffc080f50cf8 x0 : ffffff80d801d000 00510 Call trace: 00510 __alloc_tagging_slab_alloc_hook+0xe0/0x190 (P) 00510 __kmalloc_noprof+0x150/0x310 00510 __bch2_folio_create+0x5c/0xf8 00510 bch2_folio_create+0x2c/0x40 00510 bch2_readahead+0xc0/0x460 00510 read_pages+0x7c/0x230 00510 page_cache_ra_order+0x244/0x3a8 00510 page_cache_async_ra+0x124/0x170 00510 filemap_readahead.isra.0+0x58/0xa0 00510 filemap_get_pages+0x454/0x7b0 00510 filemap_read+0xdc/0x418 00510 bch2_read_iter+0x100/0x1b0 00510 vfs_read+0x214/0x300 00510 ksys_read+0x6c/0x108 00510 __arm64_sys_read+0x20/0x30 00510 invoke_syscall.constprop.0+0x54/0xe8 00510 do_el0_svc+0x44/0xc8 00510 el0_svc+0x18/0x58 00510 el0t_64_sync_handler+0x104/0x130 00510 el0t_64_sync+0x154/0x158 00510 Code: d5384100 f9401c01 b9401aa3 b40002e1 (f8227881) 00510 ---[ end trace 0000000000000000 ]--- 00510 Kernel panic - not syncing: Oops: Fatal exception 00510 SMP: stopping secondary CPUs 00510 Kernel Offset: disabled 00510 CPU features: 0x0000,000000e0,00000410,8240500b 00510 Memory Limit: none
Investigation indicates that these bits are already set when we allocate slab page and are not zeroed out after allocation. We are not yet sure why these crashes start happening only recently but regardless of the reason, not initializing a field that gets used later is wrong. Fix it by initializing slab->obj_exts during slab page allocation.
slab->obj_exts overlays page->memcg_data and the checks on page alloc and page free should catch any non-zero values, i.e. page_expected_state() page_bad_reason() so if anyone is e.g. UAF-writing there or leaving garbage there while freeing the page it's a bug.
Perhaps CONFIG_MEMCG is disabled in the ktests and thus the checks are not happening? We could extend them for CONFIG_SLAB_OBJ_EXT checking _unused_slab_obj_exts perhaps. But it would be a short lived change, see below.
Correct, CONFIG_MEMCG was disabled during these tests. We added BUG_ON() in the slab allocation path to trigger on these low bits and it did trigger but the same assertion in the freeing path did not catch anything. We suspected 4996fc547f5b ("mm: let _folio_nr_pages overlay memcg_data in first tail page") to cause this but Kent's bisection did not confirm that.
Fixes: 21c690a349ba ("mm: introduce slabobj_ext to support slab object extensions") Reported-by: Kent Overstreet kent.overstreet@linux.dev Tested-by: Kent Overstreet kent.overstreet@linux.dev Signed-off-by: Suren Baghdasaryan surenb@google.com Acked-by: Kent Overstreet kent.overstreet@linux.dev Cc: stable@vger.kernel.org
We'll need this anyway for the not so far future when struct slab is separated from struct page so it's fine but it would still be great to find the underlying buggy code which this is going to hide.
Yeah, we will try to find the culprit. For now to prevent others from stepping on this mine I would like to get this in. Thanks!
Kent asked me to forward this (his email is misbehaving for some reason):
Yes, ktest doesn't flip on CONFIG_MEMCG.
Those checks you're talking about are also behind CONFIG_DEBUG_VM, which isn't normally on. I did do some runs with it on and it didn't fire - only additional asserts Suren and I added - so something's missing.
In the meantime, this needs to go in quickly as a hotfix because it's a 6.15-rc1 regression, and I've been getting distros to enable memory allocation profiling and I'd be shocked if it doesn't cause memcg crashes as well.
mm/slub.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/mm/slub.c b/mm/slub.c index b46f87662e71..dc9e729e1d26 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1973,6 +1973,11 @@ static inline void handle_failed_objexts_alloc(unsigned long obj_exts, #define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \ __GFP_ACCOUNT | __GFP_NOFAIL)
+static inline void init_slab_obj_exts(struct slab *slab) +{
slab->obj_exts = 0;
+}
int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s, gfp_t gfp, bool new_slab) { @@ -2058,6 +2063,10 @@ static inline bool need_slab_obj_ext(void)
#else /* CONFIG_SLAB_OBJ_EXT */
+static inline void init_slab_obj_exts(struct slab *slab) +{ +}
static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s, gfp_t gfp, bool new_slab) { @@ -2637,6 +2646,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) slab->objects = oo_objects(oo); slab->inuse = 0; slab->frozen = 0;
init_slab_obj_exts(slab); account_slab(slab, oo_order(oo), s, flags);
base-commit: c496b37f9061db039b413c03ccd33506175fe6ec