[PATCH 4.14] mm, slub: restore the original intention of prefetch_freepointer()

List overview All Threads
Download

newer

older

stable-rc/linux-4.4.y build: 174...

stable-rc/linux-5.4.y boot: 172...

Sven Eckelmann

26 Apr 2020 26 Apr '20

7:06 a.m.

From: Vlastimil Babka vbabka@suse.cz

commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.

In SLUB, prefetch_freepointer() is used when allocating an object from cache's freelist, to make sure the next object in the list is cache-hot, since it's probable it will be allocated soon.

Commit 2482ddec670f ("mm: add SLUB free list pointer obfuscation") has unintentionally changed the prefetch in a way where the prefetch is turned to a real fetch, and only the next->next pointer is prefetched. In case there is not a stream of allocations that would benefit from prefetching, the extra real fetch might add a useless cache miss to the allocation. Restore the previous behavior.

Link: http://lkml.kernel.org/r/20180809085245.22448-1-vbabka@suse.cz Fixes: 2482ddec670f ("mm: add SLUB free list pointer obfuscation") Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Kees Cook keescook@chromium.org Cc: Daniel Micay danielmicay@gmail.com Cc: Eric Dumazet edumazet@google.com Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthias Schiffer mschiffer@universe-factory.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sven Eckelmann sven@narfation.org --- The original problem is explained in the patch description as performance problem. And maybe this could also be one reason why it was never submitted for a stable kernel.

But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely related to "random" data bus errors. At least applying this patch seemed to have solved it for Matthias Schiffer mschiffer@universe-factory.net and some other persons who where debugging/testing this problem with him.

More details about it can be found in https://github.com/freifunk-gluon/gluon/issues/1982 --- mm/slub.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c index 3c1a16f03b2b..481518c3f61a 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -269,8 +269,7 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object)

static void prefetch_freepointer(const struct kmem_cache *s, void *object) { - if (object) - prefetch(freelist_dereference(s, object + s->offset)); + prefetch(object + s->offset); }

static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)

-- 2.20.1

Show replies by date

Sasha Levin

26 Apr 26 Apr

11:14 p.m.

On Sun, Apr 26, 2020 at 09:06:17AM +0200, Sven Eckelmann wrote:

...

From: Vlastimil Babka vbabka@suse.cz

commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.

In SLUB, prefetch_freepointer() is used when allocating an object from cache's freelist, to make sure the next object in the list is cache-hot, since it's probable it will be allocated soon.

Commit 2482ddec670f ("mm: add SLUB free list pointer obfuscation") has unintentionally changed the prefetch in a way where the prefetch is turned to a real fetch, and only the next->next pointer is prefetched. In case there is not a stream of allocations that would benefit from prefetching, the extra real fetch might add a useless cache miss to the allocation. Restore the previous behavior.

Link: http://lkml.kernel.org/r/20180809085245.22448-1-vbabka@suse.cz Fixes: 2482ddec670f ("mm: add SLUB free list pointer obfuscation") Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Kees Cook keescook@chromium.org Cc: Daniel Micay danielmicay@gmail.com Cc: Eric Dumazet edumazet@google.com Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthias Schiffer mschiffer@universe-factory.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sven Eckelmann sven@narfation.org

The original problem is explained in the patch description as performance problem. And maybe this could also be one reason why it was never submitted for a stable kernel.

But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely related to "random" data bus errors. At least applying this patch seemed to have solved it for Matthias Schiffer mschiffer@universe-factory.net and some other persons who where debugging/testing this problem with him.

More details about it can be found in https://github.com/freifunk-gluon/gluon/issues/1982

Interesting... I wonder why this issue has started only now.

I've queued it up for 4.14, thanks!

-- Thanks, Sasha

Sven Eckelmann

27 Apr 27 Apr

7:01 a.m.

On Monday, 27 April 2020 01:14:26 CEST Sasha Levin wrote:

...

On Sun, Apr 26, 2020 at 09:06:17AM +0200, Sven Eckelmann wrote:

...
From: Vlastimil Babka vbabka@suse.cz

commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.

[...]

...

...

The original problem is explained in the patch description as performance problem. And maybe this could also be one reason why it was never submitted for a stable kernel.

But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely related to "random" data bus errors. At least applying this patch seemed to have solved it for Matthias Schiffer mschiffer@universe-factory.net and some other persons who where debugging/testing this problem with him.

More details about it can be found in https://github.com/freifunk-gluon/gluon/issues/1982

Interesting... I wonder why this issue has started only now.

Unfortunately, I don't know the details. So I (actually we) would love to get some feedback from the slub experts. Not that there is another problem which we just don't grasp yet.

Just some background information about the "why" from freifunk-gluon's perspective:

OpenWrt 19.07 was released (despite its name) at the beginning of 2020. And it was the first release using kernel 4.14 on the most used target: ar71xx (ath79). The wireless community network firmware projects (freifunk-gluon in this example) updated their frameworks to this OpenWrt release in the last months and just now started to roll it out on their networks.

And while the wireless community networks around here usually don't track the connected clients, the health of the APs is often tracked on some central system. And some people then just noticed a sudden spike of reboots on their APs. Since ar71xx is (often) the most used architecture at the moment, this could be spotted rather easily if you spend some time looking at graphs.

Kind regards, Sven

Vlastimil Babka

8:45 a.m.

On 4/27/20 9:01 AM, Sven Eckelmann wrote:

...

On Monday, 27 April 2020 01:14:26 CEST Sasha Levin wrote:

...
On Sun, Apr 26, 2020 at 09:06:17AM +0200, Sven Eckelmann wrote:

...
From: Vlastimil Babka vbabka@suse.cz

commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.

[...]

...
...

The original problem is explained in the patch description as performance problem. And maybe this could also be one reason why it was never submitted for a stable kernel.

But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely related to "random" data bus errors. At least applying this patch seemed to have solved it for Matthias Schiffer mschiffer@universe-factory.net and some other persons who where debugging/testing this problem with him.

More details about it can be found in https://github.com/freifunk-gluon/gluon/issues/1982

Hmm, doesn't explain much how the fix was eventually found, but nevermind, good job.

...

...
Interesting... I wonder why this issue has started only now.

Unfortunately, I don't know the details. So I (actually we) would love to get some feedback from the slub experts. Not that there is another problem which we just don't grasp yet.

I think the prefetch my go to an address that would cause a real fetch to page fault. Under normal circumstances that could be only the NULL pointer that terminates a freelist, otherwise the address should be valid.

So that could mean: 1) prefetch() on mips is implemented/compiled wrong? 2) the CPU really has issues with prefetch causing a page fault 3) the prefetch gets reordered between LL/SC and there's some bug similar to this one described in arch/mips/include/asm/sync.h:

/* * Some Loongson 3 CPUs have a bug wherein execution of a memory access (load, * store or prefetch) in between an LL & SC can cause the SC instruction to * erroneously succeed, breaking atomicity. Whilst it's unusual to write code * containing such sequences, this bug bites harder than we might otherwise * expect due to reordering & speculation:

...

Just some background information about the "why" from freifunk-gluon's perspective:

OpenWrt 19.07 was released (despite its name) at the beginning of 2020. And it was the first release using kernel 4.14 on the most used target: ar71xx (ath79). The wireless community network firmware projects (freifunk-gluon in this example) updated their frameworks to this OpenWrt release in the last months and just now started to roll it out on their networks.

And while the wireless community networks around here usually don't track the connected clients, the health of the APs is often tracked on some central system. And some people then just noticed a sudden spike of reboots on their APs. Since ar71xx is (often) the most used architecture at the moment, this could be spotted rather easily if you spend some time looking at graphs.

Kind regards, Sven

Matthias Schiffer

7:08 p.m.

On 4/27/20 10:45 AM, Vlastimil Babka wrote:

...

On 4/27/20 9:01 AM, Sven Eckelmann wrote:

...
On Monday, 27 April 2020 01:14:26 CEST Sasha Levin wrote:

...
On Sun, Apr 26, 2020 at 09:06:17AM +0200, Sven Eckelmann wrote:

...
From: Vlastimil Babka vbabka@suse.cz

commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.

[...]

...
...

The original problem is explained in the patch description as performance problem. And maybe this could also be one reason why it was never submitted for a stable kernel.

But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely related to "random" data bus errors. At least applying this patch seemed to have solved it for Matthias Schiffer mschiffer@universe-factory.net and some other persons who where debugging/testing this problem with him.

More details about it can be found in https://github.com/freifunk-gluon/gluon/issues/1982

Hmm, doesn't explain much how the fix was eventually found, but nevermind, good job.

The fact that the location of the data bus error was so imprecise made me suspect that no regular load could be the cause - therefore I looked at that prefetch in detail and eventually found your patch.

...

...
...
Interesting... I wonder why this issue has started only now.

Unfortunately, I don't know the details. So I (actually we) would love to get some feedback from the slub experts. Not that there is another problem which we just don't grasp yet.

I think the prefetch my go to an address that would cause a real fetch to page fault. Under normal circumstances that could be only the NULL pointer that terminates a freelist, otherwise the address should be valid.

For further analysis, I just replaced the prefetch as implemented in 4.14 (i.e. before applying the patch in question) with a regular load (excluding NULL, which would lead to an immediate fault on boot). With the test program, I quickly hit a fault, at an address that looks completely bogus (i.e. neither NULL nor an address looking like it might be mapped to anything). Is this expected with the incorrect prefetch_freepointer() implementation of 4.14? Is it possible that prefetch_freepointer() of 4.14 is even more broken than suspected before? Note that we hit this issue with the "names_cache" slab, which has page-sized objects, if that might provide any clue...

In any case, it seems like the "pref" instruction should not be used on bogus addresses on the ath79 platform... The exact behaviour is also hardware-dependent: On some SoCs, the error would be visible as a data bus error, while others just reset without any way to find out what was going wrong.

Matthias

...

So that could mean:

prefetch() on mips is implemented/compiled wrong?

the CPU really has issues with prefetch causing a page fault

the prefetch gets reordered between LL/SC and there's some bug similar to

this one described in arch/mips/include/asm/sync.h:

/*

Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,

store or prefetch) in between an LL & SC can cause the SC instruction to

erroneously succeed, breaking atomicity. Whilst it's unusual to write code

containing such sequences, this bug bites harder than we might otherwise

expect due to reordering & speculation:

...
Just some background information about the "why" from freifunk-gluon's perspective:

OpenWrt 19.07 was released (despite its name) at the beginning of 2020. And it was the first release using kernel 4.14 on the most used target: ar71xx (ath79). The wireless community network firmware projects (freifunk-gluon in this example) updated their frameworks to this OpenWrt release in the last months and just now started to roll it out on their networks.

And while the wireless community networks around here usually don't track the connected clients, the health of the APs is often tracked on some central system. And some people then just noticed a sudden spike of reboots on their APs. Since ar71xx is (often) the most used architecture at the moment, this could be spotted rather easily if you spend some time looking at graphs.

Kind regards, Sven

Vlastimil Babka

28 Apr 28 Apr

11:43 a.m.

On 4/27/20 9:08 PM, Matthias Schiffer wrote:

...

On 4/27/20 10:45 AM, Vlastimil Babka wrote:

...
On 4/27/20 9:01 AM, Sven Eckelmann wrote:

...
...
...
More details about it can be found in https://github.com/freifunk-gluon/gluon/issues/1982

Hmm, doesn't explain much how the fix was eventually found, but nevermind, good job.

The fact that the location of the data bus error was so imprecise made me suspect that no regular load could be the cause - therefore I looked at that prefetch in detail and eventually found your patch.

Ah, great, thanks.

...

...
I think the prefetch my go to an address that would cause a real fetch to page fault. Under normal circumstances that could be only the NULL pointer that terminates a freelist, otherwise the address should be valid.

For further analysis, I just replaced the prefetch as implemented in 4.14 (i.e. before applying the patch in question) with a regular load (excluding NULL, which would lead to an immediate fault on boot). With the test program, I quickly hit a fault, at an address that looks completely bogus (i.e. neither NULL nor an address looking like it might be mapped to anything). Is this expected with the incorrect prefetch_freepointer() implementation of 4.14? Is it possible that prefetch_freepointer() of 4.14 is even more broken than suspected before? Note that we hit this issue with the "names_cache" slab, which has page-sized objects, if that might provide any clue...

Thanks for the test and it might indeed be worthwile to chase it down as it's really unclear to me what's going on and there might be a worse underlying issue that we just mask. One question, is CONFIG_SLAB_FREELIST_HARDENED enabled in your config? That could play a role in seeing addresses that look completely bogus.

Looking closer at prefetch_freepointer() usage from slab_alloc_node() we start with

object = c->freelist;

if that's not NULL, we do

void *next_object = get_freepointer_safe(s, object);

then there's the tricky this_cpu_cmpxchg_double() operation. We can assume that if that succeeds, c->freelist == next_object So, if next_object was already a broken pointer, the next allocation should crash, even if we don't crash due to the prefetch.

It's possible next_object is NULL. But then the 4.14 implementation should be actually fine, prefetch_freepointer() has "if (object)" condition so it won't access (not just prefetch) that NULL. On the contrary, my patch removes that condition, as it also didn't exist before commit 2482ddec670fb. So AFAICS it's possible to trigger a NULL prefetch after my backport, and apparently the CPU doesn't mind that?

What if next_object is not NULL? Then it should be a valid pointer, as stated above. With my patch, then it's just prefetched and that's it. 4.14 implementation will read (not prefetch) the next free pointer in the chain, which AFAICT should also be either valid or NULL, and then prefetch that. So both implementation. should be prefetching either a valid value, or NULL? Aha! But we have already put next_object to c->freelist by the this_cpu_cmpxchg_double(), so we could have been then interrupted, and the interrupt handler or another thread scheduled instead of us on the same CPU have now allocated next_object from c->freelist and filled it with its own data, overwriting the free pointer. Then we resume execution and indeed can read bogus value as a free pointer and try to prefetch it... so that might be the reason.

...

In any case, it seems like the "pref" instruction should not be used on bogus addresses on the ath79 platform... The exact behaviour is also hardware-dependent: On some SoCs, the error would be visible as a data bus error, while others just reset without any way to find out what was going wrong.

If I knew that, and realized the scenario above is possible, I would have marked the patch stable myself. Sorry.

...

Matthias

Vlastimil Babka

10:13 a.m.

On 4/26/20 9:06 AM, Sven Eckelmann wrote:

...

From: Vlastimil Babka vbabka@suse.cz

commit 0882ff9190e3bc51e2d78c3aadd7c690eeaa91d5 upstream.

In SLUB, prefetch_freepointer() is used when allocating an object from cache's freelist, to make sure the next object in the list is cache-hot, since it's probable it will be allocated soon.

Commit 2482ddec670f ("mm: add SLUB free list pointer obfuscation") has unintentionally changed the prefetch in a way where the prefetch is turned to a real fetch, and only the next->next pointer is prefetched. In case there is not a stream of allocations that would benefit from prefetching, the extra real fetch might add a useless cache miss to the allocation. Restore the previous behavior.

Link: http://lkml.kernel.org/r/20180809085245.22448-1-vbabka@suse.cz Fixes: 2482ddec670f ("mm: add SLUB free list pointer obfuscation") Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Kees Cook keescook@chromium.org Cc: Daniel Micay danielmicay@gmail.com Cc: Eric Dumazet edumazet@google.com Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthias Schiffer mschiffer@universe-factory.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sven Eckelmann sven@narfation.org

Regardless of how the discussion of exact root cause turns out, I agree with the stable inclusion, consider this as Ack, if it makes sense in that context :)

...

The original problem is explained in the patch description as performance problem. And maybe this could also be one reason why it was never submitted for a stable kernel.

But tests on mips ath79 (OpenWrt ar71xx target) showed that it most likely related to "random" data bus errors. At least applying this patch seemed to have solved it for Matthias Schiffer mschiffer@universe-factory.net and some other persons who where debugging/testing this problem with him.

More details about it can be found in https://github.com/freifunk-gluon/gluon/issues/1982

mm/slub.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c index 3c1a16f03b2b..481518c3f61a 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -269,8 +269,7 @@ static inline void *get_freepointer(struct kmem_cache *s, void *object) static void prefetch_freepointer(const struct kmem_cache *s, void *object) {
if (object)
prefetch(freelist_dereference(s, object + s->offset));
prefetch(object + s->offset); }

static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)

2069

days inactive

2071

days old

linux-stable-mirror@lists.linaro.org

6 comments

participants

tags (0)

participants (4)

Matthias Schiffer
Sasha Levin
Sven Eckelmann
Vlastimil Babka