[PATCH v2 1/1] hung_task: fix warnings caused by unaligned lock pointers

List overview All Threads
Download

newer

older

[PATCH RESEND v3 0/3] phy: qcom:...

[PATCH v2 RESEND] media: as102:...

Lance Yang

9 Sep 2025 9 Sep '25

2:52 p.m.

From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

To fix this, the runtime checks are adjusted to silently ignore any lock that is not 4-byte aligned, effectively disabling the feature in such cases and avoiding the related warnings.

Thanks to Geert Uytterhoeven for bisecting!

Reported-by: Eero Tamminen oak@helsinkinet.fi Closes: https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc... Fixes: e711faaafbe5 ("hung_task: replace blocker_mutex with encoded blocker") Cc: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Lance Yang lance.yang@linux.dev --- v1 -> v2: - Pick RB from Masami - thanks! - Update the changelog and comments - https://lore.kernel.org/lkml/20250823050036.7748-1-lance.yang@linux.dev/

include/linux/hung_task.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/hung_task.h b/include/linux/hung_task.h index 34e615c76ca5..c4403eeb7144 100644 --- a/include/linux/hung_task.h +++ b/include/linux/hung_task.h @@ -20,6 +20,10 @@ * always zero. So we can use these bits to encode the specific blocking * type. * + * Note that on architectures where this is not guaranteed, or for any + * unaligned lock, this tracking mechanism is silently skipped for that + * lock. + * * Type encoding: * 00 - Blocked on mutex (BLOCKER_TYPE_MUTEX) * 01 - Blocked on semaphore (BLOCKER_TYPE_SEM) @@ -45,7 +49,7 @@ static inline void hung_task_set_blocker(void *lock, unsigned long type) * If the lock pointer matches the BLOCKER_TYPE_MASK, return * without writing anything. */ - if (WARN_ON_ONCE(lock_ptr & BLOCKER_TYPE_MASK)) + if (lock_ptr & BLOCKER_TYPE_MASK) return;

WRITE_ONCE(current->blocker, lock_ptr | type); @@ -53,8 +57,6 @@ static inline void hung_task_set_blocker(void *lock, unsigned long type)

static inline void hung_task_clear_blocker(void) { - WARN_ON_ONCE(!READ_ONCE(current->blocker)); - WRITE_ONCE(current->blocker, 0UL); }

-- 2.49.0

Show replies by date

Kent Overstreet

9 Sep 9 Sep

4:46 p.m.

On Tue, Sep 09, 2025 at 10:52:43PM +0800, Lance Yang wrote:

...

From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

Isn't m68k the only architecture that's weird like this?

...

To fix this, the runtime checks are adjusted to silently ignore any lock that is not 4-byte aligned, effectively disabling the feature in such cases and avoiding the related warnings.

Thanks to Geert Uytterhoeven for bisecting!

Reported-by: Eero Tamminen oak@helsinkinet.fi Closes: https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc... Fixes: e711faaafbe5 ("hung_task: replace blocker_mutex with encoded blocker") Cc: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Lance Yang lance.yang@linux.dev

v1 -> v2:

Pick RB from Masami - thanks!

Update the changelog and comments

https://lore.kernel.org/lkml/20250823050036.7748-1-lance.yang@linux.dev/

include/linux/hung_task.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/hung_task.h b/include/linux/hung_task.h index 34e615c76ca5..c4403eeb7144 100644 --- a/include/linux/hung_task.h +++ b/include/linux/hung_task.h @@ -20,6 +20,10 @@

always zero. So we can use these bits to encode the specific blocking

type.

Note that on architectures where this is not guaranteed, or for any

unaligned lock, this tracking mechanism is silently skipped for that

lock.

Type encoding:

00 - Blocked on mutex (BLOCKER_TYPE_MUTEX)

01 - Blocked on semaphore (BLOCKER_TYPE_SEM)

@@ -45,7 +49,7 @@ static inline void hung_task_set_blocker(void *lock, unsigned long type) * If the lock pointer matches the BLOCKER_TYPE_MASK, return * without writing anything. */

if (WARN_ON_ONCE(lock_ptr & BLOCKER_TYPE_MASK))

if (lock_ptr & BLOCKER_TYPE_MASK) return;

WRITE_ONCE(current->blocker, lock_ptr | type); @@ -53,8 +57,6 @@ static inline void hung_task_set_blocker(void *lock, unsigned long type) static inline void hung_task_clear_blocker(void) {

WARN_ON_ONCE(!READ_ONCE(current->blocker));

WRITE_ONCE(current->blocker, 0UL);

} -- 2.49.0

John Paul Adrian Glaubitz

4:55 p.m.

On Tue, 2025-09-09 at 12:46 -0400, Kent Overstreet wrote:

...

On Tue, Sep 09, 2025 at 10:52:43PM +0800, Lance Yang wrote:

...
From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

Isn't m68k the only architecture that's weird like this?

Yes, and it does this on Linux only. I have been trying to change it upstream though as the official SysV ELF ABI for m68k requires a 4-byte natural alignment [1].

Adrian

...

[1] https://people.debian.org/~glaubitz/m68k-sysv-abi.pdf (p. 29)

-- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

Kent Overstreet

7:02 p.m.

On Tue, Sep 09, 2025 at 06:55:42PM +0200, John Paul Adrian Glaubitz wrote:

...

On Tue, 2025-09-09 at 12:46 -0400, Kent Overstreet wrote:

...
On Tue, Sep 09, 2025 at 10:52:43PM +0800, Lance Yang wrote:

...
From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

Isn't m68k the only architecture that's weird like this?

Yes, and it does this on Linux only. I have been trying to change it upstream though as the official SysV ELF ABI for m68k requires a 4-byte natural alignment [1].

Better to make it an explicit ifdef on the architecture, then...

Finn Thain

10 Sep 10 Sep

12:45 a.m.

On Tue, 9 Sep 2025, John Paul Adrian Glaubitz wrote:

...

I have been trying to change it upstream though

That ship sailed decades ago.

...

as the official SysV ELF ABI for m68k requires a 4-byte natural alignment [1] ...

[1] https://people.debian.org/~glaubitz/m68k-sysv-abi.pdf (p. 29)

GNU/Linux is not AT&T Unix and was never intended to be that. Hence, your old System V (trademark) binaries are not going to work, unfortunately.

Geert Uytterhoeven

7:34 a.m.

On Tue, 9 Sept 2025 at 18:55, John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de wrote:

...

On Tue, 2025-09-09 at 12:46 -0400, Kent Overstreet wrote:

...
On Tue, Sep 09, 2025 at 10:52:43PM +0800, Lance Yang wrote:

...
From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

Isn't m68k the only architecture that's weird like this?

Yes, and it does this on Linux only. I have been trying to change it upstream though as the official SysV ELF ABI for m68k requires a 4-byte natural alignment [1].

M68k does this on various OSes and ABIs that predate or are not explicitly compatible with the SysV ELF ABI.

Other architectures like CRIS (1-byte alignment!) are no longer supported by Linux.

FWIW, doubles (and doublewords) are not naturally aligned in the SysV ELF ABI for i386, while doubles (no mention of doublewords) are naturally aligned in the SysV ELF ABI for m68k.

Gr{oetje,eeting}s,

Geert

-- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds

John Paul Adrian Glaubitz

7:37 a.m.

On Wed, 2025-09-10 at 09:34 +0200, Geert Uytterhoeven wrote:

...

...
...
Isn't m68k the only architecture that's weird like this?

Yes, and it does this on Linux only. I have been trying to change it upstream though as the official SysV ELF ABI for m68k requires a 4-byte natural alignment [1].

M68k does this on various OSes and ABIs that predate or are not explicitly compatible with the SysV ELF ABI.

I know. I was talking in the context of SysV ELF systems.

...

Other architectures like CRIS (1-byte alignment!) are no longer supported by Linux.

Yes, that's why we should take care of the alignment ;-).

...

FWIW, doubles (and doublewords) are not naturally aligned in the SysV ELF ABI for i386, while doubles (no mention of doublewords) are naturally aligned in the SysV ELF ABI for m68k.

I wouldn't consider i386 a role model for us ;-).

Adrian

-- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

Finn Thain

12:07 a.m.

On Tue, 9 Sep 2025, Kent Overstreet wrote:

...

On Tue, Sep 09, 2025 at 10:52:43PM +0800, Lance Yang wrote:

...
From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

Isn't m68k the only architecture that's weird like this?

No. Historically, Linux/CRIS did not naturally align integer types either. AFAIK, there's no standard that demands natural alignment of integer types. Linux ABIs differ significantly.

For example, Linux/i386 does not naturally align long longs. Therefore, x86 may be expected to become the next m68k (or CRIS) unless such assumptions are avoided and alignment requirements are made explicit.

The real problem here is the algorithm. Some under-resourced distros choose to blame the ABI instead of the algorithm, because in doing so, they are freed from having to work to improve upstream code bases.

IMHO, good C doesn't make alignment assumptions, because that hinders source code portability and reuse, as well as algorithm extensibility. We've seen it before. The issue here [1] is no different from the pointer abuse which we fixed in Cpython [2].

Linux is probably the only non-trivial program that could be feasibly rebuilt with -malign-int without ill effect (i.e. without breaking userland) but that sort of workaround would not address the root cause (i.e. algorithms with bad assumptions).

[1] https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc...

[2] https://github.com/python/cpython/pull/135016

Kent Overstreet

12:51 a.m.

On Wed, Sep 10, 2025 at 10:07:04AM +1000, Finn Thain wrote:

...

On Tue, 9 Sep 2025, Kent Overstreet wrote:

...
On Tue, Sep 09, 2025 at 10:52:43PM +0800, Lance Yang wrote:

...
From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

Isn't m68k the only architecture that's weird like this?

No. Historically, Linux/CRIS did not naturally align integer types either. AFAIK, there's no standard that demands natural alignment of integer types. Linux ABIs differ significantly.

For example, Linux/i386 does not naturally align long longs. Therefore, x86 may be expected to become the next m68k (or CRIS) unless such assumptions are avoided and alignment requirements are made explicit.

That doesn't really apply; i386's long long is ugly but it's not as much of an issue in practice, because it's greater than a machine word.

...

The real problem here is the algorithm. Some under-resourced distros choose to blame the ABI instead of the algorithm, because in doing so, they are freed from having to work to improve upstream code bases.

Hang on, let's avoid playing the blame game. It's perfectly reasonable to view standards not as holy religious texts that must be adhered to; these things were written down when specifications were much looser.

...

IMHO, good C doesn't make alignment assumptions, because that hinders source code portability and reuse, as well as algorithm extensibility. We've seen it before. The issue here [1] is no different from the pointer abuse which we fixed in Cpython [2].

That kind of thinking really dates from before multithreaded and even lockless algorithms became absolutely pervasive, especially in the kernel.

These days, READ_ONCE() and WRITE_ONCE() are pervasive, and since C lacks any notion of atomics in the type system (the place this primarily comes up), it would go a long ways towards improving portability and eliminating nasty land mines.

Finn Thain

1:35 a.m.

On Tue, 9 Sep 2025, Kent Overstreet wrote:

...

On Wed, Sep 10, 2025 at 10:07:04AM +1000, Finn Thain wrote:

...
On Tue, 9 Sep 2025, Kent Overstreet wrote:

...
On Tue, Sep 09, 2025 at 10:52:43PM +0800, Lance Yang wrote:

...
From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

Isn't m68k the only architecture that's weird like this?

No. Historically, Linux/CRIS did not naturally align integer types either. AFAIK, there's no standard that demands natural alignment of integer types. Linux ABIs differ significantly.

For example, Linux/i386 does not naturally align long longs. Therefore, x86 may be expected to become the next m68k (or CRIS) unless such assumptions are avoided and alignment requirements are made explicit.

That doesn't really apply; i386's long long is ugly but it's not as much of an issue in practice, because it's greater than a machine word.

Similarly, on m68k, there is no issue with __alignof(long) == 2 because these platforms don't trap on misaligned access. But that seems a bit irrelevant to the real issue, which is not specific architectural quirks, but the algorithms and their ongoing development.

...

...

...
IMHO, good C doesn't make alignment assumptions, because that hinders source code portability and reuse, as well as algorithm extensibility. We've seen it before. The issue here [1] is no different from the pointer abuse which we fixed in Cpython [2].

That kind of thinking really dates from before multithreaded and even lockless algorithms became absolutely pervasive, especially in the kernel.

What I meant was, "assumptions hinder portability etc." not "good C hinders portability etc." (my bad).

...

These days, READ_ONCE() and WRITE_ONCE() are pervasive, and since C lacks any notion of atomics in the type system (the place this primarily comes up), it would go a long ways towards improving portability and eliminating nasty land mines.

Natural alignment would seem to be desirable for new ABIs, until you realize that it implies wasted RAM on embedded systems and reduced data locality (that is, cooler caches if you did this on i386).

Kent Overstreet

1:48 a.m.

On Wed, Sep 10, 2025 at 11:35:56AM +1000, Finn Thain wrote:

...

Similarly, on m68k, there is no issue with __alignof(long) == 2 because these platforms don't trap on misaligned access. But that seems a bit irrelevant to the real issue, which is not specific architectural quirks, but the algorithms and their ongoing development.

Err, I believe the topic was just alignment and the breaking of commonly held expectations :)

...

...
...

...
IMHO, good C doesn't make alignment assumptions, because that hinders source code portability and reuse, as well as algorithm extensibility. We've seen it before. The issue here [1] is no different from the pointer abuse which we fixed in Cpython [2].

That kind of thinking really dates from before multithreaded and even lockless algorithms became absolutely pervasive, especially in the kernel.

What I meant was, "assumptions hinder portability etc." not "good C hinders portability etc." (my bad).

Of course, but given the lack of a true atomic type in C there's no good alternative way to avoid this landmine.

Also, grep for READ_ONCE/WRITE_ONCE in the kernel tree if you want to see how big the issue is - ad then remember that only captures a fraction of it :)

...

...
These days, READ_ONCE() and WRITE_ONCE() are pervasive, and since C lacks any notion of atomics in the type system (the place this primarily comes up), it would go a long ways towards improving portability and eliminating nasty land mines.

Natural alignment would seem to be desirable for new ABIs, until you realize that it implies wasted RAM on embedded systems and reduced data locality (that is, cooler caches if you did this on i386).

For the data structures where it matters we tend to organize things by natural alignment already.

If anyone wanted to gather precise numbers, there's memory allocation profiling + pahole :)

Finn Thain

6:40 a.m.

On Tue, 9 Sep 2025, Kent Overstreet wrote:

...

Err, I believe the topic was just alignment and the breaking of commonly held expectations :)

...

Also, grep for READ_ONCE/WRITE_ONCE in the kernel tree if you want to see how big the issue is

I'm already aware of the comment in include/asm-generic/rwonce.h about load tearing and 64-bit loads on 32-bit architectures. That's partly why I mentioned long long alignment on i386. Perhaps, for being so common, i386 has generally lowered expectations?

Andreas Schwab

6:52 a.m.

On Sep 10 2025, Finn Thain wrote:

...

Linux is probably the only non-trivial program that could be feasibly rebuilt with -malign-int without ill effect (i.e. without breaking userland)

No, you can't. It would change the layout of basic user-level structures, breaking the syscall ABI.

-- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."

John Paul Adrian Glaubitz

7:39 a.m.

On Wed, 2025-09-10 at 08:52 +0200, Andreas Schwab wrote:

...

On Sep 10 2025, Finn Thain wrote:

...
Linux is probably the only non-trivial program that could be feasibly rebuilt with -malign-int without ill effect (i.e. without breaking userland)

No, you can't. It would change the layout of basic user-level structures, breaking the syscall ABI.

Not if you rebuild the whole userspace as well.

FWIW, the Gentoo people already created a chroot with 32-bit alignmment:

https://dev.gentoo.org/~dilfridge/m68k/

It works with qemu-user. I haven't tried it on qemu-system with a 32-bit- aligned kernel yet.

Adrian

-- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

Geert Uytterhoeven

7:45 a.m.

Hi Adrian,

On Wed, 10 Sept 2025 at 09:39, John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de wrote:

...

On Wed, 2025-09-10 at 08:52 +0200, Andreas Schwab wrote:

...
On Sep 10 2025, Finn Thain wrote:

...
Linux is probably the only non-trivial program that could be feasibly rebuilt with -malign-int without ill effect (i.e. without breaking userland)

No, you can't. It would change the layout of basic user-level structures, breaking the syscall ABI.

Not if you rebuild the whole userspace as well.

Linux does not break the userspace ABI.

...

FWIW, the Gentoo people already created a chroot with 32-bit alignmment:

That would be a different Linux architecture (m68k-32?).

Gr{oetje,eeting}s,

Geert

Finn Thain

8:02 a.m.

On Wed, 10 Sep 2025, Andreas Schwab wrote:

...

On Sep 10 2025, Finn Thain wrote:

...
Linux is probably the only non-trivial program that could be feasibly rebuilt with -malign-int without ill effect (i.e. without breaking userland)

No, you can't. It would change the layout of basic user-level structures, breaking the syscall ABI.

So you'd have to patch the uapi headers at the same time. I think that's "feasible", no?

Andreas Schwab

11:26 a.m.

On Sep 10 2025, Finn Thain wrote:

...

So you'd have to patch the uapi headers at the same time. I think that's "feasible", no?

I would surely be a big task.

-- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."

Geert Uytterhoeven

7:36 a.m.

On Wed, 10 Sept 2025 at 02:07, Finn Thain fthain@linux-m68k.org wrote:

...

On Tue, 9 Sep 2025, Kent Overstreet wrote:

...
On Tue, Sep 09, 2025 at 10:52:43PM +0800, Lance Yang wrote:

...
From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

Isn't m68k the only architecture that's weird like this?

No. Historically, Linux/CRIS did not naturally align integer types either. AFAIK, there's no standard that demands natural alignment of integer types. Linux ABIs differ significantly.

For example, Linux/i386 does not naturally align long longs. Therefore, x86 may be expected to become the next m68k (or CRIS) unless such assumptions are avoided and alignment requirements are made explicit.

The real problem here is the algorithm. Some under-resourced distros choose to blame the ABI instead of the algorithm, because in doing so, they are freed from having to work to improve upstream code bases.

IMHO, good C doesn't make alignment assumptions, because that hinders source code portability and reuse, as well as algorithm extensibility. We've seen it before. The issue here [1] is no different from the pointer abuse which we fixed in Cpython [2].

Linux is probably the only non-trivial program that could be feasibly rebuilt with -malign-int without ill effect (i.e. without breaking userland) but that sort of workaround would not address the root cause (i.e. algorithms with bad assumptions).

The first step to preserve compatibility with userland would be to properly annotate the few uapi definitions that would change with -malign-int otherwise. I am still waiting for these patches...

Gr{oetje,eeting}s,

Geert

Kent Overstreet

11:57 a.m.

On Wed, Sep 10, 2025 at 09:36:34AM +0200, Geert Uytterhoeven wrote:

...

On Wed, 10 Sept 2025 at 02:07, Finn Thain fthain@linux-m68k.org wrote:

...
On Tue, 9 Sep 2025, Kent Overstreet wrote:

...
On Tue, Sep 09, 2025 at 10:52:43PM +0800, Lance Yang wrote:

...
From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

Isn't m68k the only architecture that's weird like this?

No. Historically, Linux/CRIS did not naturally align integer types either. AFAIK, there's no standard that demands natural alignment of integer types. Linux ABIs differ significantly.

For example, Linux/i386 does not naturally align long longs. Therefore, x86 may be expected to become the next m68k (or CRIS) unless such assumptions are avoided and alignment requirements are made explicit.

The real problem here is the algorithm. Some under-resourced distros choose to blame the ABI instead of the algorithm, because in doing so, they are freed from having to work to improve upstream code bases.

IMHO, good C doesn't make alignment assumptions, because that hinders source code portability and reuse, as well as algorithm extensibility. We've seen it before. The issue here [1] is no different from the pointer abuse which we fixed in Cpython [2].

Linux is probably the only non-trivial program that could be feasibly rebuilt with -malign-int without ill effect (i.e. without breaking userland) but that sort of workaround would not address the root cause (i.e. algorithms with bad assumptions).

The first step to preserve compatibility with userland would be to properly annotate the few uapi definitions that would change with -malign-int otherwise. I am still waiting for these patches...

I think it'd need a new gcc attribute to do it sanely...

Andrew Morton

7 Oct 7 Oct

8:56 p.m.

Getting back to the $Subject at hand, are people OK with proceeding with Lance's original fix?

From: Lance Yang lance.yang@linux.dev Subject: hung_task: fix warnings caused by unaligned lock pointers Date: Tue, 9 Sep 2025 22:52:43 +0800

From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

To fix this, the runtime checks are adjusted to silently ignore any lock that is not 4-byte aligned, effectively disabling the feature in such cases and avoiding the related warnings.

Thanks to Geert Uytterhoeven for bisecting!

Link: https://lkml.kernel.org/r/20250909145243.17119-1-lance.yang@linux.dev Fixes: e711faaafbe5 ("hung_task: replace blocker_mutex with encoded blocker") Signed-off-by: Lance Yang lance.yang@linux.dev Reported-by: Eero Tamminen oak@helsinkinet.fi Closes: https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc... Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Cc: John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de Cc: Anna Schumaker anna.schumaker@oracle.com Cc: Boqun Feng boqun.feng@gmail.com Cc: Finn Thain fthain@linux-m68k.org Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Ingo Molnar mingo@redhat.com Cc: Joel Granados joel.granados@kernel.org Cc: John Stultz jstultz@google.com Cc: Kent Overstreet kent.overstreet@linux.dev Cc: Lance Yang lance.yang@linux.dev Cc: Mingzhe Yang mingzhe.yang@ly.com Cc: Peter Zijlstra peterz@infradead.org Cc: Sergey Senozhatsky senozhatsky@chromium.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Tomasz Figa tfiga@chromium.org Cc: Waiman Long longman@redhat.com Cc: Will Deacon will@kernel.org Cc: Yongliang Gao leonylgao@tencent.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---

include/linux/hung_task.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

--- a/include/linux/hung_task.h~hung_task-fix-warnings-caused-by-unaligned-lock-pointers +++ a/include/linux/hung_task.h @@ -20,6 +20,10 @@ * always zero. So we can use these bits to encode the specific blocking * type. * + * Note that on architectures where this is not guaranteed, or for any + * unaligned lock, this tracking mechanism is silently skipped for that + * lock. + * * Type encoding: * 00 - Blocked on mutex (BLOCKER_TYPE_MUTEX) * 01 - Blocked on semaphore (BLOCKER_TYPE_SEM) @@ -45,7 +49,7 @@ static inline void hung_task_set_blocker * If the lock pointer matches the BLOCKER_TYPE_MASK, return * without writing anything. */ - if (WARN_ON_ONCE(lock_ptr & BLOCKER_TYPE_MASK)) + if (lock_ptr & BLOCKER_TYPE_MASK) return;

WRITE_ONCE(current->blocker, lock_ptr | type); @@ -53,8 +57,6 @@ static inline void hung_task_set_blocker

static inline void hung_task_clear_blocker(void) { - WARN_ON_ONCE(!READ_ONCE(current->blocker)); - WRITE_ONCE(current->blocker, 0UL); }

Finn Thain

8 Oct 8 Oct

12:40 a.m.

On Tue, 7 Oct 2025, Andrew Morton wrote:

...

Getting back to the $Subject at hand, are people OK with proceeding with Lance's original fix?

Lance's patch is probably more appropriate for -stable than the patch I proposed -- assuming a fix is needed for -stable.

Besides those two alternatives, there is also a workaround: $ ./scripts/config -d DETECT_HUNG_TASK_BLOCKER which may be acceptable to the interested parties (i.e. m68k users).

I don't have a preference. I'll leave it up to the bug reporters (Eero and Geert).

Lance Yang

3:03 a.m.

On 2025/10/8 08:40, Finn Thain wrote:

...

On Tue, 7 Oct 2025, Andrew Morton wrote:

...
Getting back to the $Subject at hand, are people OK with proceeding with Lance's original fix?

Lance's patch is probably more appropriate for -stable than the patch I proposed -- assuming a fix is needed for -stable.

Thanks!

Apart from that, I believe this fix is still needed for the hung task detector itself, to prevent unnecessary warnings in a few unexpected cases.

...

Besides those two alternatives, there is also a workaround: $ ./scripts/config -d DETECT_HUNG_TASK_BLOCKER which may be acceptable to the interested parties (i.e. m68k users).

I don't have a preference. I'll leave it up to the bug reporters (Eero and Geert).

Finn Thain

6:14 a.m.

On Wed, 8 Oct 2025, Lance Yang wrote:

...

On 2025/10/8 08:40, Finn Thain wrote:

...
On Tue, 7 Oct 2025, Andrew Morton wrote:

...
Getting back to the $Subject at hand, are people OK with proceeding with Lance's original fix?

Lance's patch is probably more appropriate for -stable than the patch I proposed -- assuming a fix is needed for -stable.

Thanks!

Apart from that, I believe this fix is still needed for the hung task detector itself, to prevent unnecessary warnings in a few unexpected cases.

Can you be more specific about those cases? A fix for a theoretical bug doesn't qualify for -stable branches. But if it's a fix for a real bug, I have misunderstood Andrew's question...

...

...
Besides those two alternatives, there is also a workaround: $ ./scripts/config -d DETECT_HUNG_TASK_BLOCKER which may be acceptable to the interested parties (i.e. m68k users).

I don't have a preference. I'll leave it up to the bug reporters (Eero and Geert).

Lance Yang

7:09 a.m.

On 2025/10/8 14:14, Finn Thain wrote:

...

On Wed, 8 Oct 2025, Lance Yang wrote:

...
On 2025/10/8 08:40, Finn Thain wrote:

...
On Tue, 7 Oct 2025, Andrew Morton wrote:

...
Getting back to the $Subject at hand, are people OK with proceeding with Lance's original fix?

Lance's patch is probably more appropriate for -stable than the patch I proposed -- assuming a fix is needed for -stable.

Thanks!

Apart from that, I believe this fix is still needed for the hung task detector itself, to prevent unnecessary warnings in a few unexpected cases.

Can you be more specific about those cases? A fix for a theoretical bug doesn't qualify for -stable branches. But if it's a fix for a real bug, I have misunderstood Andrew's question...

I believe it is a real bug, as it was reported by Eero and Geert[1].

The blocker tracking mechanism in -stable assumes that lock pointers are at least 4-byte aligned. As I mentioned previously[2], this assumption fails for packed structs on architectures that don't trap on unaligned access.

Of course, we could always improve the mechanism to not make assumptions. But for -stable, this fix completely resolves the issue by ignoring any unaligned pointer, whatever the cause (e.g., packed structs, non-native alignment, etc.).

So we can all sleep well at night again :)

[1] https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc... [2] https://lore.kernel.org/lkml/cfb62b9d-9cbd-47dd-a894-3357027e2a50@linux.dev/

...

...
...
Besides those two alternatives, there is also a workaround: $ ./scripts/config -d DETECT_HUNG_TASK_BLOCKER which may be acceptable to the interested parties (i.e. m68k users).

I don't have a preference. I'll leave it up to the bug reporters (Eero and Geert).

Lance Yang

7:23 a.m.

On 2025/10/8 15:09, Lance Yang wrote:

...

On 2025/10/8 14:14, Finn Thain wrote:

...
On Wed, 8 Oct 2025, Lance Yang wrote:

...
On 2025/10/8 08:40, Finn Thain wrote:

...
On Tue, 7 Oct 2025, Andrew Morton wrote:

...
Getting back to the $Subject at hand, are people OK with proceeding with Lance's original fix?

Lance's patch is probably more appropriate for -stable than the patch I proposed -- assuming a fix is needed for -stable.

Thanks!

Apart from that, I believe this fix is still needed for the hung task detector itself, to prevent unnecessary warnings in a few unexpected cases.

Can you be more specific about those cases? A fix for a theoretical bug doesn't qualify for -stable branches. But if it's a fix for a real bug, I have misunderstood Andrew's question...

I believe it is a real bug, as it was reported by Eero and Geert[1].

The blocker tracking mechanism in -stable assumes that lock pointers are at least 4-byte aligned. As I mentioned previously[2], this assumption fails for packed structs on architectures that don't trap on unaligned access.

Of course, we could always improve the mechanism to not make assumptions. But for -stable, this fix completely resolves the issue by ignoring any unaligned pointer, whatever the cause (e.g., packed structs, non-native alignment, etc.).

So we can all sleep well at night again :)

[1] https://lore.kernel.org/lkml/ CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc1_0g@mail.gmail.com/ [2] https://lore.kernel.org/lkml/cfb62b9d-9cbd-47dd- a894-3357027e2a50@linux.dev/

Forgot to add:

In other words, we are not just fixing the bug reported by Eero and Geert, but correcting the blocker tracking mechanism's flawed assumption for -stable ;)

If you feel this doesn't qualify as a fix, I can change the Fixes: tag to point to the original commit that introduced this flawed mechanism instead.

...

...
...
...
Besides those two alternatives, there is also a workaround: $ ./scripts/config -d DETECT_HUNG_TASK_BLOCKER which may be acceptable to the interested parties (i.e. m68k users).

I don't have a preference. I'll leave it up to the bug reporters (Eero and Geert).

Finn Thain

10:12 a.m.

On Wed, 8 Oct 2025, Lance Yang wrote:

...

In other words, we are not just fixing the bug reported by Eero and Geert, but correcting the blocker tracking mechanism's flawed assumption for -stable ;)

If you feel this doesn't qualify as a fix, I can change the Fixes: tag to point to the original commit that introduced this flawed mechanism instead.

That's really a question for the bug reporters. I don't personally have a problem with CONFIG_DETECT_HUNG_TASK_BLOCKER so I can't say whether the fix meets the requirements set in Documentation/process/stable-kernel-rules.rst. And I still don't know what's meant by "unnecessary warnings in a few unexpected cases".

Lance Yang

1:48 p.m.

On 2025/10/8 18:12, Finn Thain wrote:

...

On Wed, 8 Oct 2025, Lance Yang wrote:

...
In other words, we are not just fixing the bug reported by Eero and Geert, but correcting the blocker tracking mechanism's flawed assumption for -stable ;)

If you feel this doesn't qualify as a fix, I can change the Fixes: tag to point to the original commit that introduced this flawed mechanism instead.

That's really a question for the bug reporters. I don't personally have a problem with CONFIG_DETECT_HUNG_TASK_BLOCKER so I can't say whether the fix meets the requirements set in Documentation/process/stable-kernel-rules.rst. And I still don't know

I'm a bit confused, as I recall you previously stating that "It's wrong and should be fixed"[1].

To clarify, is your current position that it should be fixed in general, but the fix should not be backported to -stable?

If so, then I have nothing further to add to this thread and am happy to let the maintainer @Andrew decide.

...

what's meant by "unnecessary warnings in a few unexpected cases".

The blocker tracking mechanism will trigger a warning when it encounters any unaligned lock pointer (e.g., from a packed struct). I don't think that is the expected behavior. Instead, it should simply skip any unaligned pointer it cannot handle. For the stable kernels, at least, this is the correct behavior.

[1] https://lore.kernel.org/lkml/6ec95c3f-365b-e352-301b-94ab3d8af73c@linux-m68k...

Finn Thain

9:55 p.m.

On Wed, 8 Oct 2025, Lance Yang wrote:

...

On 2025/10/8 18:12, Finn Thain wrote:

...
On Wed, 8 Oct 2025, Lance Yang wrote:

...
In other words, we are not just fixing the bug reported by Eero and Geert, but correcting the blocker tracking mechanism's flawed assumption for -stable ;)

If you feel this doesn't qualify as a fix, I can change the Fixes: tag to point to the original commit that introduced this flawed mechanism instead.

That's really a question for the bug reporters. I don't personally have a problem with CONFIG_DETECT_HUNG_TASK_BLOCKER so I can't say whether the fix meets the requirements set in Documentation/process/stable-kernel-rules.rst. And I still don't know

I'm a bit confused, as I recall you previously stating that "It's wrong and should be fixed"[1].

You took that quote out of context. Please go and read it again.

...

To clarify, is your current position that it should be fixed in general, but the fix should not be backported to -stable?

To clarify, what do you mean by "it"? Is it the commentary discussed in [1]? The misalignment of atomics? The misalignment of locks? The alignment assumptions in your code? The WARN reported by Eero and Geert?

...

If so, then I have nothing further to add to this thread and am happy to let the maintainer @Andrew decide.

...
what's meant by "unnecessary warnings in a few unexpected cases".

The blocker tracking mechanism will trigger a warning when it encounters any unaligned lock pointer (e.g., from a packed struct). I don't think that is the expected behavior.

Sure, no-one was expecting false positives.

I think you are conflating "misaligned" with "not 4-byte aligned". Your algorithm does not strictly require natural alignment, it requires 4-byte alignment of locks.

Regarding your concern about packed structs, please re-read this message: https://lore.kernel.org/all/CAMuHMdV-AtPm-W-QUC1HixJ8Koy_HdESwCCOhRs3Q26=wjW...

AFAIK the problem with your code is nothing more than the usual difficulty encountered when porting between architectures that have different alignment rules for scalar variables.

Therefore, my question about the theoretical nature of the problem comes down to this.

Is the m68k architecture the only one producing actual false positives?

Do you know of actual instances of locks in packed structs?

...

Instead, it should simply skip any unaligned pointer it cannot handle. For the stable kernels, at least, this is the correct behavior.

Why? Are users of the stable branch actually affected?

...

[1] https://lore.kernel.org/lkml/6ec95c3f-365b-e352-301b-94ab3d8af73c@linux-m68k...

Lance Yang

9 Oct 9 Oct

2:01 a.m.

@Andrew, what's your call on this?

I think we fundamentally disagree on whether this fix for known false-positive warnings is needed for -stable.

Rather than continuing this thread, let's just ask the maintainer.

Thanks, Lance

On 2025/10/9 05:55, Finn Thain wrote:

...

On Wed, 8 Oct 2025, Lance Yang wrote:

...
On 2025/10/8 18:12, Finn Thain wrote:

...
On Wed, 8 Oct 2025, Lance Yang wrote:

...
In other words, we are not just fixing the bug reported by Eero and Geert, but correcting the blocker tracking mechanism's flawed assumption for -stable ;)

If you feel this doesn't qualify as a fix, I can change the Fixes: tag to point to the original commit that introduced this flawed mechanism instead.

That's really a question for the bug reporters. I don't personally have a problem with CONFIG_DETECT_HUNG_TASK_BLOCKER so I can't say whether the fix meets the requirements set in Documentation/process/stable-kernel-rules.rst. And I still don't know

I'm a bit confused, as I recall you previously stating that "It's wrong and should be fixed"[1].

You took that quote out of context. Please go and read it again.

...
To clarify, is your current position that it should be fixed in general, but the fix should not be backported to -stable?

To clarify, what do you mean by "it"? Is it the commentary discussed in [1]? The misalignment of atomics? The misalignment of locks? The alignment assumptions in your code? The WARN reported by Eero and Geert?

...
If so, then I have nothing further to add to this thread and am happy to let the maintainer @Andrew decide.

...
what's meant by "unnecessary warnings in a few unexpected cases".

The blocker tracking mechanism will trigger a warning when it encounters any unaligned lock pointer (e.g., from a packed struct). I don't think that is the expected behavior.

Sure, no-one was expecting false positives.

I think you are conflating "misaligned" with "not 4-byte aligned". Your algorithm does not strictly require natural alignment, it requires 4-byte alignment of locks.

Regarding your concern about packed structs, please re-read this message: https://lore.kernel.org/all/CAMuHMdV-AtPm-W-QUC1HixJ8Koy_HdESwCCOhRs3Q26=wjW...

AFAIK the problem with your code is nothing more than the usual difficulty encountered when porting between architectures that have different alignment rules for scalar variables.

Therefore, my question about the theoretical nature of the problem comes down to this.

Is the m68k architecture the only one producing actual false positives?

Do you know of actual instances of locks in packed structs?

...
Instead, it should simply skip any unaligned pointer it cannot handle. For the stable kernels, at least, this is the correct behavior.

Why? Are users of the stable branch actually affected?

...
[1] https://lore.kernel.org/lkml/6ec95c3f-365b-e352-301b-94ab3d8af73c@linux-m68k...

Andrew Morton

4:04 a.m.

On Thu, 9 Oct 2025 10:01:18 +0800 Lance Yang lance.yang@linux.dev wrote:

...

I think we fundamentally disagree on whether this fix for known false-positive warnings is needed for -stable.

Having the kernel send scary warnings to our users is really bad behavior. And if we don't fix it, people will keep reporting it.

And removing a WARN_ON is a perfectly good way of fixing it. The kernel has 19,000 WARNs, probably seven of which are useful :(

From: Lance Yang lance.yang@linux.dev Subject: hung_task: fix warnings caused by unaligned lock pointers Date: Tue, 9 Sep 2025 22:52:43 +0800

From: Lance Yang lance.yang@linux.dev

The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding.

However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger.

To fix this, the runtime checks are adjusted to silently ignore any lock that is not 4-byte aligned, effectively disabling the feature in such cases and avoiding the related warnings.

Thanks to Geert Uytterhoeven for bisecting!

include/linux/hung_task.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

WRITE_ONCE(current->blocker, lock_ptr | type); @@ -53,8 +57,6 @@ static inline void hung_task_set_blocker

static inline void hung_task_clear_blocker(void) { - WARN_ON_ONCE(!READ_ONCE(current->blocker)); - WRITE_ONCE(current->blocker, 0UL); }

Geert Uytterhoeven

7:11 a.m.

Hi Andrew,

On Thu, 9 Oct 2025 at 06:04, Andrew Morton akpm@linux-foundation.org wrote:

...

On Thu, 9 Oct 2025 10:01:18 +0800 Lance Yang lance.yang@linux.dev wrote:

...
I think we fundamentally disagree on whether this fix for known false-positive warnings is needed for -stable.

Having the kernel send scary warnings to our users is really bad behavior. And if we don't fix it, people will keep reporting it.

As the issue is present in v6.16 and v6.17, I think that warrants -stable.

...

And removing a WARN_ON is a perfectly good way of fixing it. The kernel has 19,000 WARNs, probably seven of which are useful :(

Right. And there is panic_on_warn...

Gr{oetje,eeting}s,

Geert

-- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds

David Laight

14 Oct 14 Oct

10:11 a.m.

On Thu, 9 Oct 2025 09:11:06 +0200 Geert Uytterhoeven geert@linux-m68k.org wrote:

...

Hi Andrew,

On Thu, 9 Oct 2025 at 06:04, Andrew Morton akpm@linux-foundation.org wrote:

...
On Thu, 9 Oct 2025 10:01:18 +0800 Lance Yang lance.yang@linux.dev wrote:

...
I think we fundamentally disagree on whether this fix for known false-positive warnings is needed for -stable.

Having the kernel send scary warnings to our users is really bad behavior. And if we don't fix it, people will keep reporting it.

As the issue is present in v6.16 and v6.17, I think that warrants -stable.

...
And removing a WARN_ON is a perfectly good way of fixing it. The kernel has 19,000 WARNs, probably seven of which are useful :(

Right. And there is panic_on_warn...

Which, like panic_on_oops, panics before syslogd has a chance to write the error message to /var/log/kernel. Both are set in some environments.

Tracking down those crashes is a right PITA.

David

...

Gr{oetje,eeting}s,
                    Geert
-- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds

Eero Tamminen

8 Oct 8 Oct

12:56 p.m.

Hi,

On 10/8/25 03:40, Finn Thain wrote:

...

On Tue, 7 Oct 2025, Andrew Morton wrote:

...
Getting back to the $Subject at hand, are people OK with proceeding with Lance's original fix?

Lance's patch is probably more appropriate for -stable than the patch I proposed -- assuming a fix is needed for -stable.

Besides those two alternatives, there is also a workaround: $ ./scripts/config -d DETECT_HUNG_TASK_BLOCKER which may be acceptable to the interested parties (i.e. m68k users).

I don't have a preference. I'll leave it up to the bug reporters (Eero and Geert).

It's good for me.

- Eero

days inactive

days old

linux-stable-mirror@lists.linaro.org

32 comments

participants

tags (0)

participants (9)

Andreas Schwab
Andrew Morton
David Laight
Eero Tamminen
Finn Thain
Geert Uytterhoeven
John Paul Adrian Glaubitz
Kent Overstreet
Lance Yang