Hello, colleagues.
Commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a "bounds: support non-power-of-two CONFIG_NR_CPUS" (https://github.com/torvalds/linux/commit/f2d5dcb48f7ba9e3ff249d58fc1fa963d37...) was backported to 6.1.x-stable (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...), but causes a serious regression on quite a lot of hardware with AMD GPUs, kernel panics.
It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest 6.1.88 still has this problem.
The problem is described here: https://gitlab.freedesktop.org/drm/amd/-/issues/3347
On Sun, Apr 28, 2024 at 05:58:08PM +0300, Mikhail Novosyolov wrote:
Hello, colleagues.
Commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a "bounds: support non-power-of-two CONFIG_NR_CPUS" (https://github.com/torvalds/linux/commit/f2d5dcb48f7ba9e3ff249d58fc1fa963d37...) was backported to 6.1.x-stable (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...), but causes a serious regression on quite a lot of hardware with AMD GPUs, kernel panics.
It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest 6.1.88 still has this problem.
Does v6.8.3 (which contains cf778fff03be) have this problem? How about current Linus master?
What kernel config were you using? I don't see that info on https://linux-hardware.org/?probe=9c92ac1222 (maybe my tired eyes can't see it)
(Resending in plain text, sorry for accodently sending in HTML)
----- Исходное сообщение -----
От: "Matthew Wilcox" willy@infradead.org Кому: "Михаил Новоселов" m.novosyolov@rosalinux.ru Копия: riel@surriel.com, mgorman@techsingularity.net, peterz@infradead.org, mingo@kernel.org, akpm@linux-foundation.org, stable@vger.kernel.org, sashal@kernel.org, "Бетхер Александр" a.betkher@rosalinux.ru, "i gaptrakhmanov" i.gaptrakhmanov@rosalinux.ru Отправленные: Понедельник, 29 Апрель 2024 г 5:56:29 Тема: Re: Serious regression on 6.1.x-stable caused by "bounds: support non-power-of-two CONFIG_NR_CPUS"
On Sun, Apr 28, 2024 at 05:58:08PM +0300, Mikhail Novosyolov wrote:
Hello, colleagues.
Commit f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a "bounds: support non-power-of-two CONFIG_NR_CPUS" (https://github.com/torvalds/linux/commit/f2d5dcb48f7ba9e3ff249d58fc1fa963d37...) was backported to 6.1.x-stable (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...), but causes a serious regression on quite a lot of hardware with AMD GPUs, kernel panics.
It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest 6.1.88 still has this problem.
Does v6.8.3 (which contains cf778fff03be) have this problem? How about current Linus master?
6.1.88 - has problem 6.6.27 - does not have problem 6.9-rc from commit efdfbbc4dcc8f98754056971f88af0f7ff906144 https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git - does not have problem
6.8.3 was not tested, but we can test it if needed.
What kernel config were you using? I don't see that info on https://linux-hardware.org/?probe=9c92ac1222 (maybe my tired eyes can't see it)
Kernel config for 6.1: https://abf.io/import/kernel-6.1/blob/bcb3e9611f/kernel-x86_64.config For 6.6: https://abf.io/import/kernel-6.6/blob/7404a4d3d5/kernel-x86_64.config 6.9-rc was built with copypastied config from 6.6 (https://abf.io/build_lists/5028240)
On Mon, Apr 29, 2024 at 07:07:39AM +0300, Михаил Новоселов wrote:
It was backported to 6.1.84, 6.1.84 has problems, 6/1/83 does not, the newest 6.1.88 still has this problem.
Does v6.8.3 (which contains cf778fff03be) have this problem? How about current Linus master?
6.1.88 - has problem 6.6.27 - does not have problem 6.9-rc from commit efdfbbc4dcc8f98754056971f88af0f7ff906144 https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git - does not have problem
6.8.3 was not tested, but we can test it if needed.
How curious.
What kernel config were you using? I don't see that info on https://linux-hardware.org/?probe=9c92ac1222 (maybe my tired eyes can't see it)
Kernel config for 6.1: https://abf.io/import/kernel-6.1/blob/bcb3e9611f/kernel-x86_64.config
CONFIG_NR_CPUS=8192
For 6.6: https://abf.io/import/kernel-6.6/blob/7404a4d3d5/kernel-x86_64.config
CONFIG_NR_CPUS=8192
Since you're using a power-of-two, this should have been a no-op. But bits_per() doesn't work the way I thought it did!
#define bits_per(n) \ ( \ __builtin_constant_p(n) ? ( \ ((n) == 0 || (n) == 1) \ ? 1 : ilog2(n) + 1 \ ) : \
CONFIG_NR_CPUS is obviously a constant, and larger than 1, so we end up calling ilog2(n) + 1. So we allocate one extra bit.
I should have changed this to DEFINE(NR_CPUS_BITS, bits_per(CONFIG_NR_CPUS - 1))
Can you test that and report back? I'll prepare a fix for mainline in the meantime.
29.04.2024 15:17, Matthew Wilcox пишет:
CONFIG_NR_CPUS=8192
Since you're using a power-of-two, this should have been a no-op. But bits_per() doesn't work the way I thought it did!
#define bits_per(n) \ ( \ __builtin_constant_p(n) ? ( \ ((n) == 0 || (n) == 1) \ ? 1 : ilog2(n) + 1 \ ) : \
CONFIG_NR_CPUS is obviously a constant, and larger than 1, so we end up calling ilog2(n) + 1. So we allocate one extra bit.
I should have changed this to DEFINE(NR_CPUS_BITS, bits_per(CONFIG_NR_CPUS - 1))
Can you test that and report back? I'll prepare a fix for mainline in the meantime.
Yes, this fix solved the problem
linux-stable-mirror@lists.linaro.org