Hello,
After upgrading to 6.16.9 this morning, my laptop can't boot. I cannot get any logs because the kernel seems to freeze very early, even before I'm asked for the full disk encryption passphrase.
This is a regression from 6.16.8 to 6.16.9.
I did a git bisect in the stable/linux and this is the commit causing the issue for me:
97207a4fed5348ff5c5e71a7300db9b638640879 is the first bad commit commit 97207a4fed5348ff5c5e71a7300db9b638640879 (HEAD) Author: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Date: Wed Jun 25 13:54:06 2025 -0700
drm/xe/guc: Enable extended CAT error reporting
[ Upstream commit a7ffcea8631af91479cab10aa7fbfd0722f01d9a ]
https://lore.kernel.org/all/20250625205405.1653212-3-daniele.ceraolospurio@i...
How to reproduce:
1. Upgrade to 6.16.9 2. Enable the Xe driver by passing i915.force_probe=!7d55 xe.force_probe=7d55 3. Reboot
Best regards, Iyán
Hi! Thx for your report.
Hello,
After upgrading to 6.16.9 this morning, my laptop can't boot. I cannot get any logs because the kernel seems to freeze very early, even before I'm asked for the full disk encryption passphrase.
This is a regression from 6.16.8 to 6.16.9.
Does 6.17-rc7 work for you? We need to know if this needs to be fixed in just the stable tree or if it is something that needs to be addressed in mainline as well.
I did a git bisect in the stable/linux and this is the commit causing the issue for me:
97207a4fed5348ff5c5e71a7300db9b638640879 is the first bad commit commit 97207a4fed5348ff5c5e71a7300db9b638640879 (HEAD) Author: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Date: Wed Jun 25 13:54:06 2025 -0700
drm/xe/guc: Enable extended CAT error reporting
[ Upstream commit a7ffcea8631af91479cab10aa7fbfd0722f01d9a ]
https://lore.kernel.org/all/20250625205405.1653212-3- daniele.ceraolospurio@intel.com/
How to reproduce:
- Upgrade to 6.16.9
- Enable the Xe driver by passing i915.force_probe=!7d55
xe.force_probe=7d55
Just wondering: why are those parameters needed? Is the hardware not fully supported be the xe driver yet?
Ciao, Thorsten
Hello,
On 27/09/2025 14:52, Thorsten Leemhuis wrote:
Does 6.17-rc7 work for you? We need to know if this needs to be fixed in just the stable tree or if it is something that needs to be addressed in mainline as well.
6.17-rc7 boots fine, so the issue is just in the stable tree.
Just wondering: why are those parameters needed? Is the hardware not fully supported be the xe driver yet?
i915 is still the default driver for Meteor Lake integrated GPUs, so that's why I need to pass those parameters. Lunar Lake already uses Xe by default, though. In my experience, for at least the last two kernel release cycles, I've observed better battery life and performance using the Xe driver.
I do not have concrete numbers about power usage and battery life, just my (subjective) feeling after using the laptop for a week with i915 and another week with Xe. About performance, Michael from Phoronix recently shared some benchmarks with a Core Ultra 7 155H (same CPU as the Thinkpad X1 Carbon Gen 12):
https://www.phoronix.com/review/intel-mtl-i915-xe-linux
Ciao, Thorsten
Best, Iyán
On 27.09.25 16:19, Iyán Méndez Veiga wrote:
On 27/09/2025 14:52, Thorsten Leemhuis wrote:
Does 6.17-rc7 work for you? We need to know if this needs to be fixed in just the stable tree or if it is something that needs to be addressed in mainline as well.
6.17-rc7 boots fine, so the issue is just in the stable tree.
Thx. Could you also try if reverting the patch from 6.16.y helps? Note, you might need to revert "drm/xe/guc: Set RCS/CCS yield policy" as well, which apparently depends on the patch that causes your problems.
Just wondering: why are those parameters needed? Is the hardware not fully supported be the xe driver yet?
i915 is still the default driver for Meteor Lake integrated GPUs, so that's why I need to pass those parameters. Lunar Lake already uses Xe by default, though. In my experience, for at least the last two kernel release cycles, I've observed better battery life and performance using the Xe driver.
Thx for letting us know!
Cia, Thorsten
On 27/09/2025 16:31, Thorsten Leemhuis wrote:
Thx. Could you also try if reverting the patch from 6.16.y helps? Note, you might need to revert "drm/xe/guc: Set RCS/CCS yield policy" as well, which apparently depends on the patch that causes your problems.
Yes, reverting both dd1a415dcfd5 "drm/xe/guc: Set RCS/CCS yield policy" and 97207a4fed53 "drm/xe/guc: Enable extended CAT error reporting" from 6.16.y fixes the issue for me.
Best, Iyán
On Sun, Sep 28, 2025 at 01:16:34PM +0200, Iyán Méndez Veiga wrote:
On 27/09/2025 16:31, Thorsten Leemhuis wrote:
Thx. Could you also try if reverting the patch from 6.16.y helps? Note, you might need to revert "drm/xe/guc: Set RCS/CCS yield policy" as well, which apparently depends on the patch that causes your problems.
Yes, reverting both dd1a415dcfd5 "drm/xe/guc: Set RCS/CCS yield policy" and 97207a4fed53 "drm/xe/guc: Enable extended CAT error reporting" from 6.16.y fixes the issue for me.
Thanks for the report and investigation!
I'll revert these two.
On 9/28/2025 8:40 AM, Sasha Levin wrote:
On Sun, Sep 28, 2025 at 01:16:34PM +0200, Iyán Méndez Veiga wrote:
On 27/09/2025 16:31, Thorsten Leemhuis wrote:
Thx. Could you also try if reverting the patch from 6.16.y helps? Note, you might need to revert "drm/xe/guc: Set RCS/CCS yield policy" as well, which apparently depends on the patch that causes your problems.
Yes, reverting both dd1a415dcfd5 "drm/xe/guc: Set RCS/CCS yield policy" and 97207a4fed53 "drm/xe/guc: Enable extended CAT error reporting" from 6.16.y fixes the issue for me.
Thanks for the report and investigation!
I'll revert these two.
Hi,
Thanks for the bisect and the quick turnaround on this (and sorry for not replying earlier, I just came back from vacation :) ). Just wanted to add a quick comment as the author of both patches. I have no idea why these patches would cause issues on 6.16 but not on 6.17, nothing significant should be different between the two releases in the impacted area. However, no one has actually ever reported hitting the starvation issue mitigated by the RCS/CCS patch (which has been there since 6.13), likely because it can only be reproduced if the GPU is heavily overloaded by multiple apps; therefore, given that 6.16 is not an LTS, I'm not going to attempt to reproduce and debug this and re-send the patches for that kernel version. Please let me know if there are any concerns with this approach or if the issue pops up on 6.17.
Thanks Daniele
linux-stable-mirror@lists.linaro.org