#regzbot introduced: v6.12.34..v6.12.35
After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
- the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s - the passthrough GPU has lost it's 'BusMaster' and MSI enable flags - latency measurement feature appeared
These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
[ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 ... [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 [ 1.964641] nouveau 0000:01:00.0: init failed with -5 [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
#regzbot introduced: v6.12.34..v6.12.35
After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
- the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s
- the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
- latency measurement feature appeared
These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
[ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 ... [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 [ 1.964641] nouveau 0000:01:00.0: init failed with -5 [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
Can you use git bisect to find the offending commit?
Hi,
On 07/08/25 21:22, Greg KH wrote:
On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
#regzbot introduced: v6.12.34..v6.12.35
After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
- the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s
- the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
- latency measurement feature appeared
These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
[ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 ... [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 [ 1.964641] nouveau 0000:01:00.0: init failed with -5 [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
Can you use git bisect to find the offending commit?>
Additional notes: I looked at the log and am listing probably relevant commit, if bisection is too costly:
68e58f579121 PCI: dwc: ep: Correct PBA offset in .set_msix() callback 523815857b1e PCI: cadence-ep: Correct PBA offset in .set_msix() callback
These two might be interesting ones to consider. Please ignore this note if bisection is already in progress as these are pure guesses.
Thanks, Harshit
I will perform bisection, yes.
On 8/7/25 3:52 PM, Greg KH wrote:
On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
#regzbot introduced: v6.12.34..v6.12.35
After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
- the reported link speed for the passthrough GPU has changed from 2.5 to 16GT/s
- the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
- latency measurement feature appeared
These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
[ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 ... [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 [ 1.964641] nouveau 0000:01:00.0: init failed with -5 [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
Can you use git bisect to find the offending commit?
fb5873b779dd5858123c19bbd6959566771e2e83 is the first bad commit commit fb5873b779dd5858123c19bbd6959566771e2e83 Author: Lu Baolu baolu.lu@linux.intel.com Date: Tue May 20 15:58:49 2025 +0800
iommu/vt-d: Restore context entry setup order for aliased devices
commit 320302baed05c6456164652541f23d2a96522c06 upstream.
Commit 2031c469f816 ("iommu/vt-d: Add support for static identity domain") changed the context entry setup during domain attachment from a set-and-check policy to a clear-and-reset approach. This inadvertently introduced a regression affecting PCI aliased devices behind PCIe-to-PCI bridges.
Specifically, keyboard and touchpad stopped working on several Apple Macbooks with below messages:
kernel: platform pxa2xx-spi.3: Adding to iommu group 20 kernel: input: Apple SPI Keyboard as /devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi-APP000D:00/input/input0 kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00 kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: DMAR: DRHD: handling fault status reg 3 kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
Fix this by restoring the previous context setup order.
Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") Closes: https://lore.kernel.org/all/4dada48a-c5dd-4c30-9c85-5b03b0aa01f0@bfh.ch/ Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Yi Liu yi.l.liu@intel.com Link: https://lore.kernel.org/r/20250514060523.2862195-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250520075849.755012-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/iommu/intel/iommu.c | 11 +++++++++++ drivers/iommu/intel/iommu.h | 1 + drivers/iommu/intel/nested.c | 4 ++-- 3 files changed, 14 insertions(+), 2 deletions(-)
On 8/8/25 4:40 AM, cat wrote:
I will perform bisection, yes.
On 8/7/25 3:52 PM, Greg KH wrote:
On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
#regzbot introduced: v6.12.34..v6.12.35
After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
- the reported link speed for the passthrough GPU has changed from
2.5 to 16GT/s
- the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
- latency measurement feature appeared
These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
[ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 ... [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 [ 1.964641] nouveau 0000:01:00.0: init failed with -5 [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
Can you use git bisect to find the offending commit?
Hi,
On 08/08/25 14:30, cat wrote:
fb5873b779dd5858123c19bbd6959566771e2e83 is the first bad commit commit fb5873b779dd5858123c19bbd6959566771e2e83 Author: Lu Baolu baolu.lu@linux.intel.com Date: Tue May 20 15:58:49 2025 +0800
iommu/vt-d: Restore context entry setup order for aliased devices
commit 320302baed05c6456164652541f23d2a96522c06 upstream.
Commit 2031c469f816 ("iommu/vt-d: Add support for static identity domain") changed the context entry setup during domain attachment from a set-and-check policy to a clear-and-reset approach. This inadvertently introduced a regression affecting PCI aliased devices behind PCIe- to-PCI bridges.
Specifically, keyboard and touchpad stopped working on several Apple Macbooks with below messages:
kernel: platform pxa2xx-spi.3: Adding to iommu group 20 kernel: input: Apple SPI Keyboard as /devices/pci0000:00/0000:00:1e.3/pxa2xx-spi.3/spi_master/spi2/spi- APP000D:00/input/input0 kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00 kernel: DMAR: DRHD: handling fault status reg 3 kernel: DMAR: [DMA Read NO_PASID] Request device [00:1e.3] fault addr 0xffffa000 [fault reason 0x06] PTE Read access is not set kernel: DMAR: DRHD: handling fault status reg 3 kernel: applespi spi-APP000D:00: Error writing to device: 01 0e 00 00
Fix this by restoring the previous context setup order.
Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") Closes: https://lore.kernel.org/all/4dada48a- c5dd-4c30-9c85-5b03b0aa01f0@bfh.ch/ Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Yi Liu yi.l.liu@intel.com Link: https://lore.kernel.org/r/20250514060523.2862195-1- baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20250520075849.755012-2- baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
drivers/iommu/intel/iommu.c | 11 +++++++++++ drivers/iommu/intel/iommu.h | 1 + drivers/iommu/intel/nested.c | 4 ++-- 3 files changed, 14 insertions(+), 2 deletions(-)
Looks like a duplicate of https://lore.kernel.org/linux-iommu/721D44AF820A4FEB+722679cb-2226-4287-8835...
And the fix for that was https://lore.kernel.org/all/468CF4B655888074+20250723120423.37924-1-bbaa@bba... which is present in 6.12.40, so maybe update to 6.12.40 and the issue will most likely be fixed.
Thanks, Harshit
On 8/8/25 4:40 AM, cat wrote:
I will perform bisection, yes.
On 8/7/25 3:52 PM, Greg KH wrote:
On Thu, Aug 07, 2025 at 03:31:17PM +0000, cat wrote:
#regzbot introduced: v6.12.34..v6.12.35
After upgrade to kernel 6.12.35, vfio passthrough for my GPU has stopped working within a windows VM, it sees device in device manager but reports that it did not start correctly. I compared lspci logs in the vm before and after upgrade to 6.12.35, and here are the changes I noticed:
- the reported link speed for the passthrough GPU has changed from
2.5 to 16GT/s
- the passthrough GPU has lost it's 'BusMaster' and MSI enable flags
- latency measurement feature appeared
These entries also began appearing within the vm in dmesg when host kernel is 6.12.35 or above:
[ 1.963177] nouveau 0000:01:00.0: sec2(gsp): mbox 1c503000 00000001 [ 1.963296] nouveau 0000:01:00.0: sec2(gsp):booter-load: boot failed: -5 ... [ 1.964580] nouveau 0000:01:00.0: gsp: init failed, -5 [ 1.964641] nouveau 0000:01:00.0: init failed with -5 [ 1.964681] nouveau: drm:00000000:00000080: init failed with -5 [ 1.964721] nouveau 0000:01:00.0: drm: Device allocation failed: -5 [ 1.966318] nouveau 0000:01:00.0: probe with driver nouveau failed with error -5
6.12.34 worked fine, and latest 6.12 LTS does not work either. I am using intel CPU and nvidia GPU (for passthrough, and as my GPU on linux system).
Can you use git bisect to find the offending commit?
linux-stable-mirror@lists.linaro.org