On Thu, Sep 27, 2018 at 10:10:07AM +0800, Bin Meng wrote:
On Thu, Sep 27, 2018 at 12:57 AM Bjorn Helgaas helgaas@kernel.org wrote:
On Wed, Sep 26, 2018 at 08:14:01AM -0700, Bin Meng wrote:
Add more PCI IDs to the Intel GPU "spurious interrupt" quirk table, which are known to break.
Do you have a reference for this? Any public bug reports, bugzilla, Intel spec reference or errata? "Which are known to break" is pretty vague.
Sorry I used wrong words and should have been clearer. These devices are validated to be broken. The test I used is very simple, just unplug the VGA cable and plug it again, and "spurious interrupt" will be seen on the interrupt line of the IGD device. I was not aware of any public bugs filed to Intel, nor seen any errata from Intel.
The original commit, f67fd55fa96f ("PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs"), says some systems "crash" (not sure if that means an oops or an actual crash that requires a reboot) and on other systems, Linux disables the shared interrupt line. I assume disabling the interrupt line keeps devices using that line from working, but does not directly cause a crash.
What specific symptom do you see here? I think it might be useful to collect details, e.g., dmesg logs, /proc/interrupts contents, output of "sudo lspci -vv", etc., for the systems you're quirking here. I'm hoping we can eventually figure out a solution that doesn't require a quirk for every new GPU, and maybe that info will help find it.
See commit f67fd55fa96f ("PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs"), and commit 7c82126a94e6 ("PCI: Add new ID for Intel GPU "spurious interrupt" quirk") for some history.
Based on current findings, it is highly possible that all Intel 1st/2nd/3rd generation Core processors' IGD has such quirk.
Can you include a reference to these "current findings"? I assume you have bug reports that include the device IDs you're adding? If not, how did you build this list of new IDs?
By "current findings" I mean given the IDs we have here, plus previous one added by Thomas, it's highly possible this VGA BIOS bug exists in every 1st/2nd/3rd generation Core processors.
The function comment added by f67fd55fa96f ("PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge GPUs") suggests that this is actually a BIOS issue, not a hardware erratum, i.e., I don't see anything there that suggests a hardware defect.
But there must be a hole somewhere -- the kernel can't be expected to disable interrupts in device-specific ways when there's no driver loaded. Maybe it's simply a BIOS defect or maybe there's some interrupt or _PRT-related setup we're missing.
It's a pure VGA BIOS bug, not the BIOS bug or _PRT etc. The VGA BIOS forgot to turn off the interrupt on these devices.
If this is a VGA BIOS defect, it's not very likely that it will magically be fixed for all new Intel GPUs, so in effect it sounds like we need to update this list of quirks in Linux every time a new Intel GPU comes out. That prospect is a little daunting.
Do you happen to know if Windows has the same problem? I.e., if you boot an old version of Windows with a new GPU, and unplug the VGA cable, does Windows crash? If Windows can figure out how to handle that situation gracefully, Linux should be able to do it, too.
Signed-off-by: Bin Meng bmeng.cn@gmail.com Cc: stable@vger.kernel.org # v3.4+
drivers/pci/quirks.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 6bc27b7..c0673a7 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3190,7 +3190,11 @@ static void disable_igfx_irq(struct pci_dev *dev)
pci_iounmap(dev, regs);
} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0042, disable_igfx_irq); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0046, disable_igfx_irq); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x004a, disable_igfx_irq); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0102, disable_igfx_irq); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0106, disable_igfx_irq); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x010a, disable_igfx_irq); DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0152, disable_igfx_irq);
--
Regards, Bin