On Tue, Apr 04, 2023 at 01:59:33PM -0700, Aaron Plattner wrote:
On 4/4/23 1:18 PM, Daniel Vetter wrote:
Instead of calling aperture_remove_conflicting_devices() to remove the conflicting devices, just call to aperture_detach_devices() to detach the device that matches the same PCI BAR / aperture range. Since the former is just a wrapper of the latter plus a sysfb_disable() call, and now that's done in this function but only for the primary devices.
This fixes a regression introduced by ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs"), where we remove the sysfb when loading a driver for an unrelated pci device, resulting in the user loosing their efifb console or similar.
Note that in practice this only is a problem with the nvidia blob, because that's the only gpu driver people might install which does not come with an fbdev driver of it's own. For everyone else the real gpu driver will restore a working console.
It might be worth noting that this also affects devices that have no driver installed, or where the driver failed to initialize or was configured not to set a mode. E.g. I reproduced this problem on a laptop with i915.modeset=0 and an NVIDIA driver that calls drm_fbdev_generic_setup. It would also reproduce on a system that sets modeset=0 (or has a GPU that's too new for its corresponding kernel driver) and that passes an NVIDIA GPU through to a VM using vfio-pci since that also calls aperture_remove_conflicting_pci_devices.
I agree that in practice this will mostly affect people with our driver until I get my changes to add drm_fbdev_generic_setup checked in. But these other cases don't seem all that unlikely to me.
Thomas Z. refactored the entire modeset=0 handling to be more consistent across drivers, so I think in practice it'll again only happen with the nvidia blob driver (unless you patch in the call to drm_firmware_drivers_only()). Or if you dont use nomodeset or similar and instead use a driver-specific module option, which isn't what howtos in distros recommend.
I can add this to the commit message if you want? -Daniel
-- Aaron
Also note that in the referenced bug there's confusion that this same bug also happens on amdgpu. But that was just another amdgpu specific regression, which just happened to happen at roughly the same time and with the same user-observable symptoms. That bug is fixed now, see https://bugzilla.kernel.org/show_bug.cgi?id=216331#c15
Note that we should not have any such issues on non-pci multi-gpu issues, because I could only find two such cases:
- SoC with some external panel over spi or similar. These panel drivers do not use drm_aperture_remove_conflicting_framebuffers(), so no problem.
- vga+mga, which is a direct console driver and entirely bypasses all this.
For the above reasons the cc: stable is just notionally, this patch will need a backport and that's up to nvidia if they care enough.
v2:
- Explain a bit better why other multi-gpu that aren't pci shouldn't have any issues with making all this fully pci specific.
v3
- polish commit message (Javier)
Fixes: ee7a69aa38d8 ("fbdev: Disable sysfb device registration when removing conflicting FBs") Tested-by: Aaron Plattner aplattner@nvidia.com Reviewed-by: Javier Martinez Canillas javierm@redhat.com References: https://bugzilla.kernel.org/show_bug.cgi?id=216303#c28 Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Aaron Plattner aplattner@nvidia.com Cc: Javier Martinez Canillas javierm@redhat.com Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Helge Deller deller@gmx.de Cc: Sam Ravnborg sam@ravnborg.org Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # v5.19+ (if someone else does the backport)
drivers/video/aperture.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c index 8f1437339e49..2394c2d310f8 100644 --- a/drivers/video/aperture.c +++ b/drivers/video/aperture.c @@ -321,15 +321,16 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na primary = pdev == vga_default_device();
- if (primary)
sysfb_disable();
- for (bar = 0; bar < PCI_STD_NUM_BARS; ++bar) { if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) continue; base = pci_resource_start(pdev, bar); size = pci_resource_len(pdev, bar);
ret = aperture_remove_conflicting_devices(base, size, name);
if (ret)
return ret;
} if (primary) {aperture_detach_devices(base, size);