Hi,
I have discovered a 100% reliable soft lockup on boot on my laptop: Purism Librem 14, Intel Core i7-10710U, 48Gb RAM, Samsung Evo Plus 970 SSD, CoreBoot BIOS, grub bootloader, Arch Linux.
The last working release is kernel 6.9.10, every release from 6.10 onwards reliably exhibit the issue, which, based on journalctl logs, seems to be triggered somewhere in systemd-udev: https://gitlab.archlinux.org/-/project/42594/uploads/04583baf22189a0a8bb2f87...
Bisect points to commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5
At a glance, I see two potentially problematic changes in this diff. Specifically, in the refactoring to move the call to rdt_ctrl_update inside the loop that walks over r->domains :
1. the change from on_each_cpu_mask to smp_call_function_any means that preemption is no longer disabled around the call to rdt_ctrl_update, which could plausibly be a problem
2. there's now a race condition on the msr_params struct: afaict there's no write barrier, so if the call to rdt_ctrl_update is executed on a different CPU, it could plausibly read an outdated value of the dom field, which prior to this series of patches wasn't passed as an explicit parameter, but derived inside rdt_ctrl_update
For initial report to Arch Linux bugtracker and bisect log see: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/74
Best Hugues
On Sun, Sep 08, 2024 at 11:53:56PM -0700, Hugues Bruant wrote:
Hi,
I have discovered a 100% reliable soft lockup on boot on my laptop: Purism Librem 14, Intel Core i7-10710U, 48Gb RAM, Samsung Evo Plus 970 SSD, CoreBoot BIOS, grub bootloader, Arch Linux.
The last working release is kernel 6.9.10, every release from 6.10 onwards reliably exhibit the issue, which, based on journalctl logs, seems to be triggered somewhere in systemd-udev: https://gitlab.archlinux.org/-/project/42594/uploads/04583baf22189a0a8bb2f87...
Bisect points to commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5
That's a merge commit. Meaning, the bisection likely went into the wrong direction.
Looking at your log, the first warn is in framebuffer_coreboot. Some mess in the sysfs platform devices registration.
Adding the relevant people for that:
Aug 20 20:29:36 luna kernel: sysfs: cannot create duplicate filename '/bus/platform/devices/simple-framebuffer.0' Aug 20 20:29:36 luna kernel: CPU: 5 PID: 571 Comm: (udev-worker) Tainted: G OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8 Aug 20 20:29:36 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 4.14-Purism-1 06/18/2021 Aug 20 20:29:36 luna kernel: Call Trace: Aug 20 20:29:36 luna kernel: <TASK> Aug 20 20:29:36 luna kernel: dump_stack_lvl+0x5d/0x80 Aug 20 20:29:36 luna kernel: sysfs_warn_dup.cold+0x17/0x23 Aug 20 20:29:36 luna kernel: sysfs_do_create_link_sd+0xcf/0xe0 Aug 20 20:29:36 luna kernel: bus_add_device+0x6b/0x130 Aug 20 20:29:36 luna kernel: device_add+0x3b3/0x870 Aug 20 20:29:36 luna kernel: platform_device_add+0xed/0x250 Aug 20 20:29:36 luna kernel: platform_device_register_full+0xbb/0x140 Aug 20 20:29:36 luna kernel: platform_device_register_resndata.constprop.0+0x54/0x80 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0] Aug 20 20:29:36 luna kernel: framebuffer_probe+0x165/0x1b0 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0] Aug 20 20:29:36 luna kernel: really_probe+0xdb/0x340 Aug 20 20:29:36 luna kernel: ? pm_runtime_barrier+0x54/0x90 Aug 20 20:29:36 luna kernel: ? __pfx___driver_attach+0x10/0x10 Aug 20 20:29:36 luna kernel: __driver_probe_device+0x78/0x110 Aug 20 20:29:36 luna kernel: driver_probe_device+0x1f/0xa0 Aug 20 20:29:36 luna kernel: __driver_attach+0xba/0x1c0 Aug 20 20:29:36 luna kernel: bus_for_each_dev+0x8c/0xe0 Aug 20 20:29:36 luna kernel: bus_add_driver+0x112/0x1f0 Aug 20 20:29:36 luna kernel: driver_register+0x72/0xd0 Aug 20 20:29:36 luna kernel: ? __pfx_framebuffer_driver_init+0x10/0x10 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0] Aug 20 20:29:36 luna kernel: do_one_initcall+0x58/0x310 Aug 20 20:29:36 luna kernel: do_init_module+0x60/0x220 Aug 20 20:29:36 luna kernel: init_module_from_file+0x89/0xe0 Aug 20 20:29:36 luna kernel: idempotent_init_module+0x121/0x320 Aug 20 20:29:36 luna kernel: __x64_sys_finit_module+0x5e/0xb0 Aug 20 20:29:36 luna kernel: do_syscall_64+0x82/0x190 Aug 20 20:29:36 luna kernel: ? __do_sys_newfstatat+0x3c/0x80 Aug 20 20:29:36 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:36 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:36 luna kernel: ? do_sys_openat2+0x9c/0xe0 Aug 20 20:29:36 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:36 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:36 luna kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Aug 20 20:29:36 luna kernel: RIP: 0033:0x7b1bee2f81fd
The real issue is in i915 however.
However, you have out-of-tree modules. Try reproducing it without them.
Adding i915 people too.
Aug 20 20:29:37 luna kernel: resource: Trying to free nonexistent resource <0x00000000a0000000-0x00000000a0257fff> Aug 20 20:29:37 luna kernel: BUG: unable to handle page fault for address: 0000000300000031 Aug 20 20:29:37 luna kernel: #PF: supervisor read access in kernel mode Aug 20 20:29:37 luna kernel: #PF: error_code(0x0000) - not-present page Aug 20 20:29:37 luna kernel: PGD 0 P4D 0 Aug 20 20:29:37 luna kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI Aug 20 20:29:37 luna kernel: CPU: 9 PID: 552 Comm: (udev-worker) Tainted: G OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8 Aug 20 20:29:37 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 4.14-Purism-1 06/18/2021 Aug 20 20:29:37 luna kernel: RIP: 0010:__release_resource+0x34/0xb0 Aug 20 20:29:37 luna kernel: Code: 8d 50 38 48 8b 40 38 48 85 c0 75 27 eb 6a 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 50 30 <48> 8b 40 30 48 85 c0 74 45 48 39 c7 75 ee 40 84 f6 75 45 48 8b 4f Aug 20 20:29:37 luna kernel: RSP: 0018:ffffb30dc207f930 EFLAGS: 00010296 Aug 20 20:29:37 luna kernel: RAX: 0000000300000001 RBX: ffff8fa34616e900 RCX: ffff8fa3424aac50 Aug 20 20:29:37 luna kernel: RDX: 0000000300000031 RSI: 0000000000000001 RDI: ffff8fa34616e900 Aug 20 20:29:37 luna kernel: RBP: ffff8fa3460e1400 R08: ffff8fa3424a97b8 R09: 0000000000000000 Aug 20 20:29:37 luna kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fa341671000 Aug 20 20:29:37 luna kernel: R13: 0000000000000000 R14: ffff8fa3416710c8 R15: ffff8fa341671000 Aug 20 20:29:37 luna kernel: FS: 00007b1bee0eb880(0000) GS:ffff8fae6e480000(0000) knlGS:0000000000000000 Aug 20 20:29:37 luna kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 20 20:29:37 luna kernel: CR2: 0000000300000031 CR3: 0000000103924002 CR4: 00000000003706f0 Aug 20 20:29:37 luna kernel: Call Trace: Aug 20 20:29:37 luna kernel: <TASK> Aug 20 20:29:37 luna kernel: ? __die_body.cold+0x19/0x27 Aug 20 20:29:37 luna kernel: ? page_fault_oops+0x15a/0x2d0 Aug 20 20:29:37 luna kernel: ? exc_page_fault+0x81/0x190 Aug 20 20:29:37 luna kernel: ? asm_exc_page_fault+0x26/0x30 Aug 20 20:29:37 luna kernel: ? __release_resource+0x34/0xb0 Aug 20 20:29:37 luna kernel: release_resource+0x26/0x40 Aug 20 20:29:37 luna kernel: platform_device_del+0x51/0x90 Aug 20 20:29:37 luna kernel: platform_device_unregister+0x12/0x30 Aug 20 20:29:37 luna kernel: sysfb_disable+0x2f/0x80 Aug 20 20:29:37 luna kernel: aperture_remove_conflicting_pci_devices+0x8c/0xa0 Aug 20 20:29:37 luna kernel: i915_driver_probe+0x7c8/0xac0 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea] Aug 20 20:29:37 luna kernel: local_pci_probe+0x42/0x90 Aug 20 20:29:37 luna kernel: pci_device_probe+0xbd/0x290 Aug 20 20:29:37 luna kernel: ? sysfs_do_create_link_sd+0x6e/0xe0 Aug 20 20:29:37 luna kernel: really_probe+0xdb/0x340 Aug 20 20:29:37 luna kernel: ? pm_runtime_barrier+0x54/0x90 Aug 20 20:29:37 luna kernel: ? __pfx___driver_attach+0x10/0x10 Aug 20 20:29:37 luna kernel: __driver_probe_device+0x78/0x110 Aug 20 20:29:37 luna kernel: driver_probe_device+0x1f/0xa0 Aug 20 20:29:37 luna kernel: __driver_attach+0xba/0x1c0 Aug 20 20:29:37 luna kernel: bus_for_each_dev+0x8c/0xe0 Aug 20 20:29:37 luna kernel: bus_add_driver+0x112/0x1f0 Aug 20 20:29:37 luna kernel: driver_register+0x72/0xd0 Aug 20 20:29:37 luna kernel: i915_init+0x23/0x90 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea] Aug 20 20:29:37 luna kernel: ? __pfx_i915_init+0x10/0x10 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea] Aug 20 20:29:37 luna kernel: do_one_initcall+0x58/0x310 Aug 20 20:29:37 luna kernel: do_init_module+0x60/0x220 Aug 20 20:29:37 luna kernel: init_module_from_file+0x89/0xe0 Aug 20 20:29:37 luna kernel: idempotent_init_module+0x121/0x320 Aug 20 20:29:37 luna kernel: __x64_sys_finit_module+0x5e/0xb0 Aug 20 20:29:37 luna kernel: do_syscall_64+0x82/0x190 Aug 20 20:29:37 luna kernel: ? switch_fpu_return+0x4e/0xd0 Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:37 luna kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Aug 20 20:29:37 luna kernel: RIP: 0033:0x7b1bee2f81fd Aug 20 20:29:37 luna kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 fa 0c 00 f7 d8 64 89 01 48 Aug 20 20:29:37 luna kernel: RSP: 002b:00007ffe062c2ac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 Aug 20 20:29:37 luna kernel: RAX: ffffffffffffffda RBX: 000056171c8d0a00 RCX: 00007b1bee2f81fd Aug 20 20:29:37 luna kernel: RDX: 0000000000000004 RSI: 00007b1bee0e5061 RDI: 0000000000000026 Aug 20 20:29:37 luna kernel: RBP: 00007ffe062c2b80 R08: 0000000000000001 R09: 00007ffe062c2b10 Aug 20 20:29:37 luna kernel: R10: 0000000000000040 R11: 0000000000000246 R12: 00007b1bee0e5061 Aug 20 20:29:37 luna kernel: R13: 0000000000020000 R14: 000056171c8d18c0 R15: 000056171c8d31e0 Aug 20 20:29:37 luna kernel: </TASK> Aug 20 20:29:37 luna kernel: Modules linked in: intel_powerclamp ath9k(+) snd_compress coretemp ac97_bus ath9k_common snd_pcm_dmaengine kvm_intel snd_hda_intel ath9k_hw joydev snd_intel_dspcfg mousedev ath snd_intel_sdw_acpi i915(+) kvm snd_hda_codec iTCO_wdt mac80211 snd_hda_core processor_thermal_device_pci_legacy intel_pmc_bxt snd_hwdep processor_thermal_device hid_multitouch ee1004 iTCO_vendor_support processor_thermal_wt_hint drm_buddy snd_pcm rapl processor_thermal_rfim hid_generic spi_nor r8169 i2c_i801 i2c_algo_bit libarc4 memconsole_coreboot processor_thermal_rapl snd_timer intel_cstate intel_rapl_msr framebuffer_coreboot memconsole cbmem intel_uncore snd intel_rapl_common realtek ttm i2c_smbus cfg80211 mtd processor_thermal_wt_req psmouse mdio_devres pcspkr soundcore i2c_mux processor_thermal_power_floor drm_display_helper intel_lpss_pci libphy processor_thermal_mbox intel_lpss cec rfkill int340x_thermal_zone intel_pmc_core i2c_hid_acpi idma64 intel_gtt intel_soc_dts_iosf intel_pch_thermal i2c_hid intel_vsec intel_hid video Aug 20 20:29:37 luna kernel: pmt_telemetry pmt_class pinctrl_cannonlake wmi sparse_keymap coreboot_table mac_hid pkcs8_key_parser crypto_user loop acpi_call(OE) nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel serio_raw sha512_ssse3 atkbd sha256_ssse3 sha1_ssse3 libps2 aesni_intel vivaldi_fmap nvme crypto_simd nvme_core spi_intel_pci cryptd xhci_pci spi_intel i8042 nvme_auth xhci_pci_renesas serio librem_ec_acpi(OE) Aug 20 20:29:37 luna kernel: CR2: 0000000300000031 Aug 20 20:29:37 luna kernel: ---[ end trace 0000000000000000 ]---
Hi
Am 09.09.24 um 10:02 schrieb Borislav Petkov:
Aug 20 20:29:37 luna kernel: resource: Trying to free nonexistent resource <0x00000000a0000000-0x00000000a0257fff> Aug 20 20:29:37 luna kernel: BUG: unable to handle page fault for address: 0000000300000031 Aug 20 20:29:37 luna kernel: #PF: supervisor read access in kernel mode Aug 20 20:29:37 luna kernel: #PF: error_code(0x0000) - not-present page Aug 20 20:29:37 luna kernel: PGD 0 P4D 0 Aug 20 20:29:37 luna kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI Aug 20 20:29:37 luna kernel: CPU: 9 PID: 552 Comm: (udev-worker) Tainted: G OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8 Aug 20 20:29:37 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 4.14-Purism-1 06/18/2021 Aug 20 20:29:37 luna kernel: RIP: 0010:__release_resource+0x34/0xb0 Aug 20 20:29:37 luna kernel: Code: 8d 50 38 48 8b 40 38 48 85 c0 75 27 eb 6a 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 50 30 <48> 8b 40 30 48 85 c0 74 45 48 39 c7 75 ee 40 84 f6 75 45 48 8b 4f Aug 20 20:29:37 luna kernel: RSP: 0018:ffffb30dc207f930 EFLAGS: 00010296 Aug 20 20:29:37 luna kernel: RAX: 0000000300000001 RBX: ffff8fa34616e900 RCX: ffff8fa3424aac50 Aug 20 20:29:37 luna kernel: RDX: 0000000300000031 RSI: 0000000000000001 RDI: ffff8fa34616e900 Aug 20 20:29:37 luna kernel: RBP: ffff8fa3460e1400 R08: ffff8fa3424a97b8 R09: 0000000000000000 Aug 20 20:29:37 luna kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fa341671000 Aug 20 20:29:37 luna kernel: R13: 0000000000000000 R14: ffff8fa3416710c8 R15: ffff8fa341671000 Aug 20 20:29:37 luna kernel: FS: 00007b1bee0eb880(0000) GS:ffff8fae6e480000(0000) knlGS:0000000000000000 Aug 20 20:29:37 luna kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 20 20:29:37 luna kernel: CR2: 0000000300000031 CR3: 0000000103924002 CR4: 00000000003706f0 Aug 20 20:29:37 luna kernel: Call Trace: Aug 20 20:29:37 luna kernel: <TASK> Aug 20 20:29:37 luna kernel: ? __die_body.cold+0x19/0x27 Aug 20 20:29:37 luna kernel: ? page_fault_oops+0x15a/0x2d0 Aug 20 20:29:37 luna kernel: ? exc_page_fault+0x81/0x190 Aug 20 20:29:37 luna kernel: ? asm_exc_page_fault+0x26/0x30 Aug 20 20:29:37 luna kernel: ? __release_resource+0x34/0xb0 Aug 20 20:29:37 luna kernel: release_resource+0x26/0x40 Aug 20 20:29:37 luna kernel: platform_device_del+0x51/0x90 Aug 20 20:29:37 luna kernel: platform_device_unregister+0x12/0x30 Aug 20 20:29:37 luna kernel: sysfb_disable+0x2f/0x80 Aug 20 20:29:37 luna kernel: aperture_remove_conflicting_pci_devices+0x8c/0xa0
It's looks like another report of a known problem. Please try the patch at
https://patchwork.freedesktop.org/patch/610171/?series=137587&rev=1
Best regards Thomas
Aug 20 20:29:37 luna kernel: i915_driver_probe+0x7c8/0xac0 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea] Aug 20 20:29:37 luna kernel: local_pci_probe+0x42/0x90 Aug 20 20:29:37 luna kernel: pci_device_probe+0xbd/0x290 Aug 20 20:29:37 luna kernel: ? sysfs_do_create_link_sd+0x6e/0xe0 Aug 20 20:29:37 luna kernel: really_probe+0xdb/0x340 Aug 20 20:29:37 luna kernel: ? pm_runtime_barrier+0x54/0x90 Aug 20 20:29:37 luna kernel: ? __pfx___driver_attach+0x10/0x10 Aug 20 20:29:37 luna kernel: __driver_probe_device+0x78/0x110 Aug 20 20:29:37 luna kernel: driver_probe_device+0x1f/0xa0 Aug 20 20:29:37 luna kernel: __driver_attach+0xba/0x1c0 Aug 20 20:29:37 luna kernel: bus_for_each_dev+0x8c/0xe0 Aug 20 20:29:37 luna kernel: bus_add_driver+0x112/0x1f0 Aug 20 20:29:37 luna kernel: driver_register+0x72/0xd0 Aug 20 20:29:37 luna kernel: i915_init+0x23/0x90 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea] Aug 20 20:29:37 luna kernel: ? __pfx_i915_init+0x10/0x10 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea] Aug 20 20:29:37 luna kernel: do_one_initcall+0x58/0x310 Aug 20 20:29:37 luna kernel: do_init_module+0x60/0x220 Aug 20 20:29:37 luna kernel: init_module_from_file+0x89/0xe0 Aug 20 20:29:37 luna kernel: idempotent_init_module+0x121/0x320 Aug 20 20:29:37 luna kernel: __x64_sys_finit_module+0x5e/0xb0 Aug 20 20:29:37 luna kernel: do_syscall_64+0x82/0x190 Aug 20 20:29:37 luna kernel: ? switch_fpu_return+0x4e/0xd0 Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:37 luna kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Aug 20 20:29:37 luna kernel: RIP: 0033:0x7b1bee2f81fd Aug 20 20:29:37 luna kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 fa 0c 00 f7 d8 64 89 01 48 Aug 20 20:29:37 luna kernel: RSP: 002b:00007ffe062c2ac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 Aug 20 20:29:37 luna kernel: RAX: ffffffffffffffda RBX: 000056171c8d0a00 RCX: 00007b1bee2f81fd Aug 20 20:29:37 luna kernel: RDX: 0000000000000004 RSI: 00007b1bee0e5061 RDI: 0000000000000026 Aug 20 20:29:37 luna kernel: RBP: 00007ffe062c2b80 R08: 0000000000000001 R09: 00007ffe062c2b10 Aug 20 20:29:37 luna kernel: R10: 0000000000000040 R11: 0000000000000246 R12: 00007b1bee0e5061 Aug 20 20:29:37 luna kernel: R13: 0000000000020000 R14: 000056171c8d18c0 R15: 000056171c8d31e0 Aug 20 20:29:37 luna kernel: </TASK> Aug 20 20:29:37 luna kernel: Modules linked in: intel_powerclamp ath9k(+) snd_compress coretemp ac97_bus ath9k_common snd_pcm_dmaengine kvm_intel snd_hda_intel ath9k_hw joydev snd_intel_dspcfg mousedev ath snd_intel_sdw_acpi i915(+) kvm snd_hda_codec iTCO_wdt mac80211 snd_hda_core processor_thermal_device_pci_legacy intel_pmc_bxt snd_hwdep processor_thermal_device hid_multitouch ee1004 iTCO_vendor_support processor_thermal_wt_hint drm_buddy snd_pcm rapl processor_thermal_rfim hid_generic spi_nor r8169 i2c_i801 i2c_algo_bit libarc4 memconsole_coreboot processor_thermal_rapl snd_timer intel_cstate intel_rapl_msr framebuffer_coreboot memconsole cbmem intel_uncore snd intel_rapl_common realtek ttm i2c_smbus cfg80211 mtd processor_thermal_wt_req psmouse mdio_devres pcspkr soundcore i2c_mux processor_thermal_power_floor drm_display_helper intel_lpss_pci libphy processor_thermal_mbox intel_lpss cec rfkill int340x_thermal_zone intel_pmc_core i2c_hid_acpi idma64 intel_gtt intel_soc_dts_iosf intel_pch_thermal i2c_hid intel_vsec intel_hid video Aug 20 20:29:37 luna kernel: pmt_telemetry pmt_class pinctrl_cannonlake wmi sparse_keymap coreboot_table mac_hid pkcs8_key_parser crypto_user loop acpi_call(OE) nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel serio_raw sha512_ssse3 atkbd sha256_ssse3 sha1_ssse3 libps2 aesni_intel vivaldi_fmap nvme crypto_simd nvme_core spi_intel_pci cryptd xhci_pci spi_intel i8042 nvme_auth xhci_pci_renesas serio librem_ec_acpi(OE) Aug 20 20:29:37 luna kernel: CR2: 0000000300000031 Aug 20 20:29:37 luna kernel: ---[ end trace 0000000000000000 ]---
On Mon, Sep 9, 2024 at 2:49 AM Thomas Zimmermann tzimmermann@suse.de wrote:
Hi
Am 09.09.24 um 10:02 schrieb Borislav Petkov:
Aug 20 20:29:37 luna kernel: resource: Trying to free nonexistent resource <0x00000000a0000000-0x00000000a0257fff> Aug 20 20:29:37 luna kernel: BUG: unable to handle page fault for address: 0000000300000031 Aug 20 20:29:37 luna kernel: #PF: supervisor read access in kernel mode Aug 20 20:29:37 luna kernel: #PF: error_code(0x0000) - not-present page Aug 20 20:29:37 luna kernel: PGD 0 P4D 0 Aug 20 20:29:37 luna kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI Aug 20 20:29:37 luna kernel: CPU: 9 PID: 552 Comm: (udev-worker) Tainted: G OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8 Aug 20 20:29:37 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 4.14-Purism-1 06/18/2021 Aug 20 20:29:37 luna kernel: RIP: 0010:__release_resource+0x34/0xb0 Aug 20 20:29:37 luna kernel: Code: 8d 50 38 48 8b 40 38 48 85 c0 75 27 eb 6a 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 50 30 <48> 8b 40 30 48 85 c0 74 45 48 39 c7 75 ee 40 84 f6 75 45 48 8b 4f Aug 20 20:29:37 luna kernel: RSP: 0018:ffffb30dc207f930 EFLAGS: 00010296 Aug 20 20:29:37 luna kernel: RAX: 0000000300000001 RBX: ffff8fa34616e900 RCX: ffff8fa3424aac50 Aug 20 20:29:37 luna kernel: RDX: 0000000300000031 RSI: 0000000000000001 RDI: ffff8fa34616e900 Aug 20 20:29:37 luna kernel: RBP: ffff8fa3460e1400 R08: ffff8fa3424a97b8 R09: 0000000000000000 Aug 20 20:29:37 luna kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fa341671000 Aug 20 20:29:37 luna kernel: R13: 0000000000000000 R14: ffff8fa3416710c8 R15: ffff8fa341671000 Aug 20 20:29:37 luna kernel: FS: 00007b1bee0eb880(0000) GS:ffff8fae6e480000(0000) knlGS:0000000000000000 Aug 20 20:29:37 luna kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 20 20:29:37 luna kernel: CR2: 0000000300000031 CR3: 0000000103924002 CR4: 00000000003706f0 Aug 20 20:29:37 luna kernel: Call Trace: Aug 20 20:29:37 luna kernel: <TASK> Aug 20 20:29:37 luna kernel: ? __die_body.cold+0x19/0x27 Aug 20 20:29:37 luna kernel: ? page_fault_oops+0x15a/0x2d0 Aug 20 20:29:37 luna kernel: ? exc_page_fault+0x81/0x190 Aug 20 20:29:37 luna kernel: ? asm_exc_page_fault+0x26/0x30 Aug 20 20:29:37 luna kernel: ? __release_resource+0x34/0xb0 Aug 20 20:29:37 luna kernel: release_resource+0x26/0x40 Aug 20 20:29:37 luna kernel: platform_device_del+0x51/0x90 Aug 20 20:29:37 luna kernel: platform_device_unregister+0x12/0x30 Aug 20 20:29:37 luna kernel: sysfb_disable+0x2f/0x80 Aug 20 20:29:37 luna kernel: aperture_remove_conflicting_pci_devices+0x8c/0xa0
It's looks like another report of a known problem. Please try the patch at
https://patchwork.freedesktop.org/patch/610171/?series=137587&rev=1
Thanks for the suggestion. I tried 6.11-rc7 which I am told includes this patch. The first boot attempt was successful, but 4 subsequent attempts ran into soft lockup again: it seems this patch makes the soft lockup less reliable to reproduce but does not entirely fix it.
Noteworthy: after applying this patch, there seems to be slightly more variability in where the soft lockup happens, instead of always in the i915 driver probe triggered by udev. See attached boot logs for details.
Best regards Thomas
Aug 20 20:29:37 luna kernel: i915_driver_probe+0x7c8/0xac0 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea] Aug 20 20:29:37 luna kernel: local_pci_probe+0x42/0x90 Aug 20 20:29:37 luna kernel: pci_device_probe+0xbd/0x290 Aug 20 20:29:37 luna kernel: ? sysfs_do_create_link_sd+0x6e/0xe0 Aug 20 20:29:37 luna kernel: really_probe+0xdb/0x340 Aug 20 20:29:37 luna kernel: ? pm_runtime_barrier+0x54/0x90 Aug 20 20:29:37 luna kernel: ? __pfx___driver_attach+0x10/0x10 Aug 20 20:29:37 luna kernel: __driver_probe_device+0x78/0x110 Aug 20 20:29:37 luna kernel: driver_probe_device+0x1f/0xa0 Aug 20 20:29:37 luna kernel: __driver_attach+0xba/0x1c0 Aug 20 20:29:37 luna kernel: bus_for_each_dev+0x8c/0xe0 Aug 20 20:29:37 luna kernel: bus_add_driver+0x112/0x1f0 Aug 20 20:29:37 luna kernel: driver_register+0x72/0xd0 Aug 20 20:29:37 luna kernel: i915_init+0x23/0x90 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea] Aug 20 20:29:37 luna kernel: ? __pfx_i915_init+0x10/0x10 [i915 6caac5d02e3122d822ca0c852e7e5ed826a3aaea] Aug 20 20:29:37 luna kernel: do_one_initcall+0x58/0x310 Aug 20 20:29:37 luna kernel: do_init_module+0x60/0x220 Aug 20 20:29:37 luna kernel: init_module_from_file+0x89/0xe0 Aug 20 20:29:37 luna kernel: idempotent_init_module+0x121/0x320 Aug 20 20:29:37 luna kernel: __x64_sys_finit_module+0x5e/0xb0 Aug 20 20:29:37 luna kernel: do_syscall_64+0x82/0x190 Aug 20 20:29:37 luna kernel: ? switch_fpu_return+0x4e/0xd0 Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:37 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:37 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:37 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:37 luna kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Aug 20 20:29:37 luna kernel: RIP: 0033:0x7b1bee2f81fd Aug 20 20:29:37 luna kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 fa 0c 00 f7 d8 64 89 01 48 Aug 20 20:29:37 luna kernel: RSP: 002b:00007ffe062c2ac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 Aug 20 20:29:37 luna kernel: RAX: ffffffffffffffda RBX: 000056171c8d0a00 RCX: 00007b1bee2f81fd Aug 20 20:29:37 luna kernel: RDX: 0000000000000004 RSI: 00007b1bee0e5061 RDI: 0000000000000026 Aug 20 20:29:37 luna kernel: RBP: 00007ffe062c2b80 R08: 0000000000000001 R09: 00007ffe062c2b10 Aug 20 20:29:37 luna kernel: R10: 0000000000000040 R11: 0000000000000246 R12: 00007b1bee0e5061 Aug 20 20:29:37 luna kernel: R13: 0000000000020000 R14: 000056171c8d18c0 R15: 000056171c8d31e0 Aug 20 20:29:37 luna kernel: </TASK> Aug 20 20:29:37 luna kernel: Modules linked in: intel_powerclamp ath9k(+) snd_compress coretemp ac97_bus ath9k_common snd_pcm_dmaengine kvm_intel snd_hda_intel ath9k_hw joydev snd_intel_dspcfg mousedev ath snd_intel_sdw_acpi i915(+) kvm snd_hda_codec iTCO_wdt mac80211 snd_hda_core processor_thermal_device_pci_legacy intel_pmc_bxt snd_hwdep processor_thermal_device hid_multitouch ee1004 iTCO_vendor_support processor_thermal_wt_hint drm_buddy snd_pcm rapl processor_thermal_rfim hid_generic spi_nor r8169 i2c_i801 i2c_algo_bit libarc4 memconsole_coreboot processor_thermal_rapl snd_timer intel_cstate intel_rapl_msr framebuffer_coreboot memconsole cbmem intel_uncore snd intel_rapl_common realtek ttm i2c_smbus cfg80211 mtd processor_thermal_wt_req psmouse mdio_devres pcspkr soundcore i2c_mux processor_thermal_power_floor drm_display_helper intel_lpss_pci libphy processor_thermal_mbox intel_lpss cec rfkill int340x_thermal_zone intel_pmc_core i2c_hid_acpi idma64 intel_gtt intel_soc_dts_iosf intel_pch_thermal i2c_hid intel_vsec intel_hid video Aug 20 20:29:37 luna kernel: pmt_telemetry pmt_class pinctrl_cannonlake wmi sparse_keymap coreboot_table mac_hid pkcs8_key_parser crypto_user loop acpi_call(OE) nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 uas usb_storage dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel serio_raw sha512_ssse3 atkbd sha256_ssse3 sha1_ssse3 libps2 aesni_intel vivaldi_fmap nvme crypto_simd nvme_core spi_intel_pci cryptd xhci_pci spi_intel i8042 nvme_auth xhci_pci_renesas serio librem_ec_acpi(OE) Aug 20 20:29:37 luna kernel: CR2: 0000000300000031 Aug 20 20:29:37 luna kernel: ---[ end trace 0000000000000000 ]---
--
Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
I have discovered a 100% reliable soft lockup on boot on my laptop: Purism Librem 14, Intel Core i7-10710U, 48Gb RAM, Samsung Evo Plus 970 SSD, CoreBoot BIOS, grub bootloader, Arch Linux.
The last working release is kernel 6.9.10, every release from 6.10 onwards reliably exhibit the issue, which, based on journalctl logs, seems to be triggered somewhere in systemd-udev: https://gitlab.archlinux.org/-/project/42594/uploads/04583baf22189a0a8bb2f87...
Bisect points to commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5
Does that Intel Core i7-10710U even execute the RDT code? Most client parts don't support RDT. You can check if yours does by looking for "rdt_a" in /proc/cpuinfo.
-Tony
On Mon, Sep 9, 2024 at 9:10 AM Luck, Tony tony.luck@intel.com wrote:
I have discovered a 100% reliable soft lockup on boot on my laptop: Purism Librem 14, Intel Core i7-10710U, 48Gb RAM, Samsung Evo Plus 970 SSD, CoreBoot BIOS, grub bootloader, Arch Linux.
The last working release is kernel 6.9.10, every release from 6.10 onwards reliably exhibit the issue, which, based on journalctl logs, seems to be triggered somewhere in systemd-udev: https://gitlab.archlinux.org/-/project/42594/uploads/04583baf22189a0a8bb2f87...
Bisect points to commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5
Does that Intel Core i7-10710U even execute the RDT code? Most client parts don't support RDT. You can check if yours does by looking for "rdt_a" in /proc/cpuinfo.
Thanks for the suggestion. You're right, I do not see `rdt_a` in `/proc/cpuinfo`
(Tweaking subject; this indeed isn't related to the regression at all)
Hi,
On Mon, Sep 09, 2024 at 10:02:00AM +0200, Borislav Petkov wrote:
Looking at your log, the first warn is in framebuffer_coreboot. Some mess in the sysfs platform devices registration.
Adding the relevant people for that:
Aug 20 20:29:36 luna kernel: sysfs: cannot create duplicate filename '/bus/platform/devices/simple-framebuffer.0' Aug 20 20:29:36 luna kernel: CPU: 5 PID: 571 Comm: (udev-worker) Tainted: G OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8 Aug 20 20:29:36 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 4.14-Purism-1 06/18/2021 Aug 20 20:29:36 luna kernel: Call Trace: Aug 20 20:29:36 luna kernel: <TASK> Aug 20 20:29:36 luna kernel: dump_stack_lvl+0x5d/0x80 Aug 20 20:29:36 luna kernel: sysfs_warn_dup.cold+0x17/0x23 Aug 20 20:29:36 luna kernel: sysfs_do_create_link_sd+0xcf/0xe0 Aug 20 20:29:36 luna kernel: bus_add_device+0x6b/0x130 Aug 20 20:29:36 luna kernel: device_add+0x3b3/0x870 Aug 20 20:29:36 luna kernel: platform_device_add+0xed/0x250 Aug 20 20:29:36 luna kernel: platform_device_register_full+0xbb/0x140 Aug 20 20:29:36 luna kernel: platform_device_register_resndata.constprop.0+0x54/0x80 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0] Aug 20 20:29:36 luna kernel: framebuffer_probe+0x165/0x1b0 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0] Aug 20 20:29:36 luna kernel: really_probe+0xdb/0x340 Aug 20 20:29:36 luna kernel: ? pm_runtime_barrier+0x54/0x90 Aug 20 20:29:36 luna kernel: ? __pfx___driver_attach+0x10/0x10 Aug 20 20:29:36 luna kernel: __driver_probe_device+0x78/0x110 Aug 20 20:29:36 luna kernel: driver_probe_device+0x1f/0xa0 Aug 20 20:29:36 luna kernel: __driver_attach+0xba/0x1c0 Aug 20 20:29:36 luna kernel: bus_for_each_dev+0x8c/0xe0 Aug 20 20:29:36 luna kernel: bus_add_driver+0x112/0x1f0 Aug 20 20:29:36 luna kernel: driver_register+0x72/0xd0 Aug 20 20:29:36 luna kernel: ? __pfx_framebuffer_driver_init+0x10/0x10 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0] Aug 20 20:29:36 luna kernel: do_one_initcall+0x58/0x310 Aug 20 20:29:36 luna kernel: do_init_module+0x60/0x220 Aug 20 20:29:36 luna kernel: init_module_from_file+0x89/0xe0 Aug 20 20:29:36 luna kernel: idempotent_init_module+0x121/0x320 Aug 20 20:29:36 luna kernel: __x64_sys_finit_module+0x5e/0xb0 Aug 20 20:29:36 luna kernel: do_syscall_64+0x82/0x190 Aug 20 20:29:36 luna kernel: ? __do_sys_newfstatat+0x3c/0x80 Aug 20 20:29:36 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:36 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:36 luna kernel: ? do_sys_openat2+0x9c/0xe0 Aug 20 20:29:36 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:36 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:36 luna kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Aug 20 20:29:36 luna kernel: RIP: 0033:0x7b1bee2f81fd
Looks like it might be a conflict with drivers/firmware/sysfb_simplefb.c, which also uses the "simple-framebuffer" name with a constant ID of 0. It's possible both drivers should be switched to use PLATFORM_DEVID_AUTO? Or at least one of them. Or they should use different base names.
I'm not really sure what the best option is (does anyone rely on or care about the device naming?), and I don't actually use this driver. But here's an untested diff to try if you'd really like. If you test it, feel free to submit as a proper patch with my:
Signed-off-by: Brian Norris briannorris@chromium.org
diff --git a/drivers/firmware/google/framebuffer-coreboot.c b/drivers/firmware/google/framebuffer-coreboot.c index daadd71d8ddd..3f1b8f664c3f 100644 --- a/drivers/firmware/google/framebuffer-coreboot.c +++ b/drivers/firmware/google/framebuffer-coreboot.c @@ -62,7 +62,8 @@ static int framebuffer_probe(struct coreboot_device *dev) return -EINVAL;
pdev = platform_device_register_resndata(&dev->dev, - "simple-framebuffer", 0, + "simple-framebuffer", + PLATFORM_DEVID_AUTO, &res, 1, &pdata, sizeof(pdata)); if (IS_ERR(pdev))
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [NOT A REGRESSION] firmware: framebuffer-coreboot: duplicate device name "simple-framebuffer.0" Link: https://lore.kernel.org/stable/ZuCGkjoxKxpnhEh6%40google.com
Brian Norris briannorris@chromium.org writes:
Hello Brian,
(Tweaking subject; this indeed isn't related to the regression at all)
Hi,
On Mon, Sep 09, 2024 at 10:02:00AM +0200, Borislav Petkov wrote:
Looking at your log, the first warn is in framebuffer_coreboot. Some mess in the sysfs platform devices registration.
Adding the relevant people for that:
Aug 20 20:29:36 luna kernel: sysfs: cannot create duplicate filename '/bus/platform/devices/simple-framebuffer.0' Aug 20 20:29:36 luna kernel: CPU: 5 PID: 571 Comm: (udev-worker) Tainted: G OE 6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8 Aug 20 20:29:36 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 4.14-Purism-1 06/18/2021 Aug 20 20:29:36 luna kernel: Call Trace: Aug 20 20:29:36 luna kernel: <TASK> Aug 20 20:29:36 luna kernel: dump_stack_lvl+0x5d/0x80 Aug 20 20:29:36 luna kernel: sysfs_warn_dup.cold+0x17/0x23 Aug 20 20:29:36 luna kernel: sysfs_do_create_link_sd+0xcf/0xe0 Aug 20 20:29:36 luna kernel: bus_add_device+0x6b/0x130 Aug 20 20:29:36 luna kernel: device_add+0x3b3/0x870 Aug 20 20:29:36 luna kernel: platform_device_add+0xed/0x250 Aug 20 20:29:36 luna kernel: platform_device_register_full+0xbb/0x140 Aug 20 20:29:36 luna kernel: platform_device_register_resndata.constprop.0+0x54/0x80 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0] Aug 20 20:29:36 luna kernel: framebuffer_probe+0x165/0x1b0 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0] Aug 20 20:29:36 luna kernel: really_probe+0xdb/0x340 Aug 20 20:29:36 luna kernel: ? pm_runtime_barrier+0x54/0x90 Aug 20 20:29:36 luna kernel: ? __pfx___driver_attach+0x10/0x10 Aug 20 20:29:36 luna kernel: __driver_probe_device+0x78/0x110 Aug 20 20:29:36 luna kernel: driver_probe_device+0x1f/0xa0 Aug 20 20:29:36 luna kernel: __driver_attach+0xba/0x1c0 Aug 20 20:29:36 luna kernel: bus_for_each_dev+0x8c/0xe0 Aug 20 20:29:36 luna kernel: bus_add_driver+0x112/0x1f0 Aug 20 20:29:36 luna kernel: driver_register+0x72/0xd0 Aug 20 20:29:36 luna kernel: ? __pfx_framebuffer_driver_init+0x10/0x10 [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0] Aug 20 20:29:36 luna kernel: do_one_initcall+0x58/0x310 Aug 20 20:29:36 luna kernel: do_init_module+0x60/0x220 Aug 20 20:29:36 luna kernel: init_module_from_file+0x89/0xe0 Aug 20 20:29:36 luna kernel: idempotent_init_module+0x121/0x320 Aug 20 20:29:36 luna kernel: __x64_sys_finit_module+0x5e/0xb0 Aug 20 20:29:36 luna kernel: do_syscall_64+0x82/0x190 Aug 20 20:29:36 luna kernel: ? __do_sys_newfstatat+0x3c/0x80 Aug 20 20:29:36 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:36 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:36 luna kernel: ? do_sys_openat2+0x9c/0xe0 Aug 20 20:29:36 luna kernel: ? syscall_exit_to_user_mode+0x72/0x200 Aug 20 20:29:36 luna kernel: ? do_syscall_64+0x8e/0x190 Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:36 luna kernel: ? clear_bhb_loop+0x25/0x80 Aug 20 20:29:36 luna kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Aug 20 20:29:36 luna kernel: RIP: 0033:0x7b1bee2f81fd
Looks like it might be a conflict with drivers/firmware/sysfb_simplefb.c, which also uses the "simple-framebuffer" name with a constant ID of 0. It's possible both drivers should be switched to use PLATFORM_DEVID_AUTO? Or at least one of them. Or they should use different base names.
I'm unsure about PLATFORM_DEVID_AUTO because I don't know if there are user-space programs that assume this to always be "simple-framebuffer.0".
I'm not really sure what the best option is (does anyone rely on or care about the device naming?), and I don't actually use this driver. But here's an untested diff to try if you'd really like. If you test it, feel free to submit as a proper patch with my:
I've discussed this with Thomas Zimmermann (simpledrm maintainer) and he suggests that the problem is the system framebuffer information to be provided in both Coreboot table entry (AFAIU is LB_TAG_FRAMEBUFFER) and in the boot_params, which leads to struct screen_info to be filled.
We had the same problem for EFI systems that passed DTB to the kernel instead of ACPI, in those cases both a "simple-framebuffer" DT node and an EFI-GOP table could be provided.
Commit 3310288f6135 "(of/platform: Disable sysfb if a simple-framebuffer node is found") solved that issue. I've typed the same for Coreboot to handle in the same way. Please let me know what you think:
From 6955149fb13af1c0cba2e5c1fbb1ac9367a09ae2 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas javierm@redhat.com Date: Thu, 12 Sep 2024 12:55:29 +0200 Subject: [RFC PATCH] firmware: coreboot: Disable sysfb if Coreboot already provides a FB
On Coreboot platforms, a system framebuffer may be provided to the Linux kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But it seems SeaBIOS payload can also provide a VGA mode in the boot params.
If that the case, early arch x86 boot code will fill the global struct screen_info data.
The data is used by the Generic System Framebuffers (sysfb) framework to add a platform device with platform data about the system framebuffer.
But if there is information about the system framebuffer in the Coreboot table as well, the framebuffer_coreboot driver will also try to do the same and add another device for the system framebuffer. This will fail though because there's already a simple-framebuffer.0 device registered:
sysfs: cannot create duplicate filename '/bus/platform/devices/simple-framebuffer.0' ... coreboot: could not register framebuffer framebuffer coreboot8: probe with driver framebuffer failed with error -17
To prevent the issue, make the framebuffer_core driver to disable sysfb if there is system framebuffer data in the Coreboot table. That way only this driver will register a device and sysfb would not attempt to do it (or remove its registered device if was already executed before).
Reported-by: Brian Norris briannorris@chromium.org Link: https://lore.kernel.org/all/ZuCG-DggNThuF4pj@b20ea791c01f/T/#ma7fb65acbc1a56... Suggested-by: Thomas Zimmermann tzimmermann@suse.de Signed-off-by: Javier Martinez Canillas javierm@redhat.com --- drivers/firmware/google/framebuffer-coreboot.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/firmware/google/framebuffer-coreboot.c b/drivers/firmware/google/framebuffer-coreboot.c index daadd71d8ddd..0a28aa5b17dc 100644 --- a/drivers/firmware/google/framebuffer-coreboot.c +++ b/drivers/firmware/google/framebuffer-coreboot.c @@ -61,6 +61,19 @@ static int framebuffer_probe(struct coreboot_device *dev) if (res.end <= res.start) return -EINVAL;
+ /* + * Since a "simple-framebuffer" device is already added + * here, disable the Generic System Framebuffers (sysfb) + * to prevent it from registering another device for the + * system framebuffer later (e.g: using the screen_info + * data that may had been filled as well). + * + * This can happen for example on Coreboot systems, that + * advertise a LB_TAG_FRAMEBUFFER entry in their Coreboot + * table and also a VESA mode by the BIOS used as payload. + */ + sysfb_disable(); + pdev = platform_device_register_resndata(&dev->dev, "simple-framebuffer", 0, &res, 1, &pdata,
On Coreboot platforms, a system framebuffer may be provided to the Linux kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But it seems SeaBIOS payload can also provide a VGA mode in the boot params.
[...]
To prevent the issue, make the framebuffer_core driver to disable sysfb if there is system framebuffer data in the Coreboot table. That way only this driver will register a device and sysfb would not attempt to do it (or remove its registered device if was already executed before).
I wonder if the priority should be the other way around? coreboot's framebuffer is generally only valid when coreboot exits to the payload (e.g. SeaBIOS). Only if the payload doesn't touch the display controller or if there is no payload and coreboot directly hands off to a kernel does the kernel driver for LB_TAG_FRAMEBUFFER make sense. But if there is some other framebuffer information passed to the kernel from a firmware component running after coreboot, most likely that one is more up to date and the framebuffer described by the coreboot table doesn't work anymore (because the payload usually doesn't modify the coreboot tables again, even if it changes hardware state). So if there are two drivers fighting over which firmware framebuffer description is the correct one, the coreboot driver should probably give way.
Julius Werner jwerner@chromium.org writes:
Hello Julius,
On Coreboot platforms, a system framebuffer may be provided to the Linux kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But it seems SeaBIOS payload can also provide a VGA mode in the boot params.
[...]
To prevent the issue, make the framebuffer_core driver to disable sysfb if there is system framebuffer data in the Coreboot table. That way only this driver will register a device and sysfb would not attempt to do it (or remove its registered device if was already executed before).
I wonder if the priority should be the other way around? coreboot's framebuffer is generally only valid when coreboot exits to the payload (e.g. SeaBIOS). Only if the payload doesn't touch the display controller or if there is no payload and coreboot directly hands off to a kernel does the kernel driver for LB_TAG_FRAMEBUFFER make sense. But if there is some other framebuffer information passed to the kernel from a firmware component running after coreboot, most likely that one is more up to date and the framebuffer described by the coreboot table doesn't work anymore (because the payload usually doesn't modify the coreboot tables again, even if it changes hardware state). So if there are two drivers fighting over which firmware framebuffer description is the correct one, the coreboot driver should probably give way.
That's a very good point. I'm actually not familiar with Coreboot and I used an educated guess (in the case of DT for example, that's the main source of truth and I didn't know if a Core table was in a similar vein).
Maybe something like the following (untested) patch then?
From de1c32017006f4671d91b695f4d6b4e99c073ab2 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas javierm@redhat.com Date: Thu, 12 Sep 2024 18:31:55 +0200 Subject: [PATCH] firmware: coreboot: Don't register a pdev if screen_info data is available
On Coreboot platforms, a system framebuffer may be provided to the Linux kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But a Coreboot payload (e.g: SeaBIOS) could also provide this information to the Linux kernel.
If that the case, early arch x86 boot code will fill the global struct screen_info data and that data used by the Generic System Framebuffers (sysfb) framework to add a platform device with platform data about the system framebuffer.
But later then the framebuffer_coreboot driver will try to do the same framebuffer (using the information from the Coreboot table), which will lead to an error due a simple-framebuffer.0 device already registered:
sysfs: cannot create duplicate filename '/bus/platform/devices/simple-framebuffer.0' ... coreboot: could not register framebuffer framebuffer coreboot8: probe with driver framebuffer failed with error -17
To prevent the issue, make the framebuffer_core driver to not register a platform device if the global struct screen_info data has been filled.
Reported-by: Brian Norris briannorris@chromium.org Link: https://lore.kernel.org/all/ZuCG-DggNThuF4pj@b20ea791c01f/T/#ma7fb65acbc1a56... Suggested-by: Julius Werner jwerner@chromium.org Signed-off-by: Javier Martinez Canillas javierm@redhat.com --- drivers/firmware/google/framebuffer-coreboot.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/firmware/google/framebuffer-coreboot.c b/drivers/firmware/google/framebuffer-coreboot.c index daadd71d8ddd..4e50da17cd7e 100644 --- a/drivers/firmware/google/framebuffer-coreboot.c +++ b/drivers/firmware/google/framebuffer-coreboot.c @@ -15,6 +15,7 @@ #include <linux/module.h> #include <linux/platform_data/simplefb.h> #include <linux/platform_device.h> +#include <linux/screen_info.h>
#include "coreboot_table.h"
@@ -27,6 +28,7 @@ static int framebuffer_probe(struct coreboot_device *dev) int i; u32 length; struct lb_framebuffer *fb = &dev->framebuffer; + struct screen_info *si = &screen_info; struct platform_device *pdev; struct resource res; struct simplefb_platform_data pdata = { @@ -36,6 +38,20 @@ static int framebuffer_probe(struct coreboot_device *dev) .format = NULL, };
+ /* + * If the global screen_info data has been filled, the Generic + * System Framebuffers (sysfb) will already register a platform + * and pass the screen_info as platform_data to a driver that + * could scan-out using the system provided framebuffer. + * + * On Coreboot systems, the advertise LB_TAG_FRAMEBUFFER entry + * in the Coreboot table should only be used if the payload did + * not set video mode info and passed it to the Linux kernel. + */ + if (si->orig_video_isVGA == VIDEO_TYPE_VLFB || + si->orig_video_isVGA == VIDEO_TYPE_EFI) + return -EINVAL; + if (!fb->physical_address) return -ENODEV;
Hi Javier,
thanks for the patch.
Am 12.09.24 um 18:33 schrieb Javier Martinez Canillas:
Julius Werner jwerner@chromium.org writes:
Hello Julius,
On Coreboot platforms, a system framebuffer may be provided to the Linux kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But it seems SeaBIOS payload can also provide a VGA mode in the boot params.
[...]
To prevent the issue, make the framebuffer_core driver to disable sysfb if there is system framebuffer data in the Coreboot table. That way only this driver will register a device and sysfb would not attempt to do it (or remove its registered device if was already executed before).
I wonder if the priority should be the other way around? coreboot's framebuffer is generally only valid when coreboot exits to the payload (e.g. SeaBIOS). Only if the payload doesn't touch the display controller or if there is no payload and coreboot directly hands off to a kernel does the kernel driver for LB_TAG_FRAMEBUFFER make sense. But if there is some other framebuffer information passed to the kernel from a firmware component running after coreboot, most likely that one is more up to date and the framebuffer described by the coreboot table doesn't work anymore (because the payload usually doesn't modify the coreboot tables again, even if it changes hardware state). So if there are two drivers fighting over which firmware framebuffer description is the correct one, the coreboot driver should probably give way.
That's a very good point. I'm actually not familiar with Coreboot and I used an educated guess (in the case of DT for example, that's the main source of truth and I didn't know if a Core table was in a similar vein).
Maybe something like the following (untested) patch then?
From de1c32017006f4671d91b695f4d6b4e99c073ab2 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas javierm@redhat.com Date: Thu, 12 Sep 2024 18:31:55 +0200 Subject: [PATCH] firmware: coreboot: Don't register a pdev if screen_info data is available
On Coreboot platforms, a system framebuffer may be provided to the Linux kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But a Coreboot payload (e.g: SeaBIOS) could also provide this information to the Linux kernel.
If that the case, early arch x86 boot code will fill the global struct screen_info data and that data used by the Generic System Framebuffers (sysfb) framework to add a platform device with platform data about the system framebuffer.
But later then the framebuffer_coreboot driver will try to do the same framebuffer (using the information from the Coreboot table), which will lead to an error due a simple-framebuffer.0 device already registered:
sysfs: cannot create duplicate filename '/bus/platform/devices/simple-framebuffer.0' ... coreboot: could not register framebuffer framebuffer coreboot8: probe with driver framebuffer failed with error -17
To prevent the issue, make the framebuffer_core driver to not register a platform device if the global struct screen_info data has been filled.
Reported-by: Brian Norris briannorris@chromium.org Link: https://lore.kernel.org/all/ZuCG-DggNThuF4pj@b20ea791c01f/T/#ma7fb65acbc1a56... Suggested-by: Julius Werner jwerner@chromium.org Signed-off-by: Javier Martinez Canillas javierm@redhat.com
drivers/firmware/google/framebuffer-coreboot.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/firmware/google/framebuffer-coreboot.c b/drivers/firmware/google/framebuffer-coreboot.c index daadd71d8ddd..4e50da17cd7e 100644 --- a/drivers/firmware/google/framebuffer-coreboot.c +++ b/drivers/firmware/google/framebuffer-coreboot.c @@ -15,6 +15,7 @@ #include <linux/module.h> #include <linux/platform_data/simplefb.h> #include <linux/platform_device.h> +#include <linux/screen_info.h> #include "coreboot_table.h" @@ -27,6 +28,7 @@ static int framebuffer_probe(struct coreboot_device *dev) int i; u32 length; struct lb_framebuffer *fb = &dev->framebuffer;
- struct screen_info *si = &screen_info;
Probably 'const'.
struct platform_device *pdev; struct resource res; struct simplefb_platform_data pdata = { @@ -36,6 +38,20 @@ static int framebuffer_probe(struct coreboot_device *dev) .format = NULL, };
- /*
* If the global screen_info data has been filled, the Generic
* System Framebuffers (sysfb) will already register a platform
* and pass the screen_info as platform_data to a driver that
* could scan-out using the system provided framebuffer.
*
* On Coreboot systems, the advertise LB_TAG_FRAMEBUFFER entry
* in the Coreboot table should only be used if the payload did
* not set video mode info and passed it to the Linux kernel.
*/
- if (si->orig_video_isVGA == VIDEO_TYPE_VLFB ||
si->orig_video_isVGA == VIDEO_TYPE_EFI)
Rather call screen_info_video_type(si) [1] to get the type. If it returns 0, the screen_info is unset and the corebios code can handle the framebuffer. In any other case, the framebuffer went through a bootloader, which might have modified it. This also handles awkward cases, such as if the bootloader programs a VGA text mode.
[1] https://elixir.bootlin.com/linux/v6.10.10/source/include/linux/screen_info.h...
With these changes:
Reviewed-by: Thomas Zimmermann tzimmermann@suse.de
Best regards Thomas
return -EINVAL;
- if (!fb->physical_address) return -ENODEV;
Thomas Zimmermann tzimmermann@suse.de writes:
Hello Thomas,
Hi Javier,
thanks for the patch.
Thanks for your feedback.
Am 12.09.24 um 18:33 schrieb Javier Martinez Canillas:
Julius Werner jwerner@chromium.org writes:
[...]
drivers/firmware/google/framebuffer-coreboot.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/firmware/google/framebuffer-coreboot.c b/drivers/firmware/google/framebuffer-coreboot.c index daadd71d8ddd..4e50da17cd7e 100644 --- a/drivers/firmware/google/framebuffer-coreboot.c +++ b/drivers/firmware/google/framebuffer-coreboot.c @@ -15,6 +15,7 @@ #include <linux/module.h> #include <linux/platform_data/simplefb.h> #include <linux/platform_device.h> +#include <linux/screen_info.h> #include "coreboot_table.h" @@ -27,6 +28,7 @@ static int framebuffer_probe(struct coreboot_device *dev) int i; u32 length; struct lb_framebuffer *fb = &dev->framebuffer;
- struct screen_info *si = &screen_info;
Probably 'const'.
Ok.
struct platform_device *pdev; struct resource res; struct simplefb_platform_data pdata = { @@ -36,6 +38,20 @@ static int framebuffer_probe(struct coreboot_device *dev) .format = NULL, };
- /*
* If the global screen_info data has been filled, the Generic
* System Framebuffers (sysfb) will already register a platform
* and pass the screen_info as platform_data to a driver that
* could scan-out using the system provided framebuffer.
*
* On Coreboot systems, the advertise LB_TAG_FRAMEBUFFER entry
* in the Coreboot table should only be used if the payload did
* not set video mode info and passed it to the Linux kernel.
*/
- if (si->orig_video_isVGA == VIDEO_TYPE_VLFB ||
si->orig_video_isVGA == VIDEO_TYPE_EFI)
Rather call screen_info_video_type(si) [1] to get the type. If it
Indeed. I missed that helper, I'll change it.
returns 0, the screen_info is unset and the corebios code can handle the framebuffer. In any other case, the framebuffer went through a bootloader, which might have modified it. This also handles awkward cases, such as if the bootloader programs a VGA text mode.
[1] https://elixir.bootlin.com/linux/v6.10.10/source/include/linux/screen_info.h...
With these changes:
Reviewed-by: Thomas Zimmermann tzimmermann@suse.de
Thanks. I'll wait for others in this thread to comment and if all agree with the solution, I'll post a proper patch (addressing your comments).
Best regards Thomas
Hi Javier,
On Thu, Sep 12, 2024 at 06:33:58PM +0200, Javier Martinez Canillas wrote:
That's a very good point. I'm actually not familiar with Coreboot and I used an educated guess (in the case of DT for example, that's the main source of truth and I didn't know if a Core table was in a similar vein).
Maybe something like the following (untested) patch then?
Julius is more familiar with the Coreboot + payload ecosystem than me, but his explanations make sense to me, as does this patch.
From de1c32017006f4671d91b695f4d6b4e99c073ab2 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas javierm@redhat.com Date: Thu, 12 Sep 2024 18:31:55 +0200 Subject: [PATCH] firmware: coreboot: Don't register a pdev if screen_info data is available
On Coreboot platforms, a system framebuffer may be provided to the Linux kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But a Coreboot payload (e.g: SeaBIOS) could also provide this information to the Linux kernel.
If that the case, early arch x86 boot code will fill the global struct screen_info data and that data used by the Generic System Framebuffers (sysfb) framework to add a platform device with platform data about the system framebuffer.
Normally, these sorts of "early" and "later" ordering descriptions would set alarm bells when talking about independent drivers. But I suppose the "early arch" code has better ordering guaranteeds than drivers, so this should be fine.
But later then the framebuffer_coreboot driver will try to do the same framebuffer (using the information from the Coreboot table), which will lead to an error due a simple-framebuffer.0 device already registered:
sysfs: cannot create duplicate filename '/bus/platform/devices/simple-framebuffer.0' ... coreboot: could not register framebuffer framebuffer coreboot8: probe with driver framebuffer failed with error -17
To prevent the issue, make the framebuffer_core driver to not register a platform device if the global struct screen_info data has been filled.
Reported-by: Brian Norris briannorris@chromium.org Link: https://lore.kernel.org/all/ZuCG-DggNThuF4pj@b20ea791c01f/T/#ma7fb65acbc1a56... Suggested-by: Julius Werner jwerner@chromium.org Signed-off-by: Javier Martinez Canillas javierm@redhat.com
drivers/firmware/google/framebuffer-coreboot.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/firmware/google/framebuffer-coreboot.c b/drivers/firmware/google/framebuffer-coreboot.c index daadd71d8ddd..4e50da17cd7e 100644 --- a/drivers/firmware/google/framebuffer-coreboot.c +++ b/drivers/firmware/google/framebuffer-coreboot.c @@ -15,6 +15,7 @@ #include <linux/module.h> #include <linux/platform_data/simplefb.h> #include <linux/platform_device.h> +#include <linux/screen_info.h> #include "coreboot_table.h" @@ -27,6 +28,7 @@ static int framebuffer_probe(struct coreboot_device *dev) int i; u32 length; struct lb_framebuffer *fb = &dev->framebuffer;
- struct screen_info *si = &screen_info; struct platform_device *pdev; struct resource res; struct simplefb_platform_data pdata = {
@@ -36,6 +38,20 @@ static int framebuffer_probe(struct coreboot_device *dev) .format = NULL, };
- /*
* If the global screen_info data has been filled, the Generic
* System Framebuffers (sysfb) will already register a platform
Did you mean 'platform_device'?
* and pass the screen_info as platform_data to a driver that
* could scan-out using the system provided framebuffer.
*
* On Coreboot systems, the advertise LB_TAG_FRAMEBUFFER entry
s/advertise/advertised/ ?
* in the Coreboot table should only be used if the payload did
* not set video mode info and passed it to the Linux kernel.
s/passed/pass/
*/
- if (si->orig_video_isVGA == VIDEO_TYPE_VLFB ||
si->orig_video_isVGA == VIDEO_TYPE_EFI)
This line is using spaces for indentation. It should use a tab, and then spaces for alignment. But presumably this will change based on Thomas's suggestions anyway.
return -EINVAL;
Is EINVAL right? IIUC, that will print a noisier error to the logs. I believe the "expected" sorts of return codes are ENODEV or ENXIO. (See call_driver_probe().) ENODEV seems like a fine choice, similar to several of the other return codes already used here.
Anyway, this seems along the right track. Thanks for tackling, and feel free to carry a:
Reviewed-by: Brian Norris briannorris@chromium.org
- if (!fb->physical_address) return -ENODEV;
Best regards,
Javier Martinez Canillas Core Platforms Red Hat
Brian Norris briannorris@chromium.org writes:
Hello Brian,
Hi Javier,
On Thu, Sep 12, 2024 at 06:33:58PM +0200, Javier Martinez Canillas wrote:
That's a very good point. I'm actually not familiar with Coreboot and I used an educated guess (in the case of DT for example, that's the main source of truth and I didn't know if a Core table was in a similar vein).
Maybe something like the following (untested) patch then?
Julius is more familiar with the Coreboot + payload ecosystem than me, but his explanations make sense to me, as does this patch.
From de1c32017006f4671d91b695f4d6b4e99c073ab2 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas javierm@redhat.com Date: Thu, 12 Sep 2024 18:31:55 +0200 Subject: [PATCH] firmware: coreboot: Don't register a pdev if screen_info data is available
On Coreboot platforms, a system framebuffer may be provided to the Linux kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But a Coreboot payload (e.g: SeaBIOS) could also provide this information to the Linux kernel.
If that the case, early arch x86 boot code will fill the global struct screen_info data and that data used by the Generic System Framebuffers (sysfb) framework to add a platform device with platform data about the system framebuffer.
Normally, these sorts of "early" and "later" ordering descriptions would set alarm bells when talking about independent drivers. But I suppose the "early arch" code has better ordering guaranteeds than drivers, so this should be fine.
Yes, I didn't want to imply ordering here but just mentioning what code was registering a "simple-framebuffer" platform_device, that conflicted with this driver.
But later then the framebuffer_coreboot driver will try to do the same framebuffer (using the information from the Coreboot table), which will lead to an error due a simple-framebuffer.0 device already registered:
[...]
- /*
* If the global screen_info data has been filled, the Generic
* System Framebuffers (sysfb) will already register a platform
Did you mean 'platform_device'?
Ups, yeah I forgot to write device there.
* and pass the screen_info as platform_data to a driver that
* could scan-out using the system provided framebuffer.
*
* On Coreboot systems, the advertise LB_TAG_FRAMEBUFFER entry
s/advertise/advertised/ ?
Ok.
>> + * in the Coreboot table should only be used if the payload did
* not set video mode info and passed it to the Linux kernel.
s/passed/pass/
Ok.
*/
- if (si->orig_video_isVGA == VIDEO_TYPE_VLFB ||
si->orig_video_isVGA == VIDEO_TYPE_EFI)
This line is using spaces for indentation. It should use a tab, and then spaces for alignment. But presumably this will change based on Thomas's suggestions anyway.
Yes, I usually run checkpatch --strict before posting but didn't in this case because just shared the patch as a response.
return -EINVAL;
Is EINVAL right? IIUC, that will print a noisier error to the logs. I believe the "expected" sorts of return codes are ENODEV or ENXIO. (See call_driver_probe().) ENODEV seems like a fine choice, similar to several of the other return codes already used here.
You are right, -ENODEV is indeed a more suitable error code for this.
Anyway, this seems along the right track. Thanks for tackling, and feel free to carry a:
Reviewed-by: Brian Norris briannorris@chromium.org
Thanks and for your comments.
On Mon, Sep 9, 2024 at 1:02 AM Borislav Petkov bp@alien8.de wrote:
On Sun, Sep 08, 2024 at 11:53:56PM -0700, Hugues Bruant wrote:
Hi,
I have discovered a 100% reliable soft lockup on boot on my laptop: Purism Librem 14, Intel Core i7-10710U, 48Gb RAM, Samsung Evo Plus 970 SSD, CoreBoot BIOS, grub bootloader, Arch Linux.
The last working release is kernel 6.9.10, every release from 6.10 onwards reliably exhibit the issue, which, based on journalctl logs, seems to be triggered somewhere in systemd-udev: https://gitlab.archlinux.org/-/project/42594/uploads/04583baf22189a0a8bb2f87...
Bisect points to commit 5186ba33234c9a90833f7c93ce7de80e25fac6f5
That's a merge commit. Meaning, the bisection likely went into the wrong direction.
I double-checked and the bisection results seem quite consistent. While merge commits are unlikely to be correct bisection results, they're entirely possible if the bug is triggered by an unexpected interaction between multiple unrelated commits.
However, you have out-of-tree modules. Try reproducing it without them.
That was the first suggestion on the Arch bug tracker. The whole bisection was done without out-of-tree modules.
Now, for the fun part: the kind soul on the Arch bugtracker who provided me with the kernel images for bisection built a patched 6.10.9 at my request, reverting just Tony's RDT changes that were flagged by the bisection: bd4955d4bc2182ccb660c9c30a4dd7f36feaf943 and e3ca96e479c91d6ee657d3caa5092a6a3a620f9f
That patch bring the boot success rate on my machine from 0/10 up to 4/10, even though this code is not supposed to be used, its presence is clearly impactful!
The framebuffer fix seems to also have a positive (though smaller, closer to 20%) impact on boot success rate, so I'm planning to test the combination of both as a next step.
See some extra boot logs attached
linux-stable-mirror@lists.linaro.org