On Tue, 2025-11-18 at 21:09 +0100, Michal Wajdeczko wrote:
On 11/18/2025 8:50 PM, Lucas De Marchi wrote:
On Tue, Nov 18, 2025 at 08:29:09PM +0100, Michal Wajdeczko wrote:
On 11/18/2025 8:08 PM, Lucas De Marchi wrote:
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack:
[] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe]
Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge sagar.ghuge@intel.com Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
drivers/gpu/drm/xe/xe_guc_ct.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 2697d711adb2b..07ae0d601910e 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -236,6 +236,9 @@ int xe_guc_ct_init_noalloc(struct xe_guc_ct *ct) #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) spin_lock_init(&ct->dead.lock); INIT_WORK(&ct->dead.worker, ct_dead_worker_func); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) + stack_depot_init(); +#endif
shouldn't we just update our Kconfig by adding in DRM_XE_DEBUG_GUC
select STACKDEPOT_ALWAYS_INIT
didn't know about that, thanks.... but that doesn't seem suitable for a something that will be a module that may or may not get loaded depending on hw configuration.
true in general, but here we need stackdepot for the DEBUG_GUC which likely will selected only by someone who already has the right platform and plans to load the xe
Another counterargument here is that drm_mm (and even core_mm) are explicitly calling this during initialization. And what if we decide to add another use case in Xe for the same? Shouldn't we follow this same approach?
We could also do both here - kconfig and in code. As mentioned it doesn't hurt given the way the stackdepot init is written.
Thanks, Stuart
Indeed, the option 3 says:
3. Calling stack_depot_init(). Possible after boot is complete. This option is recommended for modules initialized later in the boot process, after mm_init() completes.
So I think it's preferred to do what we are doing here.
Lucas De Marchi
it's the first option listed in [1]
[1] https://elixir.bootlin.com/linux/v6.18-rc6/source/include/linux/stackdepot.h...
#endif init_waitqueue_head(&ct->wq); init_waitqueue_head(&ct->g2h_fence_wq);