There was a missing call to stack_depot_init() that is needed if CONFIG_DRM_XE_DEBUG_GUC is defined. That is fixed in the simplest possible way in the first patch. Second patch refactors it to try to isolate the ifdefs in specific functions related to CONFIG_DRM_XE_DEBUG and CONFIG_DRM_XE_DEBUG_GUC.
Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com --- Lucas De Marchi (2): drm/xe/guc: Fix stack_depot usage drm/xe/guc_ct: Cleanup ifdef'ry
drivers/gpu/drm/xe/xe_guc_ct.c | 204 +++++++++++++++++++++-------------------- 1 file changed, 107 insertions(+), 97 deletions(-)
base-commit: b603326a067916accf680fd623f4fc3c22bba487 change-id: 20251117-fix-debug-guc-3d79bbe9dead
Lucas De Marchi
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack:
[] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe]
Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge sagar.ghuge@intel.com Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/xe/xe_guc_ct.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 2697d711adb2b..07ae0d601910e 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -236,6 +236,9 @@ int xe_guc_ct_init_noalloc(struct xe_guc_ct *ct) #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) spin_lock_init(&ct->dead.lock); INIT_WORK(&ct->dead.worker, ct_dead_worker_func); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) + stack_depot_init(); +#endif #endif init_waitqueue_head(&ct->wq); init_waitqueue_head(&ct->g2h_fence_wq);
On Tue, 2025-11-18 at 11:08 -0800, Lucas De Marchi wrote:
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack:
[] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe]
Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge sagar.ghuge@intel.com Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
Reviewed-by: Stuart Summers stuart.summers@intel.com
I believe in CI we're setting the DEBUG_MM config option which also does this. It looks like that stack_depot_init() checks if it was already initialized (statically) before doing the initialization, so should be harmless calling this twice if we do have that config set.
Thanks, Stuart
drivers/gpu/drm/xe/xe_guc_ct.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 2697d711adb2b..07ae0d601910e 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -236,6 +236,9 @@ int xe_guc_ct_init_noalloc(struct xe_guc_ct *ct) #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) spin_lock_init(&ct->dead.lock); INIT_WORK(&ct->dead.worker, ct_dead_worker_func); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) + stack_depot_init(); +#endif #endif init_waitqueue_head(&ct->wq); init_waitqueue_head(&ct->g2h_fence_wq);
On 11/18/2025 8:08 PM, Lucas De Marchi wrote:
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack:
[] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe]
Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge sagar.ghuge@intel.com Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
drivers/gpu/drm/xe/xe_guc_ct.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 2697d711adb2b..07ae0d601910e 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -236,6 +236,9 @@ int xe_guc_ct_init_noalloc(struct xe_guc_ct *ct) #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) spin_lock_init(&ct->dead.lock); INIT_WORK(&ct->dead.worker, ct_dead_worker_func); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
- stack_depot_init();
+#endif
shouldn't we just update our Kconfig by adding in DRM_XE_DEBUG_GUC
select STACKDEPOT_ALWAYS_INIT
it's the first option listed in [1]
[1] https://elixir.bootlin.com/linux/v6.18-rc6/source/include/linux/stackdepot.h...
#endif init_waitqueue_head(&ct->wq); init_waitqueue_head(&ct->g2h_fence_wq);
On Tue, Nov 18, 2025 at 08:29:09PM +0100, Michal Wajdeczko wrote:
On 11/18/2025 8:08 PM, Lucas De Marchi wrote:
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack:
[] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe]
Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge sagar.ghuge@intel.com Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
drivers/gpu/drm/xe/xe_guc_ct.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 2697d711adb2b..07ae0d601910e 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -236,6 +236,9 @@ int xe_guc_ct_init_noalloc(struct xe_guc_ct *ct) #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) spin_lock_init(&ct->dead.lock); INIT_WORK(&ct->dead.worker, ct_dead_worker_func); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
- stack_depot_init();
+#endif
shouldn't we just update our Kconfig by adding in DRM_XE_DEBUG_GUC
select STACKDEPOT_ALWAYS_INIT
didn't know about that, thanks.... but that doesn't seem suitable for a something that will be a module that may or may not get loaded depending on hw configuration.
Indeed, the option 3 says:
3. Calling stack_depot_init(). Possible after boot is complete. This option is recommended for modules initialized later in the boot process, after mm_init() completes.
So I think it's preferred to do what we are doing here.
Lucas De Marchi
it's the first option listed in [1]
[1] https://elixir.bootlin.com/linux/v6.18-rc6/source/include/linux/stackdepot.h...
#endif init_waitqueue_head(&ct->wq); init_waitqueue_head(&ct->g2h_fence_wq);
On 11/18/2025 8:50 PM, Lucas De Marchi wrote:
On Tue, Nov 18, 2025 at 08:29:09PM +0100, Michal Wajdeczko wrote:
On 11/18/2025 8:08 PM, Lucas De Marchi wrote:
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack:
[] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe]
Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge sagar.ghuge@intel.com Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
drivers/gpu/drm/xe/xe_guc_ct.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 2697d711adb2b..07ae0d601910e 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -236,6 +236,9 @@ int xe_guc_ct_init_noalloc(struct xe_guc_ct *ct) #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) spin_lock_init(&ct->dead.lock); INIT_WORK(&ct->dead.worker, ct_dead_worker_func); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) + stack_depot_init(); +#endif
shouldn't we just update our Kconfig by adding in DRM_XE_DEBUG_GUC
select STACKDEPOT_ALWAYS_INIT
didn't know about that, thanks.... but that doesn't seem suitable for a something that will be a module that may or may not get loaded depending on hw configuration.
true in general, but here we need stackdepot for the DEBUG_GUC which likely will selected only by someone who already has the right platform and plans to load the xe
Indeed, the option 3 says:
3. Calling stack_depot_init(). Possible after boot is complete. This option is recommended for modules initialized later in the boot process, after mm_init() completes.
So I think it's preferred to do what we are doing here.
Lucas De Marchi
it's the first option listed in [1]
[1] https://elixir.bootlin.com/linux/v6.18-rc6/source/include/linux/stackdepot.h...
#endif init_waitqueue_head(&ct->wq); init_waitqueue_head(&ct->g2h_fence_wq);
On Tue, 2025-11-18 at 21:09 +0100, Michal Wajdeczko wrote:
On 11/18/2025 8:50 PM, Lucas De Marchi wrote:
On Tue, Nov 18, 2025 at 08:29:09PM +0100, Michal Wajdeczko wrote:
On 11/18/2025 8:08 PM, Lucas De Marchi wrote:
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack:
[] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe]
Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge sagar.ghuge@intel.com Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
drivers/gpu/drm/xe/xe_guc_ct.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 2697d711adb2b..07ae0d601910e 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -236,6 +236,9 @@ int xe_guc_ct_init_noalloc(struct xe_guc_ct *ct) #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) spin_lock_init(&ct->dead.lock); INIT_WORK(&ct->dead.worker, ct_dead_worker_func); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) + stack_depot_init(); +#endif
shouldn't we just update our Kconfig by adding in DRM_XE_DEBUG_GUC
select STACKDEPOT_ALWAYS_INIT
didn't know about that, thanks.... but that doesn't seem suitable for a something that will be a module that may or may not get loaded depending on hw configuration.
true in general, but here we need stackdepot for the DEBUG_GUC which likely will selected only by someone who already has the right platform and plans to load the xe
Another counterargument here is that drm_mm (and even core_mm) are explicitly calling this during initialization. And what if we decide to add another use case in Xe for the same? Shouldn't we follow this same approach?
We could also do both here - kconfig and in code. As mentioned it doesn't hurt given the way the stackdepot init is written.
Thanks, Stuart
Indeed, the option 3 says:
3. Calling stack_depot_init(). Possible after boot is complete. This option is recommended for modules initialized later in the boot process, after mm_init() completes.
So I think it's preferred to do what we are doing here.
Lucas De Marchi
it's the first option listed in [1]
[1] https://elixir.bootlin.com/linux/v6.18-rc6/source/include/linux/stackdepot.h...
#endif init_waitqueue_head(&ct->wq); init_waitqueue_head(&ct->g2h_fence_wq);
On Tue, Nov 18, 2025 at 09:09:58PM +0100, Michal Wajdeczko wrote:
On 11/18/2025 8:50 PM, Lucas De Marchi wrote:
On Tue, Nov 18, 2025 at 08:29:09PM +0100, Michal Wajdeczko wrote:
On 11/18/2025 8:08 PM, Lucas De Marchi wrote:
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack:
[] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe]
Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge sagar.ghuge@intel.com Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
drivers/gpu/drm/xe/xe_guc_ct.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 2697d711adb2b..07ae0d601910e 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -236,6 +236,9 @@ int xe_guc_ct_init_noalloc(struct xe_guc_ct *ct) #if IS_ENABLED(CONFIG_DRM_XE_DEBUG) spin_lock_init(&ct->dead.lock); INIT_WORK(&ct->dead.worker, ct_dead_worker_func); +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC) + stack_depot_init(); +#endif
shouldn't we just update our Kconfig by adding in DRM_XE_DEBUG_GUC
select STACKDEPOT_ALWAYS_INIT
didn't know about that, thanks.... but that doesn't seem suitable for a something that will be a module that may or may not get loaded depending on hw configuration.
true in general, but here we need stackdepot for the DEBUG_GUC which likely will selected only by someone who already has the right platform and plans to load the xe
conversely, if we have DRM_XE_DEBUG_GUC set there's no downside in calling stack_depot_init(). Any performance penalty argument is gone by "you are using DRM_XE_DEBUG_GUC".
$ git grep "select STACKDEPOT_ALWAYS_INIT" lib/Kconfig.kasan: select STACKDEPOT_ALWAYS_INIT lib/Kconfig.kmsan: select STACKDEPOT_ALWAYS_INIT mm/Kconfig.debug: select STACKDEPOT_ALWAYS_INIT if STACKTRACE_SUPPORT mm/Kconfig.debug: select STACKDEPOT_ALWAYS_INIT if !DEBUG_KMEMLEAK_DEFAULT_OFF
The only users right now of STACKDEPOT_ALWAYS_INIT make sense as they are core ones. There's not a single driver using STACKDEPOT_ALWAYS_INIT. drm and ref_tracker, on the other hand use stack_depot_init()
Lucas De Marchi
linux-stable-mirror@lists.linaro.org