On Sun, 25 May 2025 13:36:16 -0400 Kent Overstreet kent.overstreet@linux.dev wrote:
We already have "trace max stack", but that only checks at process exit, so it doesn't tell you much.
Nope, it traces the stack at every function call, but it misses the leaf functions and also doesn't check interrupts as they may use a different stack.
We could do better with tracing - just inject a trampoline that checks the current stack usage against the maximum stack usage we've seen, and emits a trace event with a stack trace if it's greater.
(and now Steve's going to tell us he's already done this :)
Close ;-)
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
Wait.
# cat /sys/kernel/tracing/stack_trace Depth Size Location (33 entries) ----- ---- -------- 0) 8360 48 __msecs_to_jiffies+0x9/0x30 1) 8312 104 update_group_capacity+0x95/0x970 2) 8208 520 update_sd_lb_stats.constprop.0+0x278/0x2f40 3) 7688 416 sched_balance_find_src_group+0x96/0xe30 4) 7272 512 sched_balance_rq+0x53f/0x2fe0 5) 6760 344 sched_balance_newidle+0x6c1/0x1310 6) 6416 80 pick_next_task_fair+0x55/0xe60 7) 6336 328 __schedule+0x8a5/0x33d0 8) 6008 32 schedule+0xe2/0x3b0 9) 5976 32 io_schedule+0x8f/0xf0 10) 5944 264 rq_qos_wait+0x12a/0x200 11) 5680 144 wbt_wait+0x159/0x260 12) 5536 40 __rq_qos_throttle+0x50/0x90 13) 5496 320 blk_mq_submit_bio+0x70b/0x1ff0 14) 5176 240 __submit_bio+0x1b3/0x600 15) 4936 248 submit_bio_noacct_nocheck+0x546/0xca0 16) 4688 144 ext4_bio_write_folio+0x69d/0x1870 17) 4544 64 mpage_submit_folio+0x14c/0x2b0 18) 4480 96 mpage_process_page_bufs+0x392/0x7a0 19) 4384 632 mpage_prepare_extent_to_map+0xa5b/0x1080 20) 3752 496 ext4_do_writepages+0x8af/0x2ee0 21) 3256 304 ext4_writepages+0x26f/0x5c0 22) 2952 344 do_writepages+0x183/0x7c0 23) 2608 152 __writeback_single_inode+0x114/0xb00 24) 2456 744 writeback_sb_inodes+0x52b/0xdf0 25) 1712 168 __writeback_inodes_wb+0xf4/0x270 26) 1544 312 wb_writeback+0x547/0x800 27) 1232 328 wb_workfn+0x7b1/0xbc0 28) 904 352 process_one_work+0x85a/0x1450 29) 552 176 worker_thread+0x5b7/0xf80 30) 376 168 kthread+0x371/0x720 31) 208 32 ret_from_fork+0x34/0x70 32) 176 176 ret_from_fork_asm+0x1a/0x30
The code that does this is in kernel/trace/trace_stack.c
It simply attaches to the function tracer and at ever function checks the current stack size.
Hmm, I need to update this because today we even pass the stack pointer via the ftrace_regs if the arch supports it. Using that would allow me to get rid of the hack:
static void check_stack(unsigned long ip, unsigned long *stack) { [..] this_size = ((unsigned long)stack) & (THREAD_SIZE-1); this_size = THREAD_SIZE - this_size;
unsigned long stack;
[..]
static void stack_trace_call(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *op, struct ftrace_regs *fregs) { unsigned long stack; [..]
check_stack(ip, &stack);
-- Steve