On Fri, Feb 10, 2023 at 10:13 AM Kajetan Puchalski kajetan.puchalski@arm.com wrote:
Hi Yafang,
After commit 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN"), the content of the format file under /sys/kernel/debug/tracing/events/task/task_newtask was changed from field:char comm[16]; offset:12; size:16; signed:0; to field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0;
John reported that this change breaks older versions of perfetto. Then Mathieu pointed out that this behavioral change was caused by the use of __stringify(_len), which happens to work on macros, but not on enum labels. And he also gave the suggestion on how to fix it: :One possible solution to make this more robust would be to extend :struct trace_event_fields with one more field that indicates the length :of an array as an actual integer, without storing it in its stringified :form in the type, and do the formatting in f_show where it belongs.
The result as follows after this change, $ cat /sys/kernel/debug/tracing/events/task/task_newtask/format field:char comm[16]; offset:12; size:16; signed:0;
Fixes: 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN") Reported-by: John Stultz jstultz@google.com Debugged-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Suggested-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Yafang Shao laoar.shao@gmail.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Alexei Starovoitov alexei.starovoitov@gmail.com Cc: Kajetan Puchalski kajetan.puchalski@arm.com Cc: Steven Rostedt rostedt@goodmis.org Cc: John Stultz jstultz@google.com Cc: stable@vger.kernel.org # v5.17+
From my testing this works the same with older Perfetto as the previous diff you sent in the older thread, ie this type of errors:
Name Value Type mismatched_sched_switch_tids 2439 error (analysis) systrace_parse_failure 3853 error (analysis)
Meaning that applying this patch on top of the old one ends up with different results than just reverting the old one so not a 100% fix but as I said before, still an improvement.
fwiw I don't mind reverting commit 3087c61ed2c4. For the record it didn't go through the bpf tree. It went through mm.
But first we need to define which part of ftrace format is an abi. I think it's a format of tracing/event/foo/format file and not the contents of it. The tracepoints come and go. Their arguments get changed too. So the contents of ftrace format files change.
In this case Perfetto stumbled on parsing field:char comm[TASK_COMM_LEN]; offset:8;
We probably should define that 'field: value offset: value' is an abi, but value-s can change. Say offset:8 becomes offset:+8, for whatever reason. If Perfetto fails to parse it, it's a Perfetto bug.
In this case it failed to parse char comm[TASK_COMM_LEN]; But that's not the only such "field:". These three were there for a long time. field:u32 rates[NUM_NL80211_BANDS]; offset:16; size:24; field:int mcast_rate[NUM_NL80211_BANDS]; offset:60; size:24; field:int mcast_rate[NUM_NL80211_BANDS]; offset:108; size:24;
I suspect Perfetto didn't have a use case to parse them, so the bug stayed dormant and a change in TASK_COMM_LEN triggered it.
We can use Yafang's patch to do: -field:%.*s %s%s; +field:%.*s %s[%d]; but it will affect both TASK_COMM_LEN and NUM_NL80211_BANDS.
In summary the TASK_COMM_LEN change from #define to enum didn't introduce anything new in the kind of "value"s being printed in ftrace files. Hence I'm arguing there is no abi breakage.
There was a question why change from #define to enum is useful to bpf. The reason is that #define-s are not seen in dwarf whereas enums and their values are. bpf tooling has ways to extract that data and auto-adjust bpf programs when enum-s disappear or their values change.