On Fri, 14 Aug 2020, Alexei Starovoitov wrote:
On Fri, Aug 14, 2020 at 02:06:37PM +0100, Alan Maguire wrote:
On Wed, 12 Aug 2020, Alexei Starovoitov wrote:
On Thu, Aug 06, 2020 at 03:42:23PM +0100, Alan Maguire wrote:
The bpf_trace_printk tracepoint is augmented with a "trace_id" field; it is used to allow tracepoint filtering as typed display information can easily be interspersed with other tracing data, making it hard to read. Specifying a trace_id will allow users to selectively trace data, eliminating noise.
Since trace_id is not seen in trace_pipe, how do you expect users to filter by it?
Sorry should have specified this. The approach is to use trace instances and filtering such that we only see events associated with a specific trace_id. There's no need for the trace event to actually display the trace_id - it's still usable as a filter. The steps involved are:
- create a trace instance within which we can specify a fresh set of trace event enablings, filters etc.
mkdir /sys/kernel/debug/tracing/instances/traceid100
- enable the filter for the specific trace id
echo "trace_id == 100" > /sys/kernel/debug/tracing/instances/traceid100/events/bpf_trace/bpf_trace_printk/filter
- enable the trace event
echo 1 > /sys/kernel/debug/tracing/instances/events/bpf_trace/bpf_trace_printk/enable
- ensure the BPF program uses a trace_id 100 when calling bpf_trace_btf()
ouch. I think you interpreted the acceptance of the commit 7fb20f9e901e ("bpf, doc: Remove references to warning message when using bpf_trace_printk()") in the wrong way.
Everything that doc had said is still valid. In particular: -A: This is done to nudge program authors into better interfaces when -programs need to pass data to user space. Like bpf_perf_event_output() -can be used to efficiently stream data via perf ring buffer. -BPF maps can be used for asynchronous data sharing between kernel -and user space. bpf_trace_printk() should only be used for debugging.
bpf_trace_printk is for debugging only. _debugging of bpf programs themselves_. What you're describing above is logging and tracing. It's not debugging of programs. perf buffer, ring buffer, and seq_file interfaces are the right interfaces for tracing, logging, and kernel debugging.
It also feels like workaround. May be let bpf prog print the whole struct in one go with multiple new lines and call trace_bpf_trace_printk(buf) once?
We can do that absolutely, but I'd be interested to get your take on the filtering mechanism before taking that approach. I'll add a description of the above mechanism to the cover letter and patch to be clearer next time too.
I think patch 3 is no go, because it takes bpf_trace_printk in the wrong direction. Instead please refactor it to use string buffer or seq_file as an output.
Fair enough. I'm thinking a helper like
long bpf_btf_snprintf(char *str, u32 str_size, struct btf_ptr *ptr, u32 ptr_size, u64 flags);
Then the user can choose perf event or ringbuf interfaces to share the results with userspace.
If the user happen to use bpf_trace_printk("%s", buf); after that to print that string buffer to trace_pipe that's user's choice. I can see such use case when program author wants to debug their bpf program. That's fine. But for kernel debugging, on demand and "always on" logging and tracing the documentation should point to sustainable interfaces that don't interfere with each other, can be run in parallel by multiple users, etc.
The problem with bpf_trace_printk() under this approach is that the string size for %s arguments is very limited; bpf_trace_printk() restricts these to 64 bytes in size. Looks like bpf_seq_printf() restricts a %s string to 128 bytes also. We could add an additional helper for the bpf_seq case which calls bpf_seq_printf() for each component in the object, i.e.
long bpf_seq_btf_printf(struct seq_file *m, struct btf_ptr *ptr, u32 ptr_size, u64 flags);
This would steer users away from bpf_trace_printk() for this use case - since it can print only a small amount of the string - while supporting all the other user-space communication mechanisms.
Alan