On Wed, Mar 13, 2024 at 12:42 AM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Mon, Mar 11, 2024 at 7:42 PM 梦龙董 dongmenglong.8@bytedance.com wrote:
[......]
I see. I thought you're sharing the trampoline across attachments. (since bpf prog is the same).
That seems to be a good idea, which I hadn't thought before.
But above approach cannot possibly work with a shared trampoline. You need to create individual trampoline for all attachment and point them to single bpf prog.
tbh I'm less excited about this feature now, since sharing the prog across different attachments is nice, but it won't scale to thousands of attachments. I assumed that there will be a single trampoline with max(argno) across attachments and attach/detach will scale to thousands.
With individual trampoline this will work for up to a hundred attachments max.
What does "a hundred attachments max" means? Can't I trace thousands of kernel functions with a bpf program of tracing multi-link?
Let's step back. What is the exact use case you're trying to solve? Not an artificial one as selftest in patch 9, but the real use case?
I have a tool, which is used to diagnose network problems, and its name is "nettrace". It will trace many kernel functions, whose function args contain "skb", like this:
./nettrace -p icmp begin trace... ***************** ffff889be8fbd500,ffff889be8fbcd00 *************** [1272349.614564] [dev_gro_receive ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614579] [__netif_receive_skb_core] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614585] [ip_rcv ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614592] [ip_rcv_core ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614599] [skb_clone ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614616] [nf_hook_slow ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614629] [nft_do_chain ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614635] [ip_rcv_finish ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614643] [ip_route_input_slow ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614647] [fib_validate_source ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614652] [ip_local_deliver ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614658] [nf_hook_slow ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614663] [ip_local_deliver_finish] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614666] [icmp_rcv ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614671] [icmp_echo ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614675] [icmp_reply ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614715] [consume_skb ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614722] [packet_rcv ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220 [1272349.614725] [consume_skb ] ICMP: 169.254.128.15 -> 172.27.0.6 ping request, seq: 48220
For now, I have to create a bpf program for every kernel function that I want to trace, which is up to 200.
With this multi-link, I only need to create 5 bpf program, like this:
int BPF_PROG(trace_skb_1, struct *skb); int BPF_PROG(trace_skb_2, u64 arg0, struct *skb); int BPF_PROG(trace_skb_3, u64 arg0, u64 arg1, struct *skb); int BPF_PROG(trace_skb_4, u64 arg0, u64 arg1, u64 arg2, struct *skb); int BPF_PROG(trace_skb_5, u64 arg0, u64 arg1, u64 arg2, u64 arg3, struct *skb);
Then, I can attach trace_skb_1 to all the kernel functions that I want to trace and whose first arg is skb; attach trace_skb_2 to kernel functions whose 2nd arg is skb, etc.
Or, I can create only one bpf program and store the index of skb to the attachment cookie, and attach this program to all the kernel functions that I want to trace.
This is my use case. With the multi-link, now I only have 1 bpf program, 1 bpf link, 200 trampolines, instead of 200 bpf programs, 200 bpf link and 200 trampolines.
The shared trampoline you mentioned seems to be a wonderful idea, which can make the 200 trampolines to one. Let me have a look, we create a trampoline and record the max args count of all the target functions, let's mark it as arg_count.
During generating the trampoline, we assume that the function args count is arg_count. During attaching, we check the consistency of all the target functions, just like what we do now.
Am I right?
Thanks! Menglong Dong