On Fri, Mar 15, 2024 at 4:00 PM 梦龙董 dongmenglong.8@bytedance.com wrote:
On Thu, Mar 14, 2024 at 8:27 AM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Tue, Mar 12, 2024 at 6:53 PM 梦龙董 dongmenglong.8@bytedance.com wrote:
[......]
What does "a hundred attachments max" means? Can't I trace thousands of kernel functions with a bpf program of tracing multi-link?
I mean what time does it take to attach one program to 100 fentry-s ? What is the time for 1k and for 10k ?
The kprobe multi test attaches to pretty much all funcs in /sys/kernel/tracing/available_filter_functions and it's fast enough to run in test_progs on every commit in bpf CI. See get_syms() in prog_tests/kprobe_multi_test.c
Can this new multi fentry do that? and at what speed? The answer will decide how applicable this api is going to be. Generating different trampolines for every attach point is an approach as well. Pls benchmark it too.
I see. Creating plenty of trampolines does take a lot of time, and I'll do testing on it.
I have done a simple benchmark on creating 1000 trampolines. It is slow, quite slow, which consume up to 60s. We can't do it this way.
Now, I have a bad idea. How about we introduce a "dynamic trampoline"? The basic logic of it can be:
""" save regs bpfs = trampoline_lookup_ip(ip) fentry = bpfs->fentries while fentry: fentry(ctx) fentry = fentry->next
call origin save return value
fexit = bpfs->fexits while fexit: fexit(ctx) fexit = fexit->next
xxxxxx """
And we lookup the "bpfs" by the function ip in a hash map in trampoline_lookup_ip. The type of "bpfs" is:
struct bpf_array { struct bpf_prog *fentries; struct bpf_prog *fexits; struct bpf_prog *modify_returns; }
When we need to attach the bpf progA to function A/B/C, we only need to create the bpf_arrayA, bpf_arrayB, bpf_arrayC and add the progA to them, and insert them to the hash map "direct_call_bpfs", and attach the "dynamic trampoline" to A/B/C. If bpf_arrayA exist, just add progA to the tail of bpf_arrayA->fentries. When we need to attach progB to B/C, just add progB to bpf_arrayB->fentries and bpf_arrayB->fentries.
Compared to the trampoline, extra overhead is introduced by the hash lookuping.
I have not begun to code yet, and I am not sure the overhead is acceptable. Considering that we also need to do hash lookup by the function in kprobe_multi, maybe the overhead is acceptable?
Thanks! Menglong Dong
Let's step back.
[......]
For one trampoline to handle all attach points we might need some arch support, but we can start simple. Make btf_func_model with MAX_BPF_FUNC_REG_ARGS by calling btf_distill_func_proto() with func==NULL. And use that to build a trampoline.
The challenge is how to use minimal number of trampolines when bpf_progA is attached for func1, func2, func3 and bpf_progB is attached to func3, func4, func5. We'd still need 3 trampolines: for func[12] to call bpf_progA, for func3 to call bpf_progA and bpf_progB, for func[45] to call bpf_progB.
Jiri was trying to solve it in the past. His slides from LPC: https://lpc.events/event/16/contributions/1350/attachments/1033/1983/plumber...
Pls study them and his prior patchsets to avoid stepping on the same rakes.