On 7/18/25 8:05 AM, Matt Fleming wrote:
From: Matt Fleming mfleming@cloudflare.com
Add benchmarks for the standard set of operations: lookup, update, delete. Also, include a benchmark for trie_free() which is known to have terrible performance for maps with many entries.
Benchmarks operate on tries without gaps in the key range, i.e. each test begins with a trie with valid keys in the range [0, nr_entries). This is intended to cause maximum branching when traversing the trie.
All measurements are recorded inside the kernel to remove syscall overhead.
Most benchmarks run an XDP program to generate stats but free needs to collect latencies using fentry/fexit on map_free_deferred() because it's not possible to use fentry directly on lpm_trie.c since commit c83508da5620 ("bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs") and there's no way to create/destroy a map from within an XDP program.
Here is example output from an AMD EPYC 9684X 96-Core machine for each of the benchmarks using a trie with 10K entries and a 32-bit prefix length, e.g.
$ ./bench lpm-trie-$op \ --prefix_len=32 \ --producers=1 \ --nr_entries=10000
lookup: throughput 7.423 ± 0.023 M ops/s ( 7.423M ops/prod), latency 134.710 ns/op update: throughput 2.643 ± 0.015 M ops/s ( 2.643M ops/prod), latency 378.310 ns/op delete: throughput 0.712 ± 0.008 M ops/s ( 0.712M ops/prod), latency 1405.152 ns/op free: throughput 0.574 ± 0.003 K ops/s ( 0.574K ops/prod), latency 1.743 ms/op
Signed-off-by: Matt Fleming mfleming@cloudflare.com
tools/testing/selftests/bpf/Makefile | 2 + tools/testing/selftests/bpf/bench.c | 10 + tools/testing/selftests/bpf/bench.h | 1 + .../selftests/bpf/benchs/bench_lpm_trie_map.c | 345 ++++++++++++++++++ .../selftests/bpf/progs/lpm_trie_bench.c | 175 +++++++++ .../selftests/bpf/progs/lpm_trie_map.c | 19 + 6 files changed, 552 insertions(+) create mode 100644 tools/testing/selftests/bpf/benchs/bench_lpm_trie_map.c create mode 100644 tools/testing/selftests/bpf/progs/lpm_trie_bench.c create mode 100644 tools/testing/selftests/bpf/progs/lpm_trie_map.c
[...]
+static __always_inline double duration_ms(struct bench_res *res) +{
- if (!res->hits)
return 0.0;
- return res->duration_ns / res->hits / NSEC_PER_MSEC;
+}
The above function 'duration_ms' is not used.
+static void free_ops_report_progress(int iter, struct bench_res *res,
long delta_ns)
+{
- double hits_per_sec, hits_per_prod;
- double rate_divisor = 1000.0;
- char rate = 'K';
- hits_per_sec = res->hits / (res->duration_ns / (double)NSEC_PER_SEC) /
rate_divisor;
- hits_per_prod = hits_per_sec / env.producer_cnt;
- printf("Iter %3d (%7.3lfus): ", iter,
(delta_ns - NSEC_PER_SEC) / 1000.0);
- printf("hits %8.3lf%c/s (%7.3lf%c/prod)\n", hits_per_sec, rate,
hits_per_prod, rate);
+}
[...]