On Wed, 2025-02-05 at 14:33 -0800, Andrii Nakryiko wrote:
I see two ways forward for you. Either you can break apart your BPF object of ~100 BPF programs into more independent BPF objects >
(seeing
that programs can be independently loaded/unloaded depending on configuration, seems like you do have a bunch of logic > > independence, right?). I assume shared BPF maps are the biggest reason to keep > > all those programs together in one BPF object. To share BPF maps >
between
multiple BPF objects libbpf provides two complementary interfaces:
- bpf_map__reuse_fd() for manual control - BPF map pinning (could be declarative or manual)
This way you can ensure that all BPF objects would use the same BPF map, where necessary.
I think this approach *could* work but could easily become complex for us because we'd need to track all the dependencies between programs and maps, and anything missed could lead to difficult refcount bugs.
Further, splitting into objects incurs some performance and memory cost because bpf_object__load_vmlinux_btf will be called for each object, and there's currently no way to share BTF data across the objects. Having a single BPF object avoids this issue. Potentially, libbpf could cache some BTF data to make lessen the impact.
Alternatively, we can look at this problem as needing libbpf to
only
prepare BPF program code (doing all the relocations and stuff like that), but then application actually taking care of > > loading/unloading BPF program with bpf_prog_load() outside of bpf_object abstraction. I've had an almost ready patches splitting bpf_object__load() into > > > two steps: bpf_object__prepare() and bpf_object__load() after that. "prepare" step would create BPF maps, load BTF information, perform necessary relocations and arrive at final state of BPF program code (which you can get with bpf_program__insns() API), but stopping
just
short of actually doing bpf_prog_load() step.
This seems like it would solve your problem as well. You'd use
libbpf
to do all the low-level ELF processing and relocation, but then
take
over managing BPF program lifetime. Loading/unloading as you see > > fit, including in parallel.
Is this something that would work for you?
I think this API could work, though I think we would need a few other modifications as well in order to correctly handle program/map dependencies and account for relocations. At a high level, I think we'd need something that includes:
1) A way to associate each BPF program with all the maps it will use (association of struct bpf_program * --> list of struct bpf_map * in some form). This is so that we can load/unload associated maps when we load/unload a program.
2) An API to create a BPF map, in case a new map needs to be loaded after initial startup.
3) An API to allow unloading a map while keeping map->fd reserved. This is important because the fd value is used by BPF program instructions, so without something like this, we'd have to redo the relocation process for any other BPF programs that access this map (and thus reload those programs too). This API could be implemented by dup'ing a placeholder fd.
Alternatively, if libbpf could automatically refcount maps across multiple BPF objects to load/unload them on demand, then all of the above work could happen behind the scenes. This would be similar to the other approach you mentioned, but with libbpf doing the refcounting heavy lifting instead of leaving that to each application, thus more robust and elegant. This would mean changing libbpf to (a) synchronize access to some map functions and (b) allowing struct bpf_map * to be shared across BPF objects. Perhaps a concept of a "collection of BPF objects" might allow for this.
> > > > This patch set also permits loading BPF programs in > > parallel if > > > > the > > application wishes. We tested parallel loading with > > 200+ BPF > > > > > > programs > > and found the load time dropped from 18 seconds to 5 > > seconds > > > > when > > done > > in parallel on a 6.8 kernel.
bpf_object is intentionally single-threaded, so I don't think we'll > > > be supporting parallel BPF program loading in the paradigm of > > bpf_object (but see the bpf_object__prepare() proposal). Even from API > >
standpoint
this is problematic with logging and log buffers basically assuming single-threaded execution of BPF program loading.
All that could be changed or worked around, but your use case is > > not really a typical case, so I'm a bit hesitant at this point.
> >
I can understand where you're coming from if no one else has mentioned a use case like this. We can do parallel loading by splitting our programs into BPF objects, but unless the objects are split very evenly, this results in less optimal load time. For example, if 100 programs are split into 2 objects and one object has 80 programs while the other has 20, then the one with 80 programs creates a bottleneck.