After switching to memcg-based bpf memory accounting, the bpf memory is charged to the loader's memcg by defaut, that causes unexpected issues for us. For instance, the container of the loader-which loads the bpf programs and pins them on bpffs-may restart after pinning the progs and maps. After the restart, the pinned progs and maps won't belong to the new container any more, while they actually belong to an offline memcg left by the previous generation. That inconsistent behavior will make trouble for the memory resource management for this container.
The reason why these progs and maps have to be persistent across multiple generations is that these progs and maps are also used by other processes which are not in this container. IOW, they can't be removed when this container is restarted. Take a specific example, bpf program for clsact qdisc is loaded by a agent running in a container, which not only loads bpf program but also processes the data generated by this program and do some other maintainace things.
In order to keep the charging behavior consistent, we used to consider a way to recharge these pinned maps and progs again after the container is restarted, but after the discussion[1] with Roman, we decided to go another direction that don't charge them to the container in the first place. TL;DR about the mentioned disccussion: recharging is not a generic solution and it may take too much risk.
This patchset is the solution of no charge. Two flags are introduced in union bpf_attr, one for bpf map and another for bpf prog. The user who doesn't want to charge to current memcg can use these two flags. These two flags are only permitted for sys admin as these memory will be accounted to the root memcg only.
Patches #1~#8 are for bpf map. Patches #9~#12 are for bpf prog. Patch #13 and #14 are for selftests and also the examples of how to use them.
[1]. https://lwn.net/Articles/887180/
Yafang Shao (14): bpf: Introduce no charge flag for bpf map bpf: Only sys admin can set no charge flag bpf: Enable no charge in map _CREATE_FLAG_MASK bpf: Introduce new parameter bpf_attr in bpf_map_area_alloc bpf: Allow no charge in bpf_map_area_alloc bpf: Allow no charge for allocation not at map creation time bpf: Allow no charge in map specific allocation bpf: Aggregate flags for BPF_PROG_LOAD command bpf: Add no charge flag for bpf prog bpf: Only sys admin can set no charge flag for bpf prog bpf: Set __GFP_ACCOUNT at the callsite of bpf_prog_alloc bpf: Allow no charge for bpf prog bpf: selftests: Add test case for BPF_F_NO_CHARTE bpf: selftests: Add test case for BPF_F_PROG_NO_CHARGE
include/linux/bpf.h | 27 ++++++- include/uapi/linux/bpf.h | 21 +++-- kernel/bpf/arraymap.c | 9 +-- kernel/bpf/bloom_filter.c | 7 +- kernel/bpf/bpf_local_storage.c | 8 +- kernel/bpf/bpf_struct_ops.c | 13 +-- kernel/bpf/core.c | 20 +++-- kernel/bpf/cpumap.c | 10 ++- kernel/bpf/devmap.c | 14 ++-- kernel/bpf/hashtab.c | 14 ++-- kernel/bpf/local_storage.c | 4 +- kernel/bpf/lpm_trie.c | 4 +- kernel/bpf/queue_stack_maps.c | 5 +- kernel/bpf/reuseport_array.c | 3 +- kernel/bpf/ringbuf.c | 19 ++--- kernel/bpf/stackmap.c | 13 +-- kernel/bpf/syscall.c | 40 +++++++--- kernel/bpf/verifier.c | 2 +- net/core/filter.c | 6 +- net/core/sock_map.c | 8 +- net/xdp/xskmap.c | 9 ++- tools/include/uapi/linux/bpf.h | 21 +++-- .../selftests/bpf/map_tests/no_charg.c | 79 +++++++++++++++++++ .../selftests/bpf/prog_tests/no_charge.c | 49 ++++++++++++ 24 files changed, 297 insertions(+), 108 deletions(-) create mode 100644 tools/testing/selftests/bpf/map_tests/no_charg.c create mode 100644 tools/testing/selftests/bpf/prog_tests/no_charge.c