Hello everyone,
We have prepared patches to address an issue from a previous discussion. The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKP...
This patch series adds a new field "used_entries" to struct bpf_map_info and keeps tracking the "count" field in bpf_htab in both the preallocated and non-preallocated cases.
bpftool is modified to report the newly added "used_entries" field in struct bpf_map_info and to mark pre-allocated htab maps with "*". These make it easier to view the current memory situation of a hashmap.
We have added a new interface function map_get_used_elem in bpf_map_ops to provide an abstraction layer so that other map type implementations can support the "used_entries" attribute in a future change.
A concurrency testing for pre-allocated and dynamically allocated htab maps is introduced to test the correctness and performance of htab map's used size.
Existing unit tests are integrated to test the correctness of htab map's used size.
Thank you,
Ho-Ren (Jack) Chuang (4): bpf: Support reporting BPF htab map's used size for monitoring bpftool: Add tools support to show BPF htab map's used size samples/bpf: Add concurrency testing for BPF htab map's used size selftests/bpf: Add unit tests for BPF htab map's used size
include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 1 + kernel/bpf/hashtab.c | 19 +++ kernel/bpf/syscall.c | 2 + samples/bpf/Makefile | 4 + samples/bpf/test_map_used_kern.c | 65 ++++++++ samples/bpf/test_map_used_user.c | 204 ++++++++++++++++++++++++ tools/bpf/bpftool/map.c | 9 +- tools/include/uapi/linux/bpf.h | 1 + tools/testing/selftests/bpf/test_maps.c | 74 ++++++++- 10 files changed, 377 insertions(+), 3 deletions(-) create mode 100644 samples/bpf/test_map_used_kern.c create mode 100644 samples/bpf/test_map_used_user.c
Expose BPF htab map's used size by counting accessed or allocated/freed elements to userspace.
Leverage the htab->count value for both preallocated and dynamically allocated maps. Expose the value to a new field "used_entries" in a userspace struct bpf_map_info to allow monitoring. Support hash table type (BPF_MAP_TYPE_HASH).
Signed-off-by: Ho-Ren (Jack) Chuang horenchuang@bytedance.com --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 1 + kernel/bpf/hashtab.c | 19 +++++++++++++++++++ kernel/bpf/syscall.c | 2 ++ 4 files changed, 23 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9e7d46d16032..82ee14139b69 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -97,6 +97,7 @@ struct bpf_map_ops { int (*map_pop_elem)(struct bpf_map *map, void *value); int (*map_peek_elem)(struct bpf_map *map, void *value); void *(*map_lookup_percpu_elem)(struct bpf_map *map, void *key, u32 cpu); + u32 (*map_get_used_elem)(struct bpf_map *map);
/* funcs called by prog_array and perf_event_array map */ void *(*map_fd_get_ptr)(struct bpf_map *map, struct file *map_file, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 17f61338f8f8..63659368cf0e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6215,6 +6215,7 @@ struct bpf_map_info { __u32 id; __u32 key_size; __u32 value_size; + __u32 used_entries; __u32 max_entries; __u32 map_flags; char name[BPF_OBJ_NAME_LEN]; diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index ed3f8a53603b..bc9c00b92e57 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -913,6 +913,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) if (htab_is_prealloc(htab)) { check_and_free_fields(htab, l); __pcpu_freelist_push(&htab->freelist, &l->fnode); + dec_elem_count(htab); } else { dec_elem_count(htab); htab_elem_free(htab, l); @@ -994,6 +995,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, if (!l) return ERR_PTR(-E2BIG); l_new = container_of(l, struct htab_elem, fnode); + inc_elem_count(htab); } } else { if (is_map_full(htab)) @@ -2186,6 +2188,22 @@ static int bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_f return num_elems; }
+u32 htab_map_get_used_elem(struct bpf_map *map) +{ + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); + + /* The elem count may temporarily go beyond the max after + * inc_elem_count() but before dec_elem_count(). + */ + if (htab->use_percpu_counter) + return min_t(u32, htab->map.max_entries, + percpu_counter_sum(&htab->pcount) + + atomic_read(&htab->count)); + else + return min_t(u32, htab->map.max_entries, + atomic_read(&htab->count)); +} + BTF_ID_LIST_SINGLE(htab_map_btf_ids, struct, bpf_htab) const struct bpf_map_ops htab_map_ops = { .map_meta_equal = bpf_map_meta_equal, @@ -2202,6 +2220,7 @@ const struct bpf_map_ops htab_map_ops = { .map_seq_show_elem = htab_map_seq_show_elem, .map_set_for_each_callback_args = map_set_for_each_callback_args, .map_for_each_callback = bpf_for_each_hash_elem, + .map_get_used_elem = htab_map_get_used_elem, BATCH_OPS(htab), .map_btf_id = &htab_map_btf_ids[0], .iter_seq_info = &iter_seq_info, diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 7b373a5e861f..ea4828bb22ac 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4203,6 +4203,8 @@ static int bpf_map_get_info_by_fd(struct file *file, info.map_flags = map->map_flags; info.map_extra = map->map_extra; memcpy(info.name, map->name, sizeof(map->name)); + if (map->ops->map_get_used_elem) + info.used_entries = map->ops->map_get_used_elem(map);
if (map->btf) { info.btf_id = btf_obj_id(map->btf);
Add bpftool support for reporting the number of used entries of htab maps by leveraging the newly added "used_entries" field in struct bpf_map_info. It works with JSON as well.
To better understand actual used memory size of a htab map, pre-allocated maps are now marked with "*" behind the "max_entries" size.
Signed-off-by: Ho-Ren (Jack) Chuang horenchuang@bytedance.com --- tools/bpf/bpftool/map.c | 9 +++++++-- tools/include/uapi/linux/bpf.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 9a6ca9f31133..0b07abae7309 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -475,6 +475,8 @@ static int show_map_close_json(int fd, struct bpf_map_info *info) jsonw_uint_field(json_wtr, "bytes_key", info->key_size); jsonw_uint_field(json_wtr, "bytes_value", info->value_size); jsonw_uint_field(json_wtr, "max_entries", info->max_entries); + if (info->type == BPF_MAP_TYPE_HASH) + jsonw_uint_field(json_wtr, "used_entries", info->used_entries);
if (memlock) jsonw_int_field(json_wtr, "bytes_memlock", atoll(memlock)); @@ -561,8 +563,11 @@ static int show_map_close_plain(int fd, struct bpf_map_info *info) frozen_str = get_fdinfo(fd, "frozen");
show_map_header_plain(info); - printf("\tkey %uB value %uB max_entries %u", - info->key_size, info->value_size, info->max_entries); + printf("\tkey %uB value %uB max_entries %u%1s", + info->key_size, info->value_size, info->max_entries, + !(info->map_flags & BPF_F_NO_PREALLOC) ? "*" : ""); + if (info->type == BPF_MAP_TYPE_HASH) + printf(" used_entries %u", info->used_entries);
if (memlock) printf(" memlock %sB", memlock); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 17f61338f8f8..63659368cf0e 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -6215,6 +6215,7 @@ struct bpf_map_info { __u32 id; __u32 key_size; __u32 value_size; + __u32 used_entries; __u32 max_entries; __u32 map_flags; char name[BPF_OBJ_NAME_LEN];
Add htab map's used_size test cases for concurrency testing.
Support hash table type (BPF_MAP_TYPE_HASH).
Signed-off-by: Ho-Ren (Jack) Chuang horenchuang@bytedance.com --- samples/bpf/Makefile | 4 + samples/bpf/test_map_used_kern.c | 65 ++++++++++ samples/bpf/test_map_used_user.c | 204 +++++++++++++++++++++++++++++++ 3 files changed, 273 insertions(+) create mode 100644 samples/bpf/test_map_used_kern.c create mode 100644 samples/bpf/test_map_used_user.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 727da3c5879b..8725d0d64a21 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -40,6 +40,7 @@ tprogs-y += tc_l2_redirect tprogs-y += lwt_len_hist tprogs-y += xdp_tx_iptunnel tprogs-y += test_map_in_map +tprogs-y += test_map_used tprogs-y += per_socket_stats_example tprogs-y += xdp_rxq_info tprogs-y += syscall_tp @@ -101,6 +102,7 @@ tc_l2_redirect-objs := tc_l2_redirect_user.o lwt_len_hist-objs := lwt_len_hist_user.o xdp_tx_iptunnel-objs := xdp_tx_iptunnel_user.o test_map_in_map-objs := test_map_in_map_user.o +test_map_used-objs := test_map_used_user.o per_socket_stats_example-objs := cookie_uid_helper_example.o xdp_rxq_info-objs := xdp_rxq_info_user.o syscall_tp-objs := syscall_tp_user.o @@ -153,6 +155,7 @@ always-y += sampleip_kern.o always-y += lwt_len_hist_kern.o always-y += xdp_tx_iptunnel_kern.o always-y += test_map_in_map_kern.o +always-y += test_map_used_kern.o always-y += tcp_synrto_kern.o always-y += tcp_rwnd_kern.o always-y += tcp_bufs_kern.o @@ -216,6 +219,7 @@ TPROGLDLIBS_xdp_router_ipv4 += -lm -pthread TPROGLDLIBS_tracex4 += -lrt TPROGLDLIBS_trace_output += -lrt TPROGLDLIBS_map_perf_test += -lrt +TPROGLDLIBS_test_map_used += -lrt TPROGLDLIBS_test_overhead += -lrt
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: diff --git a/samples/bpf/test_map_used_kern.c b/samples/bpf/test_map_used_kern.c new file mode 100644 index 000000000000..e908593c1f09 --- /dev/null +++ b/samples/bpf/test_map_used_kern.c @@ -0,0 +1,65 @@ +/* Copyright (c) 2022 ByteDance + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#include <linux/netdevice.h> +#include <linux/version.h> +#include <uapi/linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include <bpf/bpf_core_read.h> +#include "trace_common.h" + +#define MAX_ENTRIES 1000 + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, u32); + __type(value, long); + __uint(max_entries, MAX_ENTRIES); + __uint(map_flags, BPF_F_NO_PREALLOC); +} touch_hash_no_prealloc SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, u32); + __type(value, long); + __uint(max_entries, MAX_ENTRIES); +} touch_hash_prealloc SEC(".maps"); + +SEC("kprobe/" SYSCALL(sys_mount)) +int stress_hmap_alloc(struct pt_regs *ctx) +{ + u32 key, i; + long init_val = bpf_get_current_pid_tgid(); + +#pragma clang loop unroll(full) + for (i = 0; i < MAX_ENTRIES; ++i) { + key = i; + bpf_map_update_elem(&touch_hash_no_prealloc, + &key, &init_val, BPF_ANY); + } + + return 0; +} + +SEC("kprobe/" SYSCALL(sys_umount)) +int stress_hmap_prealloc(struct pt_regs *ctx) +{ + u32 key, i; + long init_val = bpf_get_current_pid_tgid(); + +#pragma clang loop unroll(full) + for (i = 0; i < MAX_ENTRIES; ++i) { + key = i; + bpf_map_update_elem(&touch_hash_prealloc, + &key, &init_val, BPF_ANY); + } + + return 0; +} + +char _license[] SEC("license") = "GPL"; +u32 _version SEC("version") = LINUX_VERSION_CODE; diff --git a/samples/bpf/test_map_used_user.c b/samples/bpf/test_map_used_user.c new file mode 100644 index 000000000000..797f6ca7434d --- /dev/null +++ b/samples/bpf/test_map_used_user.c @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2022 ByteDance + */ +#define _GNU_SOURCE +#include <sched.h> +#include <stdio.h> +#include <sys/types.h> +#include <asm/unistd.h> +#include <unistd.h> +#include <assert.h> +#include <sys/wait.h> +#include <stdlib.h> +#include <signal.h> +#include <linux/bpf.h> +#include <string.h> +#include <time.h> +#include <sys/resource.h> +#include <arpa/inet.h> +#include <errno.h> + +#include <bpf/bpf.h> +#include <bpf/libbpf.h> + +#define TEST_BIT(t) (1U << (t)) +#define MAX_NR_CPUS 1024 + +static __u64 time_get_ns(void) +{ + struct timespec ts; + + clock_gettime(CLOCK_MONOTONIC, &ts); + return ts.tv_sec * 1000000000ull + ts.tv_nsec; +} + +enum test_type { + HASH_TOUCH_PREALLOC, + HASH_TOUCH, + NR_TESTS, +}; + +const char *test_map_names[NR_TESTS] = { + [HASH_TOUCH_PREALLOC] = "hash_map", + [HASH_TOUCH] = "hash_map", +}; + +static int test_flags = ~0; +static __u32 num_map_entries; +static __u32 inner_lru_hash_size; +static __u32 max_cnt = 1000; + +static int check_test_flags(enum test_type t) +{ + return test_flags & TEST_BIT(t); +} + +static void test_hash_touch_prealloc(int cpu) +{ + __u64 start_time; + int i; + + start_time = time_get_ns(); + for (i = 0; i < max_cnt; i++) + syscall(__NR_umount2, NULL, 0); + printf("%d:hash_touch pre-alloc %lld touches per sec\n", + cpu, max_cnt * 1000000000ll / (time_get_ns() - start_time)); +} + +static void test_hash_touch(int cpu) +{ + __u64 start_time; + int i; + + start_time = time_get_ns(); + for (i = 0; i < max_cnt; i++) + syscall(__NR_mount, NULL, NULL, NULL, 0, NULL); + printf("%d:hash_touch %lld touchess per sec\n", + cpu, max_cnt * 1000000000ll * 64 / (time_get_ns() - start_time)); +} + +typedef void (*test_func)(int cpu); +const test_func test_funcs[] = { + [HASH_TOUCH_PREALLOC] = test_hash_touch_prealloc, + [HASH_TOUCH] = test_hash_touch, +}; + +static void loop(int cpu) +{ + cpu_set_t cpuset; + int i; + + CPU_ZERO(&cpuset); + CPU_SET(cpu, &cpuset); + sched_setaffinity(0, sizeof(cpuset), &cpuset); + + for (i = 0; i < NR_TESTS; i++) { + if (check_test_flags(i)) + test_funcs[i](cpu); + } +} + +static void run_perf_test(int tasks) +{ + pid_t pid[tasks]; + int i; + + for (i = 0; i < tasks; i++) { + pid[i] = fork(); + if (pid[i]) + printf("Spawn process #%d [%u]\n", i, pid[i]); + + if (pid[i] == 0) { + loop(i); + exit(0); + } else if (pid[i] == -1) { + printf("couldn't spawn #%d process\n", i); + exit(1); + } + } + for (i = 0; i < tasks; i++) { + int status; + + assert(waitpid(pid[i], &status, 0) == pid[i]); + assert(status == 0); + } +} + +static void fixup_map(struct bpf_object *obj) +{ + struct bpf_map *map; + int i; + + bpf_object__for_each_map(map, obj) { + const char *name = bpf_map__name(map); + + /* Only change the max_entries for the enabled test(s) */ + for (i = 0; i < NR_TESTS; i++) { + if (!strcmp(test_map_names[i], name) && + (check_test_flags(i))) { + bpf_map__set_max_entries(map, num_map_entries); + continue; + } + } + } + + inner_lru_hash_size = num_map_entries; +} + +int main(int argc, char **argv) +{ + int nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); + struct bpf_link *links[8]; + struct bpf_program *prog; + struct bpf_object *obj; + char filename[256]; + int i = 0; + + if (argc > 1) + test_flags = atoi(argv[1]) ? : test_flags; + + if (argc > 2) + nr_cpus = atoi(argv[2]) ? : nr_cpus; + + if (argc > 3) + num_map_entries = atoi(argv[3]); + + if (argc > 4) + max_cnt = atoi(argv[4]); + + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); + obj = bpf_object__open_file(filename, NULL); + if (libbpf_get_error(obj)) { + fprintf(stderr, "ERROR: opening BPF object file failed\n"); + return 0; + } + + /* resize BPF map prior to loading */ + if (num_map_entries > 0) + fixup_map(obj); + + /* load BPF program */ + if (bpf_object__load(obj)) { + fprintf(stderr, "ERROR: loading BPF object file failed\n"); + goto cleanup; + } + + bpf_object__for_each_program(prog, obj) { + links[i] = bpf_program__attach(prog); + if (libbpf_get_error(links[i])) { + fprintf(stderr, "ERROR: bpf_program__attach failed\n"); + links[i] = NULL; + goto cleanup; + } + i++; + } + + run_perf_test(nr_cpus); + +cleanup: + for (i--; i >= 0; i--) + bpf_link__destroy(links[i]); + + bpf_object__close(obj); + return 0; +}
Integrate with existing unit tests such as basic, read/write-only htab maps, and concurrency testings for testing htab map's used_entires.
Signed-off-by: Ho-Ren (Jack) Chuang horenchuang@bytedance.com --- tools/testing/selftests/bpf/test_maps.c | 74 ++++++++++++++++++++++++- 1 file changed, 73 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index b73152822aa2..3bd202d27563 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -38,6 +38,8 @@ static void test_hashmap(unsigned int task, void *data) { long long key, next_key, first_key, value; int fd; + struct bpf_map_info map_info = {}; + __u32 info_len = sizeof(map_info);
fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), 2, &map_opts); if (fd < 0) { @@ -50,16 +52,32 @@ static void test_hashmap(unsigned int task, void *data) /* Insert key=1 element. */ assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == 0);
+ /* Check used_entires is now 1. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 1); + value = 0; /* BPF_NOEXIST means add new element if it doesn't exist. */ assert(bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST) < 0 && /* key=1 already exists. */ errno == EEXIST);
+ /* Check used_entires is still 1 because we are updating + * an existing element. + */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 1); + /* -1 is an invalid flag. */ assert(bpf_map_update_elem(fd, &key, &value, -1) < 0 && errno == EINVAL);
+ /* Check used_entires is still 1 because the last + * insertion was invalid. + */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 1); + /* Check that key=1 can be found. */ assert(bpf_map_lookup_elem(fd, &key, &value) == 0 && value == 1234);
@@ -68,6 +86,10 @@ static void test_hashmap(unsigned int task, void *data) /* Insert key=2 element. */ assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == 0);
+ /* Check used_entires is now 2. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 2); + /* Check that key=2 matches the value and delete it */ assert(bpf_map_lookup_and_delete_elem(fd, &key, &value) == 0 && value == 1234);
@@ -89,6 +111,10 @@ static void test_hashmap(unsigned int task, void *data) assert(bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST) < 0 && errno == E2BIG);
+ /* Check used_entires is now 2. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 2); + /* Update existing element, though the map is full. */ key = 1; assert(bpf_map_update_elem(fd, &key, &value, BPF_EXIST) == 0); @@ -102,6 +128,10 @@ static void test_hashmap(unsigned int task, void *data) key = 0; assert(bpf_map_delete_elem(fd, &key) < 0 && errno == ENOENT);
+ /* Check used_entires is now 2. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 2); + /* Iterate over two elements. */ assert(bpf_map_get_next_key(fd, NULL, &first_key) == 0 && (first_key == 1 || first_key == 2)); @@ -127,6 +157,10 @@ static void test_hashmap(unsigned int task, void *data) assert(bpf_map_get_next_key(fd, &key, &next_key) < 0 && errno == ENOENT);
+ /* Check used_entires is now 0 because both elements were deleted. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 0); + close(fd); }
@@ -292,6 +326,8 @@ static void test_hashmap_walk(unsigned int task, void *data) int fd, i, max_entries = 10000; long long key, value[VALUE_SIZE], next_key; bool next_key_valid = true; + struct bpf_map_info map_info = {}; + __u32 info_len = sizeof(map_info);
fd = helper_fill_hashmap(max_entries);
@@ -302,6 +338,9 @@ static void test_hashmap_walk(unsigned int task, void *data) }
assert(i == max_entries); + /* Check used_entires is now max_entries. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == max_entries);
assert(bpf_map_get_next_key(fd, NULL, &key) == 0); for (i = 0; next_key_valid; i++) { @@ -313,6 +352,9 @@ static void test_hashmap_walk(unsigned int task, void *data) }
assert(i == max_entries); + /* Check used_entires is now max_entries. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == max_entries);
for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : &key, &next_key) == 0; i++) { @@ -322,6 +364,9 @@ static void test_hashmap_walk(unsigned int task, void *data) }
assert(i == max_entries); + /* Check used_entires is now max_entries. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == max_entries); close(fd); }
@@ -1303,13 +1348,14 @@ static void test_map_in_map(void)
static void test_map_large(void) { - struct bigkey { int a; char b[4096]; long long c; } key; int fd, i, value; + struct bpf_map_info map_info = {}; + __u32 info_len = sizeof(map_info);
fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), MAP_SIZE, &map_opts); @@ -1341,6 +1387,10 @@ static void test_map_large(void) key.a = 1; assert(bpf_map_lookup_elem(fd, &key, &value) < 0 && errno == ENOENT);
+ /* Check used_entires is now MAP_SIZE. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == MAP_SIZE); + close(fd); }
@@ -1466,6 +1516,8 @@ static void test_map_parallel(void) { int i, fd, key = 0, value = 0, j = 0; int data[2]; + struct bpf_map_info map_info = {}; + __u32 info_len = sizeof(map_info);
fd = bpf_map_create(BPF_MAP_TYPE_HASH, NULL, sizeof(key), sizeof(value), MAP_SIZE, &map_opts); @@ -1504,6 +1556,10 @@ static void test_map_parallel(void) value == key); }
+ /* Check used_entires is now MAP_SIZE. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == MAP_SIZE); + /* Now let's delete all elemenets in parallel. */ data[1] = DO_DELETE; run_parallel(TASKS, test_update_delete, data); @@ -1513,6 +1569,10 @@ static void test_map_parallel(void) assert(bpf_map_get_next_key(fd, NULL, &key) < 0 && errno == ENOENT); assert(bpf_map_get_next_key(fd, &key, &key) < 0 && errno == ENOENT);
+ /* Check used_entires is now 0. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 0); + key = 0; bpf_map_delete_elem(fd, &key); if (j++ < 5) @@ -1524,6 +1584,8 @@ static void test_map_rdonly(void) { int fd, key = 0, value = 0; __u32 old_flags; + struct bpf_map_info map_info = {}; + __u32 info_len = sizeof(map_info);
old_flags = map_opts.map_flags; map_opts.map_flags |= BPF_F_RDONLY; @@ -1546,6 +1608,10 @@ static void test_map_rdonly(void) assert(bpf_map_lookup_elem(fd, &key, &value) < 0 && errno == ENOENT); assert(bpf_map_get_next_key(fd, &key, &value) < 0 && errno == ENOENT);
+ /* Check used_entires is now 0. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 0); + close(fd); }
@@ -1553,6 +1619,8 @@ static void test_map_wronly_hash(void) { int fd, key = 0, value = 0; __u32 old_flags; + struct bpf_map_info map_info = {}; + __u32 info_len = sizeof(map_info);
old_flags = map_opts.map_flags; map_opts.map_flags |= BPF_F_WRONLY; @@ -1574,6 +1642,10 @@ static void test_map_wronly_hash(void) assert(bpf_map_lookup_elem(fd, &key, &value) < 0 && errno == EPERM); assert(bpf_map_get_next_key(fd, &key, &value) < 0 && errno == EPERM);
+ /* Check used_entires is now 1. */ + assert(bpf_obj_get_info_by_fd(fd, &map_info, &info_len) == 0); + assert(map_info.used_entries == 1); + close(fd); }
On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang horenchuang@bytedance.com wrote:
Hello everyone,
We have prepared patches to address an issue from a previous discussion. The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKP...
Rephrasing what was said earlier. We're not keeping the count of elements in a preallocated hash map and we are not going to add one. The bpf prog needs to do the accounting on its own if it needs this kind of statistics. Keeping the count for non-prealloc is already significant performance overhead. We don't trade performance for stats.
Hi Alexei,
We understand the concern on added performance overhead. We had some discussion about this while working on the patch and decided to give it a try (my bad).
Adding some more context. We are leveraging the BPF_OBJ_GET_INFO_BY_FD syscall to trace CPU usage per prog and memory usage per map. We would like to use this patch to add an interface for map types to return its internal "count". For instance, we are thinking of having the below map types to report the "count" and those won't add overhead to the hot path. 1. ringbuf to return its "count" by calculating the distance between producer_pos and consumer_pos 2. queue and stack to return its "count" from the head's position 3. dev map hash to returns its "count" from items
There are other map types, similar to the hashtab pre-allocation case, will introduce overhead in the hot path in order to count the stats. I think we can find alternative solutions for those (eg, iterate the map and count, count only if bpf_stats_enabled switch is on, etc). There are cases where this can't be done at the application level because applications don't see the internal stats in order to do the right counting.
We can remove the counting for the pre-allocated case in this patch. Please let us know what you think.
Thanks, Hao
On Sat, Nov 5, 2022 at 9:20 AM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang horenchuang@bytedance.com wrote:
Hello everyone,
We have prepared patches to address an issue from a previous discussion. The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKP...
Rephrasing what was said earlier. We're not keeping the count of elements in a preallocated hash map and we are not going to add one. The bpf prog needs to do the accounting on its own if it needs this kind of statistics. Keeping the count for non-prealloc is already significant performance overhead. We don't trade performance for stats.
Hi Alexei, we can use the existing switch bpf_stats_enabled around the added overhead. The switch is turned off by default so I believe there will be no extra overhead once we do that. Can you please have a second thought on this?
On Mon, Nov 7, 2022 at 4:30 PM Hao Xiang . hao.xiang@bytedance.com wrote:
Hi Alexei,
We understand the concern on added performance overhead. We had some discussion about this while working on the patch and decided to give it a try (my bad).
Adding some more context. We are leveraging the BPF_OBJ_GET_INFO_BY_FD syscall to trace CPU usage per prog and memory usage per map. We would like to use this patch to add an interface for map types to return its internal "count". For instance, we are thinking of having the below map types to report the "count" and those won't add overhead to the hot path.
- ringbuf to return its "count" by calculating the distance between
producer_pos and consumer_pos 2. queue and stack to return its "count" from the head's position 3. dev map hash to returns its "count" from items
There are other map types, similar to the hashtab pre-allocation case, will introduce overhead in the hot path in order to count the stats. I think we can find alternative solutions for those (eg, iterate the map and count, count only if bpf_stats_enabled switch is on, etc). There are cases where this can't be done at the application level because applications don't see the internal stats in order to do the right counting.
We can remove the counting for the pre-allocated case in this patch. Please let us know what you think.
Thanks, Hao
On Sat, Nov 5, 2022 at 9:20 AM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang horenchuang@bytedance.com wrote:
Hello everyone,
We have prepared patches to address an issue from a previous discussion. The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKP...
Rephrasing what was said earlier. We're not keeping the count of elements in a preallocated hash map and we are not going to add one. The bpf prog needs to do the accounting on its own if it needs this kind of statistics. Keeping the count for non-prealloc is already significant performance overhead. We don't trade performance for stats.
linux-kselftest-mirror@lists.linaro.org