On Tue, Apr 6, 2021 at 11:55 AM Pedro Tammela pctammela@gmail.com wrote:
This macro was refactored out of the bpf selftests.
Since percpu values are rounded up to '8' in the kernel, a careless user in userspace might encounter unexpected values when parsing the output of the batched operations.
I wonder if a user has to be more careful, though? This BPF_PERCPU_TYPE, __bpf_percpu_align and bpf_percpu macros seem to create just another opaque layer. It actually seems detrimental to me.
I'd rather emphasize in the documentation (e.g., in bpf_map_lookup_elem) that all per-cpu maps are aligning values at 8 bytes, so user has to make sure that array of values provided to bpf_map_lookup_elem() has each element size rounded up to 8.
In practice, I'd recommend users to always use __u64/__s64 when having primitive integers in a map (they are not saving anything by using int, it just creates an illusion of savings). Well, maybe on 32-bit arches they would save a bit of CPU, but not on typical 64-bit architectures. As for using structs as values, always mark them as __attribute__((aligned(8))).
Basically, instead of obscuring the real use some more, let's clarify and maybe even provide some examples in documentation?
Now that both array and hash maps have support for batched ops in the percpu variant, let's provide a convenient macro to declare percpu map value types.
Updates the tests to a "reference" usage of the new macro.
Signed-off-by: Pedro Tammela pctammela@mojatatu.com
tools/lib/bpf/bpf.h | 10 ++++ tools/testing/selftests/bpf/bpf_util.h | 7 --- .../bpf/map_tests/htab_map_batch_ops.c | 48 ++++++++++--------- .../selftests/bpf/prog_tests/map_init.c | 5 +- tools/testing/selftests/bpf/test_maps.c | 16 ++++--- 5 files changed, 46 insertions(+), 40 deletions(-)
[...]
@@ -400,11 +402,11 @@ static void test_arraymap(unsigned int task, void *data) static void test_arraymap_percpu(unsigned int task, void *data) { unsigned int nr_cpus = bpf_num_possible_cpus();
BPF_DECLARE_PERCPU(long, values);
pcpu_map_value_t values[nr_cpus]; int key, next_key, fd, i; fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_ARRAY, sizeof(key),
sizeof(bpf_percpu(values, 0)), 2, 0);
sizeof(long), 2, 0); if (fd < 0) { printf("Failed to create arraymap '%s'!\n", strerror(errno)); exit(1);
@@ -459,7 +461,7 @@ static void test_arraymap_percpu(unsigned int task, void *data) static void test_arraymap_percpu_many_keys(void) { unsigned int nr_cpus = bpf_num_possible_cpus();
This just sets a bad example for anyone using selftests as an aspiration for their own code. bpf_num_possible_cpus() does exit(1) internally if libbpf_num_possible_cpus() returns error. No one should write real production code like that. So maybe let's provide a better example instead with error handling and malloc (or perhaps alloca)?
BPF_DECLARE_PERCPU(long, values);
pcpu_map_value_t values[nr_cpus]; /* nr_keys is not too large otherwise the test stresses percpu * allocator more than anything else */
@@ -467,7 +469,7 @@ static void test_arraymap_percpu_many_keys(void) int key, fd, i;
fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_ARRAY, sizeof(key),
sizeof(bpf_percpu(values, 0)), nr_keys, 0);
sizeof(long), nr_keys, 0); if (fd < 0) { printf("Failed to create per-cpu arraymap '%s'!\n", strerror(errno));
-- 2.25.1