On Sat, May 02, 2020 at 01:48:51AM +0200, Daniel Borkmann wrote:
On 4/27/20 11:45 AM, Lorenz Bauer wrote:
On Sun, 26 Apr 2020 at 18:33, Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
[...]
+/* Linux packet pointers are either aligned to NET_IP_ALIGN (aka 2 bytes),
- or not aligned if the arch supports efficient unaligned access.
- Since the verifier ensures that eBPF packet accesses follow these rules,
- we can tell LLVM to emit code as if we always had a larger alignment.
- It will yell at us if we end up on a platform where this is not valid.
- */
+typedef uint8_t *net_ptr __attribute__((align_value(8)));
Wow. I didn't know about this attribute. I wonder whether it can help Daniel's memcpy hack.
Yes, I think so.
Just for some more context [0]. I think the problem is a bit more complex in general. Generally, _any_ kind of pointer to some data (except for the stack) is currently treated as byte-by-byte copy from __builtin_memcpy() and other similarly available __builtin_*() helpers on BPF backend since the backend cannot make any assumptions about the data's alignment and whether unaligned access from the underlying arch is ok & efficient (the latter the verifier does judge for us however). So it's definitely not just limited to xdp->data. There is also the issue that while access to any non-stack data can be unaligned, access to the stack however cannot. I've discussed a while back with Yonghong about potential solutions. One would be to add a small patch to the BPF backend to enable __builtin_*() helpers to allow for unaligned access which could then be opt-ed in e.g. via -mattr from llc for the case when we know that the compiled program only runs on archs with efficient unaligned access anyway. However, this still potentially breaks with the BPF stack for the case when objects are, for example, larger than size 8 but with a natural alignment smaller than 8 where __builtin_memcpy() would then decide to emit dw-typed load/stores. But for these cases could then be annotated via __aligned(8) on stack. So this is basically what we do right now as a generic workaround in Cilium [0], meaning, our own memcpy/memset with optimal number of instructions and __aligned(8) where needed; most of the time this __aligned(8) is not needed, so it's really just a few places, and we also have a cocci scripts to catch these during development if needed. Anyway, real thing would be to allow the BPF stack for unaligned access as well and then BPF backend could nicely solve this in a native way w/o any workarounds, but that is tbd.
Thanks, Daniel
[0] https://github.com/cilium/cilium/blob/master/bpf/include/bpf/builtins.h
Daniel, do you mind adding such memcpy to libbpf ?