Re: [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF

28 Feb 2023

      Hi Alexei,
On Mon, Feb 27, 2023 at 08:56:38PM -0800, Alexei Starovoitov wrote:
...
On Mon, Feb 27, 2023 at 5:57 PM Daniel Xu dxu@dxuuu.xyz wrote:
...
Hi Alexei,
On Mon, Feb 27, 2023 at 03:03:38PM -0800, Alexei Starovoitov wrote:
...
On Mon, Feb 27, 2023 at 12:51:02PM -0700, Daniel Xu wrote:
...
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:

Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.

Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.

So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.

=== BPF related bits ===
However, when policy is enforced through BPF, the prog is run before the
kernel reassembles fragmented packets. This leaves BPF developers in a
awkward place: implement reassembly (possibly poorly) or use a stateless
method as described above.
Fortunately, the kernel has robust support for fragmented IP packets.
This patchset wraps the existing defragmentation facilities in kfuncs so
that BPF progs running on middleboxes can reassemble fragmented packets
before applying policy.
=== Patchset details ===
This patchset is (hopefully) relatively straightforward from BPF perspective.
One thing I'd like to call out is the skb_copy()ing of the prog skb. I
did this to maintain the invariant that the ctx remains valid after prog
has run. This is relevant b/c ip_defrag() and ip_check_defrag() may
consume the skb if the skb is a fragment.
Instead of doing all that with extra skb copy can you hook bpf prog after
the networking stack already handled ip defrag?
What kind of middle box are you doing? Why does it have to run at TC layer?
Unless I'm missing something, the only other relevant hooks would be
socket hooks, right?
Unfortunately I don't think my use case can do that. We are running the
kernel as a router, so no sockets are involved.
Are you using bpf_fib_lookup and populating kernel routing
table and doing everything on your own including neigh ?
We're currently not doing any routing things in BPF yet. All the routing
manipulation has been done in iptables / netfilter so far. I'm not super
familiar with routing stuff but from what I understand there is some
relatively complicated stuff going on with BGP and ipsec tunnels at the
moment. Not sure if that answers your question.
...
Have you considered to skb redirect to another netdev that does ip defrag?
Like macvlan does it under some conditions. This can be generalized.
I had not considered that yet. Are you suggesting adding a new
passthrough netdev thing that'll defrags? I looked at the macvlan driver
and it looks like it defrags to handle some multicast corner case.
...
Recently Florian proposed to allow calling bpf progs from all existing
netfilter hooks.
You can pretend to local deliver and hook in NF_INET_LOCAL_IN ?
Does that work for forwarding cases? I'm reading through [0] and it
seems to suggest that it'll only defrag for locally destined packets:
If the destination IP address is matches with
    local NIC's IP address, the dst_input() function will brings the packets
    into the ip_local_deliver(), which will defrag the packet and pass it
    to the NF_IP_LOCAL_IN hook
Faking local delivery seems kinda ugly -- maybe I don't know any clean
ways.
[...]
[0]: https://kernelnewbies.org/Networking?action=AttachFile&do=get&target...
Thanks,
Daniel

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF