Re: [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF

1 Mar 2023


      Hi Ed,
Had some trouble with email yesterday (forgot to renew domain
registration) and this reply might not have made it out. Apologies
if it's a repost.
On Mon, Feb 27, 2023 at 10:58:47PM +0000, Edward Cree wrote:
...
On 27/02/2023 22:04, Daniel Xu wrote:
...
I don't believe full L4 headers are required in the first fragment.
Sufficiently sneaky attackers can, I think, send a byte at a time to
subvert your proposed algorithm. Storing skb data seems inevitable here.
Someone can correct me if I'm wrong here.
My thinking was that legitimate traffic would never do this and thus if
 your first fragment doesn't have enough data to make a determination
 then you just DROP the packet.
Right, that would be practical. I had some discussion with coworkers and
the other option on the table is to drop all fragments. At least for us
in the cloud, fragments are heavily frowned upon (where are they not..)
anyways.
...
...
What I find valuable about this patch series is that we can
leverage the well understood and battle hardened kernel facilities. So
avoid all the correctness and security issues that the kernel has spent
20+ years fixing.
I can certainly see the argument here.  I guess it's a question of are
 you more worried about the DoS from tricking the validator into thinking
 good fragments are bad (the reverse is irrelevant because if you can
 trick a validator into thinking your bad fragment belongs to a previously
 seen good packet, then you can equally trick a reassembler into stitching
 your bad fragment into that packet), or are you more worried about the
 DoS from tying lots of memory down in the reassembly cache.
Equal balance of concerns on my side. Ideally there are no dropping of
valid packets and DoS is very hard to achieve.
...
Even with reordering handling, a data structure to record which ranges of
 a packet have been seen takes much less memory than storing the complete
 fragment bodies.  (Just a simple bitmap of 8-byte blocks — the resolution
 of iph->frag_off — reduces size by a factor of 64, not counting all the
 overhead of a struct sk_buff for each fragment in the queue.  Or you
 could re-use the rbtree-based code from the reassembler, just with a
 freshly allocated node containing only offset & length, instead of the
 whole SKB.)
Yeah, now that you say that, it doesn't sound too bad on space side. But
I do wonder -- how much code and complexity is that going to be? For
example I think ipv6 frags have a 60s reassembly timeout which adds more
stuff to consider. And probably even more I've already forgotten.
B/c at least on the kernel side, this series is 80% code for tests. And
the kfunc wrappers are not very invasive at all.  Plus it's wrapping
infra that hasn't changed much for decades.
...
And having a BPF helper effectively consume the skb is awkward, as you
 noted; someone is likely to decide that skb_copy() is too slow, try to
 add ctx invalidation, and thereby create a whole new swathe of potential
 correctness and security issues.
Yep. I did try that. While the verifier bits weren't too tricky, there
are a lot of infra concerns to solve:
* https://github.com/danobi/linux/commit/35a66af8d54cca647b0adfc7c1da7105d2603...
* https://github.com/danobi/linux/commit/e8c86ea75e2ca8f0631632d54ef7633813087...
* https://github.com/danobi/linux/commit/972bcf769f41fbfa7f84ce00faf06b5b57bc6...
But FWIW, fragmented packets are kinda a corner case anyways. I don't
think it would be resonable to expect high perf when packets are in
play.
...
Plus, imagine trying to support this in a hardware-offload XDP device.
 They'd have to reimplement the entire frag cache, which is a much bigger
 attack surface than just a frag validator, and they couldn't leverage
 the battle-hardened kernel implementation.
Hmm, well this helper is restricted to TC progs for now. I don't quite
see a path to enabling for XDP as there would have to be at a minimum
quite a few allocations to handle frags. So not sure XDP is a factor at
the moment.
...
...
And make it trivial for the next person that comes
along to do the right thing.
Fwiw the validator approach could *also* be a helper, it doesn't have to
 be something the BPF developer writes for themselves.
But if after thinking about the possibility you still prefer your way, I
 won't try to stop you — I just wanted to ensure it had been considered.
Thank you for the discussion. The thought had come to mind originally,
but I shied away after seeing some of the reassembly details. Would be
interested in hearing more from other folks.
Thanks,
Daniel

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH bpf-next v2 0/8] Support defragmenting IPv(4|6) packets in BPF