On Tue, Jan 21, 2020 at 12:36 PM Matt Cover werekraken@gmail.com wrote:
On Tue, Jan 21, 2020 at 1:20 PM Matthew Cover werekraken@gmail.com wrote:
Allow looking up an nf_conn. This allows eBPF programs to leverage nf_conntrack state for similar purposes to socket state use cases, as provided by the socket lookup helpers. This is particularly useful when nf_conntrack state is locally available, but socket state is not.
I think there's an important distinction between accessing sockets and accessing the connection tracker: Sockets are inherently tied to local processes. They consume resources regardless of what kind of fancy networking behaviour you desire out of the stack. Connection-tracking on the other hand only consumes resources if you enable features that explicitly require that functionality. This raises some interesting questions.
The kernel disables nf_conntrack by default to alleviate the costs associated with it[0]. In the case of this proposal, the BPF program itself is trying to use nf_conntrack, so does that mean that the kernel should auto-enable nf_conntrack hooks for the current namespace (or all namespaces, given that the helper provides access into other namespaces as well) whenever a BPF program is loaded that uses this helper?
Related side note: What if you wanted to migitate the performance penalty of turning on nf_conntrack by programmatically choosing whether to populate the ct table? Do we then need to define an interface that allows a BPF program to tell nf_conntrack whether or not to track a given connection?
More importantly, nf_conntrack has a particular view in mind of what a connection is and the metadata that can be associated with a connection. On the other hand, one of the big pulls for building networking functionality in BPF is to allow flexibility. Over time, more complex use cases will arise that demand additional metadata to be stored with their connections. Cilium's connection tracking entries provides a glimpse of this[1]. I'm sure that the OVS-BPF project would have similar demands. Longer term, do we encourage such projects to migrate to this implementation, proposing metadata extensions that are programmable from BPF?
Taking the metadata question further, there is not only the metadata that arbitrary BPF programs wish to associate with nf_conntrack. There is also the various extensions that nf_conntrack itself has which could be interesting for users that depend on that state. Would we draw a line before providing access into those aspects of nf_conntrack from BPF?
Beyond metadata, there is the question of write access to nf_conntrack. Presumably if a read helper like this is added to the BPF API, it is only balanced to also add create, update and delete operations? No doubt if someone wants to build NAT or firewall functionality in BPF using nf_conntrack, they will want this. Does this take us on the track of eventually exporting the entire nf_conntrack module (or even nf_nat) internal kernel APIs as external BPF API?
If the BPF API is going to provide a connection tracker, I feel that it should aim to solve connection tracking for various potential users. This takes us from not just what this patch does, but to the full vision of where this API goes with a connection tracker implementation that could be reused by e.g. OVS-BPF or Cilium. At this point, I'm not convinced why such an implementation should exist in the BPF API rather than as a common library that can be forked and tweaked for anyone's uses.
What do you see as the split of responsibility between BPF and other subsystems long-term for your use case that motivates relying upon nf_conntrack always running?
[0] https://github.com/torvalds/linux/commit/4d3a57f23dec59f0a2362e63540b2d01b37... [1] https://github.com/cilium/cilium/blob/v1.6.5/bpf/lib/common.h#L510