On Fri, Jan 24, 2020 at 12:11 PM Joe Stringer joe@wand.net.nz wrote:
Joe, thank you for taking the time to respond. And thank you for your efforts on the sk helpers; I both use them and borrowed heavily from them in coding this submission.
On Tue, Jan 21, 2020 at 12:36 PM Matt Cover werekraken@gmail.com wrote:
On Tue, Jan 21, 2020 at 1:20 PM Matthew Cover werekraken@gmail.com wrote:
Allow looking up an nf_conn. This allows eBPF programs to leverage nf_conntrack state for similar purposes to socket state use cases, as provided by the socket lookup helpers. This is particularly useful when nf_conntrack state is locally available, but socket state is not.
I think there's an important distinction between accessing sockets and accessing the connection tracker: Sockets are inherently tied to local processes. They consume resources regardless of what kind of fancy networking behaviour you desire out of the stack. Connection-tracking on the other hand only consumes resources if you enable features that explicitly require that functionality. This raises some interesting questions.
Sockets require local config to exist; a service must be listening. nf_conntrack entries require local config to exist. To me, this is not so different.
In addition to the nf_conntrack helpers, I'm hoping to add helpers for lookups to the ipvs connection table via ip_vs_conn_in_get(). From my perspective, this is again similar. The connection is locally known/owned, but there is no socket. The state lives elsewhere in the kernel, but already exists.
There is no need to pay the memory cost for native bpf ct in these cases (i.e. when nf_conntrack is already in use or when the flow traverses ipvs rather than terminating at a socket).
The kernel disables nf_conntrack by default to alleviate the costs associated with it[0]. In the case of this proposal, the BPF program itself is trying to use nf_conntrack, so does that mean that the kernel should auto-enable nf_conntrack hooks for the current namespace (or all namespaces, given that the helper provides access into other namespaces as well) whenever a BPF program is loaded that uses this helper?
I see no reason to auto-enable nf_conntrack. When nf_conntrack is not present, unloaded, or not configured the helpers return NULL. This is similar to the sk helpers when no service is listening on <daddr>:<dport>.
Related side note: What if you wanted to migitate the performance penalty of turning on nf_conntrack by programmatically choosing whether to populate the ct table? Do we then need to define an interface that allows a BPF program to tell nf_conntrack whether or not to track a given connection?
This certainly could be of interest for certain use cases. Adding such functionality is neither included nor precluded by this submission.
Writing to an existing nf_conn could be added to this helper in the future. Then, as an example, an XDP program could populate ct->mark and a restore mark rule could be used to apply the mark to the skb. This is conceptually similar to the XDP/tc interaction example.
https://github.com/xdp-project/xdp-tutorial/tree/master/advanced01-xdp-tc-in...
Adding new entries would be another helper if desired. Again, there is nothing I see in this submission to preclude the addition of such a helper, it simply isn't part of my current use case.
More importantly, nf_conntrack has a particular view in mind of what a connection is and the metadata that can be associated with a connection. On the other hand, one of the big pulls for building networking functionality in BPF is to allow flexibility. Over time, more complex use cases will arise that demand additional metadata to be stored with their connections. Cilium's connection tracking entries provides a glimpse of this[1]. I'm sure that the OVS-BPF project would have similar demands. Longer term, do we encourage such projects to migrate to this implementation, proposing metadata extensions that are programmable from BPF?
Presumably, such a push would require a helper to add new ct entries which I see as beyond the scope of this submission. However, I would imagine that, as long as a metadata extensions approach wasn't overly cumbersome to use, the performance of the two ct solutions would be the deciding factor.
Taking the metadata question further, there is not only the metadata that arbitrary BPF programs wish to associate with nf_conntrack. There is also the various extensions that nf_conntrack itself has which could be interesting for users that depend on that state. Would we draw a line before providing access into those aspects of nf_conntrack from BPF?
I'm planning to add a bpf_tcp_nf_conn() helper which gives access to members of ip_ct_tcp. This is similar to bpf_tcp_sock() in my mind. Are these the types of extensions you mean?
Beyond metadata, there is the question of write access to nf_conntrack. Presumably if a read helper like this is added to the BPF API, it is only balanced to also add create, update and delete operations? No doubt if someone wants to build NAT or firewall functionality in BPF using nf_conntrack, they will want this. Does this take us on the track of eventually exporting the entire nf_conntrack module (or even nf_nat) internal kernel APIs as external BPF API?
I touched on create and update above. Delete, like create, would almost certainly be a separate helper. This submission is not intended to put us on that track. I do not believe it hinders an effort such as that either. Are you worried that adding nf_conn to bpf is a slippery slope?
If the BPF API is going to provide a connection tracker, I feel that it should aim to solve connection tracking for various potential users. This takes us from not just what this patch does, but to the full vision of where this API goes with a connection tracker implementation that could be reused by e.g. OVS-BPF or Cilium. At this point, I'm not convinced why such an implementation should exist in the BPF API rather than as a common library that can be forked and tweaked for anyone's uses.
What do you see as the split of responsibility between BPF and other subsystems long-term for your use case that motivates relying upon nf_conntrack always running?
I do not see this as relying on nf_conntrack always running; I see it as not always relying that flow/connection ownership is determined by socket presence/state. Sockets, nf_conns, and ip_vs_conns are all of interest for different workloads.
[0] https://github.com/torvalds/linux/commit/4d3a57f23dec59f0a2362e63540b2d01b37... [1] https://github.com/cilium/cilium/blob/v1.6.5/bpf/lib/common.h#L510