On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote:
On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi memxor@gmail.com wrote:
[...]
All of these things are messy because of tc legacy. bpf tried to follow tc style with cls and act distinction and it didn't quite work. cls with direct-action is the only thing that became mainstream while tc style attach wasn't really addressed. There were several incidents where tc had tens of thousands of progs attached because of this attach/query/index weirdness described above. I think the only way to address this properly is to introduce bpf_link style of attaching to tc. Such bpf_link would support ingress/egress only. direction-action will be implied. There won't be any index and query will be obvious.
Note that we already have bpf_link support working (without support for pinning ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, handle, chain_index tuple uniquely identifies a filter, so we stash this in the bpf_link and are able to operate on the exact filter during release.
So I would like to propose to take this patch set a step further from what Daniel said: int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): and make this proposed api to return FD. To detach from tc ingress/egress just close(fd).
You mean adding an fd-based TC API to the kernel?
The user processes will not conflict with each other and will not accidently detach bpf program that was attached by another user process. Such api will address the existing tc query/attach/detach race race conditions.
Hmm, I think we do solve the race condition by returning the id. As long as you don't misuse the interface and go around deleting filters arbitrarily (i.e. only detach using the id), programs won't step over each other's filters. Storing the id from the netlink response received during detach also eliminates any ambigiuity from probing through get_info after attach. Same goes for actions, and the same applies to the bpf_link returning API (which stashes id/index).
Do you have any other example that can still be racy given the current API?
The only advantage of fd would be the possibility of pinning it, and extending lifetime of the filter.
And libbpf side of support for this api will be trivial. Single bpf link_create command with ifindex and ingress|egress arguments. wdyt?
-- Kartikeya