On Fri, Sep 13, 2024 at 11:24:09AM -0700, Martin KaFai Lau wrote:
On 9/13/24 2:39 AM, Tiago Lam wrote:
This follows the same rationale provided for the ipv4 counterpart, where it now runs a reverse socket lookup when source addresses and/or ports are changed, on sendmsg, to check whether egress traffic should be allowed to go through or not.
As with ipv4, the ipv6 sendmsg path is also extended here to support the IPV6_ORIGDSTADDR ancilliary message to be able to specify a source address/port.
Suggested-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: Tiago Lam tiagolam@cloudflare.com
net/ipv6/datagram.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++ net/ipv6/udp.c | 8 ++++-- 2 files changed, 82 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index fff78496803d..4214dda1c320 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -756,6 +756,27 @@ void ip6_datagram_recv_ctl(struct sock *sk, struct msghdr *msg, } EXPORT_SYMBOL_GPL(ip6_datagram_recv_ctl); +static inline bool reverse_sk_lookup(struct flowi6 *fl6, struct sock *sk,
struct in6_addr *saddr, __be16 sport)
+{
- if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
(saddr && sport) &&
(ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, saddr) || inet_sk(sk)->inet_sport != sport)) {
struct sock *sk_egress;
bpf_sk_lookup_run_v6(sock_net(sk), IPPROTO_UDP, &fl6->daddr, fl6->fl6_dport,
saddr, ntohs(sport), 0, &sk_egress);
iirc, in the ingress path, the sk could also be selected by a tc bpf prog doing bpf_sk_assign. Then this re-run on sk_lookup may give an incorrect result?
If it does give the incorrect result, we still fallback to the normal egress path.
In general, is it necessary to rerun any bpf prog if the user space has specified the IP[v6]_ORIGDSTADDR.
More generally, wouldn't that also be the case if someone calls bpf_sk_assign() in both TC and sk_lookup on ingress? It can lead to some interference between the two.
It seems like the interesting cases are: 1. Calling bpf_sk_assign() on both TC and sk_lookup ingress: if this happens sk_lookup on egress should match the correct socket when doing the reverse lookup; 2. Calling bpf_sk_assign() only on ingress TC: in this case it will depend if an sk_lookup program is attached or not: a. If not, there's no reverse lookup on egress either; b. But if yes, although the reverse sk_lookup here won't match the initial socket assigned at ingress TC, the packets will still fallback to the normal egress path;
You're right in that case 2b above will continue with the same restrictions as before.
Tiago.