With the current API the only way to remove an allowed IP is to completely rebuild the allowed IPs set for a peer using WGPEER_F_REPLACE_ALLOWEDIPS. In other words, if my current configuration is such that a peer has allowed IP IPs 192.168.0.2 and 192.168.0.3 and I want to remove 192.168.0.2 the actual transition looks like this.
[192.168.0.2, 192.168.0.3] <-- Initial state [] <-- Step 1: Allowed IPs removed for peer [192.168.0.3] <-- Step 2: Allowed IPs added back for peer
This is true even if the allowed IP list is small and the update does not need to be batched into multiple WG_CMD_SET_DEVICE requests, as the removal and subsequent addition of IPs is non-atomic within a single request. Consequently, wg_allowedips_lookup_dst and wg_allowedips_lookup_src may return NULL while reconfiguring a peer even for packets bound for IPs a user did not intend to remove leading to unintended interruptions in connectivity. This presents in userspace as failed calls to sendto and sendmsg for UDP sockets. In my case, I ran netperf while repeatedly reconfiguring the allowed IPs for a peer with wg.
/usr/local/bin/netperf -H 10.102.73.72 -l 10m -t UDP_STREAM -- -R 1 -m 1024 send_data: data send error: No route to host (errno 113) netperf: send_omni: send_data failed: No route to host
While this may not be of particular concern for environments where peers and allowed IPs are mostly static, Cilium manages peers and allowed IPs in a dynamic environment where peers (i.e. Kubernetes nodes) and allowed IPs (i.e. Pods running on those nodes) can frequently change. Cilium must continually keep its WireGuard device's configuration in sync with its cluster state leading to unnecessary churn and packet drops.
This patch introduces a new flag called WGALLOWEDIP_F_REMOVE_ME which in the same way that WGPEER_F_REMOVE_ME allows a user to remove a single peer from a WireGuard device's configuration allows a user to remove an IP from a peer's set of allowed IPs. This has two benefits. First, it allows systems such as Cilium to avoid introducing connectivity blips while reconfiguring a WireGuard device. Second, it allows us to more efficiently keep the device's configuration in sync with the cluster state, as we no longer need to do frequent rebuilds of the allowed IPs list for each peer. Instead, the device's configuration can be incrementally updated. This patch also bumps WG_GENL_VERSION which can be used by clients to detect whether or not their system supports the WGALLOWEDIP_F_REMOVE_ME flag.
======= Changes =======
v1->v2 ------ * Fixed some Sparse warnings
Signed-off-by: Jordan Rife jrife@google.com --- drivers/net/wireguard/allowedips.c | 103 ++++++++++---- drivers/net/wireguard/allowedips.h | 4 + drivers/net/wireguard/netlink.c | 45 +++++-- drivers/net/wireguard/selftest/allowedips.c | 30 +++++ include/uapi/linux/wireguard.h | 11 +- tools/testing/selftests/wireguard/Makefile | 18 +++ tools/testing/selftests/wireguard/netns.sh | 38 ++++++ tools/testing/selftests/wireguard/remove-ip.c | 126 ++++++++++++++++++ 8 files changed, 333 insertions(+), 42 deletions(-) create mode 100644 tools/testing/selftests/wireguard/Makefile create mode 100644 tools/testing/selftests/wireguard/remove-ip.c
diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c index 4b8528206cc8a..ff52259dd8d81 100644 --- a/drivers/net/wireguard/allowedips.c +++ b/drivers/net/wireguard/allowedips.c @@ -249,6 +249,56 @@ static int add(struct allowedips_node __rcu **trie, u8 bits, const u8 *key, return 0; }
+static void _remove(struct allowedips_node *node, struct mutex *lock) +{ + struct allowedips_node *child, **parent_bit, *parent; + bool free_parent; + + list_del_init(&node->peer_list); + RCU_INIT_POINTER(node->peer, NULL); + if (node->bit[0] && node->bit[1]) + return; + child = rcu_dereference_protected(node->bit[!rcu_access_pointer(node->bit[0])], + lockdep_is_held(lock)); + if (child) + child->parent_bit_packed = node->parent_bit_packed; + parent_bit = (struct allowedips_node **)(node->parent_bit_packed & ~3UL); + *parent_bit = child; + parent = (void *)parent_bit - + offsetof(struct allowedips_node, bit[node->parent_bit_packed & 1]); + free_parent = !rcu_access_pointer(node->bit[0]) && + !rcu_access_pointer(node->bit[1]) && + (node->parent_bit_packed & 3) <= 1 && + !rcu_access_pointer(parent->peer); + if (free_parent) + child = rcu_dereference_protected(parent->bit[!(node->parent_bit_packed & 1)], + lockdep_is_held(lock)); + call_rcu(&node->rcu, node_free_rcu); + if (!free_parent) + return; + if (child) + child->parent_bit_packed = parent->parent_bit_packed; + *(struct allowedips_node **)(parent->parent_bit_packed & ~3UL) = child; + call_rcu(&parent->rcu, node_free_rcu); +} + +static int remove(struct allowedips_node __rcu **trie, u8 bits, const u8 *key, + u8 cidr, struct wg_peer *peer, struct mutex *lock) +{ + struct allowedips_node *node; + + if (unlikely(cidr > bits || !peer)) + return -EINVAL; + if (!rcu_access_pointer(*trie) || + !node_placement(*trie, key, cidr, bits, &node, lock) || + peer != rcu_access_pointer(node->peer)) + return 0; + + _remove(node, lock); + + return 0; +} + void wg_allowedips_init(struct allowedips *table) { table->root4 = table->root6 = NULL; @@ -300,43 +350,38 @@ int wg_allowedips_insert_v6(struct allowedips *table, const struct in6_addr *ip, return add(&table->root6, 128, key, cidr, peer, lock); }
+int wg_allowedips_remove_v4(struct allowedips *table, const struct in_addr *ip, + u8 cidr, struct wg_peer *peer, struct mutex *lock) +{ + /* Aligned so it can be passed to fls */ + u8 key[4] __aligned(__alignof(u32)); + + ++table->seq; + swap_endian(key, (const u8 *)ip, 32); + return remove(&table->root4, 32, key, cidr, peer, lock); +} + +int wg_allowedips_remove_v6(struct allowedips *table, const struct in6_addr *ip, + u8 cidr, struct wg_peer *peer, struct mutex *lock) +{ + /* Aligned so it can be passed to fls64 */ + u8 key[16] __aligned(__alignof(u64)); + + ++table->seq; + swap_endian(key, (const u8 *)ip, 128); + return remove(&table->root6, 128, key, cidr, peer, lock); +} + void wg_allowedips_remove_by_peer(struct allowedips *table, struct wg_peer *peer, struct mutex *lock) { - struct allowedips_node *node, *child, **parent_bit, *parent, *tmp; - bool free_parent; + struct allowedips_node *node, *tmp;
if (list_empty(&peer->allowedips_list)) return; ++table->seq; list_for_each_entry_safe(node, tmp, &peer->allowedips_list, peer_list) { - list_del_init(&node->peer_list); - RCU_INIT_POINTER(node->peer, NULL); - if (node->bit[0] && node->bit[1]) - continue; - child = rcu_dereference_protected(node->bit[!rcu_access_pointer(node->bit[0])], - lockdep_is_held(lock)); - if (child) - child->parent_bit_packed = node->parent_bit_packed; - parent_bit = (struct allowedips_node **)(node->parent_bit_packed & ~3UL); - *parent_bit = child; - parent = (void *)parent_bit - - offsetof(struct allowedips_node, bit[node->parent_bit_packed & 1]); - free_parent = !rcu_access_pointer(node->bit[0]) && - !rcu_access_pointer(node->bit[1]) && - (node->parent_bit_packed & 3) <= 1 && - !rcu_access_pointer(parent->peer); - if (free_parent) - child = rcu_dereference_protected( - parent->bit[!(node->parent_bit_packed & 1)], - lockdep_is_held(lock)); - call_rcu(&node->rcu, node_free_rcu); - if (!free_parent) - continue; - if (child) - child->parent_bit_packed = parent->parent_bit_packed; - *(struct allowedips_node **)(parent->parent_bit_packed & ~3UL) = child; - call_rcu(&parent->rcu, node_free_rcu); + _remove(node, lock); } }
diff --git a/drivers/net/wireguard/allowedips.h b/drivers/net/wireguard/allowedips.h index 2346c797eb4d8..931958cb6e100 100644 --- a/drivers/net/wireguard/allowedips.h +++ b/drivers/net/wireguard/allowedips.h @@ -38,6 +38,10 @@ int wg_allowedips_insert_v4(struct allowedips *table, const struct in_addr *ip, u8 cidr, struct wg_peer *peer, struct mutex *lock); int wg_allowedips_insert_v6(struct allowedips *table, const struct in6_addr *ip, u8 cidr, struct wg_peer *peer, struct mutex *lock); +int wg_allowedips_remove_v4(struct allowedips *table, const struct in_addr *ip, + u8 cidr, struct wg_peer *peer, struct mutex *lock); +int wg_allowedips_remove_v6(struct allowedips *table, const struct in6_addr *ip, + u8 cidr, struct wg_peer *peer, struct mutex *lock); void wg_allowedips_remove_by_peer(struct allowedips *table, struct wg_peer *peer, struct mutex *lock); /* The ip input pointer should be __aligned(__alignof(u64))) */ diff --git a/drivers/net/wireguard/netlink.c b/drivers/net/wireguard/netlink.c index f7055180ba4aa..5f2a8553ab43d 100644 --- a/drivers/net/wireguard/netlink.c +++ b/drivers/net/wireguard/netlink.c @@ -46,7 +46,8 @@ static const struct nla_policy peer_policy[WGPEER_A_MAX + 1] = { static const struct nla_policy allowedip_policy[WGALLOWEDIP_A_MAX + 1] = { [WGALLOWEDIP_A_FAMILY] = { .type = NLA_U16 }, [WGALLOWEDIP_A_IPADDR] = NLA_POLICY_MIN_LEN(sizeof(struct in_addr)), - [WGALLOWEDIP_A_CIDR_MASK] = { .type = NLA_U8 } + [WGALLOWEDIP_A_CIDR_MASK] = { .type = NLA_U8 }, + [WGALLOWEDIP_A_FLAGS] = { .type = NLA_U32 } };
static struct wg_device *lookup_interface(struct nlattr **attrs, @@ -329,6 +330,7 @@ static int set_port(struct wg_device *wg, u16 port) static int set_allowedip(struct wg_peer *peer, struct nlattr **attrs) { int ret = -EINVAL; + u32 flags = 0; u16 family; u8 cidr;
@@ -337,19 +339,38 @@ static int set_allowedip(struct wg_peer *peer, struct nlattr **attrs) return ret; family = nla_get_u16(attrs[WGALLOWEDIP_A_FAMILY]); cidr = nla_get_u8(attrs[WGALLOWEDIP_A_CIDR_MASK]); + if (attrs[WGALLOWEDIP_A_FLAGS]) + flags = nla_get_u32(attrs[WGALLOWEDIP_A_FLAGS]);
if (family == AF_INET && cidr <= 32 && - nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr)) - ret = wg_allowedips_insert_v4( - &peer->device->peer_allowedips, - nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, - &peer->device->device_update_lock); - else if (family == AF_INET6 && cidr <= 128 && - nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in6_addr)) - ret = wg_allowedips_insert_v6( - &peer->device->peer_allowedips, - nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, - &peer->device->device_update_lock); + nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr)) { + if (flags & WGALLOWEDIP_F_REMOVE_ME) + ret = wg_allowedips_remove_v4(&peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), + cidr, + peer, + &peer->device->device_update_lock); + else + ret = wg_allowedips_insert_v4(&peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), + cidr, + peer, + &peer->device->device_update_lock); + } else if (family == AF_INET6 && cidr <= 128 && + nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in6_addr)) { + if (flags & WGALLOWEDIP_F_REMOVE_ME) + ret = wg_allowedips_remove_v6(&peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), + cidr, + peer, + &peer->device->device_update_lock); + else + ret = wg_allowedips_insert_v6(&peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), + cidr, + peer, + &peer->device->device_update_lock); + }
return ret; } diff --git a/drivers/net/wireguard/selftest/allowedips.c b/drivers/net/wireguard/selftest/allowedips.c index 3d1f64ff2e122..9f6458a889e96 100644 --- a/drivers/net/wireguard/selftest/allowedips.c +++ b/drivers/net/wireguard/selftest/allowedips.c @@ -461,6 +461,10 @@ static __init struct wg_peer *init_peer(void) wg_allowedips_insert_v##version(&t, ip##version(ipa, ipb, ipc, ipd), \ cidr, mem, &mutex)
+#define remove(version, mem, ipa, ipb, ipc, ipd, cidr) \ + wg_allowedips_remove_v##version(&t, ip##version(ipa, ipb, ipc, ipd), \ + cidr, mem, &mutex) + #define maybe_fail() do { \ ++i; \ if (!_s) { \ @@ -586,6 +590,32 @@ bool __init wg_allowedips_selftest(void) test_negative(4, a, 192, 0, 0, 0); test_negative(4, a, 255, 0, 0, 0);
+ insert(4, a, 1, 0, 0, 0, 32); + insert(4, a, 192, 0, 0, 0, 24); + insert(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef, 128); + insert(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0, 98); + test(4, a, 1, 0, 0, 0); + test(4, a, 192, 0, 0, 1); + test(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef); + test(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0x10101010); + /* Must be an exact match to remove */ + remove(4, a, 192, 0, 0, 0, 32); + test(4, a, 192, 0, 0, 1); + remove(4, a, 192, 0, 0, 0, 24); + test_negative(4, a, 192, 0, 0, 1); + remove(4, a, 1, 0, 0, 0, 32); + test_negative(4, a, 1, 0, 0, 0); + /* Must be an exact match to remove */ + remove(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef, 96); + test(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef); + remove(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef, 128); + test_negative(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef); + /* Must match the peer to remove */ + remove(6, b, 0x24446800, 0xf0e40800, 0xeeaebeef, 0, 98); + test(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0x10101010); + remove(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0, 98); + test_negative(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0x10101010); + wg_allowedips_free(&t, &mutex); wg_allowedips_init(&t); insert(4, a, 192, 168, 0, 0, 16); diff --git a/include/uapi/linux/wireguard.h b/include/uapi/linux/wireguard.h index ae88be14c9478..e219194cb9f5a 100644 --- a/include/uapi/linux/wireguard.h +++ b/include/uapi/linux/wireguard.h @@ -101,6 +101,10 @@ * WGALLOWEDIP_A_FAMILY: NLA_U16 * WGALLOWEDIP_A_IPADDR: struct in_addr or struct in6_addr * WGALLOWEDIP_A_CIDR_MASK: NLA_U8 + * WGALLOWEDIP_A_FLAGS: NLA_U32, WGALLOWEDIP_F_REMOVE_ME if + * the specified IP should be removed, + * otherwise this IP will be added if + * it is not already present. * 0: NLA_NESTED * ... * 0: NLA_NESTED @@ -132,7 +136,7 @@ #define _WG_UAPI_WIREGUARD_H
#define WG_GENL_NAME "wireguard" -#define WG_GENL_VERSION 1 +#define WG_GENL_VERSION 2
#define WG_KEY_LEN 32
@@ -184,11 +188,16 @@ enum wgpeer_attribute { }; #define WGPEER_A_MAX (__WGPEER_A_LAST - 1)
+enum wgallowedip_flag { + WGALLOWEDIP_F_REMOVE_ME = 1U << 0, + __WGALLOWEDIP_F_ALL = WGALLOWEDIP_F_REMOVE_ME +}; enum wgallowedip_attribute { WGALLOWEDIP_A_UNSPEC, WGALLOWEDIP_A_FAMILY, WGALLOWEDIP_A_IPADDR, WGALLOWEDIP_A_CIDR_MASK, + WGALLOWEDIP_A_FLAGS, __WGALLOWEDIP_A_LAST }; #define WGALLOWEDIP_A_MAX (__WGALLOWEDIP_A_LAST - 1) diff --git a/tools/testing/selftests/wireguard/Makefile b/tools/testing/selftests/wireguard/Makefile new file mode 100644 index 0000000000000..4f4db54f89cb3 --- /dev/null +++ b/tools/testing/selftests/wireguard/Makefile @@ -0,0 +1,18 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Note: To build this you must install libnl-3 and libnl-genl-3 development +# packages. +remove-ip: + gcc -I/usr/include/libnl3 \ + -I../../../../usr/include \ + remove-ip.c \ + -o remove-ip \ + -lnl-genl-3 \ + -lnl-3 + +.PHONY: all +all: remove-ip + +.PHONY: clean +clean: + rm remove-ip diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh index 405ff262ca93d..70058d6ebbe85 100755 --- a/tools/testing/selftests/wireguard/netns.sh +++ b/tools/testing/selftests/wireguard/netns.sh @@ -28,6 +28,7 @@ exec 3>&1 export LANG=C export WG_HIDE_KEYS=never NPROC=( /sys/devices/system/cpu/cpu+([0-9]) ); NPROC=${#NPROC[@]} +SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) netns0="wg-test-$$-0" netns1="wg-test-$$-1" netns2="wg-test-$$-2" @@ -610,6 +611,43 @@ n0 wg set wg0 peer "$pub2" allowed-ips "$allowedips" } < <(n0 wg show wg0 allowed-ips) ip0 link del wg0
+# Test IP removal +allowedips=( ) +for i in {1..197}; do + allowedips+=( 192.168.0.$i ) + allowedips+=( abcd::$i ) +done +saved_ifs="$IFS" +IFS=, +allowedips="${allowedips[*]}" +IFS="$saved_ifs" +ip0 link add wg0 type wireguard +n0 wg set wg0 peer "$pub1" allowed-ips "$allowedips" +pub1_hex=$(echo "$pub1" | base64 -d | xxd -p -c 50) +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 4 192.168.0.1 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 4 192.168.0.20 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 4 192.168.0.100 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 6 abcd::1 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 6 abcd::20 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 6 abcd::100 +n0 wg show wg0 allowed-ips +{ + read -r pub allowedips + [[ $pub == "$pub1" ]] + i=0 + for ip in $allowedips; do + [[ "$ip" != "192.168.0.1" ]] + [[ "$ip" != "192.168.0.20" ]] + [[ "$ip" != "192.168.0.100" ]] + [[ "$ip" != "abcd::1" ]] + [[ "$ip" != "abcd::20" ]] + [[ "$ip" != "abcd::100" ]] + ((++i)) + done + ((i == 388)) +} < <(n0 wg show wg0 allowed-ips) +ip0 link del wg0 + ! n0 wg show doesnotexist || false
ip0 link add wg0 type wireguard diff --git a/tools/testing/selftests/wireguard/remove-ip.c b/tools/testing/selftests/wireguard/remove-ip.c new file mode 100644 index 0000000000000..242f922d99b56 --- /dev/null +++ b/tools/testing/selftests/wireguard/remove-ip.c @@ -0,0 +1,126 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/wireguard.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <arpa/inet.h> +#include <netlink/socket.h> +#include <netlink/netlink.h> +#include <netlink/genl/ctrl.h> +#include <netlink/genl/genl.h> +#include <netlink/genl/family.h> + +#define CURVE25519_KEY_SIZE 32 + +const char *usage = "Usage: remove-ip INTERFACE_NAME PEER_PUBLIC_KEY_HEX IP_VERSION IP"; + +char h2b(char c) +{ + if ('0' <= c && c <= '9') + return c - '0'; + else if ('a' <= c && c <= 'f') + return 10 + (c - 'a'); + + return -1; +} + +int parse_key(const char *raw, unsigned char key[CURVE25519_KEY_SIZE]) +{ + int ret = 0; + int i; + + for (i = 0; i < CURVE25519_KEY_SIZE; i++) { + char h, l; + + h = h2b(raw[0]); + if (h < 0) + return -1; + + l = h2b(raw[1]); + if (l < 0) + return -1; + + key[i] = (h << 4) | l; + raw += 2; + } + + return 0; +} + +int main(int argc, char **argv) +{ + unsigned char addr[sizeof(struct in6_addr)]; + unsigned char pub_key[CURVE25519_KEY_SIZE]; + struct nl_sock *sock; + struct nl_msg *msg; + int addr_len; + int family; + int cidr; + int af; + + if (argc < 5) { + printf("Not enough arguments.\n\n%s\n", usage); + return -1; + } + + if (parse_key(argv[2], pub_key)) { + printf("Could not parse public key\n"); + return -1; + } + + switch (argv[3][0]) { + case '4': + af = AF_INET; + addr_len = sizeof(struct in_addr); + cidr = 32; + break; + case '6': + af = AF_INET6; + addr_len = sizeof(struct in6_addr); + cidr = 128; + break; + default: + printf("Invalid IP version\n"); + return -1; + } + + if (inet_pton(af, argv[4], &addr) <= 0) { + printf("Could not parse IP address\n"); + return -1; + } + + sock = nl_socket_alloc(); + genl_connect(sock); + family = genl_ctrl_resolve(sock, WG_GENL_NAME); + msg = nlmsg_alloc(); + genlmsg_put(msg, NL_AUTO_PID, NL_AUTO_SEQ, family, 0, NLM_F_ECHO, + WG_CMD_SET_DEVICE, WG_GENL_VERSION); + nla_put_string(msg, WGDEVICE_A_IFNAME, argv[1]); + + struct nlattr *peers = nla_nest_start(msg, WGDEVICE_A_PEERS); + struct nlattr *peer0 = nla_nest_start(msg, 0); + + nla_put(msg, WGPEER_A_PUBLIC_KEY, CURVE25519_KEY_SIZE, pub_key); + + struct nlattr *allowed_ips = nla_nest_start(msg, WGPEER_A_ALLOWEDIPS); + struct nlattr *allowed_ip0 = nla_nest_start(msg, 0); + + nla_put_u16(msg, WGALLOWEDIP_A_FAMILY, af); + nla_put(msg, WGALLOWEDIP_A_IPADDR, addr_len, &addr); + nla_put_u8(msg, WGALLOWEDIP_A_CIDR_MASK, cidr); + nla_put_u32(msg, WGALLOWEDIP_A_FLAGS, WGALLOWEDIP_F_REMOVE_ME); + nla_nest_end(msg, allowed_ip0); + nla_nest_end(msg, allowed_ips); + nla_nest_end(msg, peer0); + nla_nest_end(msg, peers); + + int err = nl_send_sync(sock, msg); + + if (err < 0) { + char message[256]; + + nl_perror(err, message); + printf("An error occurred: %d - %s\n", err, message); + } + + return err; +}
Hi Jason,
Just reaching out to see if you've had a chance to take a look at this. Thanks!
-Jordan
Hi Jordan,
Sorry it's taken me a bit to look at this patch. A few notes:
On Thu, Sep 05, 2024 at 03:05:41PM -0500, Jordan Rife wrote:
With the current API the only way to remove an allowed IP is to completely rebuild the allowed IPs set for a peer using WGPEER_F_REPLACE_ALLOWEDIPS. In other words, if my current configuration is such that a peer has allowed IP IPs 192.168.0.2 and 192.168.0.3 and I want to remove 192.168.0.2 the actual transition looks like this.
[192.168.0.2, 192.168.0.3] <-- Initial state [] <-- Step 1: Allowed IPs removed for peer [192.168.0.3] <-- Step 2: Allowed IPs added back for peer
This is true even if the allowed IP list is small and the update does not need to be batched into multiple WG_CMD_SET_DEVICE requests, as the removal and subsequent addition of IPs is non-atomic within a single request. Consequently, wg_allowedips_lookup_dst and wg_allowedips_lookup_src may return NULL while reconfiguring a peer even for packets bound for IPs a user did not intend to remove leading to unintended interruptions in connectivity. This presents in userspace as failed calls to sendto and sendmsg for UDP sockets. In my case, I ran netperf while repeatedly reconfiguring the allowed IPs for a peer with wg.
/usr/local/bin/netperf -H 10.102.73.72 -l 10m -t UDP_STREAM -- -R 1 -m 1024 send_data: data send error: No route to host (errno 113) netperf: send_omni: send_data failed: No route to host
That's a worthwhile point. This indeed fixes a *bug*, beyond being just a helpful new feature.
incrementally updated. This patch also bumps WG_GENL_VERSION which can be used by clients to detect whether or not their system supports the WGALLOWEDIP_F_REMOVE_ME flag.
I'm actually less enthusiastic about this part, but mainly because I haven't looked closely at what the convention for this is. I was wondering though - this adds WGALLOWEDIP_A_FLAGS, which didn't exist before. Shouldn't some upper layer return a relevant value in that case? And even within the existing flags, for WGPEER_A_FLAGS, for example, old kernels check to see if there are new flags, for this purpose, e.g.:
if (attrs[WGPEER_A_FLAGS]) flags = nla_get_u32(attrs[WGPEER_A_FLAGS]); ret = -EOPNOTSUPP; if (flags & ~__WGPEER_F_ALL) goto out;
So I think we might be able to avoid having to bump the version number. GENL is supposed to be extensible like this.
+static void _remove(struct allowedips_node *node, struct mutex *lock)
This file doesn't really do the _ prefix thing anywhere. Can you call this something more descriptive, like "remove_node"?
if (free_parent)
child = rcu_dereference_protected(
parent->bit[!(node->parent_bit_packed & 1)],
lockdep_is_held(lock));
[...]
- if (free_parent)
child = rcu_dereference_protected(parent->bit[!(node->parent_bit_packed & 1)],
lockdep_is_held(lock));
Thanks for fixing up the ugly extra \n in the original code you copy and pasted from. I remember the horror of that when I added line breaks in the original code.
- call_rcu(&node->rcu, node_free_rcu);
- if (!free_parent)
return;
- if (child)
child->parent_bit_packed = parent->parent_bit_packed;
- *(struct allowedips_node **)(parent->parent_bit_packed & ~3UL) = child;
- call_rcu(&parent->rcu, node_free_rcu);
+}
+static int remove(struct allowedips_node __rcu **trie, u8 bits, const u8 *key,
u8 cidr, struct wg_peer *peer, struct mutex *lock)
+{
- struct allowedips_node *node;
- if (unlikely(cidr > bits || !peer))
return -EINVAL;
Reasoning for this is that it copies the logic in add()?
- if (!rcu_access_pointer(*trie) ||
!node_placement(*trie, key, cidr, bits, &node, lock) ||
peer != rcu_access_pointer(node->peer))
return 0;
What's the reasoning behind returning success when it can't find the node? Because in that case it's already removed so the function is idempotent? And you checked that nothing really cares about the return value there anyway? Or is this a mistake and you meant to return something else? I can imagine good reasoning in either direction; I'd just like to learn that your choice is deliberate.
- _remove(node, lock);
- return 0;
+}
- family = nla_get_u16(attrs[WGALLOWEDIP_A_FAMILY]); cidr = nla_get_u8(attrs[WGALLOWEDIP_A_CIDR_MASK]);
- if (attrs[WGALLOWEDIP_A_FLAGS])
flags = nla_get_u32(attrs[WGALLOWEDIP_A_FLAGS]);
As I mentioned above, you need to do the dance of:
ret = -EOPNOTSUPP; if (flags & ~__WGALLOWEDIP_F_ALL) goto out;
So that we can safely extend this later.
if (family == AF_INET && cidr <= 32 &&
nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr))
ret = wg_allowedips_insert_v4(
&peer->device->peer_allowedips,
nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer,
&peer->device->device_update_lock);
- else if (family == AF_INET6 && cidr <= 128 &&
nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in6_addr))
ret = wg_allowedips_insert_v6(
&peer->device->peer_allowedips,
nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer,
&peer->device->device_update_lock);
nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr)) {
if (flags & WGALLOWEDIP_F_REMOVE_ME)
ret = wg_allowedips_remove_v4(&peer->device->peer_allowedips,
nla_data(attrs[WGALLOWEDIP_A_IPADDR]),
cidr,
peer,
&peer->device->device_update_lock);
We get 100 chars now, so you can rewrite this as:
ret = wg_allowedips_remove_v4(&peer->device->peer_allowedips, nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, &peer->device->device_update_lock);
WGALLOWEDIP_A_FLAGS: NLA_U32, WGALLOWEDIP_F_REMOVE_ME if
the specified IP should be removed,
That comma should be a semicolon because what comes after is a complete sentence, and there's no conjunction.
otherwise this IP will be added if
it is not already present.
+remove-ip:
gcc -I/usr/include/libnl3 \
-I../../../../usr/include \
remove-ip.c \
-o remove-ip \
-lnl-genl-3 \
-lnl-3
sock = nl_socket_alloc();
genl_connect(sock);
family = genl_ctrl_resolve(sock, WG_GENL_NAME);
msg = nlmsg_alloc();
genlmsg_put(msg, NL_AUTO_PID, NL_AUTO_SEQ, family, 0, NLM_F_ECHO,
WG_CMD_SET_DEVICE, WG_GENL_VERSION);
nla_put_string(msg, WGDEVICE_A_IFNAME, argv[1]);
struct nlattr *peers = nla_nest_start(msg, WGDEVICE_A_PEERS);
struct nlattr *peer0 = nla_nest_start(msg, 0);
nla_put(msg, WGPEER_A_PUBLIC_KEY, CURVE25519_KEY_SIZE, pub_key);
struct nlattr *allowed_ips = nla_nest_start(msg, WGPEER_A_ALLOWEDIPS);
struct nlattr *allowed_ip0 = nla_nest_start(msg, 0);
nla_put_u16(msg, WGALLOWEDIP_A_FAMILY, af);
nla_put(msg, WGALLOWEDIP_A_IPADDR, addr_len, &addr);
nla_put_u8(msg, WGALLOWEDIP_A_CIDR_MASK, cidr);
nla_put_u32(msg, WGALLOWEDIP_A_FLAGS, WGALLOWEDIP_F_REMOVE_ME);
nla_nest_end(msg, allowed_ip0);
nla_nest_end(msg, allowed_ips);
nla_nest_end(msg, peer0);
nla_nest_end(msg, peers);
int err = nl_send_sync(sock, msg);
if (err < 0) {
char message[256];
nl_perror(err, message);
printf("An error occurred: %d - %s\n", err, message);
}
I'm not so keen on this, simply because we've been able to do everything else in that script and keeping with the "make sure the userspace tooling" paradigm. There are two options:
1. Rewrite netns.sh all in C, only depending on libnl or whatever (which I would actually really *love* to see happen). This would change the testing paradigm, but I'd be okay with that if it's done well and all at once.
2. Add support for this new flag to wg(8) (which I think is necessary anyway for this to land; kernel features and userspace support oughta be posted at once).
Thanks for the patch. I like the feature and I'm happy you posted this.
Jason
On Thu, Sep 05, 2024 at 03:05:41PM -0500, Jordan Rife wrote:
With the current API the only way to remove an allowed IP is to completely rebuild the allowed IPs set for a peer using WGPEER_F_REPLACE_ALLOWEDIPS.
Just for posterity, there actually is another way: create a new peer with a random key, and give the allowed IP you want to remove to that peer. Moves are atomic. Then destroy that peer.
Not that this is clean or nice or something, and I like your patch. But in case somebody gets into trouble before this lands, I thought I should note it on the list.
Jason
Hi Jason,
Thanks a lot for taking a look!
I'm actually less enthusiastic about this part, but mainly because I haven't looked closely at what the convention for this is. I was wondering though - this adds WGALLOWEDIP_A_FLAGS, which didn't exist before. Shouldn't some upper layer return a relevant value in that case? And even within the existing flags, for WGPEER_A_FLAGS, for example, old kernels check to see if there are new flags, for this purpose, e.g.:
if (attrs[WGPEER_A_FLAGS]) flags = nla_get_u32(attrs[WGPEER_A_FLAGS]); ret = -EOPNOTSUPP; if (flags & ~__WGPEER_F_ALL) goto out;
So I think we might be able to avoid having to bump the version number. GENL is supposed to be extensible like this.
I think the challenge with WGALLOWEDIP_A_FLAGS in particular is that because it didn't exist since the beginning like WGPEER_A_FLAGS, there are kernels out there that have no knowledge of it and wouldn't have this check in place. While I think it's a good idea to replicate this check for WGALLOWEDIP_A_FLAGS as well for future compatibility, we still need some way for clients to probe whether or not this feature is supported in case they're running on an older kernel. If we want to keep the version number as-is, I see a few alternatives:
1. Include WGALLOWEDIP_A_FLAGS in the response of WG_CMD_GET_DEVICE. This would be a new field inside each entry of WGPEER_A_ALLOWEDIPS. If the result of WG_CMD_GET_DEVICE contains WGALLOWEDIP_A_FLAGS then a client knows that it can use features WGALLOWEDIP_F_REMOVE_ME. It could potentially be used later to contain persistent flags for an IP, but for now would just be zeroed out. This fails if there are no allowed IPs configured for the device you're trying to configure, but in the case where you're trying to use WGALLOWEDIP_F_REMOVE_ME you probably would. 2. Keep everything the same. Don't bump the version number. Clients could in theory try to use WGALLOWEDIP_A_FLAGS with WG_CMD_SET_DEVICE then check if their request had the intended effect (e.g. checking if the IP was removed).
I slightly prefer approach 1 myself, as I feel it's a bit cleaner, but I'm happy to discuss some other approaches here. I'm trying to think about how these probes could be used inside the WireGuard Go library and wg itself. Assuming approach one,
* For libraries that manage WireGuard like golang.zx2c4.com/wireguard/wgctrl the first time a client uses .Device() (WG_CMD_GET_DEVICE under the hood) there would need to be some logic that looks at WGPEER_A_ALLOWEDIPS for one of the peers and sets some internal flag like c.supportsAllowedIPFlags. When a client later calls c.ConfigureDevice() the library would disallow direct IP removal if the feature is not supported (c.supportsAllowedIPFlags != nil && !*c.supportsAllowedIPFlags). Alternatively, you could do all this up front while initializing the client by creating some dummy device + peer, adding some IP, then using WG_CMD_GET_DEVICE to see if WGALLOWEDIP_A_FLAGS is present in the result. * Similarly for wg, if the user is trying to remove an allowed IP you'd first query the allowed IP for a peer and check for WGALLOWEDIP_A_FLAGS if it doesn't exist in the response then the command would fail and print something like "not supported".
Bumping WG_GENL_VERSION is cleaner still, since clients in userspace just need to check an integer value and avoid any probing logic. However, like you I am unsure what the convention is here and whether or not this has any unintended effects.
This file doesn't really do the _ prefix thing anywhere. Can you call this something more descriptive, like "remove_node"?
Sure. I'll update this in v3.
Reasoning for this is that it copies the logic in add()?
As for the cidr > bits check, the intent was simply to fail if the user passes an invalid value for WGALLOWEDIP_A_CIDR_MASK. Although, unlike the add() case, I suppose we could just remove this check. Worst case if a user passes something high like 255 remove() would just be a no-op. It looks safe to remove !peer check as well. I can update this in v3.
What's the reasoning behind returning success when it can't find the node? Because in that case it's already removed so the function is idempotent? And you checked that nothing really cares about the return value there anyway? Or is this a mistake and you meant to return something else? I can imagine good reasoning in either direction; I'd just like to learn that your choice is deliberate.
Yes, I opted for idempotence here intentionally. At least for the use cases I have in mind the intent is basically "remove all these allowed IPs from this peer if they exist". If an allowed IP already got removed or is mapped to another peer somehow then that's fine. I'd imagine it complicates the code in userspace to have to deal with corner cases involving possible error codes returned when removing a batch of allowed IPs. You'd need to query the device again, reform your request, etc. I /think/ add() is idempotent as well in cases where you re-add an IP to a peer, so there's some symmetry here. Perhaps if there are use cases that need more stricter checks in the future we could introduce a new flag to return an error code in this case.
As I mentioned above, you need to do the dance of:
ret = -EOPNOTSUPP; if (flags & ~__WGALLOWEDIP_F_ALL) goto out;
So that we can safely extend this later.
Good idea. I will add this in v3 as well.
We get 100 chars now, so you can rewrite this as:
ret = wg_allowedips_remove_v4(&peer->device->peer_allowedips, nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, &peer->device->device_update_lock);
Nice, will do.
That comma should be a semicolon because what comes after is a complete sentence, and there's no conjunction.
Good point. I think we might actually need a comma /after/ otherwise: "...; otherwise, ...": "WGALLOWEDIP_F_REMOVE_ME if the specified IP should be removed; otherwise, this IP will be added if it is not already present".
I'm not so keen on this, simply because we've been able to do everything else in that script and keeping with the "make sure the userspace tooling" paradigm. There are two options:
Rewrite netns.sh all in C, only depending on libnl or whatever (which I would actually really *love* to see happen). This would change the testing paradigm, but I'd be okay with that if it's done well and all at once.
Add support for this new flag to wg(8) (which I think is necessary anyway for this to land; kernel features and userspace support oughta be posted at once).
Thanks for the patch. I like the feature and I'm happy you posted this.
Option 1 seems like a heavy lift for this patch. I think option 2 makes more sense, as long as there is an understanding that netns.sh needs to be run with the latest and greatest version of wg in order for all tests to pass. Maybe we can add a version check to selectively disable the remove IP tests if wg is on an older version. I agree though that long term option 1 would be nice, as it provides a finer level of control and tests could be run without as many external dependencies. I can get rid of the remove-ip cruft and send two patches to the wireguard mailing list if that works, one for the kernel and one for wireguard-tools.
However, this raises some questions about the wg interface itself which would be used both by netns.sh and end users. Looking at the current interface for wg, the way to configure allowed IPs currently is "wg set". For example,
wg set peer ABCD... allowed-ips 192.168.0.24/32,192.168.0.44/32,192.168.0.88/32
The intended effect is to completely replace the allowed IPs for that peer rather than to make an incremental change. I think we'll need to extend the interface a bit. There are a few directions we can take here:
1. Add a new flag to "wg set" like --incremental in which case it won't use WGPEER_F_REPLACE_ALLOWEDIPS under the hood. Support addition or removal of allowed IPs with a "-" suffix on CIDRs you want to remove (192.168.0.24/32-,192.168.0.44/32-,192.168.0.88/32). 2. Same as 1 but with a new argument name, allowed-ips-diff instead of the --incremental flag. This appears more in line with the current style of wg's arguments.
There are a lot more variations here, so I wanted to get your input here before just picking a direction. Thanks again for the thorough feedback!
-Jordan
I think the challenge with WGALLOWEDIP_A_FLAGS in particular is that because it didn't exist since the beginning like WGPEER_A_FLAGS, there are kernels out there that have no knowledge of it and wouldn't have this check in place. While I think it's a good idea to replicate this check for WGALLOWEDIP_A_FLAGS as well for future compatibility, we still need some way for clients to probe whether or not this feature is supported in case they're running on an older kernel. If we want to keep the version number as-is, I see a few alternatives:
Forget about all of this actually. I was under the mistaken impression that an unrecognized attribute would be silently ignored by an older kernel, but it seems that validation is strict.
if (attrs[WGPEER_A_ALLOWEDIPS]) { struct nlattr *attr, *allowedip[WGALLOWEDIP_A_MAX + 1]; int rem;
nla_for_each_nested(attr, attrs[WGPEER_A_ALLOWEDIPS], rem) { ret = nla_parse_nested(allowedip, WGALLOWEDIP_A_MAX, attr, allowedip_policy, NULL); if (ret < 0) goto out; ret = set_allowedip(peer, allowedip); if (ret < 0) goto out; } }
nla_parse_nested() uses NL_VALIDATE_STRICT which sets NL_VALIDATE_MAXTYPE, causing __nla_validate_parse() in this case to check that no attribute types are greater than WGALLOWEDIP_A_MAX.
The WG_CMD_SET_DEVICE operation simply returns EINVAL if you try to use WGALLOWEDIP_A_FLAGS on a kernel that doesn't support it. I tested this using a patched version of wg that sets the WGALLOWEDIP_F_REMOVE_ME attribute when using an argument I added called "allowed-ips-patch".
Kernel With WGALLOWEDIP_A_FLAGS ================================== jordan@t14:~/code/wireguard-tools/src$ sudo ./wg set wg0 peer xK7O/YnTb8W/fgPA4dwAshEV06rMPMqqmy3zZN0NPS4= allowed-ips-patch 192.168.0.3/32 jordan@t14:~/code/wireguard-tools/src$ sudo ./wg interface: wg0
peer: xK7O/YnTb8W/fgPA4dwAshEV06rMPMqqmy3zZN0NPS4= allowed ips: 192.168.0.3/32 jordan@t14:~/code/wireguard-tools/src$ sudo ./wg set wg0 peer xK7O/YnTb8W/fgPA4dwAshEV06rMPMqqmy3zZN0NPS4= allowed-ips-patch -192.168.0.3/32 jordan@t14:~/code/wireguard-tools/src$ sudo ./wg interface: wg0
peer: xK7O/YnTb8W/fgPA4dwAshEV06rMPMqqmy3zZN0NPS4= allowed ips: (none) jordan@t14:~/code/wireguard-tools/src$
Kernel Without WGALLOWEDIP_A_FLAGS ================================== jordan@t14:~/code/wireguard-tools/src$ sudo ./wg set wg0 peer xK7O/YnTb8W/fgPA4dwAshEV06rMPMqqmy3zZN0NPS4= allowed-ips-patch 192.168.0.3/32 jordan@t14:~/code/wireguard-tools/src$ sudo ./wg interface: wg0
peer: xK7O/YnTb8W/fgPA4dwAshEV06rMPMqqmy3zZN0NPS4= allowed ips: 192.168.0.3/32 jordan@t14:~/code/wireguard-tools/src$ sudo ./wg set wg0 peer xK7O/YnTb8W/fgPA4dwAshEV06rMPMqqmy3zZN0NPS4= allowed-ips-patch -192.168.0.3/32 Unable to modify interface: Invalid argument jordan@t14:~/code/wireguard-tools/src$
The second command fails with "Invalid argument" (EINVAL) on the unpatched kernel. This simplifies things, as there's no need for clients to explicitly probe to see if this attribute is supported. I will do the following:
1. Revert WG_GENL_VERSION back to 1. 2. Add a check for new flags similar to the one you mentioned for WGPEER_A_FLAGS.
if (attrs[WGPEER_A_FLAGS]) flags = nla_get_u32(attrs[WGPEER_A_FLAGS]); ret = -EOPNOTSUPP; if (flags & ~__WGPEER_F_ALL) goto out;
This should be sufficient. We might want to consider how best to bubble this error up to users. In the case of wg, "Invalid argument" may not be very helpful in determining where you went wrong. We could always detect when EINVAL is returned in response to an operation that sets WGALLOWEDIP_A_FLAGS and print something more helpful like "Operation not supported on this kernel". However, these are details that can be worked out.
Sorry for the confusion!
-Jordan
On Wed, 27 Nov 2024 23:21:33 +0000 Jordan Rife wrote:
The second command fails with "Invalid argument" (EINVAL) on the unpatched kernel. This simplifies things, as there's no need for clients to explicitly probe to see if this attribute is supported. I will do the following:
Revert WG_GENL_VERSION back to 1.
Add a check for new flags similar to the one you mentioned for WGPEER_A_FLAGS.
if (attrs[WGPEER_A_FLAGS]) flags = nla_get_u32(attrs[WGPEER_A_FLAGS]); ret = -EOPNOTSUPP; if (flags & ~__WGPEER_F_ALL) goto out;
This should be sufficient. We might want to consider how best to bubble this error up to users. In the case of wg, "Invalid argument" may not be very helpful in determining where you went wrong. We could always detect when EINVAL is returned in response to an operation that sets WGALLOWEDIP_A_FLAGS and print something more helpful like "Operation not supported on this kernel". However, these are details that can be worked out.
Better still use NLA_POLICY_MASK() so that nla_parse_nested() can perform the validation and attach a machine readable info about the failure.
Better still use NLA_POLICY_MASK() so that nla_parse_nested() can perform the validation and attach a machine readable info about the failure.
This is definitely cleaner for the new WGALLOWEDIP_A_FLAGS parameter. Thanks for the suggestion.
Applying this to WGPEER_A_FLAGS would simplify the existing validation logic as well, although I think it changes the error code returned if a user provides an invalid flag from EOPNOTSUPP to EINVAL. I'm not sure if there's anything relying on this behavior. I'll let Jason make the final call there.
-Jordan
linux-kselftest-mirror@lists.linaro.org