This is similar to TCP MD5 in functionality but it's sufficiently different that wire formats are incompatible. Compared to TCP-MD5 more algorithms are supported and multiple keys can be used on the same connection but there is still no negotiation mechanism.
Expected use-case is protecting long-duration BGP/LDP connections between routers using pre-shared keys. The goal of this series is to allow routers using the linux TCP stack to interoperate with vendors such as Cisco and Juniper.
Both algorithms described in RFC5926 are implemented but the code is not very easily extensible beyond that. In particular there are several code paths making stack allocations based on RFC5926 maximum, those would have to be increased.
This version implements SNE and l3mdev awareness and adds more tests. Here are some known flaws and limitations:
* Interaction with TCP-MD5 not tested in all corners * Interaction with FASTOPEN not tested and unlikely to work because sequence number assumptions for syn/ack. * Not clear if crypto_shash_setkey might sleep. If some implementation do that then maybe they could be excluded through alloc flags. * Traffic key is not cached (reducing performance) * User is responsible for ensuring keys do not overlap. * There is no useful way to list keys, making userspace debug difficult. * There is no prefixlen support equivalent to md5. This is used in some complex FRR configs.
Test suite was added to tools/selftests/tcp_authopt. Tests are written in python using pytest and scapy and check the API in some detail and validate packet captures. Python code is already used in linux and in kselftests but virtualenvs not very much, this particular test suite uses `pip` to create a private virtualenv and hide dependencies.
This actually forms the bulk of the series by raw line-count. Since there is a lot of code it was mostly split on "functional area" so most files are only affected by a single code. A lot of those tests are relevant to TCP-MD5 so perhaps it might help to split into a separate series?
Some testing support is included in nettest and fcnal-test.sh, similar to the current level of tcp-md5 testing.
SNE was tested by creating connections in a loop until a large SEQ is randomly selected and then making it rollover. The "connect in a loop" step ran into timewait overflow and connection failure on port reuse. After spending some time on this issue and my conclusion is that AO makes it impossible to kill remainders of old connections in a manner similar to unsigned or md5sig, this is because signatures are dependent on ISNs. This means that if a timewait socket is closed improperly then information required to RST the peer is lost.
The fact that AO completely breaks all connection-less RSTs is acknowledged in the RFC and the workaround of "respect timewait" seems acceptable.
Changes for frr (old): https://github.com/FRRouting/frr/pull/9442 That PR was made early for ABI feedback, it has many issues.
Changes for yabgp (old): https://github.com/cdleonard/yabgp/commits/tcp_authopt This can be use for easy interoperability testing with cisco/juniper/etc.
Changes since PATCH v1: * Implement Sequence Number Extension * Implement l3index for vrf: TCP_AUTHOPT_KEY_IFINDEX as equivalent of TCP_MD5SIG_FLAG_IFINDEX * Expand TCP-AO tests in fcnal-test.sh to near-parity with md5. * Show addr/port on failure similar to md5 * Remove tox dependency from test suite (create venv directly) * Switch default pytest output format to TAP (kselftest standard) * Fix _copy_from_sockptr_tolerant stack corruption on short sockopts. This was covered in test but error was invisible without STACKPROTECTOR=y * Fix sysctl_tcp_authopt check in tcp_get_authopt_val before memset. This was harmless because error code is checked in getsockopt anyway. * Fix dropping md5 packets on all sockets with AO enabled * Fix checking (key->recv_id & TCP_AUTHOPT_KEY_ADDR_BIND) instead of key->flags in tcp_authopt_key_match_exact * Fix PATCH 1/19 not compiling due to missing "int err" declaration * Add ratelimited message for AO and MD5 both present * Export all symbols required by CONFIG_IPV6=m (again) * Fix compilation with CONFIG_TCP_AUTHOPT=y CONFIG_TCP_MD5SIG=n * Fix checkpatch issues * Pass -rrequirements.txt to tox to avoid dependency variation. Link: https://lore.kernel.org/netdev/cover.1632240523.git.cdleonard@gmail.com/
Changes since RFCv3: * Implement TCP_AUTHOPT handling for timewait and reset replies. Write tests to execute these paths by injecting packets with scapy * Handle combining md5 and authopt: if both are configured use authopt. * Fix locking issues around send_key, introduced in on of the later patches. * Handle IPv4-mapped-IPv6 addresses: it used to be that an ipv4 SYN sent to an ipv6 socket with TCP-AO triggered WARN * Implement un-namespaced sysctl disabled this feature by default * Allocate new key before removing any old one in setsockopt (Dmitry) * Remove tcp_authopt_key_info.local_id because it's no longer used (Dmitry) * Propagate errors from TCP_AUTHOPT getsockopt (Dmitry) * Fix no-longer-correct TCP_AUTHOPT_KEY_DEL docs (Dmitry) * Simplify crypto allocation (Eric) * Use kzmalloc instead of __GFP_ZERO (Eric) * Add static_key_false tcp_authopt_needed (Eric) * Clear authopt_info copied from oldsk in __tcp_authopt_openreq (Eric) * Replace memcmp in ipv4 and ipv6 addr comparisons (Eric) * Export symbols for CONFIG_IPV6=m (kernel test robot) * Mark more functions static (kernel test robot) * Fix build with CONFIG_PROVE_RCU_LIST=y (kernel test robot) Link: https://lore.kernel.org/netdev/cover.1629840814.git.cdleonard@gmail.com/
Changes since RFCv2: * Removed local_id from ABI and match on send_id/recv_id/addr * Add all relevant out-of-tree tests to tools/testing/selftests * Return an error instead of ignoring unknown flags, hopefully this makes it easier to extend. * Check sk_family before __tcp_authopt_info_get_or_create in tcp_set_authopt_key * Use sock_owned_by_me instead of WARN_ON(!lockdep_sock_is_held(sk)) * Fix some intermediate build failures reported by kbuild robot * Improve documentation Link: https://lore.kernel.org/netdev/cover.1628544649.git.cdleonard@gmail.com/
Changes since RFC: * Split into per-topic commits for ease of review. The intermediate commits compile with a few "unused function" warnings and don't do anything useful by themselves. * Add ABI documention including kernel-doc on uapi * Fix lockdep warnings from crypto by creating pools with one shash for each cpu * Accept short options to setsockopt by padding with zeros; this approach allows increasing the size of the structs in the future. * Support for aes-128-cmac-96 * Support for binding addresses to keys in a way similar to old tcp_md5 * Add support for retrieving received keyid/rnextkeyid and controling the keyid/rnextkeyid being sent. Link: https://lore.kernel.org/netdev/01383a8751e97ef826ef2adf93bfde3a08195a43.1626...
Leonard Crestez (25): tcp: authopt: Initial support and key management docs: Add user documentation for tcp_authopt selftests: Initial tcp_authopt test module selftests: tcp_authopt: Initial sockopt manipulation tcp: authopt: Add crypto initialization tcp: authopt: Compute packet signatures tcp: Use BIT() for OPTION_* constants tcp: authopt: Hook into tcp core tcp: authopt: Disable via sysctl by default selftests: tcp_authopt: Test key address binding tcp: authopt: Implement Sequence Number Extension tcp: ipv6: Add AO signing for tcp_v6_send_response tcp: authopt: Add support for signing skb-less replies tcp: ipv4: Add AO signing for skb-less replies selftests: tcp_authopt: Implement SNE in python selftests: tcp_authopt: Add scapy-based packet signing code selftests: tcp_authopt: Add packet-level tests selftests: tcp_authopt: Initial sne test tcp: authopt: Add key selection controls selftests: tcp_authopt: Add tests for rollover tcp: authopt: Add initial l3index support selftests: tcp_authopt: Initial tests for l3mdev handling selftests: nettest: Rename md5_prefix to key_addr_prefix selftests: nettest: Initial tcp_authopt support selftests: net/fcnal: Initial tcp_authopt support
Documentation/networking/index.rst | 1 + Documentation/networking/ip-sysctl.rst | 6 + Documentation/networking/tcp_authopt.rst | 69 + include/linux/tcp.h | 9 + include/net/tcp.h | 1 + include/net/tcp_authopt.h | 271 +++ include/uapi/linux/snmp.h | 1 + include/uapi/linux/tcp.h | 123 ++ net/ipv4/Kconfig | 14 + net/ipv4/Makefile | 1 + net/ipv4/proc.c | 1 + net/ipv4/sysctl_net_ipv4.c | 10 + net/ipv4/tcp.c | 30 + net/ipv4/tcp_authopt.c | 1617 +++++++++++++++++ net/ipv4/tcp_input.c | 18 + net/ipv4/tcp_ipv4.c | 104 +- net/ipv4/tcp_minisocks.c | 12 + net/ipv4/tcp_output.c | 100 +- net/ipv6/tcp_ipv6.c | 60 +- tools/testing/selftests/net/fcnal-test.sh | 249 +++ tools/testing/selftests/net/nettest.c | 123 +- tools/testing/selftests/tcp_authopt/Makefile | 10 + .../testing/selftests/tcp_authopt/README.rst | 18 + tools/testing/selftests/tcp_authopt/config | 6 + .../selftests/tcp_authopt/requirements.txt | 46 + tools/testing/selftests/tcp_authopt/run.sh | 31 + tools/testing/selftests/tcp_authopt/settings | 1 + tools/testing/selftests/tcp_authopt/setup.cfg | 35 + tools/testing/selftests/tcp_authopt/setup.py | 6 + .../tcp_authopt/tcp_authopt_test/__init__.py | 0 .../tcp_authopt/tcp_authopt_test/conftest.py | 71 + .../full_tcp_sniff_session.py | 91 + .../tcp_authopt_test/linux_tcp_authopt.py | 285 +++ .../tcp_authopt_test/linux_tcp_md5sig.py | 110 ++ .../tcp_authopt_test/linux_tcp_repair.py | 67 + .../tcp_authopt_test/netns_fixture.py | 85 + .../tcp_authopt_test/scapy_conntrack.py | 173 ++ .../tcp_authopt_test/scapy_tcp_authopt.py | 220 +++ .../tcp_authopt_test/scapy_utils.py | 177 ++ .../tcp_authopt/tcp_authopt_test/server.py | 124 ++ .../tcp_authopt/tcp_authopt_test/sne_alg.py | 111 ++ .../tcp_authopt/tcp_authopt_test/sockaddr.py | 122 ++ .../tcp_connection_fixture.py | 276 +++ .../tcp_authopt/tcp_authopt_test/test_bind.py | 155 ++ .../tcp_authopt_test/test_rollover.py | 181 ++ .../tcp_authopt/tcp_authopt_test/test_sne.py | 202 ++ .../tcp_authopt_test/test_sne_alg.py | 96 + .../tcp_authopt_test/test_sockopt.py | 203 +++ .../tcp_authopt_test/test_vectors.py | 365 ++++ .../tcp_authopt_test/test_verify_capture.py | 559 ++++++ .../tcp_authopt_test/test_vrf_bind.py | 492 +++++ .../tcp_authopt/tcp_authopt_test/utils.py | 114 ++ .../tcp_authopt/tcp_authopt_test/validator.py | 138 ++ .../tcp_authopt_test/vrf_netns_fixture.py | 127 ++ 54 files changed, 7471 insertions(+), 46 deletions(-) create mode 100644 Documentation/networking/tcp_authopt.rst create mode 100644 include/net/tcp_authopt.h create mode 100644 net/ipv4/tcp_authopt.c create mode 100644 tools/testing/selftests/tcp_authopt/Makefile create mode 100644 tools/testing/selftests/tcp_authopt/README.rst create mode 100644 tools/testing/selftests/tcp_authopt/config create mode 100644 tools/testing/selftests/tcp_authopt/requirements.txt create mode 100755 tools/testing/selftests/tcp_authopt/run.sh create mode 100644 tools/testing/selftests/tcp_authopt/settings create mode 100644 tools/testing/selftests/tcp_authopt/setup.cfg create mode 100644 tools/testing/selftests/tcp_authopt/setup.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/__init__.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/conftest.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/full_tcp_sniff_session.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_md5sig.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_repair.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/netns_fixture.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_conntrack.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_tcp_authopt.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_utils.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/server.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/sne_alg.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/sockaddr.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/tcp_connection_fixture.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_bind.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_rollover.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne_alg.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sockopt.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vectors.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_verify_capture.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vrf_bind.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/utils.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/validator.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/vrf_netns_fixture.py
base-commit: d4a07dc5ac34528f292a4f328cf3c65aba312e1b
This commit adds support to add and remove keys but does not use them further.
Similar to tcp md5 a single pointer to a struct tcp_authopt_info* struct is added to struct tcp_sock, this avoids increasing memory usage. The data structures related to tcp_authopt are initialized on setsockopt and only freed on socket close.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/linux/tcp.h | 9 ++ include/net/tcp.h | 1 + include/net/tcp_authopt.h | 76 +++++++++++ include/uapi/linux/tcp.h | 81 ++++++++++++ net/ipv4/Kconfig | 14 ++ net/ipv4/Makefile | 1 + net/ipv4/tcp.c | 30 +++++ net/ipv4/tcp_authopt.c | 263 ++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp_ipv4.c | 2 + 9 files changed, 477 insertions(+) create mode 100644 include/net/tcp_authopt.h create mode 100644 net/ipv4/tcp_authopt.c
diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 48d8a363319e..50038f35ba51 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -140,10 +140,12 @@ struct tcp_request_sock { static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req) { return (struct tcp_request_sock *)req; }
+struct tcp_authopt_info; + struct tcp_sock { /* inet_connection_sock has to be the first member of tcp_sock */ struct inet_connection_sock inet_conn; u16 tcp_header_len; /* Bytes of tcp header to send */ u16 gso_segs; /* Max number of segs per GSO packet */ @@ -403,10 +405,14 @@ struct tcp_sock {
/* TCP MD5 Signature Option information */ struct tcp_md5sig_info __rcu *md5sig_info; #endif
+#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_info __rcu *authopt_info; +#endif + /* TCP fastopen related information */ struct tcp_fastopen_request *fastopen_req; /* fastopen_rsk points to request_sock that resulted in this big * socket. Used to retransmit SYNACKs etc. */ @@ -453,10 +459,13 @@ struct tcp_timewait_sock { int tw_ts_recent_stamp; u32 tw_tx_delay; #ifdef CONFIG_TCP_MD5SIG struct tcp_md5sig_key *tw_md5_key; #endif +#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_info *tw_authopt_info; +#endif };
static inline struct tcp_timewait_sock *tcp_twsk(const struct sock *sk) { return (struct tcp_timewait_sock *)sk; diff --git a/include/net/tcp.h b/include/net/tcp.h index 8e8c5922a7b0..620bd2c9250b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -184,10 +184,11 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); #define TCPOPT_WINDOW 3 /* Window scaling */ #define TCPOPT_SACK_PERM 4 /* SACK Permitted */ #define TCPOPT_SACK 5 /* SACK Block */ #define TCPOPT_TIMESTAMP 8 /* Better RTT estimations/PAWS */ #define TCPOPT_MD5SIG 19 /* MD5 Signature (RFC2385) */ +#define TCPOPT_AUTHOPT 29 /* Auth Option (RFC5925) */ #define TCPOPT_MPTCP 30 /* Multipath TCP (RFC6824) */ #define TCPOPT_FASTOPEN 34 /* Fast open (RFC7413) */ #define TCPOPT_EXP 254 /* Experimental */ /* Magic number to be after the option value for sharing TCP * experimental options. See draft-ietf-tcpm-experimental-options-00.txt diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h new file mode 100644 index 000000000000..42ad764e98c2 --- /dev/null +++ b/include/net/tcp_authopt.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_TCP_AUTHOPT_H +#define _LINUX_TCP_AUTHOPT_H + +#include <uapi/linux/tcp.h> + +/** + * struct tcp_authopt_key_info - Representation of a Master Key Tuple as per RFC5925 + * + * Key structure lifetime is only protected by RCU so readers needs to hold a + * single rcu_read_lock until they're done with the key. + */ +struct tcp_authopt_key_info { + /** @node: node in &tcp_authopt_info.head list */ + struct hlist_node node; + /** @rcu: for kfree_rcu */ + struct rcu_head rcu; + /** @flags: Combination of &enum tcp_authopt_key_flag */ + u32 flags; + /** @send_id: Same as &tcp_authopt_key.send_id */ + u8 send_id; + /** @recv_id: Same as &tcp_authopt_key.recv_id */ + u8 recv_id; + /** @alg_id: Same as &tcp_authopt_key.alg */ + u8 alg_id; + /** @keylen: Same as &tcp_authopt_key.keylen */ + u8 keylen; + /** @key: Same as &tcp_authopt_key.key */ + u8 key[TCP_AUTHOPT_MAXKEYLEN]; + /** @addr: Same as &tcp_authopt_key.addr */ + struct sockaddr_storage addr; +}; + +/** + * struct tcp_authopt_info - Per-socket information regarding tcp_authopt + * + * This is lazy-initialized in order to avoid increasing memory usage for + * regular TCP sockets. Once created it is only destroyed on socket close. + */ +struct tcp_authopt_info { + /** @head: List of tcp_authopt_key_info */ + struct hlist_head head; + /** @rcu: for kfree_rcu */ + struct rcu_head rcu; + /** @flags: Combination of &enum tcp_authopt_key_flag */ + u32 flags; + /** @src_isn: Local Initial Sequence Number */ + u32 src_isn; + /** @dst_isn: Remote Initial Sequence Number */ + u32 dst_isn; +}; + +#ifdef CONFIG_TCP_AUTHOPT +void tcp_authopt_clear(struct sock *sk); +int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen); +int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key); +int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen); +#else +static inline int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) +{ + return -ENOPROTOOPT; +} +static inline int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key) +{ + return -ENOPROTOOPT; +} +static inline void tcp_authopt_clear(struct sock *sk) +{ +} +static inline int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) +{ + return -ENOPROTOOPT; +} +#endif + +#endif /* _LINUX_TCP_AUTHOPT_H */ diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 8fc09e8638b3..76d7be6b27f4 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -126,10 +126,12 @@ enum { #define TCP_INQ 36 /* Notify bytes available to read as a cmsg on read */
#define TCP_CM_INQ TCP_INQ
#define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */ +#define TCP_AUTHOPT 38 /* TCP Authentication Option (RFC5925) */ +#define TCP_AUTHOPT_KEY 39 /* TCP Authentication Option Key (RFC5925) */
#define TCP_REPAIR_ON 1 #define TCP_REPAIR_OFF 0 #define TCP_REPAIR_OFF_NO_WP -1 /* Turn off without window probes */ @@ -340,10 +342,89 @@ struct tcp_diag_md5sig { __u16 tcpm_keylen; __be32 tcpm_addr[4]; __u8 tcpm_key[TCP_MD5SIG_MAXKEYLEN]; };
+/** + * enum tcp_authopt_flag - flags for `tcp_authopt.flags` + */ +enum tcp_authopt_flag { + /** + * @TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED: + * Configure behavior of segments with TCP-AO coming from hosts for which no + * key is configured. The default recommended by RFC is to silently accept + * such connections. + */ + TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED = (1 << 2), +}; + +/** + * struct tcp_authopt - Per-socket options related to TCP Authentication Option + */ +struct tcp_authopt { + /** @flags: Combination of &enum tcp_authopt_flag */ + __u32 flags; +}; + +/** + * enum tcp_authopt_key_flag - flags for `tcp_authopt.flags` + * + * @TCP_AUTHOPT_KEY_DEL: Delete the key and ignore non-id fields + * @TCP_AUTHOPT_KEY_EXCLUDE_OPTS: Exclude TCP options from signature + * @TCP_AUTHOPT_KEY_ADDR_BIND: Key only valid for `tcp_authopt.addr` + */ +enum tcp_authopt_key_flag { + TCP_AUTHOPT_KEY_DEL = (1 << 0), + TCP_AUTHOPT_KEY_EXCLUDE_OPTS = (1 << 1), + TCP_AUTHOPT_KEY_ADDR_BIND = (1 << 2), +}; + +/** + * enum tcp_authopt_alg - Algorithms for TCP Authentication Option + */ +enum tcp_authopt_alg { + /** @TCP_AUTHOPT_ALG_HMAC_SHA_1_96: HMAC-SHA-1-96 as described in RFC5926 */ + TCP_AUTHOPT_ALG_HMAC_SHA_1_96 = 1, + /** @TCP_AUTHOPT_ALG_AES_128_CMAC_96: AES-128-CMAC-96 as described in RFC5926 */ + TCP_AUTHOPT_ALG_AES_128_CMAC_96 = 2, +}; + +/* for TCP_AUTHOPT_KEY socket option */ +#define TCP_AUTHOPT_MAXKEYLEN 80 + +/** + * struct tcp_authopt_key - TCP Authentication KEY + * + * Key are identified by the combination of: + * - send_id + * - recv_id + * - addr (iff TCP_AUTHOPT_KEY_ADDR_BIND) + * + * RFC5925 requires that key ids must not overlap for the same TCP connection. + * This is not enforced by linux. + */ +struct tcp_authopt_key { + /** @flags: Combination of &enum tcp_authopt_key_flag */ + __u32 flags; + /** @send_id: keyid value for send */ + __u8 send_id; + /** @recv_id: keyid value for receive */ + __u8 recv_id; + /** @alg: One of &enum tcp_authopt_alg */ + __u8 alg; + /** @keylen: Length of the key buffer */ + __u8 keylen; + /** @key: Secret key */ + __u8 key[TCP_AUTHOPT_MAXKEYLEN]; + /** + * @addr: Key is only valid for this address + * + * Ignored unless TCP_AUTHOPT_KEY_ADDR_BIND flag is set + */ + struct __kernel_sockaddr_storage addr; +}; + /* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
#define TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT 0x1 struct tcp_zerocopy_receive { __u64 address; /* in: address of mapping */ diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index 87983e70f03f..6459f4ea6f1d 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -740,5 +740,19 @@ config TCP_MD5SIG RFC2385 specifies a method of giving MD5 protection to TCP sessions. Its main (only?) use is to protect BGP sessions between core routers on the Internet.
If unsure, say N. + +config TCP_AUTHOPT + bool "TCP: Authentication Option support (RFC5925)" + select CRYPTO + select CRYPTO_SHA1 + select CRYPTO_HMAC + select CRYPTO_AES + select CRYPTO_CMAC + help + RFC5925 specifies a new method of giving protection to TCP sessions. + Its intended use is to protect BGP sessions between core routers + on the Internet. It obsoletes TCP MD5 (RFC2385) but is incompatible. + + If unsure, say N. diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index bbdd9c44f14e..d336f32ce177 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -59,10 +59,11 @@ obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o obj-$(CONFIG_TCP_CONG_YEAH) += tcp_yeah.o obj-$(CONFIG_TCP_CONG_ILLINOIS) += tcp_illinois.o +obj-$(CONFIG_TCP_AUTHOPT) += tcp_authopt.o obj-$(CONFIG_NET_SOCK_MSG) += tcp_bpf.o obj-$(CONFIG_BPF_SYSCALL) += udp_bpf.o obj-$(CONFIG_NETLABEL) += cipso_ipv4.o
obj-$(CONFIG_XFRM) += xfrm4_policy.o xfrm4_state.o xfrm4_input.o \ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a7b1138d619c..d303fb84802a 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -271,10 +271,11 @@
#include <net/icmp.h> #include <net/inet_common.h> #include <net/tcp.h> #include <net/mptcp.h> +#include <net/tcp_authopt.h> #include <net/xfrm.h> #include <net/ip.h> #include <net/sock.h>
#include <linux/uaccess.h> @@ -3552,10 +3553,16 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, case TCP_MD5SIG: case TCP_MD5SIG_EXT: err = tp->af_specific->md5_parse(sk, optname, optval, optlen); break; #endif + case TCP_AUTHOPT: + err = tcp_set_authopt(sk, optval, optlen); + break; + case TCP_AUTHOPT_KEY: + err = tcp_set_authopt_key(sk, optval, optlen); + break; case TCP_USER_TIMEOUT: /* Cap the max time in ms TCP will retry or probe the window * before giving up and aborting (ETIMEDOUT) a connection. */ if (val < 0) @@ -4198,10 +4205,33 @@ static int do_tcp_getsockopt(struct sock *sk, int level, if (!err && copy_to_user(optval, &zc, len)) err = -EFAULT; return err; } #endif +#ifdef CONFIG_TCP_AUTHOPT + case TCP_AUTHOPT: { + struct tcp_authopt info; + int err; + + if (get_user(len, optlen)) + return -EFAULT; + + lock_sock(sk); + err = tcp_get_authopt_val(sk, &info); + release_sock(sk); + + if (err) + return err; + len = min_t(unsigned int, len, sizeof(info)); + if (put_user(len, optlen)) + return -EFAULT; + if (copy_to_user(optval, &info, len)) + return -EFAULT; + return 0; + } +#endif + default: return -ENOPROTOOPT; }
if (put_user(len, optlen)) diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c new file mode 100644 index 000000000000..c412a712f229 --- /dev/null +++ b/net/ipv4/tcp_authopt.c @@ -0,0 +1,263 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +#include <linux/kernel.h> +#include <net/tcp.h> +#include <net/tcp_authopt.h> +#include <crypto/hash.h> + +/* checks that ipv4 or ipv6 addr matches. */ +static bool ipvx_addr_match(struct sockaddr_storage *a1, + struct sockaddr_storage *a2) +{ + if (a1->ss_family != a2->ss_family) + return false; + if (a1->ss_family == AF_INET && + (((struct sockaddr_in *)a1)->sin_addr.s_addr != + ((struct sockaddr_in *)a2)->sin_addr.s_addr)) + return false; + if (a1->ss_family == AF_INET6 && + !ipv6_addr_equal(&((struct sockaddr_in6 *)a1)->sin6_addr, + &((struct sockaddr_in6 *)a2)->sin6_addr)) + return false; + return true; +} + +static bool tcp_authopt_key_match_exact(struct tcp_authopt_key_info *info, + struct tcp_authopt_key *key) +{ + if (info->send_id != key->send_id) + return false; + if (info->recv_id != key->recv_id) + return false; + if ((info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) != (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND)) + return false; + if (info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) + if (!ipvx_addr_match(&info->addr, &key->addr)) + return false; + + return true; +} + +static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct sock *sk, + struct tcp_authopt_info *info, + struct tcp_authopt_key *ukey) +{ + struct tcp_authopt_key_info *key_info; + + hlist_for_each_entry_rcu(key_info, &info->head, node, lockdep_sock_is_held(sk)) + if (tcp_authopt_key_match_exact(key_info, ukey)) + return key_info; + + return NULL; +} + +static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct tcp_authopt_info *info; + + info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); + if (info) + return info; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return ERR_PTR(-ENOMEM); + + sk_nocaps_add(sk, NETIF_F_GSO_MASK); + INIT_HLIST_HEAD(&info->head); + rcu_assign_pointer(tp->authopt_info, info); + + return info; +} + +#define TCP_AUTHOPT_KNOWN_FLAGS ( \ + TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED) + +/* Like copy_from_sockopt except tolerate different optlen for compatibility reasons + * + * If the src is shorter then it's from an old userspace and the rest of dst is + * filled with zeros. + * + * If the dst is shorter then src is from a newer userspace and we only accept + * if the rest of the option is all zeros. + * + * This allows sockopts to grow as long as for new fields zeros has no effect. + */ +static int _copy_from_sockptr_tolerant(u8 *dst, + unsigned int dstlen, + sockptr_t src, + unsigned int srclen) +{ + int err; + + /* If userspace optlen is too short fill the rest with zeros */ + if (srclen > dstlen) { + if (sockptr_is_kernel(src)) + return -EINVAL; + err = check_zeroed_user(src.user + dstlen, srclen - dstlen); + if (err < 0) + return err; + if (err == 0) + return -EINVAL; + } + err = copy_from_sockptr(dst, src, min(srclen, dstlen)); + if (err) + return err; + if (srclen < dstlen) + memset(dst + srclen, 0, dstlen - srclen); + + return err; +} + +int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) +{ + struct tcp_authopt opt; + struct tcp_authopt_info *info; + int err; + + sock_owned_by_me(sk); + + err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); + if (err) + return err; + + if (opt.flags & ~TCP_AUTHOPT_KNOWN_FLAGS) + return -EINVAL; + + info = __tcp_authopt_info_get_or_create(sk); + if (IS_ERR(info)) + return PTR_ERR(info); + + info->flags = opt.flags & TCP_AUTHOPT_KNOWN_FLAGS; + + return 0; +} + +int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct tcp_authopt_info *info; + + sock_owned_by_me(sk); + + memset(opt, 0, sizeof(*opt)); + info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); + if (!info) + return -ENOENT; + + opt->flags = info->flags & TCP_AUTHOPT_KNOWN_FLAGS; + + return 0; +} + +/* Free key nicely, for living sockets */ +static void tcp_authopt_key_del(struct sock *sk, + struct tcp_authopt_info *info, + struct tcp_authopt_key_info *key) +{ + sock_owned_by_me(sk); + hlist_del_rcu(&key->node); + atomic_sub(sizeof(*key), &sk->sk_omem_alloc); + kfree_rcu(key, rcu); +} + +/* Free info and keys. + * Don't touch tp->authopt_info, it might not even be assigned yes. + */ +void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info) +{ + struct hlist_node *n; + struct tcp_authopt_key_info *key; + + hlist_for_each_entry_safe(key, n, &info->head, node) { + /* sk is NULL for timewait case + * struct timewait_sock doesn't track sk_omem_alloc + */ + if (sk) + atomic_sub(sizeof(*key), &sk->sk_omem_alloc); + hlist_del_rcu(&key->node); + kfree_rcu(key, rcu); + } + kfree_rcu(info, rcu); +} + +/* free everything and clear tcp_sock.authopt_info to NULL */ +void tcp_authopt_clear(struct sock *sk) +{ + struct tcp_authopt_info *info; + + info = rcu_dereference_protected(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk)); + if (info) { + tcp_authopt_free(sk, info); + tcp_sk(sk)->authopt_info = NULL; + } +} + +#define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \ + TCP_AUTHOPT_KEY_DEL | \ + TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \ + TCP_AUTHOPT_KEY_ADDR_BIND) + +int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) +{ + struct tcp_authopt_key opt; + struct tcp_authopt_info *info; + struct tcp_authopt_key_info *key_info, *old_key_info; + int err; + + sock_owned_by_me(sk); + + err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); + if (err) + return err; + + if (opt.flags & ~TCP_AUTHOPT_KEY_KNOWN_FLAGS) + return -EINVAL; + + if (opt.keylen > TCP_AUTHOPT_MAXKEYLEN) + return -EINVAL; + + /* Delete is a special case: */ + if (opt.flags & TCP_AUTHOPT_KEY_DEL) { + info = rcu_dereference_check(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk)); + if (!info) + return -ENOENT; + key_info = tcp_authopt_key_lookup_exact(sk, info, &opt); + if (!key_info) + return -ENOENT; + tcp_authopt_key_del(sk, info, key_info); + return 0; + } + + /* check key family */ + if (opt.flags & TCP_AUTHOPT_KEY_ADDR_BIND) { + if (sk->sk_family != opt.addr.ss_family) + return -EINVAL; + } + + /* Initialize tcp_authopt_info if not already set */ + info = __tcp_authopt_info_get_or_create(sk); + if (IS_ERR(info)) + return PTR_ERR(info); + + key_info = sock_kmalloc(sk, sizeof(*key_info), GFP_KERNEL | __GFP_ZERO); + if (!key_info) + return -ENOMEM; + /* If an old key exists with exact ID then remove and replace. + * RCU-protected readers might observe both and pick any. + */ + old_key_info = tcp_authopt_key_lookup_exact(sk, info, &opt); + if (old_key_info) + tcp_authopt_key_del(sk, info, old_key_info); + key_info->flags = opt.flags & TCP_AUTHOPT_KEY_KNOWN_FLAGS; + key_info->send_id = opt.send_id; + key_info->recv_id = opt.recv_id; + key_info->alg_id = opt.alg; + key_info->keylen = opt.keylen; + memcpy(key_info->key, opt.key, opt.keylen); + memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr)); + hlist_add_head_rcu(&key_info->node, &info->head); + + return 0; +} diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 13d868c43284..0c9f050fa0e8 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -60,10 +60,11 @@
#include <net/net_namespace.h> #include <net/icmp.h> #include <net/inet_hashtables.h> #include <net/tcp.h> +#include <net/tcp_authopt.h> #include <net/transp_v6.h> #include <net/ipv6.h> #include <net/inet_common.h> #include <net/timewait_sock.h> #include <net/xfrm.h> @@ -2276,10 +2277,11 @@ void tcp_v4_destroy_sock(struct sock *sk) tcp_clear_md5_list(sk); kfree_rcu(rcu_dereference_protected(tp->md5sig_info, 1), rcu); tp->md5sig_info = NULL; } #endif + tcp_authopt_clear(sk);
/* Clean up a referenced TCP bind bucket. */ if (inet_csk(sk)->icsk_bind_hash) inet_put_port(sk);
On 11/1/21 10:34 AM, Leonard Crestez wrote:
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c new file mode 100644 index 000000000000..c412a712f229 --- /dev/null +++ b/net/ipv4/tcp_authopt.c @@ -0,0 +1,263 @@ +// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/kernel.h> +#include <net/tcp.h> +#include <net/tcp_authopt.h> +#include <crypto/hash.h>
+/* checks that ipv4 or ipv6 addr matches. */ +static bool ipvx_addr_match(struct sockaddr_storage *a1,
struct sockaddr_storage *a2)
+{
- if (a1->ss_family != a2->ss_family)
return false;
- if (a1->ss_family == AF_INET &&
(((struct sockaddr_in *)a1)->sin_addr.s_addr !=
((struct sockaddr_in *)a2)->sin_addr.s_addr))
return false;
- if (a1->ss_family == AF_INET6 &&
!ipv6_addr_equal(&((struct sockaddr_in6 *)a1)->sin6_addr,
&((struct sockaddr_in6 *)a2)->sin6_addr))
return false;
The above 2 could just be
if (a1->ss_family == AF_INET) return (((struct sockaddr_in *)a1)->sin_addr.s_addr == ((struct sockaddr_in *)a2)->sin_addr.s_addr))
On 11/3/21 4:29 AM, David Ahern wrote:
On 11/1/21 10:34 AM, Leonard Crestez wrote:
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c new file mode 100644 index 000000000000..c412a712f229 --- /dev/null +++ b/net/ipv4/tcp_authopt.c @@ -0,0 +1,263 @@ +// SPDX-License-Identifier: GPL-2.0-or-later
+#include <linux/kernel.h> +#include <net/tcp.h> +#include <net/tcp_authopt.h> +#include <crypto/hash.h>
+/* checks that ipv4 or ipv6 addr matches. */ +static bool ipvx_addr_match(struct sockaddr_storage *a1,
struct sockaddr_storage *a2)
+{
- if (a1->ss_family != a2->ss_family)
return false;
- if (a1->ss_family == AF_INET &&
(((struct sockaddr_in *)a1)->sin_addr.s_addr !=
((struct sockaddr_in *)a2)->sin_addr.s_addr))
return false;
- if (a1->ss_family == AF_INET6 &&
!ipv6_addr_equal(&((struct sockaddr_in6 *)a1)->sin6_addr,
&((struct sockaddr_in6 *)a2)->sin6_addr))
return false;
The above 2 could just be
if (a1->ss_family == AF_INET) return (((struct sockaddr_in *)a1)->sin_addr.s_addr == ((struct sockaddr_in *)a2)->sin_addr.s_addr))
OK. The function is a little weird that it has a final "return true" which is technically also reachable if AF is unexpected but that situation is prevented from higher up.
-- Regards, Leonard
Hi Leonard,
On 11/1/21 16:34, Leonard Crestez wrote: [..]
+struct tcp_authopt_key {
- /** @flags: Combination of &enum tcp_authopt_key_flag */
- __u32 flags;
- /** @send_id: keyid value for send */
- __u8 send_id;
- /** @recv_id: keyid value for receive */
- __u8 recv_id;
- /** @alg: One of &enum tcp_authopt_alg */
- __u8 alg;
- /** @keylen: Length of the key buffer */
- __u8 keylen;
- /** @key: Secret key */
- __u8 key[TCP_AUTHOPT_MAXKEYLEN];
- /**
* @addr: Key is only valid for this address
*
* Ignored unless TCP_AUTHOPT_KEY_ADDR_BIND flag is set
*/
- struct __kernel_sockaddr_storage addr;
+};
[..]
+/* Free key nicely, for living sockets */ +static void tcp_authopt_key_del(struct sock *sk,
struct tcp_authopt_info *info,
struct tcp_authopt_key_info *key)
+{
- sock_owned_by_me(sk);
- hlist_del_rcu(&key->node);
- atomic_sub(sizeof(*key), &sk->sk_omem_alloc);
- kfree_rcu(key, rcu);
+}
[..]
+#define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \
- TCP_AUTHOPT_KEY_DEL | \
- TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \
- TCP_AUTHOPT_KEY_ADDR_BIND)
+int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) +{
[..]
- /* Delete is a special case: */
- if (opt.flags & TCP_AUTHOPT_KEY_DEL) {
info = rcu_dereference_check(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk));
if (!info)
return -ENOENT;
key_info = tcp_authopt_key_lookup_exact(sk, info, &opt);
if (!key_info)
return -ENOENT;
tcp_authopt_key_del(sk, info, key_info);
return 0;
I remember we discussed it in RFC, that removing a key that's currently in use may result in random MKT to be used.
I think, it's possible to make this API a bit more predictable if: - DEL command fails to remove a key that is current/receive_next; - opt.flags has CURR/NEXT flag that has corresponding `u8 current_key` and `u8 receive_next` values. As socket lock is held - that makes current_key/receive_next change atomic with deletion of an existing key that might have been in use.
In result user may remove a key that's not in use or has to set new current/next. Which avoids the issue with random MKT being used to sign segments.
Thanks, Dmitry
On 11/5/21 3:22 AM, Dmitry Safonov wrote:
Hi Leonard,
On 11/1/21 16:34, Leonard Crestez wrote: [..]
+struct tcp_authopt_key {
- /** @flags: Combination of &enum tcp_authopt_key_flag */
- __u32 flags;
- /** @send_id: keyid value for send */
- __u8 send_id;
- /** @recv_id: keyid value for receive */
- __u8 recv_id;
- /** @alg: One of &enum tcp_authopt_alg */
- __u8 alg;
- /** @keylen: Length of the key buffer */
- __u8 keylen;
- /** @key: Secret key */
- __u8 key[TCP_AUTHOPT_MAXKEYLEN];
- /**
* @addr: Key is only valid for this address
*
* Ignored unless TCP_AUTHOPT_KEY_ADDR_BIND flag is set
*/
- struct __kernel_sockaddr_storage addr;
+};
[..]
+/* Free key nicely, for living sockets */ +static void tcp_authopt_key_del(struct sock *sk,
struct tcp_authopt_info *info,
struct tcp_authopt_key_info *key)
+{
- sock_owned_by_me(sk);
- hlist_del_rcu(&key->node);
- atomic_sub(sizeof(*key), &sk->sk_omem_alloc);
- kfree_rcu(key, rcu);
+}
[..]
+#define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \
- TCP_AUTHOPT_KEY_DEL | \
- TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \
- TCP_AUTHOPT_KEY_ADDR_BIND)
+int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) +{
[..]
- /* Delete is a special case: */
- if (opt.flags & TCP_AUTHOPT_KEY_DEL) {
info = rcu_dereference_check(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk));
if (!info)
return -ENOENT;
key_info = tcp_authopt_key_lookup_exact(sk, info, &opt);
if (!key_info)
return -ENOENT;
tcp_authopt_key_del(sk, info, key_info);
return 0;
I remember we discussed it in RFC, that removing a key that's currently in use may result in random MKT to be used.
I think, it's possible to make this API a bit more predictable if:
- DEL command fails to remove a key that is current/receive_next;
- opt.flags has CURR/NEXT flag that has corresponding `u8 current_key`
and `u8 receive_next` values. As socket lock is held - that makes current_key/receive_next change atomic with deletion of an existing key that might have been in use.
In result user may remove a key that's not in use or has to set new current/next. Which avoids the issue with random MKT being used to sign segments.
The MKT used to sign segments is already essentially random unless the user makes a deliberate choice. This is what happens if you add two keys an call connect(). But why is this a problem?
Applications which want to deliberately control the send key can do so with TCP_AUTHOPT_FLAG_LOCK_KEYID. If that flag is not set then the key with send_id == recv_rnextkeyid is preffered as suggested by the RFC, or a random one on connect.
I think your suggestion would force additional complexity on all applications for no clear gain.
Key selection controls are only added much later in the series, this is also part of the effort to split the code into readable patches. See this patch:
https://lore.kernel.org/netdev/2dc569c0d60c80c26aafcaa201ba5b5ec53ce6bd.1635...
Removing a key while traffic is happening shouldn't cause failures in recv or send code; this takes some effort but is also required to prevent auth failures when a socket is closed and transitions to timewait. I attempted to ensure this by only doing rcu_dereference for tcp_authopt_info and tcp_authopt_key_info once per packet.
-- Regards, Leonard
On 11/5/21 07:04, Leonard Crestez wrote:
On 11/5/21 3:22 AM, Dmitry Safonov wrote:
[..]
I remember we discussed it in RFC, that removing a key that's currently in use may result in random MKT to be used.
I think, it's possible to make this API a bit more predictable if:
- DEL command fails to remove a key that is current/receive_next;
- opt.flags has CURR/NEXT flag that has corresponding `u8 current_key`
and `u8 receive_next` values. As socket lock is held - that makes current_key/receive_next change atomic with deletion of an existing key that might have been in use.
In result user may remove a key that's not in use or has to set new current/next. Which avoids the issue with random MKT being used to sign segments.
The MKT used to sign segments is already essentially random unless the user makes a deliberate choice. This is what happens if you add two keys an call connect(). But why is this a problem?
The issue is predictability and less control for a user on how the key is selected.
Let's say as a user I have two MKTs A and B. I want to use A for 6 weeks and then change to B. I want to switch to B as soon as the admin of the peer adds the key and the peer sends me (rnext_key = B.id).
With your semantics currently a random key will be used as long as I don't "lock" the id which means that rnext_key won't be respected. So there's clearly less predictability for a user to select current key in use.
Applications which want to deliberately control the send key can do so with TCP_AUTHOPT_FLAG_LOCK_KEYID. If that flag is not set then the key with send_id == recv_rnextkeyid is preffered as suggested by the RFC, or a random one on connect.
I think your suggestion would force additional complexity on all applications for no clear gain.
I disagree. From RFC (3.1):
"It is presumed that an MKT affecting a particular connection cannot be destroyed during an active connection -- or, equivalently, that its parameters are copied to an area local to the connection (i.e., instantiated) and so changes would affect only new connections."
which means that the user shouldn't be able to remove a key in use. So, by default you should return an error if the key in use being deleted.
The only use-case to delete a key that is in use is if it has been compromised RFC(6.1):
"Deciding when to start using a key is a performance issue. Deciding when to remove an MKT is a security issue. Invalid MKTs are expected to be removed. TCP-AO provides no mechanism to coordinate their removal, as we consider this a key management operation."
I might misread the RFC, but it seems that shouldn't happen in an ordinary usage scenario (as long as the user don't --force removal of the compromised key in an exceptional case).
So, if you allow a user to set current_key/rnext_key atomically with removal - it seems to fit this --force use-case and let user more control over which key is in use.
Key selection controls are only added much later in the series, this is also part of the effort to split the code into readable patches. See this patch:
https://lore.kernel.org/netdev/2dc569c0d60c80c26aafcaa201ba5b5ec53ce6bd.1635...
A separate issue with that one (if I'm not misreading) seems to be that you're going to send segments with info->send_rnextkeyid if the deleted key was TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID one. And won't be able to verify the peer inbound segments/replies.
Removing a key while traffic is happening shouldn't cause failures in recv or send code; this takes some effort but is also required to prevent auth failures when a socket is closed and transitions to timewait. I attempted to ensure this by only doing rcu_dereference for tcp_authopt_info and tcp_authopt_key_info once per packet.
Thanks, Dmitry
On 11/5/21 4:50 PM, Dmitry Safonov wrote:
On 11/5/21 07:04, Leonard Crestez wrote:
On 11/5/21 3:22 AM, Dmitry Safonov wrote:
[..]
I remember we discussed it in RFC, that removing a key that's currently in use may result in random MKT to be used.
I think, it's possible to make this API a bit more predictable if:
- DEL command fails to remove a key that is current/receive_next;
- opt.flags has CURR/NEXT flag that has corresponding `u8 current_key`
and `u8 receive_next` values. As socket lock is held - that makes current_key/receive_next change atomic with deletion of an existing key that might have been in use.
In result user may remove a key that's not in use or has to set new current/next. Which avoids the issue with random MKT being used to sign segments.
The MKT used to sign segments is already essentially random unless the user makes a deliberate choice. This is what happens if you add two keys an call connect(). But why is this a problem?
The issue is predictability and less control for a user on how the key is selected.
Let's say as a user I have two MKTs A and B. I want to use A for 6 weeks and then change to B. I want to switch to B as soon as the admin of the peer adds the key and the peer sends me (rnext_key = B.id).
With your semantics currently a random key will be used as long as I don't "lock" the id which means that rnext_key won't be respected. So there's clearly less predictability for a user to select current key in use.
RFC makes two requirements regarding keyid selection: A) 7.2: TCP SEND [..] MUST be augmented so that the preferred outgoing MKT (current_key) can be indicated. B) 7.5.2.e: Key must be switch to rnextkeyid if that key is available.
These requirements are in conflict so I added a flag TCP_AUTHOPT_LOCK_KEYID to determine if keyid is determined by local userspace or based on the peer's rnextkeyid.
Without a "locking" bit any key selections made from userspace would get flipped by incoming traffic. Indeed together with your suggestion of not allowing the current key to be deleted it would be possible for delete to fail repeatedly because the peer keeps sending us valid packets!
The expectation is that complex applications will use the "locking" functionality and handle the switch to recv_rnextkeyid themselves. Alternatively it's also possible for peers to only control rnextkeyid and perform key switch that way.
In your scenario the key will switch automatically as soon as the peer sends "B.send_id" in the rnextkeyid field.
Please note that key selection is only fully implemented in PATCH 19, without it the behavior is indeed more random. That patch was separated for ease of review and because detailed behavior is worth a separate discussion.
Entirely different key selections mechanisms are possible, for example each key could have a "preference" score. The most detailed discussion of key rollover I found is from Cisco:
https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/iproute_pi/configuration/x...
That document describes key rollover based on lifetime intervals for each key. I believe my patches provide sufficient control to implement the same but that it is a concern for userspace.
Applications which want to deliberately control the send key can do so with TCP_AUTHOPT_FLAG_LOCK_KEYID. If that flag is not set then the key with send_id == recv_rnextkeyid is preffered as suggested by the RFC, or a random one on connect.
I think your suggestion would force additional complexity on all applications for no clear gain.
I disagree. From RFC (3.1):
"It is presumed that an MKT affecting a particular connection cannot be destroyed during an active connection -- or, equivalently, that its parameters are copied to an area local to the connection (i.e., instantiated) and so changes would affect only new connections."
which means that the user shouldn't be able to remove a key in use. So, by default you should return an error if the key in use being deleted.
I believe this behavior belongs in application software.
The only use-case to delete a key that is in use is if it has been compromised RFC(6.1):
"Deciding when to start using a key is a performance issue. Deciding when to remove an MKT is a security issue. Invalid MKTs are expected to be removed. TCP-AO provides no mechanism to coordinate their removal, as we consider this a key management operation."
I might misread the RFC, but it seems that shouldn't happen in an ordinary usage scenario (as long as the user don't --force removal of the compromised key in an exceptional case).
So, if you allow a user to set current_key/rnext_key atomically with removal - it seems to fit this --force use-case and let user more control over which key is in use.
Control is available: user can "lock" a different key before removing the current one.
The kernel just doesn't make this mandatory.
Key selection controls are only added much later in the series, this is also part of the effort to split the code into readable patches. See this patch:
https://lore.kernel.org/netdev/2dc569c0d60c80c26aafcaa201ba5b5ec53ce6bd.1635...
A separate issue with that one (if I'm not misreading) seems to be that you're going to send segments with info->send_rnextkeyid if the deleted key was TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID one. And won't be able to verify the peer inbound segments/replies.
I'm not sure I understand this.
If TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID is set then the rnextkeyid byte in output packets is controlled by user directly, this behavior is kept deliberately simple.
Otherwise we send the recv_id of the current key, this attempts to ensure symmetry.
Userspace is expected to use the recv_id of a valid key with TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID, otherwise the connection will indeed break. The kernel could enforce that this value is valid but it does not attempt to prevent userspace bugs.
Removing a key while traffic is happening shouldn't cause failures in recv or send code; this takes some effort but is also required to prevent auth failures when a socket is closed and transitions to timewait. I attempted to ensure this by only doing rcu_dereference for tcp_authopt_info and tcp_authopt_key_info once per packet.
The .rst documentation contains a brief description of the user interface and includes kernel-doc generated from uapi header.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- Documentation/networking/index.rst | 1 + Documentation/networking/tcp_authopt.rst | 44 ++++++++++++++++++++++++ 2 files changed, 45 insertions(+) create mode 100644 Documentation/networking/tcp_authopt.rst
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 58bc8cd367c6..f5c324a060d8 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -100,10 +100,11 @@ Contents: strparser switchdev sysfs-tagging tc-actions-env-rules tcp-thin + tcp_authopt team timestamping tipc tproxy tuntap diff --git a/Documentation/networking/tcp_authopt.rst b/Documentation/networking/tcp_authopt.rst new file mode 100644 index 000000000000..484f66f41ad5 --- /dev/null +++ b/Documentation/networking/tcp_authopt.rst @@ -0,0 +1,44 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================= +TCP Authentication Option +========================= + +The TCP Authentication option specified by RFC5925 replaces the TCP MD5 +Signature option. It similar in goals but not compatible in either wire formats +or ABI. + +Interface +========= + +Individual keys can be added to or removed from a TCP socket by using +TCP_AUTHOPT_KEY setsockopt and a ``struct tcp_authopt_key``. There is no +support for reading back keys and updates always replace the old key. These +structures represent "Master Key Tuples (MKTs)" as described by the RFC. + +Per-socket options can set or read using the TCP_AUTHOPT sockopt and a ``struct +tcp_authopt``. This is optional: doing setsockopt TCP_AUTHOPT_KEY is +sufficient to enable the feature. + +Configuration associated with TCP Authentication is indepedently attached to +each TCP socket. After listen and accept the newly returned socket gets an +independent copy of relevant settings from the listen socket. + +Key binding +----------- + +Keys can be bound to remote addresses in a way that is similar to TCP_MD5. + + * The full address must match (/32 or /128) + * Ports are ignored + * Address binding is optional, by default keys match all addresses + +RFC5925 requires that key ids do not overlap when tcp identifiers (addr/port) +overlap. This is not enforced by linux, configuring ambiguous keys will result +in packet drops and lost connections. + +ABI Reference +============= + +.. kernel-doc:: include/uapi/linux/tcp.h + :identifiers: tcp_authopt tcp_authopt_flag tcp_authopt_key tcp_authopt_key_flag tcp_authopt_alg
This test suite is written as a standalone python3 package using dependencies such as scapy.
The run.sh script wrapper called from kselftest infrastructure uses "pip" to generate an isolated virtual environment just for running these tests. The run.sh wrapper can be called from anywhere and does not rely on kselftest infrastructure.
Default output is in TAP format.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- tools/testing/selftests/tcp_authopt/Makefile | 10 ++++ .../testing/selftests/tcp_authopt/README.rst | 18 ++++++++ tools/testing/selftests/tcp_authopt/config | 6 +++ .../selftests/tcp_authopt/requirements.txt | 46 +++++++++++++++++++ tools/testing/selftests/tcp_authopt/run.sh | 31 +++++++++++++ tools/testing/selftests/tcp_authopt/settings | 1 + tools/testing/selftests/tcp_authopt/setup.cfg | 35 ++++++++++++++ tools/testing/selftests/tcp_authopt/setup.py | 6 +++ .../tcp_authopt/tcp_authopt_test/__init__.py | 0 9 files changed, 153 insertions(+) create mode 100644 tools/testing/selftests/tcp_authopt/Makefile create mode 100644 tools/testing/selftests/tcp_authopt/README.rst create mode 100644 tools/testing/selftests/tcp_authopt/config create mode 100644 tools/testing/selftests/tcp_authopt/requirements.txt create mode 100755 tools/testing/selftests/tcp_authopt/run.sh create mode 100644 tools/testing/selftests/tcp_authopt/settings create mode 100644 tools/testing/selftests/tcp_authopt/setup.cfg create mode 100644 tools/testing/selftests/tcp_authopt/setup.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/__init__.py
diff --git a/tools/testing/selftests/tcp_authopt/Makefile b/tools/testing/selftests/tcp_authopt/Makefile new file mode 100644 index 000000000000..256ae2c16013 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/Makefile @@ -0,0 +1,10 @@ +# SPDX-License-Identifier: GPL-2.0 +include ../lib.mk + +TEST_PROGS += ./run.sh +TEST_FILES := \ + requirements.txt \ + settings \ + setup.cfg \ + setup.py \ + tcp_authopt_test diff --git a/tools/testing/selftests/tcp_authopt/README.rst b/tools/testing/selftests/tcp_authopt/README.rst new file mode 100644 index 000000000000..e9548469c827 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/README.rst @@ -0,0 +1,18 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================= +Tests for linux TCP Authentication Option +========================================= + +Test suite is written in python3 using pytest and scapy. The test suite is +mostly self-contained as a python package. + +The recommended way to run this is the included `run.sh` script as root, this +will automatically create a virtual environment with the correct dependencies +using `pip`. If not running under root it will automatically attempt to elevate +using `sudo` after the virtualenv is created. + +An old separate version can be found here: https://github.com/cdleonard/tcp-authopt-test + +Integration with kselftest infrastructure is minimal: when in doubt just run +this separately. diff --git a/tools/testing/selftests/tcp_authopt/config b/tools/testing/selftests/tcp_authopt/config new file mode 100644 index 000000000000..0d4e5d47fa72 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/config @@ -0,0 +1,6 @@ +# RFC5925 TCP Authentication Option and all algorithms +CONFIG_TCP_AUTHOPT=y +CONFIG_CRYPTO_SHA1=M +CONFIG_CRYPTO_HMAC=M +CONFIG_CRYPTO_AES=M +CONFIG_CRYPTO_CMAC=M diff --git a/tools/testing/selftests/tcp_authopt/requirements.txt b/tools/testing/selftests/tcp_authopt/requirements.txt new file mode 100644 index 000000000000..713d4d1b7a55 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/requirements.txt @@ -0,0 +1,46 @@ +# +# This file is autogenerated by pip-compile with python 3.8 +# To update, run: +# +# pip-compile +# +argparse==1.4.0 + # via nsenter +attrs==21.2.0 + # via pytest +cffi==1.15.0 + # via cryptography +contextlib2==21.6.0 + # via nsenter +cryptography==35.0.0 + # via tcp-authopt-test (setup.py) +iniconfig==1.1.1 + # via pytest +nsenter==0.2 + # via tcp-authopt-test (setup.py) +packaging==21.0 + # via pytest +pathlib==1.0.1 + # via nsenter +pluggy==1.0.0 + # via pytest +py==1.10.0 + # via pytest +pycparser==2.20 + # via cffi +pyparsing==3.0.1 + # via packaging +pytest==6.2.5 + # via + # pytest-tap + # tcp-authopt-test (setup.py) +pytest-tap==3.3 + # via tcp-authopt-test (setup.py) +scapy==2.4.5 + # via tcp-authopt-test (setup.py) +tap.py==3.0 + # via pytest-tap +toml==0.10.2 + # via pytest +waiting==1.4.1 + # via tcp-authopt-test (setup.py) diff --git a/tools/testing/selftests/tcp_authopt/run.sh b/tools/testing/selftests/tcp_authopt/run.sh new file mode 100755 index 000000000000..7aeb125706a4 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/run.sh @@ -0,0 +1,31 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Create virtualenv using pip and run pytest +# Accepts all args that pytest does +# +set -e +cd "$(dirname "${BASH_SOURCE[0]}")" + +if [[ -d venv ]]; then + echo >&2 "Using existing $(readlink -f venv)" +else + echo >&2 "Creating $(readlink -f venv)" + python3 -m venv venv + ( + . venv/bin/activate + pip install wheel + pip install -r requirements.txt + ) +fi + +cmd=(pytest -s --log-cli-level=DEBUG --tap-stream "$@") +if [[ $(id -u) -ne 0 ]]; then + echo >&2 "warning: running as non-root user, attempting sudo" + # sudo -E to use the virtualenv: + cmd=(sudo bash -c ". venv/bin/activate;$(printf " %q" "${cmd[@]}")") + exec "${cmd[@]}" +else + . venv/bin/activate + exec "${cmd[@]}" +fi diff --git a/tools/testing/selftests/tcp_authopt/settings b/tools/testing/selftests/tcp_authopt/settings new file mode 100644 index 000000000000..6091b45d226b --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/settings @@ -0,0 +1 @@ +timeout=120 diff --git a/tools/testing/selftests/tcp_authopt/setup.cfg b/tools/testing/selftests/tcp_authopt/setup.cfg new file mode 100644 index 000000000000..452083fec64b --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/setup.cfg @@ -0,0 +1,35 @@ +[options] +install_requires= + cryptography + nsenter + pytest + pytest-tap + scapy + waiting + +[options.extras_require] +dev = + black + isort + mypy + pip-tools + pre-commit + tox + +[tox:tox] +envlist = py3 + +[testenv] +commands = pytest {posargs} +deps = -rrequirements.txt + +[metadata] +name = tcp-authopt-test +version = 0.1 + +[mypy] +ignore_missing_imports = true +files = . + +[isort] +profile = black diff --git a/tools/testing/selftests/tcp_authopt/setup.py b/tools/testing/selftests/tcp_authopt/setup.py new file mode 100644 index 000000000000..055b98132e26 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/setup.py @@ -0,0 +1,6 @@ +#! /usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +from setuptools import setup + +setup() diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/__init__.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/__init__.py new file mode 100644 index 000000000000..e69de29bb2d1
Add a python translation of the linux ABI for tcpao and test the behavior of TCP_AUTHOPT and TCP_AUTHOPT_KEY sockopts.
This includes several corner cases not normally covered by traffic tests.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- .../tcp_authopt/tcp_authopt_test/conftest.py | 71 +++++ .../tcp_authopt_test/linux_tcp_authopt.py | 266 ++++++++++++++++++ .../tcp_authopt/tcp_authopt_test/sockaddr.py | 122 ++++++++ .../tcp_authopt_test/test_sockopt.py | 203 +++++++++++++ 4 files changed, 662 insertions(+) create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/conftest.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/sockaddr.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sockopt.py
diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/conftest.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/conftest.py new file mode 100644 index 000000000000..a06ba848669d --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/conftest.py @@ -0,0 +1,71 @@ +# SPDX-License-Identifier: GPL-2.0 +import logging +import os +from contextlib import ExitStack, nullcontext +from typing import ContextManager + +import pytest + +from .linux_tcp_authopt import enable_sysctl_tcp_authopt, has_tcp_authopt + +logger = logging.getLogger(__name__) + +skipif_missing_tcp_authopt = pytest.mark.skipif( + not has_tcp_authopt(), reason="Need CONFIG_TCP_AUTHOPT" +) + + +def get_effective_capabilities(): + for line in open("/proc/self/status", "r"): + if line.startswith("CapEff:"): + return int(line.split(":")[1], 16) + + +def has_effective_capability(bit) -> bool: + return get_effective_capabilities() & (1 << bit) != 0 + + +def can_capture() -> bool: + return has_effective_capability(13) + + +def raise_skip_no_netns(): + if not has_effective_capability(12): + pytest.skip("Need CAP_NET_ADMIN for network namespaces") + + +skipif_cant_capture = pytest.mark.skipif( + not can_capture(), reason="run as root to capture packets" +) + + +@pytest.fixture +def exit_stack(): + """Return a contextlib.ExitStack as a pytest fixture + + This reduces indentation making code more readable + """ + with ExitStack() as exit_stack: + yield exit_stack + + +def pytest_configure(): + # Silence messages regarding netns enter/exit: + logging.getLogger("nsenter").setLevel(logging.INFO) + if has_tcp_authopt(): + enable_sysctl_tcp_authopt() + + +def parametrize_product(**kw): + """Parametrize each key to each item in the value list""" + import itertools + + return pytest.mark.parametrize(",".join(kw.keys()), itertools.product(*kw.values())) + + +def raises_optional_exception(expected_exception, **kw) -> ContextManager: + """Like pytest.raises except accept expected_exception=None""" + if expected_exception is None: + return nullcontext() + else: + return pytest.raises(expected_exception, **kw) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py new file mode 100644 index 000000000000..b9dc9decda07 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py @@ -0,0 +1,266 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Python wrapper around linux TCP_AUTHOPT ABI""" + +import errno +import logging +import socket +import struct +import typing +from dataclasses import dataclass +from enum import IntEnum, IntFlag + +from .sockaddr import ( + SockaddrConvertType, + sockaddr_base, + sockaddr_convert, + sockaddr_storage, + sockaddr_unpack, +) + +logger = logging.getLogger(__name__) + + +def BIT(x): + return 1 << x + + +TCP_AUTHOPT = 38 +TCP_AUTHOPT_KEY = 39 + +TCP_AUTHOPT_MAXKEYLEN = 80 + + +class TCP_AUTHOPT_FLAG(IntFlag): + REJECT_UNEXPECTED = BIT(2) + + +class TCP_AUTHOPT_KEY_FLAG(IntFlag): + DEL = BIT(0) + EXCLUDE_OPTS = BIT(1) + BIND_ADDR = BIT(2) + + +class TCP_AUTHOPT_ALG(IntEnum): + HMAC_SHA_1_96 = 1 + AES_128_CMAC_96 = 2 + + +@dataclass +class tcp_authopt: + """Like linux struct tcp_authopt""" + + flags: int = 0 + sizeof = 4 + + def pack(self) -> bytes: + return struct.pack( + "I", + self.flags, + ) + + def __bytes__(self): + return self.pack() + + @classmethod + def unpack(cls, b: bytes): + tup = struct.unpack("I", b) + return cls(*tup) + + +def set_tcp_authopt(sock, opt: tcp_authopt): + return sock.setsockopt(socket.SOL_TCP, TCP_AUTHOPT, bytes(opt)) + + +def get_tcp_authopt(sock: socket.socket) -> tcp_authopt: + b = sock.getsockopt(socket.SOL_TCP, TCP_AUTHOPT, tcp_authopt.sizeof) + return tcp_authopt.unpack(b) + + +class tcp_authopt_key: + """Like linux struct tcp_authopt_key + + :ivar auto_flags: If true(default) then set "binding" flags based on non-null values attributes. + """ + + KeyArgType = typing.Union[str, bytes] + AddrArgType = typing.Union[None, str, bytes, SockaddrConvertType] + + def __init__( + self, + flags: TCP_AUTHOPT_KEY_FLAG = TCP_AUTHOPT_KEY_FLAG(0), + send_id: int = 0, + recv_id: int = 0, + alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, + key: KeyArgType = b"", + addr: AddrArgType = None, + auto_flags: bool = True, + include_options=None, + ): + self.flags = flags + self.send_id = send_id + self.recv_id = recv_id + self.alg = alg + self.key = key + self.addr = addr + self.auto_flags = auto_flags + if include_options is not None: + self.include_options = include_options + + def get_real_flags(self) -> TCP_AUTHOPT_KEY_FLAG: + result = self.flags + if self.auto_flags: + if self.addr is not None: + result |= TCP_AUTHOPT_KEY_FLAG.BIND_ADDR + else: + result &= ~TCP_AUTHOPT_KEY_FLAG.BIND_ADDR + return result + + def pack(self): + if len(self.key) > TCP_AUTHOPT_MAXKEYLEN: + raise ValueError(f"Max key length is {TCP_AUTHOPT_MAXKEYLEN}") + data = struct.pack( + "IBBBB80s", + self.get_real_flags(), + self.send_id, + self.recv_id, + self.alg, + len(self.key), + self.key, + ) + data += bytes(self.addrbuf.ljust(sockaddr_storage.sizeof, b"\x00")) + return data + + def __bytes__(self): + return self.pack() + + @property + def key(self) -> KeyArgType: + return self._key + + @key.setter + def key(self, val: KeyArgType) -> bytes: + if isinstance(val, str): + val = val.encode("utf-8") + if len(val) > TCP_AUTHOPT_MAXKEYLEN: + raise ValueError(f"Max key length is {TCP_AUTHOPT_MAXKEYLEN}") + self._key = val + return val + + @property + def addr(self): + if not self.addrbuf: + return None + else: + return sockaddr_unpack(bytes(self.addrbuf)) + + @addr.setter + def addr(self, val: AddrArgType): + if isinstance(val, bytes): + if len(val) > sockaddr_storage.sizeof: + raise ValueError(f"Must be up to {sockaddr_storage.sizeof}") + self.addrbuf = val + elif val is None: + self.addrbuf = b"" + elif isinstance(val, sockaddr_base): + self.addr = bytes(val) + else: + self.addr = sockaddr_convert(val) + return self.addr + + @property + def include_options(self) -> bool: + return not self.flags & TCP_AUTHOPT_KEY_FLAG.EXCLUDE_OPTS + + @include_options.setter + def include_options(self, value) -> bool: + if value: + self.flags &= ~TCP_AUTHOPT_KEY_FLAG.EXCLUDE_OPTS + else: + self.flags |= TCP_AUTHOPT_KEY_FLAG.EXCLUDE_OPTS + return value + + @property + def delete_flag(self) -> bool: + return bool(self.flags & TCP_AUTHOPT_KEY_FLAG.DEL) + + @delete_flag.setter + def delete_flag(self, value) -> bool: + if value: + self.flags |= TCP_AUTHOPT_KEY_FLAG.DEL + else: + self.flags &= ~TCP_AUTHOPT_KEY_FLAG.DEL + return value + + +def set_tcp_authopt_key(sock, keyopt: tcp_authopt_key): + return sock.setsockopt(socket.SOL_TCP, TCP_AUTHOPT_KEY, bytes(keyopt)) + + +def set_tcp_authopt_key_kwargs(sock, keyopt: tcp_authopt_key = None, **kw): + if keyopt is None: + keyopt = tcp_authopt_key() + for k, v in kw.items(): + setattr(keyopt, k, v) + return set_tcp_authopt_key(sock, keyopt) + + +def del_tcp_authopt_key(sock, key: tcp_authopt_key) -> bool: + """Try to delete an authopt key + + :return: True if a key was deleted, False if it was not present + """ + import copy + + key = copy.copy(key) + key.delete_flag = True + try: + sock.setsockopt(socket.SOL_TCP, TCP_AUTHOPT_KEY, bytes(key)) + return True + except OSError as e: + if e.errno == errno.ENOENT: + return False + raise + + +def get_sysctl_tcp_authopt() -> typing.Optional[bool]: + from pathlib import Path + + path = Path("/proc/sys/net/ipv4/tcp_authopt") + if path.exists(): + return path.read_text().strip() != "0" + else: + return None + + +def enable_sysctl_tcp_authopt(): + from pathlib import Path + + path = Path("/proc/sys/net/ipv4/tcp_authopt") + # Do nothing if absent + if not path.exists(): + return + try: + if path.read_text().strip() == "0": + path.write_text("1") + except: + raise Exception("Failed to enable /proc/sys/net/ipv4/tcp_authopt") + + +def has_tcp_authopt() -> bool: + """Check is TCP_AUTHOPT is implemented by the OS + + Returns True if implemented but disabled by sysctl + Returns False if disabled at compile time + """ + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + try: + optbuf = bytes(4) + sock.setsockopt(socket.SOL_TCP, TCP_AUTHOPT, optbuf) + return True + except OSError as e: + if e.errno == errno.ENOPROTOOPT: + return False + elif e.errno == errno.EPERM and get_sysctl_tcp_authopt() is False: + return True + else: + raise diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/sockaddr.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/sockaddr.py new file mode 100644 index 000000000000..3ad22c0b4015 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/sockaddr.py @@ -0,0 +1,122 @@ +# SPDX-License-Identifier: GPL-2.0 +"""pack/unpack wrappers for sockaddr""" +import socket +import struct +import typing +from dataclasses import dataclass +from ipaddress import IPv4Address, IPv6Address, ip_address + + +class sockaddr_base: + def pack(self) -> bytes: + raise NotImplementedError() + + def __bytes__(self): + return self.pack() + + +class sockaddr_in(sockaddr_base): + port: int + addr: IPv4Address + sizeof = 8 + + def __init__(self, port=0, addr=None): + self.port = port + if addr is None: + addr = IPv4Address(0) + self.addr = IPv4Address(addr) + + def pack(self): + return struct.pack("HH4s", socket.AF_INET, self.port, self.addr.packed) + + @classmethod + def unpack(cls, buffer): + family, port, addr_packed = struct.unpack("HH4s", buffer[:8]) + if family != socket.AF_INET: + raise ValueError(f"Must be AF_INET not {family}") + return cls(port, addr_packed) + + +@dataclass +class sockaddr_in6(sockaddr_base): + """Like sockaddr_in6 but for python. Always contains scope_id""" + + port: int + addr: IPv6Address + flowinfo: int + scope_id: int + sizeof = 28 + + def __init__(self, port=0, addr=None, flowinfo=0, scope_id=0): + self.port = port + if addr is None: + addr = IPv6Address(0) + self.addr = IPv6Address(addr) + self.flowinfo = flowinfo + self.scope_id = scope_id + + def pack(self): + return struct.pack( + "HHI16sI", + socket.AF_INET6, + self.port, + self.flowinfo, + self.addr.packed, + self.scope_id, + ) + + @classmethod + def unpack(cls, buffer): + family, port, flowinfo, addr_packed, scope_id = struct.unpack( + "HHI16sI", buffer[:28] + ) + if family != socket.AF_INET6: + raise ValueError(f"Must be AF_INET6 not {family}") + return cls(port, addr_packed, flowinfo=flowinfo, scope_id=scope_id) + + +@dataclass +class sockaddr_storage(sockaddr_base): + family: int + data: bytes + sizeof = 128 + + def pack(self): + return struct.pack("H126s", self.family, self.data) + + @classmethod + def unpack(cls, buffer): + return cls(*struct.unpack("H126s", buffer)) + + +def sockaddr_unpack(buffer: bytes): + """Unpack based on family""" + family = struct.unpack("H", buffer[:2])[0] + if family == socket.AF_INET: + return sockaddr_in.unpack(buffer) + elif family == socket.AF_INET6: + return sockaddr_in6.unpack(buffer) + else: + return sockaddr_storage.unpack(buffer) + + +SockaddrConvertType = typing.Union[ + sockaddr_in, sockaddr_in6, sockaddr_storage, IPv4Address, IPv6Address, str +] + + +def sockaddr_convert(val: SockaddrConvertType) -> sockaddr_base: + """Try to convert address into some sort of sockaddr""" + if ( + isinstance(val, sockaddr_in) + or isinstance(val, sockaddr_in6) + or isinstance(val, sockaddr_storage) + ): + return val + if isinstance(val, IPv4Address): + return sockaddr_in(addr=val) + if isinstance(val, IPv6Address): + return sockaddr_in6(addr=val) + if isinstance(val, str): + return sockaddr_convert(ip_address(val)) + raise TypeError(f"Don't know how to convert {val!r} to sockaddr") diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sockopt.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sockopt.py new file mode 100644 index 000000000000..41ebde8b1b7c --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sockopt.py @@ -0,0 +1,203 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Test TCP_AUTHOPT sockopt API""" +import errno +import socket +import struct +from ipaddress import IPv4Address, IPv6Address + +import pytest + +from .conftest import skipif_missing_tcp_authopt +from .linux_tcp_authopt import ( + TCP_AUTHOPT, + TCP_AUTHOPT_ALG, + TCP_AUTHOPT_FLAG, + TCP_AUTHOPT_KEY, + TCP_AUTHOPT_KEY_FLAG, + del_tcp_authopt_key, + get_tcp_authopt, + set_tcp_authopt, + set_tcp_authopt_key, + tcp_authopt, + tcp_authopt_key, +) +from .sockaddr import sockaddr_in, sockaddr_in6, sockaddr_unpack + +pytestmark = skipif_missing_tcp_authopt + + +def test_authopt_key_pack_noaddr(): + b = bytes(tcp_authopt_key(key=b"a\x00b")) + assert b[7] == 3 + assert b[8:13] == b"a\x00b\x00\x00" + + +def test_authopt_key_pack_addr(): + b = bytes(tcp_authopt_key(key=b"a\x00b", addr="10.0.0.1")) + assert struct.unpack("H", b[88:90])[0] == socket.AF_INET + assert sockaddr_unpack(b[88:]).addr == IPv4Address("10.0.0.1") + + +def test_authopt_key_pack_addr6(): + b = bytes(tcp_authopt_key(key=b"abc", addr="fd00::1")) + assert struct.unpack("H", b[88:90])[0] == socket.AF_INET6 + assert sockaddr_unpack(b[88:]).addr == IPv6Address("fd00::1") + + +def test_tcp_authopt_key_del_without_active(exit_stack): + sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + exit_stack.push(sock) + + # nothing happens: + key = tcp_authopt_key() + assert key.delete_flag is False + key.delete_flag = True + assert key.delete_flag is True + with pytest.raises(OSError) as e: + set_tcp_authopt_key(sock, key) + assert e.value.errno in [errno.EINVAL, errno.ENOENT] + + +def test_tcp_authopt_key_setdel(exit_stack): + sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + exit_stack.push(sock) + set_tcp_authopt(sock, tcp_authopt()) + + # delete returns ENOENT + key = tcp_authopt_key() + key.delete_flag = True + with pytest.raises(OSError) as e: + set_tcp_authopt_key(sock, key) + assert e.value.errno == errno.ENOENT + + key = tcp_authopt_key(send_id=1, recv_id=2) + set_tcp_authopt_key(sock, key) + # First delete works fine: + key.delete_flag = True + set_tcp_authopt_key(sock, key) + # Duplicate delete returns ENOENT + with pytest.raises(OSError) as e: + set_tcp_authopt_key(sock, key) + assert e.value.errno == errno.ENOENT + + +def test_get_tcp_authopt(): + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + with pytest.raises(OSError) as e: + sock.getsockopt(socket.SOL_TCP, TCP_AUTHOPT, 4) + assert e.value.errno == errno.ENOENT + + +def test_set_get_tcp_authopt_flags(): + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + # No flags by default + set_tcp_authopt(sock, tcp_authopt()) + opt = get_tcp_authopt(sock) + assert opt.flags == 0 + + # simple flags are echoed + goodflag = TCP_AUTHOPT_FLAG.REJECT_UNEXPECTED + set_tcp_authopt(sock, tcp_authopt(flags=goodflag)) + opt = get_tcp_authopt(sock) + assert opt.flags == goodflag + + # attempting to set a badflag returns an error and has no effect + badflag = 1 << 27 + with pytest.raises(OSError) as e: + set_tcp_authopt(sock, tcp_authopt(flags=badflag)) + opt = get_tcp_authopt(sock) + assert opt.flags == goodflag + + +def test_set_ipv6_key_on_ipv4(): + """Binding a key to an ipv6 address on an ipv4 socket makes no sense""" + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + key = tcp_authopt_key("abc") + key.flags = TCP_AUTHOPT_KEY_FLAG.BIND_ADDR + key.addr = IPv6Address("::1234") + with pytest.raises(OSError): + set_tcp_authopt_key(sock, key) + + +def test_set_ipv4_key_on_ipv6(): + """This could be implemented for ipv6-mapped-ipv4 but it is not + + TCP_MD5SIG has a similar limitation + """ + with socket.socket(socket.AF_INET6, socket.SOCK_STREAM) as sock: + key = tcp_authopt_key("abc") + key.flags = TCP_AUTHOPT_KEY_FLAG.BIND_ADDR + key.addr = IPv4Address("1.2.3.4") + with pytest.raises(OSError): + set_tcp_authopt_key(sock, key) + + +def test_authopt_key_badflags(): + """Don't pretend to handle unknown flags""" + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + with pytest.raises(OSError): + set_tcp_authopt_key(sock, tcp_authopt_key(flags=0xABCDEF)) + + +def test_authopt_key_longer_bad(): + """Test that pass a longer sockopt with unknown data fails + + Old kernels won't pretend to handle features they don't know about + """ + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + key = tcp_authopt_key(alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, key="aaa") + optbuf = bytes(key) + optbuf = optbuf.ljust(len(optbuf) + 256, b"\x5a") + with pytest.raises(OSError): + sock.setsockopt(socket.SOL_TCP, TCP_AUTHOPT_KEY, optbuf) + + +def test_authopt_key_longer_zeros(): + """Test that passing a longer sockopt padded with zeros works + + This ensures applications using a larger struct tcp_authopt_key won't have + to pass a shorter optlen on old kernels. + """ + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + key = tcp_authopt_key(alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, key="aaa") + optbuf = bytes(key) + optbuf = optbuf.ljust(len(optbuf) + 256, b"\x00") + sock.setsockopt(socket.SOL_TCP, TCP_AUTHOPT_KEY, optbuf) + # the key was added and can be deleted normally + assert del_tcp_authopt_key(sock, key) == True + assert del_tcp_authopt_key(sock, key) == False + + +def test_authopt_longer_baddata(): + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + opt = tcp_authopt() + optbuf = bytes(opt) + optbuf = optbuf.ljust(len(optbuf) + 256, b"\x5a") + with pytest.raises(OSError): + sock.setsockopt(socket.SOL_TCP, TCP_AUTHOPT, optbuf) + + +def test_authopt_longer_zeros(): + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + opt = tcp_authopt() + optbuf = bytes(opt) + optbuf = optbuf.ljust(len(optbuf) + 256, b"\x00") + sock.setsockopt(socket.SOL_TCP, TCP_AUTHOPT, optbuf) + + +def test_authopt_setdel_addrbind(): + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + key = tcp_authopt_key(addr="1.1.1.1", recv_id=1, send_id=1) + key2 = tcp_authopt_key(addr="1.1.1.2", recv_id=1, send_id=1) + set_tcp_authopt_key(sock, key) + assert del_tcp_authopt_key(sock, key2) == False + assert del_tcp_authopt_key(sock, key) == True + assert del_tcp_authopt_key(sock, key) == False + + +def test_authopt_include_options(): + key = tcp_authopt_key() + assert key.include_options + key.include_options = False + assert key.flags & TCP_AUTHOPT_KEY_FLAG.EXCLUDE_OPTS + assert not key.include_options
The crypto_shash API is used in order to compute packet signatures. The API comes with several unfortunate limitations:
1) Allocating a crypto_shash can sleep and must be done in user context. 2) Packet signatures must be computed in softirq context 3) Packet signatures use dynamic "traffic keys" which require exclusive access to crypto_shash for crypto_setkey.
The solution is to allocate one crypto_shash for each possible cpu for each algorithm at setsockopt time. The per-cpu tfm is then borrowed from softirq context, signatures are computed and the tfm is returned.
The pool for each algorithm is allocated on first use.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 16 ++++ net/ipv4/tcp_authopt.c | 166 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 182 insertions(+)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 42ad764e98c2..5217b6c7c900 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -2,10 +2,24 @@ #ifndef _LINUX_TCP_AUTHOPT_H #define _LINUX_TCP_AUTHOPT_H
#include <uapi/linux/tcp.h>
+/* According to RFC5925 the length of the authentication option varies based on + * the signature algorithm. Linux only implements the algorithms defined in + * RFC5926 which have a constant length of 16. + * + * This is used in stack allocation of tcp option buffers for output. It is + * shorter than the length of the MD5 option. + * + * Input packets can have authentication options of different lengths but they + * will always be flagged as invalid (since no such algorithms are supported). + */ +#define TCPOLEN_AUTHOPT_OUTPUT 16 + +struct tcp_authopt_alg_imp; + /** * struct tcp_authopt_key_info - Representation of a Master Key Tuple as per RFC5925 * * Key structure lifetime is only protected by RCU so readers needs to hold a * single rcu_read_lock until they're done with the key. @@ -27,10 +41,12 @@ struct tcp_authopt_key_info { u8 keylen; /** @key: Same as &tcp_authopt_key.key */ u8 key[TCP_AUTHOPT_MAXKEYLEN]; /** @addr: Same as &tcp_authopt_key.addr */ struct sockaddr_storage addr; + /** @alg: Algorithm implementation matching alg_id */ + struct tcp_authopt_alg_imp *alg; };
/** * struct tcp_authopt_info - Per-socket information regarding tcp_authopt * diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index c412a712f229..5455a9ecfe6b 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -3,10 +3,164 @@ #include <linux/kernel.h> #include <net/tcp.h> #include <net/tcp_authopt.h> #include <crypto/hash.h>
+/* All current algorithms have a mac length of 12 but crypto API digestsize can be larger */ +#define TCP_AUTHOPT_MAXMACBUF 20 +#define TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN 20 +#define TCP_AUTHOPT_MACLEN 12 + +/* Constant data with per-algorithm information from RFC5926 + * The "KDF" and "MAC" happen to be the same for both algorithms. + */ +struct tcp_authopt_alg_imp { + /* Name of algorithm in crypto-api */ + const char *alg_name; + /* One of the TCP_AUTHOPT_ALG_* constants from uapi */ + u8 alg_id; + /* Length of traffic key */ + u8 traffic_key_len; + + /* shared crypto_shash */ + struct mutex init_mutex; + bool init_done; + struct crypto_shash * __percpu *tfms; +}; + +static struct tcp_authopt_alg_imp tcp_authopt_alg_list[] = { + { + .alg_id = TCP_AUTHOPT_ALG_HMAC_SHA_1_96, + .alg_name = "hmac(sha1)", + .traffic_key_len = 20, + .init_mutex = __MUTEX_INITIALIZER(tcp_authopt_alg_list[0].init_mutex), + }, + { + .alg_id = TCP_AUTHOPT_ALG_AES_128_CMAC_96, + .alg_name = "cmac(aes)", + .traffic_key_len = 16, + .init_mutex = __MUTEX_INITIALIZER(tcp_authopt_alg_list[1].init_mutex), + }, +}; + +/* get a pointer to the tcp_authopt_alg instance or NULL if id invalid */ +static inline struct tcp_authopt_alg_imp *tcp_authopt_alg_get(int alg_num) +{ + if (alg_num <= 0 || alg_num > 2) + return NULL; + return &tcp_authopt_alg_list[alg_num - 1]; +} + +static void __tcp_authopt_alg_free(struct tcp_authopt_alg_imp *alg) +{ + int cpu; + struct crypto_shash *tfm; + + if (!alg->tfms) + return; + for_each_possible_cpu(cpu) { + tfm = *per_cpu_ptr(alg->tfms, cpu); + if (tfm) { + crypto_free_shash(tfm); + *per_cpu_ptr(alg->tfms, cpu) = NULL; + } + } + free_percpu(alg->tfms); + alg->tfms = NULL; +} + +static int __tcp_authopt_alg_init(struct tcp_authopt_alg_imp *alg) +{ + struct crypto_shash *tfm; + int cpu; + int err; + + BUILD_BUG_ON(TCP_AUTHOPT_MAXMACBUF < TCPOLEN_AUTHOPT_OUTPUT); + if (WARN_ON_ONCE(alg->traffic_key_len > TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN)) + return -ENOBUFS; + + alg->tfms = alloc_percpu(struct crypto_shash *); + if (!alg->tfms) + return -ENOMEM; + for_each_possible_cpu(cpu) { + tfm = crypto_alloc_shash(alg->alg_name, 0, 0); + if (IS_ERR(tfm)) { + err = PTR_ERR(tfm); + goto out_err; + } + + /* sanity checks: */ + if (WARN_ON_ONCE(crypto_shash_digestsize(tfm) != alg->traffic_key_len)) { + err = -EINVAL; + goto out_err; + } + if (WARN_ON_ONCE(crypto_shash_digestsize(tfm) > TCP_AUTHOPT_MAXMACBUF)) { + err = -EINVAL; + goto out_err; + } + + *per_cpu_ptr(alg->tfms, cpu) = tfm; + } + return 0; + +out_err: + __tcp_authopt_alg_free(alg); + return err; +} + +static int tcp_authopt_alg_require(struct tcp_authopt_alg_imp *alg) +{ + int err = 0; + + mutex_lock(&alg->init_mutex); + if (alg->init_done) + goto out; + err = __tcp_authopt_alg_init(alg); + if (err) + goto out; + pr_info("initialized tcp-ao algorithm %s", alg->alg_name); + alg->init_done = true; + +out: + mutex_unlock(&alg->init_mutex); + return err; +} + +static struct crypto_shash *tcp_authopt_alg_get_tfm(struct tcp_authopt_alg_imp *alg) +{ + preempt_disable(); + return *this_cpu_ptr(alg->tfms); +} + +static void tcp_authopt_alg_put_tfm(struct tcp_authopt_alg_imp *alg, struct crypto_shash *tfm) +{ + WARN_ON(tfm != *this_cpu_ptr(alg->tfms)); + preempt_enable(); +} + +static struct crypto_shash *tcp_authopt_get_kdf_shash(struct tcp_authopt_key_info *key) +{ + return tcp_authopt_alg_get_tfm(key->alg); +} + +static void tcp_authopt_put_kdf_shash(struct tcp_authopt_key_info *key, + struct crypto_shash *tfm) +{ + return tcp_authopt_alg_put_tfm(key->alg, tfm); +} + +static struct crypto_shash *tcp_authopt_get_mac_shash(struct tcp_authopt_key_info *key) +{ + return tcp_authopt_alg_get_tfm(key->alg); +} + +static void tcp_authopt_put_mac_shash(struct tcp_authopt_key_info *key, + struct crypto_shash *tfm) +{ + return tcp_authopt_alg_put_tfm(key->alg, tfm); +} + /* checks that ipv4 or ipv6 addr matches. */ static bool ipvx_addr_match(struct sockaddr_storage *a1, struct sockaddr_storage *a2) { if (a1->ss_family != a2->ss_family) @@ -202,10 +356,11 @@ void tcp_authopt_clear(struct sock *sk) int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) { struct tcp_authopt_key opt; struct tcp_authopt_info *info; struct tcp_authopt_key_info *key_info, *old_key_info; + struct tcp_authopt_alg_imp *alg; int err;
sock_owned_by_me(sk);
err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); @@ -239,10 +394,20 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) /* Initialize tcp_authopt_info if not already set */ info = __tcp_authopt_info_get_or_create(sk); if (IS_ERR(info)) return PTR_ERR(info);
+ /* check the algorithm */ + alg = tcp_authopt_alg_get(opt.alg); + if (!alg) + return -EINVAL; + if (WARN_ON_ONCE(alg->alg_id != opt.alg)) + return -EINVAL; + err = tcp_authopt_alg_require(alg); + if (err) + return err; + key_info = sock_kmalloc(sk, sizeof(*key_info), GFP_KERNEL | __GFP_ZERO); if (!key_info) return -ENOMEM; /* If an old key exists with exact ID then remove and replace. * RCU-protected readers might observe both and pick any. @@ -252,10 +417,11 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) tcp_authopt_key_del(sk, info, old_key_info); key_info->flags = opt.flags & TCP_AUTHOPT_KEY_KNOWN_FLAGS; key_info->send_id = opt.send_id; key_info->recv_id = opt.recv_id; key_info->alg_id = opt.alg; + key_info->alg = alg; key_info->keylen = opt.keylen; memcpy(key_info->key, opt.key, opt.keylen); memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr)); hlist_add_head_rcu(&key_info->node, &info->head);
Computing tcp authopt packet signatures is a two step process:
* traffic key is computed based on tcp 4-tuple, initial sequence numbers and the secret key. * packet mac is computed based on traffic key and content of individual packets.
The traffic key could be cached for established sockets but it is not.
A single code path exists for ipv4/ipv6 and input/output. This keeps the code short but slightly slower due to lots of conditionals.
On output we read remote IP address from socket members on output, we can't use skb network header because it's computed after TCP options.
On input we read remote IP address from skb network headers, we can't use socket binding members because those are not available for SYN.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- net/ipv4/tcp_authopt.c | 510 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 510 insertions(+)
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 5455a9ecfe6b..1fd98d67ec10 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -425,5 +425,515 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr)); hlist_add_head_rcu(&key_info->node, &info->head);
return 0; } + +static int tcp_authopt_get_isn(struct sock *sk, + struct sk_buff *skb, + int input, + __be32 *sisn, + __be32 *disn) +{ + struct tcp_authopt_info *authopt_info; + struct tcphdr *th = tcp_hdr(skb); + + /* special cases for SYN and SYN/ACK */ + if (th->syn && !th->ack) { + *sisn = th->seq; + *disn = 0; + return 0; + } + if (th->syn && th->ack) { + *sisn = th->seq; + *disn = htonl(ntohl(th->ack_seq) - 1); + return 0; + } + + /* Fetching authopt_info like this should be safe because authopt_info + * is never released intil the socket is being closed + * + * tcp_timewait_sock is handled but not tcp_request_sock. + * for the synack case sk should be the listen socket. + */ + rcu_read_lock(); + if (unlikely(sk->sk_state == TCP_NEW_SYN_RECV)) { + /* should never happen, sk should be the listen socket */ + authopt_info = NULL; + WARN_ONCE(1, "TCP-AO can't sign with request sock\n"); + return -EINVAL; + } else if (sk->sk_state == TCP_LISTEN) { + /* Signature computation for non-syn packet on a listen + * socket is not possible because we lack the initial + * sequence numbers. + * + * Input segments that are not matched by any request, + * established or timewait socket will get here. These + * are not normally sent by peers. + * + * Their signature might be valid but we don't have + * enough state to determine that. TCP-MD5 can attempt + * to validate and reply with a signed RST because it + * doesn't care about ISNs. + * + * Reporting an error from signature code causes the + * packet to be discarded which is good. + */ + if (input) { + /* Assume this is an ACK to a SYN/ACK + * This will incorrectly report "failed + * signature" for segments without a connection. + */ + *sisn = htonl(ntohl(th->seq) - 1); + *disn = htonl(ntohl(th->ack_seq) - 1); + rcu_read_unlock(); + return 0; + } + /* This would be an internal bug. */ + authopt_info = NULL; + WARN_ONCE(1, "TCP-AO can't sign non-syn from TCP_LISTEN sock\n"); + return -EINVAL; + } else if (sk->sk_state == TCP_TIME_WAIT) { + authopt_info = tcp_twsk(sk)->tw_authopt_info; + } else { + authopt_info = rcu_dereference(tcp_sk(sk)->authopt_info); + } + if (!authopt_info) { + rcu_read_unlock(); + return -EINVAL; + } + /* Initial sequence numbers for ESTABLISHED connections from info */ + if (input) { + *sisn = htonl(authopt_info->dst_isn); + *disn = htonl(authopt_info->src_isn); + } else { + *sisn = htonl(authopt_info->src_isn); + *disn = htonl(authopt_info->dst_isn); + } + rcu_read_unlock(); + return 0; +} + +/* feed traffic key into shash */ +static int tcp_authopt_shash_traffic_key(struct shash_desc *desc, + struct sock *sk, + struct sk_buff *skb, + bool input, + bool ipv6) +{ + struct tcphdr *th = tcp_hdr(skb); + int err; + __be32 sisn, disn; + __be16 digestbits = htons(crypto_shash_digestsize(desc->tfm) * 8); + + // RFC5926 section 3.1.1.1 + err = crypto_shash_update(desc, "\x01TCP-AO", 7); + if (err) + return err; + + /* Addresses from packet on input and from sk_common on output + * This is because on output MAC is computed before prepending IP header + */ + if (input) { + if (ipv6) + err = crypto_shash_update(desc, (u8 *)&ipv6_hdr(skb)->saddr, 32); + else + err = crypto_shash_update(desc, (u8 *)&ip_hdr(skb)->saddr, 8); + if (err) + return err; + } else { + if (ipv6) { + err = crypto_shash_update(desc, (u8 *)&sk->sk_v6_rcv_saddr, 16); + if (err) + return err; + err = crypto_shash_update(desc, (u8 *)&sk->sk_v6_daddr, 16); + if (err) + return err; + } else { + err = crypto_shash_update(desc, (u8 *)&sk->sk_rcv_saddr, 4); + if (err) + return err; + err = crypto_shash_update(desc, (u8 *)&sk->sk_daddr, 4); + if (err) + return err; + } + } + + /* TCP ports from header */ + err = crypto_shash_update(desc, (u8 *)&th->source, 4); + if (err) + return err; + err = tcp_authopt_get_isn(sk, skb, input, &sisn, &disn); + if (err) + return err; + err = crypto_shash_update(desc, (u8 *)&sisn, 4); + if (err) + return err; + err = crypto_shash_update(desc, (u8 *)&disn, 4); + if (err) + return err; + err = crypto_shash_update(desc, (u8 *)&digestbits, 2); + if (err) + return err; + + return 0; +} + +/* Convert a variable-length key to a 16-byte fixed-length key for AES-CMAC + * This is described in RFC5926 section 3.1.1.2 + */ +static int aes_setkey_derived(struct crypto_shash *tfm, u8 *key, size_t keylen) +{ + static const u8 zeros[16] = {0}; + u8 derived_key[16]; + int err; + + if (WARN_ON_ONCE(crypto_shash_digestsize(tfm) != 16)) + return -EINVAL; + err = crypto_shash_setkey(tfm, zeros, sizeof(zeros)); + if (err) + return err; + err = crypto_shash_tfm_digest(tfm, key, keylen, derived_key); + if (err) + return err; + return crypto_shash_setkey(tfm, derived_key, sizeof(derived_key)); +} + +static int tcp_authopt_setkey(struct crypto_shash *tfm, struct tcp_authopt_key_info *key) +{ + if (key->alg_id == TCP_AUTHOPT_ALG_AES_128_CMAC_96 && key->keylen != 16) + return aes_setkey_derived(tfm, key->key, key->keylen); + else + return crypto_shash_setkey(tfm, key->key, key->keylen); +} + +static int tcp_authopt_get_traffic_key(struct sock *sk, + struct sk_buff *skb, + struct tcp_authopt_key_info *key, + bool input, + bool ipv6, + u8 *traffic_key) +{ + SHASH_DESC_ON_STACK(desc, kdf_tfm); + struct crypto_shash *kdf_tfm; + int err; + + kdf_tfm = tcp_authopt_get_kdf_shash(key); + if (IS_ERR(kdf_tfm)) + return PTR_ERR(kdf_tfm); + + err = tcp_authopt_setkey(kdf_tfm, key); + if (err) + goto out; + + desc->tfm = kdf_tfm; + err = crypto_shash_init(desc); + if (err) + goto out; + + err = tcp_authopt_shash_traffic_key(desc, sk, skb, input, ipv6); + if (err) + goto out; + + err = crypto_shash_final(desc, traffic_key); + if (err) + goto out; + +out: + tcp_authopt_put_kdf_shash(key, kdf_tfm); + return err; +} + +static int crypto_shash_update_zero(struct shash_desc *desc, int len) +{ + u8 zero = 0; + int i, err; + + for (i = 0; i < len; ++i) { + err = crypto_shash_update(desc, &zero, 1); + if (err) + return err; + } + + return 0; +} + +static int tcp_authopt_hash_tcp4_pseudoheader(struct shash_desc *desc, + __be32 saddr, + __be32 daddr, + int nbytes) +{ + struct tcp4_pseudohdr phdr = { + .saddr = saddr, + .daddr = daddr, + .pad = 0, + .protocol = IPPROTO_TCP, + .len = htons(nbytes) + }; + return crypto_shash_update(desc, (u8 *)&phdr, sizeof(phdr)); +} + +static int tcp_authopt_hash_tcp6_pseudoheader(struct shash_desc *desc, + struct in6_addr *saddr, + struct in6_addr *daddr, + u32 plen) +{ + int err; + __be32 buf[2]; + + buf[0] = htonl(plen); + buf[1] = htonl(IPPROTO_TCP); + + err = crypto_shash_update(desc, (u8 *)saddr, sizeof(*saddr)); + if (err) + return err; + err = crypto_shash_update(desc, (u8 *)daddr, sizeof(*daddr)); + if (err) + return err; + return crypto_shash_update(desc, (u8 *)&buf, sizeof(buf)); +} + +/* TCP authopt as found in header */ +struct tcphdr_authopt { + u8 num; + u8 len; + u8 keyid; + u8 rnextkeyid; + u8 mac[0]; +}; + +/* Find TCP_AUTHOPT in header. + * + * Returns pointer to TCP_AUTHOPT or NULL if not found. + */ +static u8 *tcp_authopt_find_option(struct tcphdr *th) +{ + int length = (th->doff << 2) - sizeof(*th); + u8 *ptr = (u8 *)(th + 1); + + while (length >= 2) { + int opcode = *ptr++; + int opsize; + + switch (opcode) { + case TCPOPT_EOL: + return NULL; + case TCPOPT_NOP: + length--; + continue; + default: + if (length < 2) + return NULL; + opsize = *ptr++; + if (opsize < 2) + return NULL; + if (opsize > length) + return NULL; + if (opcode == TCPOPT_AUTHOPT) + return ptr - 2; + } + ptr += opsize - 2; + length -= opsize; + } + return NULL; +} + +/** Hash tcphdr options. + * If include_options is false then only the TCPOPT_AUTHOPT option itself is hashed + * Maybe we could skip option parsing by assuming the AUTHOPT header is at hash_location-4? + */ +static int tcp_authopt_hash_opts(struct shash_desc *desc, + struct tcphdr *th, + bool include_options) +{ + int err; + /* start of options */ + u8 *tcp_opts = (u8 *)(th + 1); + /* end of options */ + u8 *tcp_data = ((u8 *)th) + th->doff * 4; + /* pointer to TCPOPT_AUTHOPT */ + u8 *authopt_ptr = tcp_authopt_find_option(th); + u8 authopt_len; + + if (!authopt_ptr) + return -EINVAL; + authopt_len = *(authopt_ptr + 1); + + if (include_options) { + err = crypto_shash_update(desc, tcp_opts, authopt_ptr - tcp_opts + 4); + if (err) + return err; + err = crypto_shash_update_zero(desc, authopt_len - 4); + if (err) + return err; + err = crypto_shash_update(desc, + authopt_ptr + authopt_len, + tcp_data - (authopt_ptr + authopt_len)); + if (err) + return err; + } else { + err = crypto_shash_update(desc, authopt_ptr, 4); + if (err) + return err; + err = crypto_shash_update_zero(desc, authopt_len - 4); + if (err) + return err; + } + + return 0; +} + +static int skb_shash_frags(struct shash_desc *desc, + struct sk_buff *skb) +{ + struct sk_buff *frag_iter; + int err, i; + + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + skb_frag_t *f = &skb_shinfo(skb)->frags[i]; + u32 p_off, p_len, copied; + struct page *p; + u8 *vaddr; + + skb_frag_foreach_page(f, skb_frag_off(f), skb_frag_size(f), + p, p_off, p_len, copied) { + vaddr = kmap_atomic(p); + err = crypto_shash_update(desc, vaddr + p_off, p_len); + kunmap_atomic(vaddr); + if (err) + return err; + } + } + + skb_walk_frags(skb, frag_iter) { + err = skb_shash_frags(desc, frag_iter); + if (err) + return err; + } + + return 0; +} + +static int tcp_authopt_hash_packet(struct crypto_shash *tfm, + struct sock *sk, + struct sk_buff *skb, + bool input, + bool ipv6, + bool include_options, + u8 *macbuf) +{ + struct tcphdr *th = tcp_hdr(skb); + SHASH_DESC_ON_STACK(desc, tfm); + int err; + + /* NOTE: SNE unimplemented */ + __be32 sne = 0; + + desc->tfm = tfm; + err = crypto_shash_init(desc); + if (err) + return err; + + err = crypto_shash_update(desc, (u8 *)&sne, 4); + if (err) + return err; + + if (ipv6) { + struct in6_addr *saddr; + struct in6_addr *daddr; + + if (input) { + saddr = &ipv6_hdr(skb)->saddr; + daddr = &ipv6_hdr(skb)->daddr; + } else { + saddr = &sk->sk_v6_rcv_saddr; + daddr = &sk->sk_v6_daddr; + } + err = tcp_authopt_hash_tcp6_pseudoheader(desc, saddr, daddr, skb->len); + if (err) + return err; + } else { + __be32 saddr; + __be32 daddr; + + if (input) { + saddr = ip_hdr(skb)->saddr; + daddr = ip_hdr(skb)->daddr; + } else { + saddr = sk->sk_rcv_saddr; + daddr = sk->sk_daddr; + } + err = tcp_authopt_hash_tcp4_pseudoheader(desc, saddr, daddr, skb->len); + if (err) + return err; + } + + // TCP header with checksum set to zero + { + struct tcphdr hashed_th = *th; + + hashed_th.check = 0; + err = crypto_shash_update(desc, (u8 *)&hashed_th, sizeof(hashed_th)); + if (err) + return err; + } + + // TCP options + err = tcp_authopt_hash_opts(desc, th, include_options); + if (err) + return err; + + // Rest of SKB->data + err = crypto_shash_update(desc, (u8 *)th + th->doff * 4, skb_headlen(skb) - th->doff * 4); + if (err) + return err; + + err = skb_shash_frags(desc, skb); + if (err) + return err; + + return crypto_shash_final(desc, macbuf); +} + +/* __tcp_authopt_calc_mac - Compute packet MAC using key + * + * The macbuf output buffer must be large enough to fit the digestsize of the + * underlying transform before truncation. + * This means TCP_AUTHOPT_MAXMACBUF, not TCP_AUTHOPT_MACLEN + */ +static int __tcp_authopt_calc_mac(struct sock *sk, + struct sk_buff *skb, + struct tcp_authopt_key_info *key, + bool input, + char *macbuf) +{ + struct crypto_shash *mac_tfm; + u8 traffic_key[TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN]; + int err; + bool ipv6 = (sk->sk_family != AF_INET); + + if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) + return -EINVAL; + + err = tcp_authopt_get_traffic_key(sk, skb, key, input, ipv6, traffic_key); + if (err) + return err; + + mac_tfm = tcp_authopt_get_mac_shash(key); + if (IS_ERR(mac_tfm)) + return PTR_ERR(mac_tfm); + err = crypto_shash_setkey(mac_tfm, traffic_key, key->alg->traffic_key_len); + if (err) + goto out; + + err = tcp_authopt_hash_packet(mac_tfm, + sk, + skb, + input, + ipv6, + !(key->flags & TCP_AUTHOPT_KEY_EXCLUDE_OPTS), + macbuf); + +out: + tcp_authopt_put_mac_shash(key, mac_tfm); + return err; +}
On 11/1/21 16:34, Leonard Crestez wrote: [..]
+static int skb_shash_frags(struct shash_desc *desc,
struct sk_buff *skb)
+{
- struct sk_buff *frag_iter;
- int err, i;
- for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_frag_t *f = &skb_shinfo(skb)->frags[i];
u32 p_off, p_len, copied;
struct page *p;
u8 *vaddr;
skb_frag_foreach_page(f, skb_frag_off(f), skb_frag_size(f),
p, p_off, p_len, copied) {
vaddr = kmap_atomic(p);
err = crypto_shash_update(desc, vaddr + p_off, p_len);
kunmap_atomic(vaddr);
if (err)
return err;
}
- }
- skb_walk_frags(skb, frag_iter) {
err = skb_shash_frags(desc, frag_iter);
if (err)
return err;
- }
- return 0;
+}
This seems quite sub-optimal: IIUC, shash should only be used for small amount of hashing. That's why tcp-md5 uses ahash with scatterlists. Which drives me to the question: why not reuse tcp_md5sig_pool code?
And it seems that you can avoid TCP_AUTHOPT_ALG_* enum and just supply to crypto the string from socket option (like xfrm does).
Here is my idea: https://lore.kernel.org/all/20211105014953.972946-6-dima@arista.com/T/#u
Thanks, Dmitry
On 11/5/21 3:53 AM, Dmitry Safonov wrote:
On 11/1/21 16:34, Leonard Crestez wrote: [..]
+static int skb_shash_frags(struct shash_desc *desc,
struct sk_buff *skb)
+{
- struct sk_buff *frag_iter;
- int err, i;
- for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
skb_frag_t *f = &skb_shinfo(skb)->frags[i];
u32 p_off, p_len, copied;
struct page *p;
u8 *vaddr;
skb_frag_foreach_page(f, skb_frag_off(f), skb_frag_size(f),
p, p_off, p_len, copied) {
vaddr = kmap_atomic(p);
err = crypto_shash_update(desc, vaddr + p_off, p_len);
kunmap_atomic(vaddr);
if (err)
return err;
}
- }
- skb_walk_frags(skb, frag_iter) {
err = skb_shash_frags(desc, frag_iter);
if (err)
return err;
- }
- return 0;
+}
This seems quite sub-optimal: IIUC, shash should only be used for small amount of hashing. That's why tcp-md5 uses ahash with scatterlists.
There is indeed no good reason to prefer shash over ahash. Despite the "async" in the name it's possible to use it in atomic context.
Which drives me to the question: why not reuse tcp_md5sig_pool code?
And it seems that you can avoid TCP_AUTHOPT_ALG_* enum and just supply to crypto the string from socket option (like xfrm does).
Here is my idea: https://lore.kernel.org/all/20211105014953.972946-6-dima@arista.com/T/#u
Making the md5 pool more generic and reusing it can work.
This "pool" mechanism is really just a workaround for the crypto API not supporting the allocation of a hash in softirq context. It would make a lot sense for this functionality to be part of the crypto layer itself.
Looking at your generic tcp_sig_crypto there is nothing actually specific to TCP in there: it's just an ahash and a scratch buffer per-cpu.
I don't understand the interest in using arbitrary crypto algorithms beyond RFC5926, this series is already complex enough. Other than increasing the complexity of crypto allocation there are various stack allocations which would need to be up to the maximum size of a TCP options.
-- Regards, Leonard
On 11/1/21 16:34, Leonard Crestez wrote: [..]
+/* Find TCP_AUTHOPT in header.
- Returns pointer to TCP_AUTHOPT or NULL if not found.
- */
+static u8 *tcp_authopt_find_option(struct tcphdr *th) +{
- int length = (th->doff << 2) - sizeof(*th);
- u8 *ptr = (u8 *)(th + 1);
- while (length >= 2) {
int opcode = *ptr++;
int opsize;
switch (opcode) {
case TCPOPT_EOL:
return NULL;
case TCPOPT_NOP:
length--;
continue;
default:
if (length < 2)
return NULL;
^ never true, as checked by the loop condition
opsize = *ptr++;
if (opsize < 2)
return NULL;
if (opsize > length)
return NULL;
if (opcode == TCPOPT_AUTHOPT)
return ptr - 2;
}
ptr += opsize - 2;
length -= opsize;
- }
- return NULL;
+}
Why copy'n'pasting tcp_parse_md5sig_option(), rather than adding a new argument to the function?
Thanks, Dmitry
On 11/5/21 4:08 AM, Dmitry Safonov wrote:
On 11/1/21 16:34, Leonard Crestez wrote: [..]
+/* Find TCP_AUTHOPT in header.
- Returns pointer to TCP_AUTHOPT or NULL if not found.
- */
+static u8 *tcp_authopt_find_option(struct tcphdr *th) +{
- int length = (th->doff << 2) - sizeof(*th);
- u8 *ptr = (u8 *)(th + 1);
- while (length >= 2) {
int opcode = *ptr++;
int opsize;
switch (opcode) {
case TCPOPT_EOL:
return NULL;
case TCPOPT_NOP:
length--;
continue;
default:
if (length < 2)
return NULL;
^ never true, as checked by the loop condition
opsize = *ptr++;
if (opsize < 2)
return NULL;
if (opsize > length)
return NULL;
if (opcode == TCPOPT_AUTHOPT)
return ptr - 2;
}
ptr += opsize - 2;
length -= opsize;
- }
- return NULL;
+}
Why copy'n'pasting tcp_parse_md5sig_option(), rather than adding a new argument to the function?
No good reason.
There is a requirement in RFC5925 that packets with both AO and MD5 signatures be dropped. This currently works but the implementation is convoluted: after an AO signature is found an error is returned if MD5 is also present.
A better solution would be to do a single scan for both options up front, for example in tcp_{v4,v6}_auth_inbound_check
-- Regards, Leonard
Extending these flags using the existing (1 << x) pattern triggers complaints from checkpatch.
Instead of ignoring checkpatch modify the existing values to use BIT(x) style in a separate commit.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- net/ipv4/tcp_output.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 6867e5db3e35..96f16386f50e 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -406,17 +406,17 @@ static void tcp_init_nondata_skb(struct sk_buff *skb, u32 seq, u8 flags) static inline bool tcp_urg_mode(const struct tcp_sock *tp) { return tp->snd_una != tp->snd_up; }
-#define OPTION_SACK_ADVERTISE (1 << 0) -#define OPTION_TS (1 << 1) -#define OPTION_MD5 (1 << 2) -#define OPTION_WSCALE (1 << 3) -#define OPTION_FAST_OPEN_COOKIE (1 << 8) -#define OPTION_SMC (1 << 9) -#define OPTION_MPTCP (1 << 10) +#define OPTION_SACK_ADVERTISE BIT(0) +#define OPTION_TS BIT(1) +#define OPTION_MD5 BIT(2) +#define OPTION_WSCALE BIT(3) +#define OPTION_FAST_OPEN_COOKIE BIT(8) +#define OPTION_SMC BIT(9) +#define OPTION_MPTCP BIT(10)
static void smc_options_write(__be32 *ptr, u16 *options) { #if IS_ENABLED(CONFIG_SMC) if (static_branch_unlikely(&tcp_have_smc)) {
On 11/1/21 10:34 AM, Leonard Crestez wrote:
Extending these flags using the existing (1 << x) pattern triggers complaints from checkpatch.
Instead of ignoring checkpatch modify the existing values to use BIT(x) style in a separate commit.
Signed-off-by: Leonard Crestez cdleonard@gmail.com
net/ipv4/tcp_output.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
This one could be sent outside of this patch set since you are not adding new values. Patch sets > 20 are generally frowned upon; sending this one separately helps get the number down.
On 11/3/21 4:31 AM, David Ahern wrote:
On 11/1/21 10:34 AM, Leonard Crestez wrote:
Extending these flags using the existing (1 << x) pattern triggers complaints from checkpatch.
Instead of ignoring checkpatch modify the existing values to use BIT(x) style in a separate commit.
Signed-off-by: Leonard Crestez cdleonard@gmail.com
net/ipv4/tcp_output.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
This one could be sent outside of this patch set since you are not adding new values. Patch sets > 20 are generally frowned upon; sending this one separately helps get the number down.
In the past I've seen maintainers pick small cleanups and fixes from longer series that otherwise need further discussion.
Not sure if this practice is also common for netdev so I posted this patch separately.
-- Regards, Leonard
The tcp_authopt features exposes a minimal interface to the rest of the TCP stack. Only a few functions are exposed and if the feature is disabled they return neutral values, avoiding ifdefs in the rest of the code.
Add calls into tcp authopt from send, receive and accept code.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 99 +++++++++++++ include/uapi/linux/snmp.h | 1 + net/ipv4/proc.c | 1 + net/ipv4/tcp_authopt.c | 288 ++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp_input.c | 17 +++ net/ipv4/tcp_ipv4.c | 20 ++- net/ipv4/tcp_minisocks.c | 12 ++ net/ipv4/tcp_output.c | 85 ++++++++++- net/ipv6/tcp_ipv6.c | 21 ++- 9 files changed, 538 insertions(+), 6 deletions(-)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 5217b6c7c900..8bb76128ed11 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -65,28 +65,127 @@ struct tcp_authopt_info { /** @dst_isn: Remote Initial Sequence Number */ u32 dst_isn; };
#ifdef CONFIG_TCP_AUTHOPT +DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed); + +void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info); void tcp_authopt_clear(struct sock *sk); int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen); int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key); int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen); +struct tcp_authopt_key_info *__tcp_authopt_select_key( + const struct sock *sk, + struct tcp_authopt_info *info, + const struct sock *addr_sk, + u8 *rnextkeyid); +static inline struct tcp_authopt_key_info *tcp_authopt_select_key( + const struct sock *sk, + const struct sock *addr_sk, + struct tcp_authopt_info **info, + u8 *rnextkeyid) +{ + if (static_branch_unlikely(&tcp_authopt_needed)) { + *info = rcu_dereference(tcp_sk(sk)->authopt_info); + + if (*info) + return __tcp_authopt_select_key(sk, *info, addr_sk, rnextkeyid); + } + return NULL; +} +int tcp_authopt_hash( + char *hash_location, + struct tcp_authopt_key_info *key, + struct tcp_authopt_info *info, + struct sock *sk, struct sk_buff *skb); +int __tcp_authopt_openreq(struct sock *newsk, const struct sock *oldsk, struct request_sock *req); +static inline int tcp_authopt_openreq( + struct sock *newsk, + const struct sock *oldsk, + struct request_sock *req) +{ + if (!rcu_dereference(tcp_sk(oldsk)->authopt_info)) + return 0; + else + return __tcp_authopt_openreq(newsk, oldsk, req); +} +static inline void tcp_authopt_time_wait( + struct tcp_timewait_sock *tcptw, + struct tcp_sock *tp) +{ + if (static_branch_unlikely(&tcp_authopt_needed)) { + /* Transfer ownership of authopt_info to the twsk + * This requires no other users of the origin sock. + */ + sock_owned_by_me((struct sock *)tp); + tcptw->tw_authopt_info = tp->authopt_info; + tp->authopt_info = NULL; + } else { + tcptw->tw_authopt_info = NULL; + } +} +int __tcp_authopt_inbound_check( + struct sock *sk, + struct sk_buff *skb, + struct tcp_authopt_info *info); +/** tcp_authopt_inbound_check - check for valid TCP-AO signature. + * + * Return negative ERRNO on error, 0 if not present and 1 if present and valid + * If both TCP-AO and MD5 signatures are found this is reported as an error. + */ +static inline int tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb) +{ + if (static_branch_unlikely(&tcp_authopt_needed)) { + struct tcp_authopt_info *info = rcu_dereference(tcp_sk(sk)->authopt_info); + + if (info) + return __tcp_authopt_inbound_check(sk, skb, info); + } + + return 0; +} #else static inline int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) { return -ENOPROTOOPT; } static inline int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key) { return -ENOPROTOOPT; } +static inline void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info) +{ +} static inline void tcp_authopt_clear(struct sock *sk) { } static inline int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) { return -ENOPROTOOPT; } +static inline int tcp_authopt_hash( + char *hash_location, + struct tcp_authopt_key_info *key, + struct tcp_authopt_key *info, + struct sock *sk, struct sk_buff *skb) +{ + return -EINVAL; +} +static inline int tcp_authopt_openreq(struct sock *newsk, + const struct sock *oldsk, + struct request_sock *req) +{ + return 0; +} +static inline void tcp_authopt_time_wait( + struct tcp_timewait_sock *tcptw, + struct tcp_sock *tp) +{ +} +static inline int tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb) +{ + return 0; +} #endif
#endif /* _LINUX_TCP_AUTHOPT_H */ diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index 904909d020e2..1d96030889a1 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -290,10 +290,11 @@ enum LINUX_MIB_TCPDUPLICATEDATAREHASH, /* TCPDuplicateDataRehash */ LINUX_MIB_TCPDSACKRECVSEGS, /* TCPDSACKRecvSegs */ LINUX_MIB_TCPDSACKIGNOREDDUBIOUS, /* TCPDSACKIgnoredDubious */ LINUX_MIB_TCPMIGRATEREQSUCCESS, /* TCPMigrateReqSuccess */ LINUX_MIB_TCPMIGRATEREQFAILURE, /* TCPMigrateReqFailure */ + LINUX_MIB_TCPAUTHOPTFAILURE, /* TCPAuthOptFailure */ __LINUX_MIB_MAX };
/* linux Xfrm mib definitions */ enum diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index f30273afb539..70f7a8a47045 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -295,10 +295,11 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM("TcpDuplicateDataRehash", LINUX_MIB_TCPDUPLICATEDATAREHASH), SNMP_MIB_ITEM("TCPDSACKRecvSegs", LINUX_MIB_TCPDSACKRECVSEGS), SNMP_MIB_ITEM("TCPDSACKIgnoredDubious", LINUX_MIB_TCPDSACKIGNOREDDUBIOUS), SNMP_MIB_ITEM("TCPMigrateReqSuccess", LINUX_MIB_TCPMIGRATEREQSUCCESS), SNMP_MIB_ITEM("TCPMigrateReqFailure", LINUX_MIB_TCPMIGRATEREQFAILURE), + SNMP_MIB_ITEM("TCPAuthOptFailure", LINUX_MIB_TCPAUTHOPTFAILURE), SNMP_MIB_SENTINEL };
static void icmpmsg_put_line(struct seq_file *seq, unsigned long *vals, unsigned short *type, int count) diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 1fd98d67ec10..5e80e5e5e36e 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -3,10 +3,14 @@ #include <linux/kernel.h> #include <net/tcp.h> #include <net/tcp_authopt.h> #include <crypto/hash.h>
+/* This is enabled when first struct tcp_authopt_info is allocated and never released */ +DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed); +EXPORT_SYMBOL(tcp_authopt_needed); + /* All current algorithms have a mac length of 12 but crypto API digestsize can be larger */ #define TCP_AUTHOPT_MAXMACBUF 20 #define TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN 20 #define TCP_AUTHOPT_MACLEN 12
@@ -190,10 +194,55 @@ static bool tcp_authopt_key_match_exact(struct tcp_authopt_key_info *info, return false;
return true; }
+static bool tcp_authopt_key_match_skb_addr(struct tcp_authopt_key_info *key, + struct sk_buff *skb) +{ + u16 keyaf = key->addr.ss_family; + struct iphdr *iph = (struct iphdr *)skb_network_header(skb); + + if (keyaf == AF_INET && iph->version == 4) { + struct sockaddr_in *key_addr = (struct sockaddr_in *)&key->addr; + + return iph->saddr == key_addr->sin_addr.s_addr; + } else if (keyaf == AF_INET6 && iph->version == 6) { + struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb); + struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr; + + return ipv6_addr_equal(&ip6h->saddr, &key_addr->sin6_addr); + } + + /* This actually happens with ipv6-mapped-ipv4-addresses + * IPv6 listen sockets will be asked to validate ipv4 packets. + */ + return false; +} + +static bool tcp_authopt_key_match_sk_addr(struct tcp_authopt_key_info *key, + const struct sock *addr_sk) +{ + u16 keyaf = key->addr.ss_family; + + /* This probably can't happen even with ipv4-mapped-ipv6 */ + if (keyaf != addr_sk->sk_family) + return false; + + if (keyaf == AF_INET) { + struct sockaddr_in *key_addr = (struct sockaddr_in *)&key->addr; + + return addr_sk->sk_daddr == key_addr->sin_addr.s_addr; + } else if (keyaf == AF_INET6) { + struct sockaddr_in6 *key_addr = (struct sockaddr_in6 *)&key->addr; + + return ipv6_addr_equal(&addr_sk->sk_v6_daddr, &key_addr->sin6_addr); + } + + return false; +} + static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct sock *sk, struct tcp_authopt_info *info, struct tcp_authopt_key *ukey) { struct tcp_authopt_key_info *key_info; @@ -203,10 +252,50 @@ static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct so return key_info;
return NULL; }
+static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct tcp_authopt_info *info, + const struct sock *addr_sk, + int send_id) +{ + struct tcp_authopt_key_info *result = NULL; + struct tcp_authopt_key_info *key; + + hlist_for_each_entry_rcu(key, &info->head, node, 0) { + if (send_id >= 0 && key->send_id != send_id) + continue; + if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND) + if (!tcp_authopt_key_match_sk_addr(key, addr_sk)) + continue; + if (result && net_ratelimit()) + pr_warn("ambiguous tcp authentication keys configured for send\n"); + result = key; + } + + return result; +} + +/** + * __tcp_authopt_select_key - select key for sending + * + * @sk: socket + * @info: socket's tcp_authopt_info + * @addr_sk: socket used for address lookup. Same as sk except for synack case + * @rnextkeyid: value of rnextkeyid caller should write in packet + * + * Result is protected by RCU and can't be stored, it may only be passed to + * tcp_authopt_hash and only under a single rcu_read_lock. + */ +struct tcp_authopt_key_info *__tcp_authopt_select_key(const struct sock *sk, + struct tcp_authopt_info *info, + const struct sock *addr_sk, + u8 *rnextkeyid) +{ + return tcp_authopt_lookup_send(info, addr_sk, -1); +} + static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_authopt_info *info;
@@ -216,10 +305,12 @@ static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk
info = kzalloc(sizeof(*info), GFP_KERNEL); if (!info) return ERR_PTR(-ENOMEM);
+ /* Never released: */ + static_branch_inc(&tcp_authopt_needed); sk_nocaps_add(sk, NETIF_F_GSO_MASK); INIT_HLIST_HEAD(&info->head); rcu_assign_pointer(tp->authopt_info, info);
return info; @@ -508,10 +599,65 @@ static int tcp_authopt_get_isn(struct sock *sk, } else { *sisn = htonl(authopt_info->src_isn); *disn = htonl(authopt_info->dst_isn); } rcu_read_unlock(); + + return 0; +} + +static int tcp_authopt_clone_keys(struct sock *newsk, + const struct sock *oldsk, + struct tcp_authopt_info *new_info, + struct tcp_authopt_info *old_info) +{ + struct tcp_authopt_key_info *old_key; + struct tcp_authopt_key_info *new_key; + + hlist_for_each_entry_rcu(old_key, &old_info->head, node, lockdep_sock_is_held(oldsk)) { + new_key = sock_kmalloc(newsk, sizeof(*new_key), GFP_ATOMIC); + if (!new_key) + return -ENOMEM; + memcpy(new_key, old_key, sizeof(*new_key)); + hlist_add_head_rcu(&new_key->node, &new_info->head); + } + + return 0; +} + +/** Called to create accepted sockets. + * + * Need to copy authopt info from listen socket. + */ +int __tcp_authopt_openreq(struct sock *newsk, const struct sock *oldsk, struct request_sock *req) +{ + struct tcp_authopt_info *old_info; + struct tcp_authopt_info *new_info; + int err; + + old_info = rcu_dereference(tcp_sk(oldsk)->authopt_info); + if (!old_info) + return 0; + + /* Clear value copies from oldsk: */ + rcu_assign_pointer(tcp_sk(newsk)->authopt_info, NULL); + + new_info = kzalloc(sizeof(*new_info), GFP_ATOMIC); + if (!new_info) + return -ENOMEM; + + new_info->src_isn = tcp_rsk(req)->snt_isn; + new_info->dst_isn = tcp_rsk(req)->rcv_isn; + INIT_HLIST_HEAD(&new_info->head); + err = tcp_authopt_clone_keys(newsk, oldsk, new_info, old_info); + if (err) { + tcp_authopt_free(newsk, new_info); + return err; + } + sk_nocaps_add(newsk, NETIF_F_GSO_MASK); + rcu_assign_pointer(tcp_sk(newsk)->authopt_info, new_info); + return 0; }
/* feed traffic key into shash */ static int tcp_authopt_shash_traffic_key(struct shash_desc *desc, @@ -814,10 +960,11 @@ static int skb_shash_frags(struct shash_desc *desc, }
static int tcp_authopt_hash_packet(struct crypto_shash *tfm, struct sock *sk, struct sk_buff *skb, + struct tcp_authopt_info *info, bool input, bool ipv6, bool include_options, u8 *macbuf) { @@ -901,10 +1048,11 @@ static int tcp_authopt_hash_packet(struct crypto_shash *tfm, * This means TCP_AUTHOPT_MAXMACBUF, not TCP_AUTHOPT_MACLEN */ static int __tcp_authopt_calc_mac(struct sock *sk, struct sk_buff *skb, struct tcp_authopt_key_info *key, + struct tcp_authopt_info *info, bool input, char *macbuf) { struct crypto_shash *mac_tfm; u8 traffic_key[TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN]; @@ -926,14 +1074,154 @@ static int __tcp_authopt_calc_mac(struct sock *sk, goto out;
err = tcp_authopt_hash_packet(mac_tfm, sk, skb, + info, input, ipv6, !(key->flags & TCP_AUTHOPT_KEY_EXCLUDE_OPTS), macbuf);
out: tcp_authopt_put_mac_shash(key, mac_tfm); return err; } + +/* tcp_authopt_hash - fill in the mac + * + * The key must come from tcp_authopt_select_key. + */ +int tcp_authopt_hash(char *hash_location, + struct tcp_authopt_key_info *key, + struct tcp_authopt_info *info, + struct sock *sk, + struct sk_buff *skb) +{ + /* MAC inside option is truncated to 12 bytes but crypto API needs output + * buffer to be large enough so we use a buffer on the stack. + */ + u8 macbuf[TCP_AUTHOPT_MAXMACBUF]; + int err; + + err = __tcp_authopt_calc_mac(sk, skb, key, info, false, macbuf); + if (err) + goto fail; + memcpy(hash_location, macbuf, TCP_AUTHOPT_MACLEN); + + return 0; + +fail: + /* If mac calculation fails and caller doesn't handle the error + * try to make it obvious inside the packet. + */ + memset(hash_location, 0, TCP_AUTHOPT_MACLEN); + return err; +} + +static struct tcp_authopt_key_info *tcp_authopt_lookup_recv(struct sock *sk, + struct sk_buff *skb, + struct tcp_authopt_info *info, + int recv_id) +{ + struct tcp_authopt_key_info *result = NULL; + struct tcp_authopt_key_info *key; + + /* multiple matches will cause occasional failures */ + hlist_for_each_entry_rcu(key, &info->head, node, 0) { + if (recv_id >= 0 && key->recv_id != recv_id) + continue; + if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND && + !tcp_authopt_key_match_skb_addr(key, skb)) + continue; + if (result && net_ratelimit()) + pr_warn("ambiguous tcp authentication keys configured for receive\n"); + result = key; + } + + return result; +} + +/* Show a rate-limited message for authentication fail */ +static void print_tcpao_notice(const char *msg, struct sk_buff *skb) +{ + struct iphdr *iph = (struct iphdr *)skb_network_header(skb); + struct tcphdr *th = (struct tcphdr *)skb_transport_header(skb); + + if (iph->version == 4) { + net_info_ratelimited("%s (%pI4, %d)->(%pI4, %d)\n", msg, + &iph->saddr, ntohs(th->source), + &iph->daddr, ntohs(th->dest)); + } else if (iph->version == 6) { + struct ipv6hdr *ip6h = (struct ipv6hdr *)skb_network_header(skb); + + net_info_ratelimited("%s (%pI6, %d)->(%pI6, %d)\n", msg, + &ip6h->saddr, ntohs(th->source), + &ip6h->daddr, ntohs(th->dest)); + } else { + WARN_ONCE(1, "%s unknown IP version\n", msg); + } +} + +int __tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb, struct tcp_authopt_info *info) +{ + struct tcphdr *th = (struct tcphdr *)skb_transport_header(skb); + struct tcphdr_authopt *opt; + struct tcp_authopt_key_info *key; + u8 macbuf[TCP_AUTHOPT_MAXMACBUF]; + int err; + + opt = (struct tcphdr_authopt *)tcp_authopt_find_option(th); +#ifdef CONFIG_TCP_MD5SIG + /* RFC5925 2.2: An endpoint MUST NOT use TCP-AO for the same connection + * in which TCP MD5 is used. When both options appear, TCP MUST silently + * discard the segment. + */ + if (opt && tcp_parse_md5sig_option(th)) { + print_tcpao_notice("TCP AO and MD5 both present on same packet: discarded", skb); + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); + return -EINVAL; + } +#endif + key = tcp_authopt_lookup_recv(sk, skb, info, opt ? opt->keyid : -1); + + /* nothing found or expected */ + if (!opt && !key) + return 0; + if (!opt && key) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); + print_tcpao_notice("TCP Authentication Missing", skb); + return -EINVAL; + } + if (opt && !key) { + /* RFC5925 Section 7.3: + * A TCP-AO implementation MUST allow for configuration of the behavior + * of segments with TCP-AO but that do not match an MKT. The initial + * default of this configuration SHOULD be to silently accept such + * connections. + */ + if (info->flags & TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); + print_tcpao_notice("TCP Authentication Unexpected: Rejected", skb); + return -EINVAL; + } + print_tcpao_notice("TCP Authentication Unexpected: Accepted", skb); + return 0; + } + + /* bad inbound key len */ + if (opt->len != TCPOLEN_AUTHOPT_OUTPUT) + return -EINVAL; + + err = __tcp_authopt_calc_mac(sk, skb, key, info, true, macbuf); + if (err) + return err; + + if (memcmp(macbuf, opt->mac, TCP_AUTHOPT_MACLEN)) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); + print_tcpao_notice("TCP Authentication Failed", skb); + return -EINVAL; + } + + return 1; +} +EXPORT_SYMBOL(__tcp_authopt_inbound_check); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 246ab7b5e857..5dcde6e74bfc 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -70,10 +70,11 @@ #include <linux/sysctl.h> #include <linux/kernel.h> #include <linux/prefetch.h> #include <net/dst.h> #include <net/tcp.h> +#include <net/tcp_authopt.h> #include <net/inet_common.h> #include <linux/ipsec.h> #include <asm/unaligned.h> #include <linux/errqueue.h> #include <trace/events/tcp.h> @@ -5984,18 +5985,34 @@ void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb) if (!icsk->icsk_ca_initialized) tcp_init_congestion_control(sk); tcp_init_buffer_space(sk); }
+static void tcp_authopt_finish_connect(struct sock *sk, struct sk_buff *skb) +{ +#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_info *info; + + info = rcu_dereference_protected(tcp_sk(sk)->authopt_info, lockdep_sock_is_held(sk)); + if (!info) + return; + + info->src_isn = ntohl(tcp_hdr(skb)->ack_seq) - 1; + info->dst_isn = ntohl(tcp_hdr(skb)->seq); +#endif +} + void tcp_finish_connect(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); struct inet_connection_sock *icsk = inet_csk(sk);
tcp_set_state(sk, TCP_ESTABLISHED); icsk->icsk_ack.lrcvtime = tcp_jiffies32;
+ tcp_authopt_finish_connect(sk, skb); + if (skb) { icsk->icsk_af_ops->sk_rx_dst_set(sk, skb); security_inet_conn_established(sk, skb); sk_mark_napi_id(sk, skb); } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 0c9f050fa0e8..da43567c3753 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1955,10 +1955,26 @@ static void tcp_v4_fill_cb(struct sk_buff *skb, const struct iphdr *iph, TCP_SKB_CB(skb)->sacked = 0; TCP_SKB_CB(skb)->has_rxtstamp = skb->tstamp || skb_hwtstamps(skb)->hwtstamp; }
+static int tcp_v4_auth_inbound_check(struct sock *sk, + struct sk_buff *skb, + int dif, + int sdif) +{ + int aoret; + + aoret = tcp_authopt_inbound_check(sk, skb); + if (aoret < 0) + return aoret; + if (aoret > 0) + return 0; + + return tcp_v4_inbound_md5_hash(sk, skb, dif, sdif); +} + /* * From tcp_input.c */
int tcp_v4_rcv(struct sk_buff *skb) @@ -2012,11 +2028,11 @@ int tcp_v4_rcv(struct sk_buff *skb) struct request_sock *req = inet_reqsk(sk); bool req_stolen = false; struct sock *nsk;
sk = req->rsk_listener; - if (unlikely(tcp_v4_inbound_md5_hash(sk, skb, dif, sdif))) { + if (unlikely(tcp_v4_auth_inbound_check(sk, skb, dif, sdif))) { sk_drops_add(sk, skb); reqsk_put(req); goto discard_it; } if (tcp_checksum_complete(skb)) { @@ -2082,11 +2098,11 @@ int tcp_v4_rcv(struct sk_buff *skb) }
if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse;
- if (tcp_v4_inbound_md5_hash(sk, skb, dif, sdif)) + if (tcp_v4_auth_inbound_check(sk, skb, dif, sdif)) goto discard_and_relse;
nf_reset_ct(skb);
if (tcp_filter(sk, skb)) diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index cf913a66df17..d4828cf3d6d1 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -18,10 +18,11 @@ * Arnt Gulbrandsen, agulbra@nvg.unit.no * Jorge Cwik, jorge@laser.satlink.net */
#include <net/tcp.h> +#include <net/tcp_authopt.h> #include <net/xfrm.h> #include <net/busy_poll.h>
static bool tcp_in_window(u32 seq, u32 end_seq, u32 s_win, u32 e_win) { @@ -300,10 +301,11 @@ void tcp_time_wait(struct sock *sk, int state, int timeo) BUG_ON(tcptw->tw_md5_key && !tcp_alloc_md5sig_pool()); } } } while (0); #endif + tcp_authopt_time_wait(tcptw, tcp_sk(sk));
/* Get the TIME_WAIT timeout firing. */ if (timeo < rto) timeo = rto;
@@ -342,10 +344,19 @@ void tcp_twsk_destructor(struct sock *sk)
if (twsk->tw_md5_key) kfree_rcu(twsk->tw_md5_key, rcu); } #endif +#ifdef CONFIG_TCP_AUTHOPT + if (static_branch_unlikely(&tcp_authopt_needed)) { + struct tcp_timewait_sock *twsk = tcp_twsk(sk); + + /* twsk only contains sock_common so pass NULL as sk. */ + if (twsk->tw_authopt_info) + tcp_authopt_free(NULL, twsk->tw_authopt_info); + } +#endif } EXPORT_SYMBOL_GPL(tcp_twsk_destructor);
/* Warning : This function is called without sk_listener being locked. * Be sure to read socket fields once, as their value could change under us. @@ -532,10 +543,11 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, #ifdef CONFIG_TCP_MD5SIG newtp->md5sig_info = NULL; /*XXX*/ if (newtp->af_specific->md5_lookup(sk, newsk)) newtp->tcp_header_len += TCPOLEN_MD5SIG_ALIGNED; #endif + tcp_authopt_openreq(newsk, sk, req); if (skb->len >= TCP_MSS_DEFAULT + newtp->tcp_header_len) newicsk->icsk_ack.last_seg_size = skb->len - newtp->tcp_header_len; newtp->rx_opt.mss_clamp = req->mss; tcp_ecn_openreq_child(newtp, req); newtp->fastopen_req = NULL; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 96f16386f50e..1e5acc5a38cf 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -37,10 +37,11 @@
#define pr_fmt(fmt) "TCP: " fmt
#include <net/tcp.h> #include <net/mptcp.h> +#include <net/tcp_authopt.h>
#include <linux/compiler.h> #include <linux/gfp.h> #include <linux/module.h> #include <linux/static_key.h> @@ -410,10 +411,11 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
#define OPTION_SACK_ADVERTISE BIT(0) #define OPTION_TS BIT(1) #define OPTION_MD5 BIT(2) #define OPTION_WSCALE BIT(3) +#define OPTION_AUTHOPT BIT(4) #define OPTION_FAST_OPEN_COOKIE BIT(8) #define OPTION_SMC BIT(9) #define OPTION_MPTCP BIT(10)
static void smc_options_write(__be32 *ptr, u16 *options) @@ -434,16 +436,22 @@ static void smc_options_write(__be32 *ptr, u16 *options) struct tcp_out_options { u16 options; /* bit field of OPTION_* */ u16 mss; /* 0 to disable */ u8 ws; /* window scale, 0 to disable */ u8 num_sack_blocks; /* number of SACK blocks to include */ - u8 hash_size; /* bytes in hash_location */ u8 bpf_opt_len; /* length of BPF hdr option */ +#ifdef CONFIG_TCP_AUTHOPT + u8 authopt_rnextkeyid; /* rnextkey */ +#endif __u8 *hash_location; /* temporary pointer, overloaded */ __u32 tsval, tsecr; /* need to include OPTION_TS */ struct tcp_fastopen_cookie *fastopen_cookie; /* Fast open cookie */ struct mptcp_out_options mptcp; +#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_info *authopt_info; + struct tcp_authopt_key_info *authopt_key; +#endif };
static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp, struct tcp_out_options *opts) { @@ -616,10 +624,25 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, /* overload cookie hash location */ opts->hash_location = (__u8 *)ptr; ptr += 4; }
+#ifdef CONFIG_TCP_AUTHOPT + if (unlikely(OPTION_AUTHOPT & options)) { + struct tcp_authopt_key_info *key = opts->authopt_key; + + WARN_ON(!key); + *ptr = htonl((TCPOPT_AUTHOPT << 24) | + (TCPOLEN_AUTHOPT_OUTPUT << 16) | + (key->send_id << 8) | + opts->authopt_rnextkeyid); + /* overload cookie hash location */ + opts->hash_location = (__u8 *)(ptr + 1); + ptr += TCPOLEN_AUTHOPT_OUTPUT / 4; + } +#endif + if (unlikely(opts->mss)) { *ptr++ = htonl((TCPOPT_MSS << 24) | (TCPOLEN_MSS << 16) | opts->mss); } @@ -751,10 +774,28 @@ static void mptcp_set_option_cond(const struct request_sock *req, } } } }
+static int tcp_authopt_init_options(const struct sock *sk, + const struct sock *addr_sk, + struct tcp_out_options *opts) +{ +#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_key_info *key; + + key = tcp_authopt_select_key(sk, addr_sk, &opts->authopt_info, &opts->authopt_rnextkeyid); + if (key) { + opts->options |= OPTION_AUTHOPT; + opts->authopt_key = key; + return TCPOLEN_AUTHOPT_OUTPUT; + } +#endif + + return 0; +} + /* Compute TCP options for SYN packets. This is not the final * network wire format yet. */ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, struct tcp_out_options *opts, @@ -763,12 +804,15 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, struct tcp_sock *tp = tcp_sk(sk); unsigned int remaining = MAX_TCP_OPTION_SPACE; struct tcp_fastopen_request *fastopen = tp->fastopen_req;
*md5 = NULL; + + remaining -= tcp_authopt_init_options(sk, sk, opts); #ifdef CONFIG_TCP_MD5SIG if (static_branch_unlikely(&tcp_md5_needed) && + !(opts->options & OPTION_AUTHOPT) && rcu_access_pointer(tp->md5sig_info)) { *md5 = tp->af_specific->md5_lookup(sk, sk); if (*md5) { opts->options |= OPTION_MD5; remaining -= TCPOLEN_MD5SIG_ALIGNED; @@ -847,12 +891,13 @@ static unsigned int tcp_synack_options(const struct sock *sk, struct sk_buff *syn_skb) { struct inet_request_sock *ireq = inet_rsk(req); unsigned int remaining = MAX_TCP_OPTION_SPACE;
+ remaining -= tcp_authopt_init_options(sk, req_to_sk(req), opts); #ifdef CONFIG_TCP_MD5SIG - if (md5) { + if (md5 && !(opts->options & OPTION_AUTHOPT)) { opts->options |= OPTION_MD5; remaining -= TCPOLEN_MD5SIG_ALIGNED;
/* We can't fit any SACK blocks in a packet with MD5 + TS * options. There was discussion about disabling SACK @@ -918,13 +963,15 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb unsigned int size = 0; unsigned int eff_sacks;
opts->options = 0;
+ size += tcp_authopt_init_options(sk, sk, opts); *md5 = NULL; #ifdef CONFIG_TCP_MD5SIG if (static_branch_unlikely(&tcp_md5_needed) && + !(opts->options & OPTION_AUTHOPT) && rcu_access_pointer(tp->md5sig_info)) { *md5 = tp->af_specific->md5_lookup(sk, sk); if (*md5) { opts->options |= OPTION_MD5; size += TCPOLEN_MD5SIG_ALIGNED; @@ -1274,10 +1321,14 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
inet = inet_sk(sk); tcb = TCP_SKB_CB(skb); memset(&opts, 0, sizeof(opts));
+#ifdef CONFIG_TCP_AUTHOPT + /* for tcp_authopt_init_options inside tcp_syn_options or tcp_established_options */ + rcu_read_lock(); +#endif if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) { tcp_options_size = tcp_syn_options(sk, skb, &opts, &md5); } else { tcp_options_size = tcp_established_options(sk, skb, &opts, &md5); @@ -1362,10 +1413,17 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, sk_nocaps_add(sk, NETIF_F_GSO_MASK); tp->af_specific->calc_md5_hash(opts.hash_location, md5, sk, skb); } #endif +#ifdef CONFIG_TCP_AUTHOPT + if (opts.authopt_key) { + sk_nocaps_add(sk, NETIF_F_GSO_MASK); + tcp_authopt_hash(opts.hash_location, opts.authopt_key, opts.authopt_info, sk, skb); + } + rcu_read_unlock(); +#endif
/* BPF prog is the last one writing header option */ bpf_skops_write_hdr_opt(sk, skb, NULL, NULL, 0, &opts);
INDIRECT_CALL_INET(icsk->icsk_af_ops->send_check, @@ -1830,12 +1888,21 @@ unsigned int tcp_current_mss(struct sock *sk) u32 mtu = dst_mtu(dst); if (mtu != inet_csk(sk)->icsk_pmtu_cookie) mss_now = tcp_sync_mss(sk, mtu); }
+#ifdef CONFIG_TCP_AUTHOPT + /* Even if the result is not used rcu_read_lock is required when scanning for + * tcp authentication keys. Otherwise lockdep will complain. + */ + rcu_read_lock(); +#endif header_len = tcp_established_options(sk, NULL, &opts, &md5) + sizeof(struct tcphdr); +#ifdef CONFIG_TCP_AUTHOPT + rcu_read_unlock(); +#endif /* The mss_cache is sized based on tp->tcp_header_len, which assumes * some common options. If this is an odd packet (because we have SACK * blocks etc) then our calculated header_len will be different, and * we have to adjust mss_now correspondingly */ if (header_len != tp->tcp_header_len) { @@ -3548,10 +3615,14 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, }
#ifdef CONFIG_TCP_MD5SIG rcu_read_lock(); md5 = tcp_rsk(req)->af_specific->req_md5_lookup(sk, req_to_sk(req)); +#endif +#ifdef CONFIG_TCP_AUTHOPT + /* for tcp_authopt_init_options inside tcp_synack_options */ + rcu_read_lock(); #endif skb_set_hash(skb, tcp_rsk(req)->txhash, PKT_HASH_TYPE_L4); /* bpf program will be interested in the tcp_flags */ TCP_SKB_CB(skb)->tcp_flags = TCPHDR_SYN | TCPHDR_ACK; tcp_header_size = tcp_synack_options(sk, req, mss, skb, &opts, md5, @@ -3585,10 +3656,20 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, if (md5) tcp_rsk(req)->af_specific->calc_md5_hash(opts.hash_location, md5, req_to_sk(req), skb); rcu_read_unlock(); #endif +#ifdef CONFIG_TCP_AUTHOPT + /* If signature fails we do nothing */ + if (opts.authopt_key) + tcp_authopt_hash(opts.hash_location, + opts.authopt_key, + opts.authopt_info, + req_to_sk(req), + skb); + rcu_read_unlock(); +#endif
bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb, synack_type, &opts);
skb->skb_mstamp_ns = now; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 2cc9b0e53ad1..96a29caf56c7 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -40,10 +40,11 @@ #include <linux/icmpv6.h> #include <linux/random.h> #include <linux/indirect_call_wrapper.h>
#include <net/tcp.h> +#include <net/tcp_authopt.h> #include <net/ndisc.h> #include <net/inet6_hashtables.h> #include <net/inet6_connection_sock.h> #include <net/ipv6.h> #include <net/transp_v6.h> @@ -1619,10 +1620,26 @@ static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr, TCP_SKB_CB(skb)->sacked = 0; TCP_SKB_CB(skb)->has_rxtstamp = skb->tstamp || skb_hwtstamps(skb)->hwtstamp; }
+static int tcp_v6_auth_inbound_check(struct sock *sk, + struct sk_buff *skb, + int dif, + int sdif) +{ + int aoret; + + aoret = tcp_authopt_inbound_check(sk, skb); + if (aoret < 0) + return aoret; + if (aoret > 0) + return 0; + + return tcp_v6_inbound_md5_hash(sk, skb, dif, sdif); +} + INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) { int sdif = inet6_sdif(skb); int dif = inet6_iif(skb); const struct tcphdr *th; @@ -1671,11 +1688,11 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) struct request_sock *req = inet_reqsk(sk); bool req_stolen = false; struct sock *nsk;
sk = req->rsk_listener; - if (tcp_v6_inbound_md5_hash(sk, skb, dif, sdif)) { + if (tcp_v6_auth_inbound_check(sk, skb, dif, sdif)) { sk_drops_add(sk, skb); reqsk_put(req); goto discard_it; } if (tcp_checksum_complete(skb)) { @@ -1738,11 +1755,11 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) }
if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse;
- if (tcp_v6_inbound_md5_hash(sk, skb, dif, sdif)) + if (tcp_v6_auth_inbound_check(sk, skb, dif, sdif)) goto discard_and_relse;
if (tcp_filter(sk, skb)) goto discard_and_relse; th = (const struct tcphdr *)skb->data;
This is mainly intended to protect against local privilege escalations through a rarely used feature so it is deliberately not namespaced.
Enforcement is only at the setsockopt level, this should be enough to ensure that the tcp_authopt_needed static key never turns on.
No effort is made to handle disabling when the feature is already in use.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- Documentation/networking/ip-sysctl.rst | 6 ++++++ include/net/tcp_authopt.h | 1 + net/ipv4/sysctl_net_ipv4.c | 10 ++++++++++ net/ipv4/tcp_authopt.c | 13 ++++++++++++- 4 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 16b8bf72feaf..3f00681f73d7 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -987,10 +987,16 @@ tcp_limit_output_bytes - INTEGER tcp_challenge_ack_limit - INTEGER Limits number of Challenge ACK sent per second, as recommended in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks) Default: 1000
+tcp_authopt - BOOLEAN + Enable the TCP Authentication Option (RFC5925), a replacement for TCP + MD5 Signatures (RFC2835). + + Default: 0 + UDP variables =============
udp_l3mdev_accept - BOOLEAN Enabling this option allows a "global" bound socket to work diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 8bb76128ed11..a505db1dd67b 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -65,10 +65,11 @@ struct tcp_authopt_info { /** @dst_isn: Remote Initial Sequence Number */ u32 dst_isn; };
#ifdef CONFIG_TCP_AUTHOPT +extern int sysctl_tcp_authopt; DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed);
void tcp_authopt_free(struct sock *sk, struct tcp_authopt_info *info); void tcp_authopt_clear(struct sock *sk); int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen); diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 97eb54774924..cc34de6e4817 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -17,10 +17,11 @@ #include <net/udp.h> #include <net/cipso_ipv4.h> #include <net/ping.h> #include <net/protocol.h> #include <net/netevent.h> +#include <net/tcp_authopt.h>
static int two = 2; static int three __maybe_unused = 3; static int four = 4; static int thousand = 1000; @@ -583,10 +584,19 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_douintvec_minmax, .extra1 = &sysctl_fib_sync_mem_min, .extra2 = &sysctl_fib_sync_mem_max, }, +#ifdef CONFIG_TCP_AUTHOPT + { + .procname = "tcp_authopt", + .data = &sysctl_tcp_authopt, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, +#endif { } };
static struct ctl_table ipv4_net_table[] = { { diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 5e80e5e5e36e..7c49dcce7d24 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -3,10 +3,15 @@ #include <linux/kernel.h> #include <net/tcp.h> #include <net/tcp_authopt.h> #include <crypto/hash.h>
+/* This is mainly intended to protect against local privilege escalations through + * a rarely used feature so it is deliberately not namespaced. + */ +int sysctl_tcp_authopt; + /* This is enabled when first struct tcp_authopt_info is allocated and never released */ DEFINE_STATIC_KEY_FALSE(tcp_authopt_needed); EXPORT_SYMBOL(tcp_authopt_needed);
/* All current algorithms have a mac length of 12 but crypto API digestsize can be larger */ @@ -360,10 +365,12 @@ int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) struct tcp_authopt opt; struct tcp_authopt_info *info; int err;
sock_owned_by_me(sk); + if (!sysctl_tcp_authopt) + return -EPERM;
err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); if (err) return err;
@@ -382,13 +389,15 @@ int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_authopt_info *info;
+ memset(opt, 0, sizeof(*opt)); sock_owned_by_me(sk); + if (!sysctl_tcp_authopt) + return -EPERM;
- memset(opt, 0, sizeof(*opt)); info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); if (!info) return -ENOENT;
opt->flags = info->flags & TCP_AUTHOPT_KNOWN_FLAGS; @@ -451,10 +460,12 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) struct tcp_authopt_key_info *key_info, *old_key_info; struct tcp_authopt_alg_imp *alg; int err;
sock_owned_by_me(sk); + if (!sysctl_tcp_authopt) + return -EPERM;
err = _copy_from_sockptr_tolerant((u8 *)&opt, sizeof(opt), optval, optlen); if (err) return err;
On 11/1/21 10:34 AM, Leonard Crestez wrote:
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 97eb54774924..cc34de6e4817 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -17,10 +17,11 @@ #include <net/udp.h> #include <net/cipso_ipv4.h> #include <net/ping.h> #include <net/protocol.h> #include <net/netevent.h> +#include <net/tcp_authopt.h> static int two = 2; static int three __maybe_unused = 3; static int four = 4; static int thousand = 1000; @@ -583,10 +584,19 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_douintvec_minmax, .extra1 = &sysctl_fib_sync_mem_min, .extra2 = &sysctl_fib_sync_mem_max, }, +#ifdef CONFIG_TCP_AUTHOPT
- {
.procname = "tcp_authopt",
.data = &sysctl_tcp_authopt,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
Just add it to the namespace set, and this could be a u8 (try to plug a hole if possible) with min/max specified:
.maxlen = sizeof(u8), .mode = 0644, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE
see icmp_echo_enable_probe as an example. And if you are not going to clean up when toggled off, you need a handler that tells the user it can not be disabled by erroring out on attempts to disable it.
On 11/3/21 4:39 AM, David Ahern wrote:
On 11/1/21 10:34 AM, Leonard Crestez wrote:
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 97eb54774924..cc34de6e4817 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -17,10 +17,11 @@ #include <net/udp.h> #include <net/cipso_ipv4.h> #include <net/ping.h> #include <net/protocol.h> #include <net/netevent.h> +#include <net/tcp_authopt.h> static int two = 2; static int three __maybe_unused = 3; static int four = 4; static int thousand = 1000; @@ -583,10 +584,19 @@ static struct ctl_table ipv4_table[] = { .mode = 0644, .proc_handler = proc_douintvec_minmax, .extra1 = &sysctl_fib_sync_mem_min, .extra2 = &sysctl_fib_sync_mem_max, }, +#ifdef CONFIG_TCP_AUTHOPT
- {
.procname = "tcp_authopt",
.data = &sysctl_tcp_authopt,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = proc_dointvec,
Just add it to the namespace set, and this could be a u8 (try to plug a hole if possible) with min/max specified:
.maxlen = sizeof(u8), .mode = 0644, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE
see icmp_echo_enable_probe as an example. And if you are not going to clean up when toggled off, you need a handler that tells the user it can not be disabled by erroring out on attempts to disable it.
This is deliberately per-system because the goal is to avoid possible local privilege escalations by reducing the attack surface. Even the smallest flaw could be exploited by a malicious application establishing an authenticated connection on loopback.
Applications running in containers frequently have full access to sysctls so making this per-namespace would defeat the original purpose. I can't think of any reason to prevent using this feature at the namespace level, it has no interesting effects outside TCP connections for which it is enabled.
I also believe that as similar sysctl would be useful for TCP-MD5.
You're right about adding additional prints.
-- Regards, Leonard
On 11/1/21 16:34, Leonard Crestez wrote:
This is mainly intended to protect against local privilege escalations through a rarely used feature so it is deliberately not namespaced.
Enforcement is only at the setsockopt level, this should be enough to ensure that the tcp_authopt_needed static key never turns on.
No effort is made to handle disabling when the feature is already in use.
Signed-off-by: Leonard Crestez cdleonard@gmail.com
[..]
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 5e80e5e5e36e..7c49dcce7d24 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -3,10 +3,15 @@ #include <linux/kernel.h> #include <net/tcp.h> #include <net/tcp_authopt.h> #include <crypto/hash.h> +/* This is mainly intended to protect against local privilege escalations through
- a rarely used feature so it is deliberately not namespaced.
- */
+int sysctl_tcp_authopt;
Could you add pr_warn_once() for setsockopt() without this set, so that it's visible in dmesg for a user that gets -EPERM.
Thanks, Dmitry
By default TCP-AO keys apply to all possible peers but it's possible to have different keys for different remote hosts.
This patch adds initial tests for the behavior behind the TCP_AUTHOPT_KEY_BIND_ADDR flag. Server rejection is tested via client timeout so this can be slightly slow.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- .../tcp_authopt_test/netns_fixture.py | 85 ++++++++++ .../tcp_authopt/tcp_authopt_test/server.py | 124 ++++++++++++++ .../tcp_authopt/tcp_authopt_test/test_bind.py | 155 ++++++++++++++++++ .../tcp_authopt/tcp_authopt_test/utils.py | 114 +++++++++++++ 4 files changed, 478 insertions(+) create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/netns_fixture.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/server.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_bind.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/utils.py
diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/netns_fixture.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/netns_fixture.py new file mode 100644 index 000000000000..9c10fafd8694 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/netns_fixture.py @@ -0,0 +1,85 @@ +# SPDX-License-Identifier: GPL-2.0 +import socket +import subprocess +from ipaddress import IPv4Address, IPv6Address + +from .conftest import raise_skip_no_netns + + +class NamespaceFixture: + """Create a pair of namespaces connected by one veth pair + + Each end of the pair has multiple addresses but everything is in the same subnet + """ + + server_netns_name = "tcp_authopt_test_server" + client_netns_name = "tcp_authopt_test_client" + + @classmethod + def get_ipv4_addr(cls, ns=1, index=1) -> IPv4Address: + return IPv4Address("10.10.0.0") + (ns << 8) + index + + @classmethod + def get_ipv6_addr(cls, ns=1, index=1) -> IPv6Address: + return IPv6Address("fd00::") + (ns << 16) + index + + @classmethod + def get_addr(cls, address_family=socket.AF_INET, ns=1, index=1): + if address_family == socket.AF_INET: + return cls.get_ipv4_addr(ns, index) + elif address_family == socket.AF_INET6: + return cls.get_ipv6_addr(ns, index) + else: + raise ValueError(f"Bad address_family={address_family}") + + # 02:* means "locally administered" + server_mac_addr = "02:00:00:00:00:01" + client_mac_addr = "02:00:00:00:00:02" + + ipv4_prefix_len = 16 + ipv6_prefix_len = 64 + + @classmethod + def get_prefix_length(cls, address_family) -> int: + return { + socket.AF_INET: cls.ipv4_prefix_len, + socket.AF_INET6: cls.ipv6_prefix_len, + }[address_family] + + def __init__(self, **kw): + raise_skip_no_netns() + for k, v in kw.items(): + setattr(self, k, v) + + def __enter__(self): + self._del_netns() + script = f""" +set -e +ip netns add {self.server_netns_name} +ip netns add {self.client_netns_name} +ip link add veth0 netns {self.server_netns_name} type veth peer name veth0 netns {self.client_netns_name} +ip netns exec {self.server_netns_name} ip link set veth0 up addr {self.server_mac_addr} +ip netns exec {self.client_netns_name} ip link set veth0 up addr {self.client_mac_addr} +""" + for index in [1, 2, 3]: + script += f"ip -n {self.server_netns_name} addr add {self.get_ipv4_addr(1, index)}/16 dev veth0\n" + script += f"ip -n {self.client_netns_name} addr add {self.get_ipv4_addr(2, index)}/16 dev veth0\n" + script += f"ip -n {self.server_netns_name} addr add {self.get_ipv6_addr(1, index)}/64 dev veth0 nodad\n" + script += f"ip -n {self.client_netns_name} addr add {self.get_ipv6_addr(2, index)}/64 dev veth0 nodad\n" + subprocess.run(script, shell=True, check=True) + return self + + def _del_netns(self): + script = f"""\ +set -e +if ip netns list | grep -q {self.server_netns_name}; then + ip netns del {self.server_netns_name} +fi +if ip netns list | grep -q {self.client_netns_name}; then + ip netns del {self.client_netns_name} +fi +""" + subprocess.run(script, shell=True, check=True) + + def __exit__(self, *a): + self._del_netns() diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/server.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/server.py new file mode 100644 index 000000000000..4cad2d61093b --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/server.py @@ -0,0 +1,124 @@ +# SPDX-License-Identifier: GPL-2.0 +import logging +import os +import selectors +import socket +import typing +from contextlib import ExitStack +from threading import Thread + +logger = logging.getLogger(__name__) + + +class SimpleServerThread(Thread): + """Simple server thread for testing TCP sockets + + All data is read in 1000 bytes chunks and either echoed back or discarded. + + :ivar _listen_socket_list: List of listen sockets, not for direct manipulation. + :ivar server_socket: List of accepted sockets. + :ivar keep_half_open: do not close in response to remote close. + """ + + DEFAULT_BUFSIZE = 1000 + _listen_socket_list: typing.List[socket.socket] + server_socket: typing.List[socket.socket] + sel: typing.Optional[selectors.BaseSelector] + + def __init__( + self, + sockarg: typing.Union[None, socket.socket, typing.List[socket.socket]] = None, + mode="recv", + bufsize=DEFAULT_BUFSIZE, + keep_half_open=False, + ): + if isinstance(sockarg, socket.socket): + self._listen_socket_list = [sockarg] + elif isinstance(sockarg, list): + self._listen_socket_list = sockarg + elif sockarg is None: + self._listen_socket_list = [] + else: + raise TypeError(f"Bad sockarg={sockarg!r}") + self.server_socket = [] + self.bufsize = bufsize + self.keep_half_open = keep_half_open + self.mode = mode + self.sel = None + super().__init__() + + def _read(self, conn, events): + # logger.debug("events=%r", events) + try: + data = conn.recv(self.bufsize) + except ConnectionResetError: + # logger.info("reset %r", conn) + conn.close() + self.sel.unregister(conn) + return + # logger.debug("len(data)=%r", len(data)) + if len(data) == 0: + if not self.keep_half_open: + # logger.info("closing %r", conn) + conn.close() + self.sel.unregister(conn) + else: + if self.mode == "echo": + conn.sendall(data) + elif self.mode == "recv": + pass + else: + raise ValueError(f"Unknown mode {self.mode}") + + def _stop_pipe_read(self, conn, events): + self.should_loop = False + + def start(self) -> None: + self.exit_stack = ExitStack() + self._stop_pipe_rfd, self._stop_pipe_wfd = os.pipe() + self.exit_stack.callback(lambda: os.close(self._stop_pipe_rfd)) + self.exit_stack.callback(lambda: os.close(self._stop_pipe_wfd)) + self.sel = self.exit_stack.enter_context(selectors.DefaultSelector()) + self.sel.register( + self._stop_pipe_rfd, + selectors.EVENT_READ, + self._stop_pipe_read, + ) + for sock in self._listen_socket_list: + self.sel.register(sock, selectors.EVENT_READ, self._accept) + self.should_loop = True + return super().start() + + def _accept(self, sock, events): + # logger.info("accept on %r", sock) + conn, _addr = sock.accept() + conn = self.exit_stack.enter_context(conn) + conn.setblocking(False) + self.sel.register(conn, selectors.EVENT_READ, self._read) + self.server_socket.append(conn) + + def add_listen_socket(self, sock): + self._listen_socket_list.append(sock) + if self.sel: + self.sel.register(sock, selectors.EVENT_READ, self._accept) + + def run(self): + # logger.debug("loop init") + while self.should_loop: + for key, events in self.sel.select(timeout=1): + callback = key.data + callback(key.fileobj, events) + # logger.debug("loop done") + + def stop(self): + """Try to stop nicely""" + os.write(self._stop_pipe_wfd, b"Q") + self.join() + self.exit_stack.close() + + def __enter__(self): + self.start() + return self + + def __exit__(self, *args): + self.stop() diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_bind.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_bind.py new file mode 100644 index 000000000000..73074535c9ca --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_bind.py @@ -0,0 +1,155 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Test TCP-AO keys can be bound to specific remote addresses""" +import socket +from contextlib import ExitStack + +import pytest + +from .conftest import skipif_missing_tcp_authopt +from .linux_tcp_authopt import ( + TCP_AUTHOPT_ALG, + TCP_AUTHOPT_FLAG, + TCP_AUTHOPT_KEY_FLAG, + set_tcp_authopt, + set_tcp_authopt_key, + tcp_authopt, + tcp_authopt_key, +) +from .netns_fixture import NamespaceFixture +from .server import SimpleServerThread +from .utils import ( + DEFAULT_TCP_SERVER_PORT, + check_socket_echo, + create_listen_socket, + netns_context, +) + +pytestmark = skipif_missing_tcp_authopt + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_addr_server_bind(exit_stack: ExitStack, address_family): + """ "Server only accept client2, check client1 fails""" + nsfixture = exit_stack.enter_context(NamespaceFixture()) + server_addr = str(nsfixture.get_addr(address_family, 1, 1)) + client_addr = str(nsfixture.get_addr(address_family, 2, 1)) + client_addr2 = str(nsfixture.get_addr(address_family, 2, 2)) + + # create server: + listen_socket = exit_stack.push( + create_listen_socket(family=address_family, ns=nsfixture.server_netns_name) + ) + exit_stack.enter_context(SimpleServerThread(listen_socket, mode="echo")) + + # set keys: + server_key = tcp_authopt_key( + alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, + key="hello", + flags=TCP_AUTHOPT_KEY_FLAG.BIND_ADDR, + addr=client_addr2, + ) + set_tcp_authopt( + listen_socket, + tcp_authopt(flags=TCP_AUTHOPT_FLAG.REJECT_UNEXPECTED), + ) + set_tcp_authopt_key(listen_socket, server_key) + + # create client socket: + def create_client_socket(): + with netns_context(nsfixture.client_netns_name): + client_socket = socket.socket(address_family, socket.SOCK_STREAM) + client_key = tcp_authopt_key( + alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, + key="hello", + ) + set_tcp_authopt_key(client_socket, client_key) + return client_socket + + # addr match: + # with create_client_socket() as client_socket2: + # client_socket2.bind((client_addr2, 0)) + # client_socket2.settimeout(1.0) + # client_socket2.connect((server_addr, TCP_SERVER_PORT)) + + # addr mismatch: + with create_client_socket() as client_socket1: + client_socket1.bind((client_addr, 0)) + with pytest.raises(socket.timeout): + client_socket1.settimeout(1.0) + client_socket1.connect((server_addr, DEFAULT_TCP_SERVER_PORT)) + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_addr_client_bind(exit_stack: ExitStack, address_family): + """Client configures different keys with same id but different addresses""" + nsfixture = exit_stack.enter_context(NamespaceFixture()) + server_addr1 = str(nsfixture.get_addr(address_family, 1, 1)) + server_addr2 = str(nsfixture.get_addr(address_family, 1, 2)) + client_addr = str(nsfixture.get_addr(address_family, 2, 1)) + + # create servers: + listen_socket1 = exit_stack.enter_context( + create_listen_socket( + family=address_family, + ns=nsfixture.server_netns_name, + bind_addr=server_addr1, + ) + ) + listen_socket2 = exit_stack.enter_context( + create_listen_socket( + family=address_family, + ns=nsfixture.server_netns_name, + bind_addr=server_addr2, + ) + ) + exit_stack.enter_context(SimpleServerThread(listen_socket1, mode="echo")) + exit_stack.enter_context(SimpleServerThread(listen_socket2, mode="echo")) + + # set keys: + set_tcp_authopt_key( + listen_socket1, + tcp_authopt_key( + alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, + key="11111", + ), + ) + set_tcp_authopt_key( + listen_socket2, + tcp_authopt_key( + alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, + key="22222", + ), + ) + + # create client socket: + def create_client_socket(): + with netns_context(nsfixture.client_netns_name): + client_socket = socket.socket(address_family, socket.SOCK_STREAM) + set_tcp_authopt_key( + client_socket, + tcp_authopt_key( + alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, + key="11111", + flags=TCP_AUTHOPT_KEY_FLAG.BIND_ADDR, + addr=server_addr1, + ), + ) + set_tcp_authopt_key( + client_socket, + tcp_authopt_key( + alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, + key="22222", + flags=TCP_AUTHOPT_KEY_FLAG.BIND_ADDR, + addr=server_addr2, + ), + ) + client_socket.settimeout(1.0) + client_socket.bind((client_addr, 0)) + return client_socket + + with create_client_socket() as client_socket1: + client_socket1.connect((server_addr1, DEFAULT_TCP_SERVER_PORT)) + check_socket_echo(client_socket1) + with create_client_socket() as client_socket2: + client_socket2.connect((server_addr2, DEFAULT_TCP_SERVER_PORT)) + check_socket_echo(client_socket2) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/utils.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/utils.py new file mode 100644 index 000000000000..473e8e954d92 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/utils.py @@ -0,0 +1,114 @@ +# SPDX-License-Identifier: GPL-2.0 +import json +import random +import socket +import subprocess +import typing +from contextlib import nullcontext + +from nsenter import Namespace + +# TCP port does not impact Authentication Option so define a single default +DEFAULT_TCP_SERVER_PORT = 17971 + + +def recvall(sock, todo): + """Receive exactly todo bytes unless EOF""" + data = bytes() + while True: + chunk = sock.recv(todo) + if not len(chunk): + return data + data += chunk + todo -= len(chunk) + if todo == 0: + return data + assert todo > 0 + + +def randbytes(count) -> bytes: + """Return a random byte array""" + return bytes([random.randint(0, 255) for index in range(count)]) + + +def check_socket_echo(sock: socket.socket, size=1000): + """Send random bytes and check they are received + + The default size is equal to `SimpleServerThread.DEFAULT_BUFSIZE` which + means that a single pair of packets will be sent at the TCP level. + """ + send_buf = randbytes(size) + sock.sendall(send_buf) + recv_buf = recvall(sock, size) + assert send_buf == recv_buf + + +def nstat_json(command_prefix: str = "", namespace=None): + """Parse nstat output into a python dict""" + if namespace is not None: + command_prefix += f"ip netns exec {namespace} " + runres = subprocess.run( + f"{command_prefix}nstat -a --zeros --json", + shell=True, + check=True, + stdout=subprocess.PIPE, + encoding="utf-8", + ) + return json.loads(runres.stdout)["kernel"] + + +def netns_context(ns: str = ""): + """Create context manager for a certain optional netns + + If the ns argument is empty then just return a `nullcontext` + """ + if ns: + return Namespace("/var/run/netns/" + ns, "net") + else: + return nullcontext() + + +def socket_set_bindtodevice(sock, dev: str): + """Set SO_BINDTODEVICE""" + opt = dev.encode("utf-8") + b"\0" + sock.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, opt) + + +def create_listen_socket( + ns: str = "", + family=socket.AF_INET, + reuseaddr=True, + listen_depth=10, + bind_addr="", + bind_port=DEFAULT_TCP_SERVER_PORT, + bind_device: typing.Optional[str] = None, +): + with netns_context(ns): + listen_socket = socket.socket(family, socket.SOCK_STREAM) + if reuseaddr: + listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) + if bind_device: + socket_set_bindtodevice(listen_socket, bind_device) + listen_socket.bind((str(bind_addr), bind_port)) + listen_socket.listen(listen_depth) + return listen_socket + + +def create_client_socket( + ns: str = "", family=socket.AF_INET, bind_addr="", bind_port=0, timeout=1.0 +): + with netns_context(ns): + client_socket = socket.socket(family, socket.SOCK_STREAM) + if bind_addr or bind_port: + client_socket.bind((str(bind_addr), bind_port)) + if timeout is not None: + client_socket.settimeout(timeout) + return client_socket + + +def socket_set_linger(sock, onoff, value): + import struct + + sock.setsockopt( + socket.SOL_SOCKET, socket.SO_LINGER, struct.pack("ii", int(onoff), int(value)) + )
Add a compute_sne function which finds the value of SNE for a certain SEQ given an already known "recent" SNE/SEQ. This is implemented using the standard tcp before/after macro and will work for SEQ values that are without 2^31 of the SEQ for which we know the SNE.
For updating we advance the value for rcv_sne at the same time as rcv_nxt and for snd_sne at the same time as snd_nxt. We could track other values (for example snd_una) but this is good enough and works very easily for timewait socket.
This implementation is different from RFC suggestions and doesn't require additional flags. It does pass tests from this draft: https://datatracker.ietf.org/doc/draft-touch-sne/
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 36 ++++++++++++++++++ net/ipv4/tcp_authopt.c | 80 ++++++++++++++++++++++++++++++++++++++- net/ipv4/tcp_input.c | 1 + net/ipv4/tcp_output.c | 1 + 4 files changed, 116 insertions(+), 2 deletions(-)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index a505db1dd67b..7360bda20f97 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -62,10 +62,14 @@ struct tcp_authopt_info { u32 flags; /** @src_isn: Local Initial Sequence Number */ u32 src_isn; /** @dst_isn: Remote Initial Sequence Number */ u32 dst_isn; + /** @rcv_sne: Recv-side Sequence Number Extension tracking tcp_sock.rcv_nxt */ + u32 rcv_sne; + /** @snd_sne: Send-side Sequence Number Extension tracking tcp_sock.snd_nxt */ + u32 snd_sne; };
#ifdef CONFIG_TCP_AUTHOPT extern int sysctl_tcp_authopt; DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed); @@ -143,10 +147,36 @@ static inline int tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb return __tcp_authopt_inbound_check(sk, skb, info); }
return 0; } +void __tcp_authopt_update_rcv_sne(struct tcp_sock *tp, struct tcp_authopt_info *info, u32 seq); +static inline void tcp_authopt_update_rcv_sne(struct tcp_sock *tp, u32 seq) +{ + struct tcp_authopt_info *info; + + if (static_branch_unlikely(&tcp_authopt_needed)) { + rcu_read_lock(); + info = rcu_dereference(tp->authopt_info); + if (info) + __tcp_authopt_update_rcv_sne(tp, info, seq); + rcu_read_unlock(); + } +} +void __tcp_authopt_update_snd_sne(struct tcp_sock *tp, struct tcp_authopt_info *info, u32 seq); +static inline void tcp_authopt_update_snd_sne(struct tcp_sock *tp, u32 seq) +{ + struct tcp_authopt_info *info; + + if (static_branch_unlikely(&tcp_authopt_needed)) { + rcu_read_lock(); + info = rcu_dereference(tp->authopt_info); + if (info) + __tcp_authopt_update_snd_sne(tp, info, seq); + rcu_read_unlock(); + } +} #else static inline int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) { return -ENOPROTOOPT; } @@ -185,8 +215,14 @@ static inline void tcp_authopt_time_wait( } static inline int tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb) { return 0; } +static inline void tcp_authopt_update_rcv_sne(struct tcp_sock *tp, u32 seq) +{ +} +static inline void tcp_authopt_update_snd_sne(struct tcp_sock *tp, u32 seq) +{ +} #endif
#endif /* _LINUX_TCP_AUTHOPT_H */ diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index 7c49dcce7d24..a48b741c83e4 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -968,10 +968,84 @@ static int skb_shash_frags(struct shash_desc *desc, }
return 0; }
+/* compute_sne - Calculate Sequence Number Extension + * + * Give old upper/lower 32bit values and a new lower 32bit value determine the + * new value of the upper 32 bit. The new sequence number can be 2^31 before or + * after prev_seq but TCP window scaling should limit this further. + * + * For correct accounting the stored SNE value should be only updated together + * with the SEQ. + */ +static u32 compute_sne(u32 sne, u32 prev_seq, u32 seq) +{ + if (before(seq, prev_seq)) { + if (seq > prev_seq) + --sne; + } else { + if (seq < prev_seq) + ++sne; + } + + return sne; +} + +/* Update rcv_sne, must be called immediately before rcv_nxt update */ +void __tcp_authopt_update_rcv_sne(struct tcp_sock *tp, + struct tcp_authopt_info *info, u32 seq) +{ + info->rcv_sne = compute_sne(info->rcv_sne, tp->rcv_nxt, seq); +} + +/* Update snd_sne, must be called immediately before snd_nxt update */ +void __tcp_authopt_update_snd_sne(struct tcp_sock *tp, + struct tcp_authopt_info *info, u32 seq) +{ + info->snd_sne = compute_sne(info->snd_sne, tp->snd_nxt, seq); +} + +/* Compute SNE for a specific packet (by seq). */ +static int compute_packet_sne(struct sock *sk, struct tcp_authopt_info *info, + u32 seq, bool input, __be32 *sne) +{ + u32 rcv_nxt, snd_nxt; + + // We can't use normal SNE computation before reaching TCP_ESTABLISHED + // For TCP_SYN_SENT the dst_isn field is initialized only after we + // validate the remote SYN/ACK + // For TCP_NEW_SYN_RECV there is no tcp_authopt_info at all + if (sk->sk_state == TCP_SYN_SENT || + sk->sk_state == TCP_NEW_SYN_RECV || + sk->sk_state == TCP_LISTEN) + return 0; + + if (sk->sk_state == TCP_TIME_WAIT) { + rcv_nxt = tcp_twsk(sk)->tw_rcv_nxt; + snd_nxt = tcp_twsk(sk)->tw_snd_nxt; + } else { + if (WARN_ONCE(!sk_fullsock(sk), + "unexpected minisock sk=%p state=%d", sk, + sk->sk_state)) + return -EINVAL; + rcv_nxt = tcp_sk(sk)->rcv_nxt; + snd_nxt = tcp_sk(sk)->snd_nxt; + } + + if (WARN_ONCE(!info, "unexpected missing info for sk=%p sk_state=%d", sk, sk->sk_state)) + return -EINVAL; + + if (input) + *sne = htonl(compute_sne(info->rcv_sne, rcv_nxt, seq)); + else + *sne = htonl(compute_sne(info->snd_sne, snd_nxt, seq)); + + return 0; +} + static int tcp_authopt_hash_packet(struct crypto_shash *tfm, struct sock *sk, struct sk_buff *skb, struct tcp_authopt_info *info, bool input, @@ -979,14 +1053,16 @@ static int tcp_authopt_hash_packet(struct crypto_shash *tfm, bool include_options, u8 *macbuf) { struct tcphdr *th = tcp_hdr(skb); SHASH_DESC_ON_STACK(desc, tfm); + __be32 sne = 0; int err;
- /* NOTE: SNE unimplemented */ - __be32 sne = 0; + err = compute_packet_sne(sk, info, ntohl(th->seq), input, &sne); + if (err) + return err;
desc->tfm = tfm; err = crypto_shash_init(desc); if (err) return err; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 5dcde6e74bfc..0ac74e621b4e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3517,10 +3517,11 @@ static void tcp_snd_una_update(struct tcp_sock *tp, u32 ack) static void tcp_rcv_nxt_update(struct tcp_sock *tp, u32 seq) { u32 delta = seq - tp->rcv_nxt;
sock_owned_by_me((struct sock *)tp); + tcp_authopt_update_rcv_sne(tp, seq); tp->bytes_received += delta; WRITE_ONCE(tp->rcv_nxt, seq); }
/* Update our send window. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 1e5acc5a38cf..ea53c24747b9 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -67,10 +67,11 @@ static void tcp_event_new_data_sent(struct sock *sk, struct sk_buff *skb) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); unsigned int prior_packets = tp->packets_out;
+ tcp_authopt_update_snd_sne(tp, TCP_SKB_CB(skb)->end_seq); WRITE_ONCE(tp->snd_nxt, TCP_SKB_CB(skb)->end_seq);
__skb_unlink(skb, &sk->sk_write_queue); tcp_rbtree_insert(&sk->tcp_rtx_queue, skb);
+/* Compute SNE for a specific packet (by seq). */ +static int compute_packet_sne(struct sock *sk, struct tcp_authopt_info *info,
u32 seq, bool input, __be32 *sne)
+{
u32 rcv_nxt, snd_nxt;
// We can't use normal SNE computation before reaching TCP_ESTABLISHED
// For TCP_SYN_SENT the dst_isn field is initialized only after we
// validate the remote SYN/ACK
// For TCP_NEW_SYN_RECV there is no tcp_authopt_info at all
if (sk->sk_state == TCP_SYN_SENT ||
sk->sk_state == TCP_NEW_SYN_RECV ||
sk->sk_state == TCP_LISTEN)
return 0;
In case of TCP_NEW_SYN_RECV, if our SYNACK had sequence number 0xffffffff, we will receive an ACK sequence number of 0, which should have sne = 1.
In a somewhat similar corner case, when we receive a SYNACK to our SYN in tcp_rcv_synsent_state_process, if the SYNACK has sequence number 0xffffffff, we set tp->rcv_nxt to 0, and we should set sne to 1.
There may be more similar corner cases related to a wraparound during the handshake.
Since as you pointed out all we need is "recent" valid <sne, seq> pairs as reference, rather than relying on rcv_sne being paired with tp->rcv_nxt (and similarly for snd_sne and tp->snd_nxt), would it be easier to maintain reference <sne, seq> pairs for send and receive in tcp_authopt_info, appropriately handle the different handshake cases and initialize the pairs, and only then track them in tcp_rcv_nxt_update and tcp_rcv_snd_update?
static void tcp_rcv_nxt_update(struct tcp_sock *tp, u32 seq) { u32 delta = seq - tp->rcv_nxt;
sock_owned_by_me((struct sock *)tp);
tcp_authopt_update_rcv_sne(tp, seq); tp->bytes_received += delta; WRITE_ONCE(tp->rcv_nxt, seq);
}
Since rcv_sne and tp->rcv_nxt are not updated atomically, could there ever be a case where a reader might use the new sne with the old rcv_nxt?
Francesco
On 11/1/21 9:22 PM, Francesco Ruggeri wrote:
+/* Compute SNE for a specific packet (by seq). */ +static int compute_packet_sne(struct sock *sk, struct tcp_authopt_info *info,
u32 seq, bool input, __be32 *sne)
+{
u32 rcv_nxt, snd_nxt;
// We can't use normal SNE computation before reaching TCP_ESTABLISHED
// For TCP_SYN_SENT the dst_isn field is initialized only after we
// validate the remote SYN/ACK
// For TCP_NEW_SYN_RECV there is no tcp_authopt_info at all
if (sk->sk_state == TCP_SYN_SENT ||
sk->sk_state == TCP_NEW_SYN_RECV ||
sk->sk_state == TCP_LISTEN)
return 0;
In case of TCP_NEW_SYN_RECV, if our SYNACK had sequence number 0xffffffff, we will receive an ACK sequence number of 0, which should have sne = 1.
In a somewhat similar corner case, when we receive a SYNACK to our SYN in tcp_rcv_synsent_state_process, if the SYNACK has sequence number 0xffffffff, we set tp->rcv_nxt to 0, and we should set sne to 1.
There may be more similar corner cases related to a wraparound during the handshake.
Since as you pointed out all we need is "recent" valid <sne, seq> pairs as reference, rather than relying on rcv_sne being paired with tp->rcv_nxt (and similarly for snd_sne and tp->snd_nxt), would it be easier to maintain reference <sne, seq> pairs for send and receive in tcp_authopt_info, appropriately handle the different handshake cases and initialize the pairs, and only then track them in tcp_rcv_nxt_update and tcp_rcv_snd_update?
For TCP_NEW_SYN_RECV there is no struct tcp_authopt_info, only a request minisock. I think those are deliberately kept small save resources on SYN floods so I'd rather not increase their size.
For all the handshake cases we can just rely on SNE=0 for ISN and we already need to keep track of ISNs because they're part of the signature.
I'll need to test handshake seq 0xFFFFFFFF deliberately, you're right that it can fail.
static void tcp_rcv_nxt_update(struct tcp_sock *tp, u32 seq) { u32 delta = seq - tp->rcv_nxt;
sock_owned_by_me((struct sock *)tp);
}tcp_authopt_update_rcv_sne(tp, seq); tp->bytes_received += delta; WRITE_ONCE(tp->rcv_nxt, seq);
Since rcv_sne and tp->rcv_nxt are not updated atomically, could there ever be a case where a reader might use the new sne with the old rcv_nxt?
As far as I understand if all of the read and writes to SNE happen under the socket lock it should be fine. I don't know why WRITE_ONCE is used here, maybe somebody else wants to read rcv_nxt outside the socket lock? That doesn't matter for SNE.
I think the only case would be sending ipv4 RSTs outside the socket.
-- Regards, Leonard
On Tue, Nov 2, 2021 at 3:03 AM Leonard Crestez cdleonard@gmail.com wrote:
On 11/1/21 9:22 PM, Francesco Ruggeri wrote:
+/* Compute SNE for a specific packet (by seq). */ +static int compute_packet_sne(struct sock *sk, struct tcp_authopt_info *info,
u32 seq, bool input, __be32 *sne)
+{
u32 rcv_nxt, snd_nxt;
// We can't use normal SNE computation before reaching TCP_ESTABLISHED
// For TCP_SYN_SENT the dst_isn field is initialized only after we
// validate the remote SYN/ACK
// For TCP_NEW_SYN_RECV there is no tcp_authopt_info at all
if (sk->sk_state == TCP_SYN_SENT ||
sk->sk_state == TCP_NEW_SYN_RECV ||
sk->sk_state == TCP_LISTEN)
return 0;
In case of TCP_NEW_SYN_RECV, if our SYNACK had sequence number 0xffffffff, we will receive an ACK sequence number of 0, which should have sne = 1.
In a somewhat similar corner case, when we receive a SYNACK to our SYN in tcp_rcv_synsent_state_process, if the SYNACK has sequence number 0xffffffff, we set tp->rcv_nxt to 0, and we should set sne to 1.
There may be more similar corner cases related to a wraparound during the handshake.
Since as you pointed out all we need is "recent" valid <sne, seq> pairs as reference, rather than relying on rcv_sne being paired with tp->rcv_nxt (and similarly for snd_sne and tp->snd_nxt), would it be easier to maintain reference <sne, seq> pairs for send and receive in tcp_authopt_info, appropriately handle the different handshake cases and initialize the pairs, and only then track them in tcp_rcv_nxt_update and tcp_rcv_snd_update?
For TCP_NEW_SYN_RECV there is no struct tcp_authopt_info, only a request minisock. I think those are deliberately kept small save resources on SYN floods so I'd rather not increase their size.
For all the handshake cases we can just rely on SNE=0 for ISN and we already need to keep track of ISNs because they're part of the signature.
Exactly. But the current code, when setting rcv_sne and snd_sne, always compares the sequence number with the <info->rcv_sne, tp->rcv_nxt> (or <info->snd_sne, tp->snd_nxt>) pair, where info->rcv_sne and info->snd_sne are initialized to 0 at the time of info creation. In other words, the code assumes that rcv_sne always corresponds to tp->rcv_nxt, and snd_sne to tp->snd_nxt. But that may not be true when info is created, on account of rollovers during a handshake. So it is not just a matter of what to use for SNE before info is created and used, but also how SNEs are initialized in info. That is why I was suggesting of saving valid <sne, seq> pairs (initialized with <0, ISN>) in tcp_authopt_info rather than just SNEs, and then always compare seq to those pairs if info is available. The pairs could then be updated in tcp_rcv_nxt_update and tcp_snd_una_update.
Regards, Francesco
On 11/2/21 9:21 PM, Francesco Ruggeri wrote:
On Tue, Nov 2, 2021 at 3:03 AM Leonard Crestez cdleonard@gmail.com wrote:
On 11/1/21 9:22 PM, Francesco Ruggeri wrote:
+/* Compute SNE for a specific packet (by seq). */ +static int compute_packet_sne(struct sock *sk, struct tcp_authopt_info *info,
u32 seq, bool input, __be32 *sne)
+{
u32 rcv_nxt, snd_nxt;
// We can't use normal SNE computation before reaching TCP_ESTABLISHED
// For TCP_SYN_SENT the dst_isn field is initialized only after we
// validate the remote SYN/ACK
// For TCP_NEW_SYN_RECV there is no tcp_authopt_info at all
if (sk->sk_state == TCP_SYN_SENT ||
sk->sk_state == TCP_NEW_SYN_RECV ||
sk->sk_state == TCP_LISTEN)
return 0;
In case of TCP_NEW_SYN_RECV, if our SYNACK had sequence number 0xffffffff, we will receive an ACK sequence number of 0, which should have sne = 1.
In a somewhat similar corner case, when we receive a SYNACK to our SYN in tcp_rcv_synsent_state_process, if the SYNACK has sequence number 0xffffffff, we set tp->rcv_nxt to 0, and we should set sne to 1.
There may be more similar corner cases related to a wraparound during the handshake.
Since as you pointed out all we need is "recent" valid <sne, seq> pairs as reference, rather than relying on rcv_sne being paired with tp->rcv_nxt (and similarly for snd_sne and tp->snd_nxt), would it be easier to maintain reference <sne, seq> pairs for send and receive in tcp_authopt_info, appropriately handle the different handshake cases and initialize the pairs, and only then track them in tcp_rcv_nxt_update and tcp_rcv_snd_update?
For TCP_NEW_SYN_RECV there is no struct tcp_authopt_info, only a request minisock. I think those are deliberately kept small save resources on SYN floods so I'd rather not increase their size.
For all the handshake cases we can just rely on SNE=0 for ISN and we already need to keep track of ISNs because they're part of the signature.
Exactly. But the current code, when setting rcv_sne and snd_sne, always compares the sequence number with the <info->rcv_sne, tp->rcv_nxt> (or <info->snd_sne, tp->snd_nxt>) pair, where info->rcv_sne and info->snd_sne are initialized to 0 at the time of info creation. In other words, the code assumes that rcv_sne always corresponds to tp->rcv_nxt, and snd_sne to tp->snd_nxt. But that may not be true when info is created, on account of rollovers during a handshake. So it is not just a matter of what to use for SNE before info is created and used, but also how SNEs are initialized in info. That is why I was suggesting of saving valid <sne, seq> pairs (initialized with <0, ISN>) in tcp_authopt_info rather than just SNEs, and then always compare seq to those pairs if info is available. The pairs could then be updated in tcp_rcv_nxt_update and tcp_snd_una_update.
You are correct that SNE will be initialized incorrectly if a rollover happens during the handshake. I think this can be solved by initializing SNE at the same time as ISN like this:
rcv_sne = compute_sne(0, disn, rcv_nxt); snd_sne = compute_sne(0, sisn, snd_nxt);
This relies on initial sequence numbers having an extension of zero by definition. The actual implementation is a bit more complicated but it only needs to be done when transitioning into ESTABLISHED. I think this would even work for FASTOPEN where non-zero payload is present in handshake packets.
The SYN_SEND and SYN_RECV sockets are still special but they're also special because they have to determine ISNs from the packet itself. Since those sockets only compute signatures for packets with SYN bit ON and where SEQ=ISN then SNE is again zero by definition.
I will write tests with client and server-side SEQs equal to 0xFFFFFFFF to verify because this relies on actual initialization order details.
I think snd_nxt and rcv_nxt are good choices for SNE tracking because the rest of the TCP state machine controls their advancement. In theory it's possible to use any received SEQ value but then a very old or perhaps malicious packet could cause incorrect updates to SNE.
If separate fields were used to track rcv_sne_seq and snd_sne_seq then you would still need to only advance them for SEQ values which are known to be valid. Doing so in lockstep with snd_nxt and rcv_nxt would still make sense.
-- Regards, Leonard
On 11/1/21 9:34 AM, Leonard Crestez wrote:
Add a compute_sne function which finds the value of SNE for a certain SEQ given an already known "recent" SNE/SEQ. This is implemented using the standard tcp before/after macro and will work for SEQ values that are without 2^31 of the SEQ for which we know the SNE.
} +void __tcp_authopt_update_rcv_sne(struct tcp_sock *tp, struct tcp_authopt_info *info, u32 seq); +static inline void tcp_authopt_update_rcv_sne(struct tcp_sock *tp, u32 seq) +{
- struct tcp_authopt_info *info;
- if (static_branch_unlikely(&tcp_authopt_needed)) {
rcu_read_lock();
info = rcu_dereference(tp->authopt_info);
if (info)
__tcp_authopt_update_rcv_sne(tp, info, seq);
rcu_read_unlock();
- }
+} +void __tcp_authopt_update_snd_sne(struct tcp_sock *tp, struct tcp_authopt_info *info, u32 seq); +static inline void tcp_authopt_update_snd_sne(struct tcp_sock *tp, u32 seq) +{
- struct tcp_authopt_info *info;
- if (static_branch_unlikely(&tcp_authopt_needed)) {
rcu_read_lock();
info = rcu_dereference(tp->authopt_info);
if (info)
__tcp_authopt_update_snd_sne(tp, info, seq);
rcu_read_unlock();
- }
+}
I would think callers of these helpers own socket lock, so no rcu_read_lock()/unlock() should be needed.
Perhaps instead rcu_dereference_protected(tp->authopt_info, lockdep_sock_is_held(sk));
On 11/1/21 10:54 PM, Eric Dumazet wrote:
On 11/1/21 9:34 AM, Leonard Crestez wrote:
Add a compute_sne function which finds the value of SNE for a certain SEQ given an already known "recent" SNE/SEQ. This is implemented using the standard tcp before/after macro and will work for SEQ values that are without 2^31 of the SEQ for which we know the SNE.
} +void __tcp_authopt_update_rcv_sne(struct tcp_sock *tp, struct tcp_authopt_info *info, u32 seq); +static inline void tcp_authopt_update_rcv_sne(struct tcp_sock *tp, u32 seq) +{
- struct tcp_authopt_info *info;
- if (static_branch_unlikely(&tcp_authopt_needed)) {
rcu_read_lock();
info = rcu_dereference(tp->authopt_info);
if (info)
__tcp_authopt_update_rcv_sne(tp, info, seq);
rcu_read_unlock();
- }
+} +void __tcp_authopt_update_snd_sne(struct tcp_sock *tp, struct tcp_authopt_info *info, u32 seq); +static inline void tcp_authopt_update_snd_sne(struct tcp_sock *tp, u32 seq) +{
- struct tcp_authopt_info *info;
- if (static_branch_unlikely(&tcp_authopt_needed)) {
rcu_read_lock();
info = rcu_dereference(tp->authopt_info);
if (info)
__tcp_authopt_update_snd_sne(tp, info, seq);
rcu_read_unlock();
- }
+}
I would think callers of these helpers own socket lock, so no rcu_read_lock()/unlock() should be needed.
Perhaps instead rcu_dereference_protected(tp->authopt_info, lockdep_sock_is_held(sk));
Yes, all the callers hold the socket lock and replacing rcu_read_lock doesn't trigger any RCU warnings.
-- Regards, Leonard
This is a special code path for acks and resets outside of normal connection establishment and closing.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- net/ipv4/tcp_authopt.c | 2 ++ net/ipv6/tcp_ipv6.c | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+)
diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index a48b741c83e4..c9d201d8f7a7 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -296,10 +296,11 @@ struct tcp_authopt_key_info *__tcp_authopt_select_key(const struct sock *sk, const struct sock *addr_sk, u8 *rnextkeyid) { return tcp_authopt_lookup_send(info, addr_sk, -1); } +EXPORT_SYMBOL(__tcp_authopt_select_key);
static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_authopt_info *info; @@ -1202,10 +1203,11 @@ int tcp_authopt_hash(char *hash_location, * try to make it obvious inside the packet. */ memset(hash_location, 0, TCP_AUTHOPT_MACLEN); return err; } +EXPORT_SYMBOL(tcp_authopt_hash);
static struct tcp_authopt_key_info *tcp_authopt_lookup_recv(struct sock *sk, struct sk_buff *skb, struct tcp_authopt_info *info, int recv_id) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 96a29caf56c7..68f9545e4347 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -902,13 +902,37 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 struct sock *ctl_sk = net->ipv6.tcp_sk; unsigned int tot_len = sizeof(struct tcphdr); __be32 mrst = 0, *topt; struct dst_entry *dst; __u32 mark = 0; +#ifdef CONFIG_TCP_AUTHOPT + struct tcp_authopt_info *authopt_info = NULL; + struct tcp_authopt_key_info *authopt_key_info = NULL; + u8 authopt_rnextkeyid; +#endif
if (tsecr) tot_len += TCPOLEN_TSTAMP_ALIGNED; +#ifdef CONFIG_TCP_AUTHOPT + /* Key lookup before SKB allocation */ + if (static_branch_unlikely(&tcp_authopt_needed) && sk) { + if (sk->sk_state == TCP_TIME_WAIT) + authopt_info = tcp_twsk(sk)->tw_authopt_info; + else + authopt_info = rcu_dereference(tcp_sk(sk)->authopt_info); + + if (authopt_info) { + authopt_key_info = __tcp_authopt_select_key(sk, authopt_info, sk, + &authopt_rnextkeyid); + if (authopt_key_info) { + tot_len += TCPOLEN_AUTHOPT_OUTPUT; + /* Don't use MD5 */ + key = NULL; + } + } + } +#endif #ifdef CONFIG_TCP_MD5SIG if (key) tot_len += TCPOLEN_MD5SIG_ALIGNED; #endif
@@ -961,10 +985,24 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 tcp_v6_md5_hash_hdr((__u8 *)topt, key, &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr, t1); } #endif +#ifdef CONFIG_TCP_AUTHOPT + /* Compute the TCP-AO mac. Unlike in the ipv4 case we have a real SKB */ + if (static_branch_unlikely(&tcp_authopt_needed) && authopt_key_info) { + *topt++ = htonl((TCPOPT_AUTHOPT << 24) | + (TCPOLEN_AUTHOPT_OUTPUT << 16) | + (authopt_key_info->send_id << 8) | + (authopt_rnextkeyid)); + tcp_authopt_hash((char *)topt, + authopt_key_info, + authopt_info, + (struct sock *)sk, + buff); + } +#endif
memset(&fl6, 0, sizeof(fl6)); fl6.daddr = ipv6_hdr(skb)->saddr; fl6.saddr = ipv6_hdr(skb)->daddr; fl6.flowlabel = label;
On 11/1/21 10:34 AM, Leonard Crestez wrote:
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 96a29caf56c7..68f9545e4347 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -902,13 +902,37 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 struct sock *ctl_sk = net->ipv6.tcp_sk; unsigned int tot_len = sizeof(struct tcphdr); __be32 mrst = 0, *topt; struct dst_entry *dst; __u32 mark = 0; +#ifdef CONFIG_TCP_AUTHOPT
- struct tcp_authopt_info *authopt_info = NULL;
- struct tcp_authopt_key_info *authopt_key_info = NULL;
- u8 authopt_rnextkeyid;
+#endif if (tsecr) tot_len += TCPOLEN_TSTAMP_ALIGNED; +#ifdef CONFIG_TCP_AUTHOPT
I realize MD5 is done this way, but new code can always strive to be better. Put this and the one below in helpers such that this logic is in the authopt.h file and the intrusion here is a one liner that either compiles in or out based on the config setting.
- /* Key lookup before SKB allocation */
- if (static_branch_unlikely(&tcp_authopt_needed) && sk) {
if (sk->sk_state == TCP_TIME_WAIT)
authopt_info = tcp_twsk(sk)->tw_authopt_info;
else
authopt_info = rcu_dereference(tcp_sk(sk)->authopt_info);
if (authopt_info) {
authopt_key_info = __tcp_authopt_select_key(sk, authopt_info, sk,
&authopt_rnextkeyid);
if (authopt_key_info) {
tot_len += TCPOLEN_AUTHOPT_OUTPUT;
/* Don't use MD5 */
key = NULL;
}
}
- }
+#endif #ifdef CONFIG_TCP_MD5SIG if (key) tot_len += TCPOLEN_MD5SIG_ALIGNED; #endif @@ -961,10 +985,24 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 tcp_v6_md5_hash_hdr((__u8 *)topt, key, &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr, t1); } #endif +#ifdef CONFIG_TCP_AUTHOPT
- /* Compute the TCP-AO mac. Unlike in the ipv4 case we have a real SKB */
- if (static_branch_unlikely(&tcp_authopt_needed) && authopt_key_info) {
*topt++ = htonl((TCPOPT_AUTHOPT << 24) |
(TCPOLEN_AUTHOPT_OUTPUT << 16) |
(authopt_key_info->send_id << 8) |
(authopt_rnextkeyid));
tcp_authopt_hash((char *)topt,
authopt_key_info,
authopt_info,
(struct sock *)sk,
buff);
- }
+#endif memset(&fl6, 0, sizeof(fl6)); fl6.daddr = ipv6_hdr(skb)->saddr; fl6.saddr = ipv6_hdr(skb)->daddr; fl6.flowlabel = label;
On 11/3/21 4:44 AM, David Ahern wrote:
On 11/1/21 10:34 AM, Leonard Crestez wrote:
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 96a29caf56c7..68f9545e4347 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -902,13 +902,37 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 struct sock *ctl_sk = net->ipv6.tcp_sk; unsigned int tot_len = sizeof(struct tcphdr); __be32 mrst = 0, *topt; struct dst_entry *dst; __u32 mark = 0; +#ifdef CONFIG_TCP_AUTHOPT
- struct tcp_authopt_info *authopt_info = NULL;
- struct tcp_authopt_key_info *authopt_key_info = NULL;
- u8 authopt_rnextkeyid;
+#endif if (tsecr) tot_len += TCPOLEN_TSTAMP_ALIGNED; +#ifdef CONFIG_TCP_AUTHOPT
I realize MD5 is done this way, but new code can always strive to be better. Put this and the one below in helpers such that this logic is in the authopt.h file and the intrusion here is a one liner that either compiles in or out based on the config setting.
It's not very easy to separate the AO-specific parts here. Key lookup determines packet allocation length and whether MD5 should also be attempted (RFC claims adding both is invalid). The result of the key lookup is the used later to sign bits of the packet.
The IPv4 equivalent is even worse because no explicit reply SKB is allocated.
I can try to split tcp_authopt_pick_key_for_response_v6 and tcp_authopt_sign_response_v6.
- /* Key lookup before SKB allocation */
- if (static_branch_unlikely(&tcp_authopt_needed) && sk) {
if (sk->sk_state == TCP_TIME_WAIT)
authopt_info = tcp_twsk(sk)->tw_authopt_info;
else
authopt_info = rcu_dereference(tcp_sk(sk)->authopt_info);
if (authopt_info) {
authopt_key_info = __tcp_authopt_select_key(sk, authopt_info, sk,
&authopt_rnextkeyid);
if (authopt_key_info) {
tot_len += TCPOLEN_AUTHOPT_OUTPUT;
/* Don't use MD5 */
key = NULL;
}
}
- }
+#endif #ifdef CONFIG_TCP_MD5SIG if (key) tot_len += TCPOLEN_MD5SIG_ALIGNED; #endif @@ -961,10 +985,24 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 tcp_v6_md5_hash_hdr((__u8 *)topt, key, &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr, t1); } #endif +#ifdef CONFIG_TCP_AUTHOPT
- /* Compute the TCP-AO mac. Unlike in the ipv4 case we have a real SKB */
- if (static_branch_unlikely(&tcp_authopt_needed) && authopt_key_info) {
*topt++ = htonl((TCPOPT_AUTHOPT << 24) |
(TCPOLEN_AUTHOPT_OUTPUT << 16) |
(authopt_key_info->send_id << 8) |
(authopt_rnextkeyid));
tcp_authopt_hash((char *)topt,
authopt_key_info,
authopt_info,
(struct sock *)sk,
buff);
- }
+#endif memset(&fl6, 0, sizeof(fl6)); fl6.daddr = ipv6_hdr(skb)->saddr; fl6.saddr = ipv6_hdr(skb)->daddr; fl6.flowlabel = label;
This is required because tcp ipv4 sometimes sends replies without allocating a full skb that can be signed by tcp authopt.
Handle this with additional code in tcp authopt.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 7 ++ net/ipv4/tcp_authopt.c | 147 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 154 insertions(+)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 7360bda20f97..ae7d6a1eab8d 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -101,10 +101,17 @@ static inline struct tcp_authopt_key_info *tcp_authopt_select_key( int tcp_authopt_hash( char *hash_location, struct tcp_authopt_key_info *key, struct tcp_authopt_info *info, struct sock *sk, struct sk_buff *skb); +int tcp_v4_authopt_hash_reply( + char *hash_location, + struct tcp_authopt_info *info, + struct tcp_authopt_key_info *key, + __be32 saddr, + __be32 daddr, + struct tcphdr *th); int __tcp_authopt_openreq(struct sock *newsk, const struct sock *oldsk, struct request_sock *req); static inline int tcp_authopt_openreq( struct sock *newsk, const struct sock *oldsk, struct request_sock *req) diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index c9d201d8f7a7..aef63e35b56f 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -799,10 +799,74 @@ static int tcp_authopt_get_traffic_key(struct sock *sk, out: tcp_authopt_put_kdf_shash(key, kdf_tfm); return err; }
+struct tcp_v4_authopt_context_data { + __be32 saddr; + __be32 daddr; + __be16 sport; + __be16 dport; + __be32 sisn; + __be32 disn; + __be16 digestbits; +} __packed; + +static int tcp_v4_authopt_get_traffic_key_noskb(struct tcp_authopt_key_info *key, + __be32 saddr, + __be32 daddr, + __be16 sport, + __be16 dport, + __be32 sisn, + __be32 disn, + u8 *traffic_key) +{ + int err; + struct crypto_shash *kdf_tfm; + SHASH_DESC_ON_STACK(desc, kdf_tfm); + struct tcp_v4_authopt_context_data data; + + BUILD_BUG_ON(sizeof(data) != 22); + + kdf_tfm = tcp_authopt_get_kdf_shash(key); + if (IS_ERR(kdf_tfm)) + return PTR_ERR(kdf_tfm); + + err = tcp_authopt_setkey(kdf_tfm, key); + if (err) + goto out; + + desc->tfm = kdf_tfm; + err = crypto_shash_init(desc); + if (err) + goto out; + + // RFC5926 section 3.1.1.1 + // Separate to keep alignment semi-sane + err = crypto_shash_update(desc, "\x01TCP-AO", 7); + if (err) + return err; + data.saddr = saddr; + data.daddr = daddr; + data.sport = sport; + data.dport = dport; + data.sisn = sisn; + data.disn = disn; + data.digestbits = htons(crypto_shash_digestsize(desc->tfm) * 8); + + err = crypto_shash_update(desc, (u8 *)&data, sizeof(data)); + if (err) + goto out; + err = crypto_shash_final(desc, traffic_key); + if (err) + goto out; + +out: + tcp_authopt_put_kdf_shash(key, kdf_tfm); + return err; +} + static int crypto_shash_update_zero(struct shash_desc *desc, int len) { u8 zero = 0; int i, err;
@@ -1205,10 +1269,93 @@ int tcp_authopt_hash(char *hash_location, memset(hash_location, 0, TCP_AUTHOPT_MACLEN); return err; } EXPORT_SYMBOL(tcp_authopt_hash);
+/** + * tcp_v4_authopt_hash_reply - Hash tcp+ipv4 header without SKB + * + * @hash_location: output buffer + * @info: sending socket's tcp_authopt_info + * @key: signing key, from tcp_authopt_select_key. + * @saddr: source address + * @daddr: destination address + * @th: Pointer to TCP header and options + */ +int tcp_v4_authopt_hash_reply(char *hash_location, + struct tcp_authopt_info *info, + struct tcp_authopt_key_info *key, + __be32 saddr, + __be32 daddr, + struct tcphdr *th) +{ + struct crypto_shash *mac_tfm; + u8 macbuf[TCP_AUTHOPT_MAXMACBUF]; + u8 traffic_key[TCP_AUTHOPT_MAX_TRAFFIC_KEY_LEN]; + SHASH_DESC_ON_STACK(desc, tfm); + __be32 sne = 0; + int err; + + /* Call special code path for computing traffic key without skb + * This can be called from tcp_v4_reqsk_send_ack so caching would be + * difficult here. + */ + err = tcp_v4_authopt_get_traffic_key_noskb(key, saddr, daddr, + th->source, th->dest, + htonl(info->src_isn), htonl(info->dst_isn), + traffic_key); + if (err) + goto out_err_traffic_key; + + /* Init mac shash */ + mac_tfm = tcp_authopt_get_mac_shash(key); + if (IS_ERR(mac_tfm)) + return PTR_ERR(mac_tfm); + err = crypto_shash_setkey(mac_tfm, traffic_key, key->alg->traffic_key_len); + if (err) + goto out_err; + + desc->tfm = mac_tfm; + err = crypto_shash_init(desc); + if (err) + return err; + + err = crypto_shash_update(desc, (u8 *)&sne, 4); + if (err) + return err; + + err = tcp_authopt_hash_tcp4_pseudoheader(desc, saddr, daddr, th->doff * 4); + if (err) + return err; + + // TCP header with checksum set to zero. Caller ensures this. + if (WARN_ON_ONCE(th->check != 0)) + goto out_err; + err = crypto_shash_update(desc, (u8 *)th, sizeof(*th)); + if (err) + goto out_err; + + // TCP options + err = tcp_authopt_hash_opts(desc, th, !(key->flags & TCP_AUTHOPT_KEY_EXCLUDE_OPTS)); + if (err) + goto out_err; + + err = crypto_shash_final(desc, macbuf); + if (err) + goto out_err; + memcpy(hash_location, macbuf, TCP_AUTHOPT_MACLEN); + + tcp_authopt_put_mac_shash(key, mac_tfm); + return 0; + +out_err: + tcp_authopt_put_mac_shash(key, mac_tfm); +out_err_traffic_key: + memset(hash_location, 0, TCP_AUTHOPT_MACLEN); + return err; +} + static struct tcp_authopt_key_info *tcp_authopt_lookup_recv(struct sock *sk, struct sk_buff *skb, struct tcp_authopt_info *info, int recv_id) {
The code in tcp_v4_send_ack and tcp_v4_send_reset does not allocate a full skb so special handling is required for tcp-authopt handling.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- net/ipv4/tcp_ipv4.c | 82 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 79 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index da43567c3753..21971f5fa40e 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -644,10 +644,50 @@ void tcp_v4_send_check(struct sock *sk, struct sk_buff *skb)
__tcp_v4_send_check(skb, inet->inet_saddr, inet->inet_daddr); } EXPORT_SYMBOL(tcp_v4_send_check);
+#ifdef CONFIG_TCP_AUTHOPT +/** tcp_v4_authopt_handle_reply - Insert TCPOPT_AUTHOPT if required + * + * returns number of bytes (always aligned to 4) or zero + */ +static int tcp_v4_authopt_handle_reply(const struct sock *sk, + struct sk_buff *skb, + __be32 *optptr, + struct tcphdr *th) +{ + struct tcp_authopt_info *info; + struct tcp_authopt_key_info *key_info; + u8 rnextkeyid; + + if (sk->sk_state == TCP_TIME_WAIT) + info = tcp_twsk(sk)->tw_authopt_info; + else + info = tcp_sk(sk)->authopt_info; + if (!info) + return 0; + key_info = __tcp_authopt_select_key(sk, info, sk, &rnextkeyid); + if (!key_info) + return 0; + *optptr = htonl((TCPOPT_AUTHOPT << 24) | + (TCPOLEN_AUTHOPT_OUTPUT << 16) | + (key_info->send_id << 8) | + (rnextkeyid)); + /* must update doff before signature computation */ + th->doff += TCPOLEN_AUTHOPT_OUTPUT / 4; + tcp_v4_authopt_hash_reply((char *)(optptr + 1), + info, + key_info, + ip_hdr(skb)->daddr, + ip_hdr(skb)->saddr, + th); + + return TCPOLEN_AUTHOPT_OUTPUT; +} +#endif + /* * This routine will send an RST to the other tcp. * * Someone asks: why I NEVER use socket parameters (TOS, TTL etc.) * for reset. @@ -659,10 +699,12 @@ EXPORT_SYMBOL(tcp_v4_send_check); * Exception: precedence violation. We do not implement it in any case. */
#ifdef CONFIG_TCP_MD5SIG #define OPTION_BYTES TCPOLEN_MD5SIG_ALIGNED +#elif defined(OPTION_BYTES_TCP_AUTHOPT) +#define OPTION_BYTES TCPOLEN_AUTHOPT_OUTPUT #else #define OPTION_BYTES sizeof(__be32) #endif
static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) @@ -712,12 +754,29 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) memset(&arg, 0, sizeof(arg)); arg.iov[0].iov_base = (unsigned char *)&rep; arg.iov[0].iov_len = sizeof(rep.th);
net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev); -#ifdef CONFIG_TCP_MD5SIG +#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) rcu_read_lock(); +#endif +#ifdef CONFIG_TCP_AUTHOPT + /* Unlike TCP-MD5 the signatures for TCP-AO depend on initial sequence + * numbers so we can only handle established and time-wait sockets. + */ + if (static_branch_unlikely(&tcp_authopt_needed) && sk && + sk->sk_state != TCP_NEW_SYN_RECV && + sk->sk_state != TCP_LISTEN) { + int tcp_authopt_ret = tcp_v4_authopt_handle_reply(sk, skb, rep.opt, &rep.th); + + if (tcp_authopt_ret) { + arg.iov[0].iov_len += tcp_authopt_ret; + goto skip_md5sig; + } + } +#endif +#ifdef CONFIG_TCP_MD5SIG hash_location = tcp_parse_md5sig_option(th); if (sk && sk_fullsock(sk)) { const union tcp_md5_addr *addr; int l3index;
@@ -755,11 +814,10 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) addr = (union tcp_md5_addr *)&ip_hdr(skb)->saddr; key = tcp_md5_do_lookup(sk1, l3index, addr, AF_INET); if (!key) goto out;
- genhash = tcp_v4_md5_hash_skb(newhash, key, NULL, skb); if (genhash || memcmp(hash_location, newhash, 16) != 0) goto out;
} @@ -775,10 +833,13 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb)
tcp_v4_md5_hash_hdr((__u8 *) &rep.opt[1], key, ip_hdr(skb)->saddr, ip_hdr(skb)->daddr, &rep.th); } +#endif +#ifdef CONFIG_TCP_AUTHOPT +skip_md5sig: #endif /* Can't co-exist with TCPMD5, hence check rep.opt[0] */ if (rep.opt[0] == 0) { __be32 mrst = mptcp_reset_option(skb);
@@ -828,12 +889,14 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb) ctl_sk->sk_mark = 0; __TCP_INC_STATS(net, TCP_MIB_OUTSEGS); __TCP_INC_STATS(net, TCP_MIB_OUTRSTS); local_bh_enable();
-#ifdef CONFIG_TCP_MD5SIG +#if defined(CONFIG_TCP_MD5SIG) out: +#endif +#if defined(CONFIG_TCP_MD5SIG) || defined(CONFIG_TCP_AUTHOPT) rcu_read_unlock(); #endif }
/* The code following below sending ACKs in SYN-RECV and TIME-WAIT states @@ -850,10 +913,12 @@ static void tcp_v4_send_ack(const struct sock *sk, struct { struct tcphdr th; __be32 opt[(TCPOLEN_TSTAMP_ALIGNED >> 2) #ifdef CONFIG_TCP_MD5SIG + (TCPOLEN_MD5SIG_ALIGNED >> 2) +#elif defined(CONFIG_TCP_AUTHOPT) + + (TCPOLEN_AUTHOPT_OUTPUT >> 2) #endif ]; } rep; struct net *net = sock_net(sk); struct ip_reply_arg arg; @@ -881,10 +946,21 @@ static void tcp_v4_send_ack(const struct sock *sk, rep.th.seq = htonl(seq); rep.th.ack_seq = htonl(ack); rep.th.ack = 1; rep.th.window = htons(win);
+#ifdef CONFIG_TCP_AUTHOPT + if (static_branch_unlikely(&tcp_authopt_needed)) { + int aoret, offset = (tsecr) ? 3 : 0; + + aoret = tcp_v4_authopt_handle_reply(sk, skb, &rep.opt[offset], &rep.th); + if (aoret) { + arg.iov[0].iov_len += aoret; + key = NULL; + } + } +#endif #ifdef CONFIG_TCP_MD5SIG if (key) { int offset = (tsecr) ? 3 : 0;
rep.opt[offset++] = htonl((TCPOPT_NOP << 24) |
Add implementation and tests for Sequence Number Extension.
One implementation is based on an IETF draft: https://datatracker.ietf.org/doc/draft-touch-sne/
The linux implementation is simpler and doesn't require additional flags, it just relies on standard before/after macros.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- .../tcp_authopt/tcp_authopt_test/sne_alg.py | 111 ++++++++++++++++++ .../tcp_authopt_test/test_sne_alg.py | 96 +++++++++++++++ 2 files changed, 207 insertions(+) create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/sne_alg.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne_alg.py
diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/sne_alg.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/sne_alg.py new file mode 100644 index 000000000000..252356dc87a4 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/sne_alg.py @@ -0,0 +1,111 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Python implementation of SNE algorithms""" + + +def distance(x, y): + if x < y: + return y - x + else: + return x - y + + +class SequenceNumberExtender: + """Based on https://datatracker.ietf.org/doc/draft-touch-sne/""" + + sne: int = 0 + sne_flag: int = 1 + prev_seq: int = 0 + + def calc(self, seq): + """Update internal state and return SNE for certain SEQ""" + # use current SNE to start + result = self.sne + + # both in same SNE range? + if distance(seq, self.prev_seq) < 0x80000000: + # jumps fwd over N/2? + if seq >= 0x80000000 and self.prev_seq < 0x80000000: + self.sne_flag = 0 + # move prev forward if needed + self.prev_seq = max(seq, self.prev_seq) + # both in diff SNE ranges? + else: + # jumps forward over zero? + if seq < 0x80000000: + # update prev + self.prev_seq = seq + # first jump over zero? (wrap) + if self.sne_flag == 0: + # set flag so we increment once + self.sne_flag = 1 + # increment window + self.sne = self.sne + 1 + # use updated SNE value + result = self.sne + # jump backward over zero? + else: + # use pre-rollover SNE value + result = self.sne - 1 + + return result + + +class SequenceNumberExtenderRFC: + """Based on sample code in original RFC5925 document""" + + sne: int = 0 + sne_flag: int = 1 + prev_seq: int = 0 + + def calc(self, seq): + """Update internal state and return SNE for certain SEQ""" + # set the flag when the SEG.SEQ first rolls over + if self.sne_flag == 0 and self.prev_seq > 0x7FFFFFFF and seq < 0x7FFFFFFF: + self.sne = self.sne + 1 + self.sne_flag = 1 + # decide which SNE to use after incremented + if self.sne_flag and seq > 0x7FFFFFFF: + # use the pre-increment value + sne = self.sne - 1 + else: + # use the current value + sne = self.sne + # reset the flag in the *middle* of the window + if self.prev_seq < 0x7FFFFFFF and seq > 0x7FFFFFFF: + self.sne_flag = 0 + # save the current SEQ for the next time through the code + self.prev_seq = seq + + return sne + + +def tcp_seq_before(a, b) -> bool: + return ((a - b) & 0xFFFFFFFF) > 0x80000000 + + +def tcp_seq_after(a, b) -> bool: + return tcp_seq_before(a, b) + + +class SequenceNumberExtenderLinux: + """Based on linux implementation and with no extra flags""" + + sne: int = 0 + prev_seq: int = 0 + + def reset(self, seq, sne=0): + self.prev_seq = seq + self.sne = sne + + def calc(self, seq, update=True): + sne = self.sne + if tcp_seq_before(seq, self.prev_seq): + if seq > self.prev_seq: + sne -= 1 + else: + if seq < self.prev_seq: + sne += 1 + if update and tcp_seq_before(self.prev_seq, seq): + self.prev_seq = seq + self.sne = sne + return sne diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne_alg.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne_alg.py new file mode 100644 index 000000000000..9b74873cff4a --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne_alg.py @@ -0,0 +1,96 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Test SNE algorithm implementations""" + +import logging + +import pytest + +from .sne_alg import ( + SequenceNumberExtender, + SequenceNumberExtenderLinux, + SequenceNumberExtenderRFC, +) + +logger = logging.getLogger(__name__) + + +# Data from https://datatracker.ietf.org/doc/draft-touch-sne/ +_SNE_TEST_DATA = [ + (0x00000000, 0x00000000), + (0x00000000, 0x30000000), + (0x00000000, 0x90000000), + (0x00000000, 0x70000000), + (0x00000000, 0xA0000000), + (0x00000001, 0x00000001), + (0x00000000, 0xE0000000), + (0x00000001, 0x00000000), + (0x00000001, 0x7FFFFFFF), + (0x00000001, 0x00000000), + (0x00000001, 0x50000000), + (0x00000001, 0x80000000), + (0x00000001, 0x00000001), + (0x00000001, 0x40000000), + (0x00000001, 0x90000000), + (0x00000001, 0xB0000000), + (0x00000002, 0x0FFFFFFF), + (0x00000002, 0x20000000), + (0x00000002, 0x90000000), + (0x00000002, 0x70000000), + (0x00000002, 0xA0000000), + (0x00000003, 0x00004000), + (0x00000002, 0xD0000000), + (0x00000003, 0x20000000), + (0x00000003, 0x90000000), + (0x00000003, 0x70000000), + (0x00000003, 0xA0000000), + (0x00000004, 0x00004000), + (0x00000003, 0xD0000000), +] + + +# Easier test data with small jumps <= 0x30000000 +SNE_DATA_EASY = [ + (0x00000000, 0x00000000), + (0x00000000, 0x30000000), + (0x00000000, 0x60000000), + (0x00000000, 0x80000000), + (0x00000000, 0x90000000), + (0x00000000, 0xC0000000), + (0x00000000, 0xF0000000), + (0x00000001, 0x10000000), + (0x00000000, 0xF0030000), + (0x00000001, 0x00030000), + (0x00000001, 0x10030000), +] + + +def check_sne_alg(alg, data): + for sne, seq in data: + observed_sne = alg.calc(seq) + logger.info( + "seq %08x expected sne %08x observed sne %08x", seq, sne, observed_sne + ) + assert observed_sne == sne + + +def test_sne_alg(): + check_sne_alg(SequenceNumberExtender(), _SNE_TEST_DATA) + + +def test_sne_alg_easy(): + check_sne_alg(SequenceNumberExtender(), SNE_DATA_EASY) + + +@pytest.mark.xfail +def test_sne_alg_rfc(): + check_sne_alg(SequenceNumberExtenderRFC(), _SNE_TEST_DATA) + + +@pytest.mark.xfail +def test_sne_alg_rfc_easy(): + check_sne_alg(SequenceNumberExtenderRFC(), SNE_DATA_EASY) + + +def test_sne_alg_linux(): + check_sne_alg(SequenceNumberExtenderLinux(), _SNE_TEST_DATA) + check_sne_alg(SequenceNumberExtenderLinux(), SNE_DATA_EASY)
Tools like tcpdump and wireshark can parse the TCP Authentication Option but there is not yet support to verify correct signatures.
This patch implements TCP-AO signature verification using scapy and the python cryptography package.
The python code is verified itself with a subset of IETF test vectors from this page: https://datatracker.ietf.org/doc/html/draft-touch-tcpm-ao-test-vectors-02
The code in this commit is not specific to linux
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- .../tcp_authopt_test/scapy_tcp_authopt.py | 220 +++++++++++ .../tcp_authopt_test/scapy_utils.py | 177 +++++++++ .../tcp_authopt_test/test_vectors.py | 365 ++++++++++++++++++ .../tcp_authopt/tcp_authopt_test/validator.py | 138 +++++++ 4 files changed, 900 insertions(+) create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_tcp_authopt.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_utils.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vectors.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/validator.py
diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_tcp_authopt.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_tcp_authopt.py new file mode 100644 index 000000000000..ce36321b803a --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_tcp_authopt.py @@ -0,0 +1,220 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Packet-processing utilities implementing RFC5925 and RFC2926""" + +import hmac +import logging +import struct + +from scapy.layers.inet import TCP +from scapy.packet import Packet + +from .scapy_utils import ( + TCPOPT_AUTHOPT, + IPvXAddress, + get_packet_ipvx_dst, + get_packet_ipvx_src, + get_tcp_doff, + get_tcp_pseudoheader, +) + +logger = logging.getLogger(__name__) + + +def _cmac_aes_digest(key: bytes, msg: bytes) -> bytes: + from cryptography.hazmat.backends import default_backend + from cryptography.hazmat.primitives import cmac + from cryptography.hazmat.primitives.ciphers import algorithms + + backend = default_backend() + c = cmac.CMAC(algorithms.AES(key), backend=backend) + c.update(bytes(msg)) + return c.finalize() + + +class TcpAuthOptAlg: + @classmethod + def kdf(cls, master_key: bytes, context: bytes) -> bytes: + raise NotImplementedError() + + @classmethod + def mac(cls, traffic_key: bytes, message: bytes) -> bytes: + raise NotImplementedError() + + maclen = -1 + + +class TcpAuthOptAlg_HMAC_SHA1(TcpAuthOptAlg): + @classmethod + def kdf(cls, master_key: bytes, context: bytes) -> bytes: + input = b"\x01" + b"TCP-AO" + context + b"\x00\xa0" + return hmac.digest(master_key, input, "SHA1") + + @classmethod + def mac(cls, traffic_key: bytes, message: bytes) -> bytes: + return hmac.digest(traffic_key, message, "SHA1")[:12] + + maclen = 12 + + +class TcpAuthOptAlg_CMAC_AES(TcpAuthOptAlg): + @classmethod + def kdf(self, master_key: bytes, context: bytes) -> bytes: + if len(master_key) == 16: + key = master_key + else: + key = _cmac_aes_digest(b"\x00" * 16, master_key) + return _cmac_aes_digest(key, b"\x01" + b"TCP-AO" + context + b"\x00\x80") + + @classmethod + def mac(self, traffic_key: bytes, message: bytes) -> bytes: + return _cmac_aes_digest(traffic_key, message)[:12] + + maclen = 12 + + +def get_alg(name: str) -> TcpAuthOptAlg: + if name.upper() == "HMAC-SHA-1-96": + return TcpAuthOptAlg_HMAC_SHA1() + elif name.upper() == "AES-128-CMAC-96": + return TcpAuthOptAlg_CMAC_AES() + else: + raise ValueError(f"Bad TCP AuthOpt algorithms {name}") + + +def build_context( + saddr: IPvXAddress, daddr: IPvXAddress, sport, dport, src_isn, dst_isn +) -> bytes: + """Build context bytes as specified by RFC5925 section 5.2""" + return ( + saddr.packed + + daddr.packed + + struct.pack( + "!HHII", + sport, + dport, + src_isn, + dst_isn, + ) + ) + + +def build_context_from_packet(p: Packet, src_isn: int, dst_isn: int) -> bytes: + """Build context based on a scapy Packet and src/dst initial-sequence numbers""" + return build_context( + get_packet_ipvx_src(p), + get_packet_ipvx_dst(p), + p[TCP].sport, + p[TCP].dport, + src_isn, + dst_isn, + ) + + +def build_message_from_packet(p: Packet, include_options=True, sne=0) -> bytearray: + """Build message bytes as described by RFC5925 section 5.1""" + result = bytearray() + result += struct.pack("!I", sne) + th = p[TCP] + + # ip pseudo-header: + result += get_tcp_pseudoheader(th) + + # tcp header with checksum set to zero + th_bytes = bytes(p[TCP]) + result += th_bytes[:16] + result += b"\x00\x00" + result += th_bytes[18:20] + + # Even if include_options=False the TCP-AO option itself is still included + # with the MAC set to all-zeros. This means we need to parse TCP options. + pos = 20 + tcphdr_optend = get_tcp_doff(th) * 4 + # logger.info("th_bytes: %s", th_bytes.hex(' ')) + assert len(th_bytes) >= tcphdr_optend + while pos < tcphdr_optend: + optnum = th_bytes[pos] + pos += 1 + if optnum == 0 or optnum == 1: + if include_options: + result += bytes([optnum]) + continue + + optlen = th_bytes[pos] + pos += 1 + if pos + optlen - 2 > tcphdr_optend: + logger.info( + "bad tcp option %d optlen %d beyond end-of-header", optnum, optlen + ) + break + if optlen < 2: + logger.info("bad tcp option %d optlen %d less than two", optnum, optlen) + break + if optnum == TCPOPT_AUTHOPT: + if optlen < 4: + logger.info("bad tcp option %d optlen %d", optnum, optlen) + break + result += bytes([optnum, optlen]) + result += th_bytes[pos : pos + 2] + result += (optlen - 4) * b"\x00" + elif include_options: + result += bytes([optnum, optlen]) + result += th_bytes[pos : pos + optlen - 2] + pos += optlen - 2 + result += bytes(p[TCP].payload) + return result + + +def check_tcp_authopt_signature( + p: Packet, alg: TcpAuthOptAlg, master_key, sisn, disn, include_options=True, sne=0 +): + from .scapy_utils import scapy_tcp_get_authopt_val + + ao = scapy_tcp_get_authopt_val(p[TCP]) + if ao is None: + return None + + context_bytes = build_context_from_packet(p, sisn, disn) + traffic_key = alg.kdf(master_key, context_bytes) + message_bytes = build_message_from_packet( + p, include_options=include_options, sne=sne + ) + mac = alg.mac(traffic_key, message_bytes) + return mac == ao.mac + + +def add_tcp_authopt_signature( + p: Packet, + alg: TcpAuthOptAlg, + master_key, + sisn, + disn, + keyid=0, + rnextkeyid=0, + include_options=True, + sne=0, +): + """Sign a packet""" + th = p[TCP] + keyids = struct.pack("BB", keyid, rnextkeyid) + th.options = th.options + [(TCPOPT_AUTHOPT, keyids + alg.maclen * b"\x00")] + + context_bytes = build_context_from_packet(p, sisn, disn) + traffic_key = alg.kdf(master_key, context_bytes) + message_bytes = build_message_from_packet( + p, include_options=include_options, sne=sne + ) + mac = alg.mac(traffic_key, message_bytes) + th.options[-1] = (TCPOPT_AUTHOPT, keyids + mac) + + +def break_tcp_authopt_signature(packet: Packet): + """Invalidate TCP-AO signature inside a packet + + The packet must already be signed and it gets modified in-place. + """ + opt = packet[TCP].options[-1] + if opt[0] != TCPOPT_AUTHOPT: + raise ValueError("TCP option list must end with TCP_AUTHOPT") + opt_mac = bytearray(opt[1]) + opt_mac[-1] ^= 0xFF + packet[TCP].options[-1] = (opt[0], bytes(opt_mac)) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_utils.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_utils.py new file mode 100644 index 000000000000..03c843b8378e --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_utils.py @@ -0,0 +1,177 @@ +# SPDX-License-Identifier: GPL-2.0 +import socket +import struct +import threading +import typing +from dataclasses import dataclass +from ipaddress import IPv4Address, IPv6Address + +from scapy.config import conf as scapy_conf +from scapy.layers.inet import IP, TCP +from scapy.layers.inet6 import IPv6 +from scapy.packet import Packet +from scapy.sendrecv import AsyncSniffer + +from .utils import netns_context + +# TCPOPT numbers are apparently not available in scapy +TCPOPT_MD5SIG = 19 +TCPOPT_AUTHOPT = 29 + +# Easy generic handling of IPv4/IPv6Address +IPvXAddress = typing.Union[IPv4Address, IPv6Address] + + +def get_packet_ipvx_src(p: Packet) -> IPvXAddress: + if IP in p: + return IPv4Address(p[IP].src) + elif IPv6 in p: + return IPv6Address(p[IPv6].src) + else: + raise Exception("Neither IP nor IPv6 found on packet") + + +def get_packet_ipvx_dst(p: Packet) -> IPvXAddress: + if IP in p: + return IPv4Address(p[IP].dst) + elif IPv6 in p: + return IPv6Address(p[IPv6].dst) + else: + raise Exception("Neither IP nor IPv6 found on packet") + + +def get_tcp_doff(th: TCP): + """Get the TCP data offset, even if packet is not yet built""" + doff = th.dataofs + if doff is None: + opt_len = len(th.get_field("options").i2m(th, th.options)) + doff = 5 + ((opt_len + 3) // 4) + return doff + + +def get_tcp_v4_pseudoheader(tcp_packet: TCP) -> bytes: + iph = tcp_packet.underlayer + return struct.pack( + "!4s4sHH", + IPv4Address(iph.src).packed, + IPv4Address(iph.dst).packed, + socket.IPPROTO_TCP, + get_tcp_doff(tcp_packet) * 4 + len(tcp_packet.payload), + ) + + +def get_tcp_v6_pseudoheader(tcp_packet: TCP) -> bytes: + ipv6 = tcp_packet.underlayer + return struct.pack( + "!16s16sII", + IPv6Address(ipv6.src).packed, + IPv6Address(ipv6.dst).packed, + get_tcp_doff(tcp_packet) * 4 + len(tcp_packet.payload), + socket.IPPROTO_TCP, + ) + + +def get_tcp_pseudoheader(tcp_packet: TCP): + if isinstance(tcp_packet.underlayer, IP): + return get_tcp_v4_pseudoheader(tcp_packet) + if isinstance(tcp_packet.underlayer, IPv6): + return get_tcp_v6_pseudoheader(tcp_packet) + raise ValueError("TCP underlayer is neither IP nor IPv6") + + +def tcp_seq_wrap(seq): + return seq & 0xFFFFFFFF + + +@dataclass +class tcphdr_authopt: + """Representation of a TCP auth option as it appears in a TCP packet""" + + keyid: int + rnextkeyid: int + mac: bytes + + @classmethod + def unpack(cls, buf) -> "tcphdr_authopt": + return cls(buf[0], buf[1], buf[2:]) + + def __repr__(self): + return f"tcphdr_authopt({self.keyid}, {self.rnextkeyid}, bytes.fromhex({self.mac.hex(' ')!r})" + + +def scapy_tcp_get_authopt_val(tcp) -> typing.Optional[tcphdr_authopt]: + for optnum, optval in tcp.options: + if optnum == TCPOPT_AUTHOPT: + return tcphdr_authopt.unpack(optval) + return None + + +def scapy_tcp_get_md5_sig(tcp) -> typing.Optional[bytes]: + """Return the MD5 signature (as bytes) or None""" + for optnum, optval in tcp.options: + if optnum == TCPOPT_MD5SIG: + return optval + return None + + +def calc_tcp_md5_hash(p, key: bytes) -> bytes: + """Calculate TCP-MD5 hash from packet and return a 16-byte string""" + import hashlib + + h = hashlib.md5() + tp = p[TCP] + th_bytes = bytes(p[TCP]) + h.update(get_tcp_pseudoheader(tp)) + h.update(th_bytes[:16]) + h.update(b"\x00\x00") + h.update(th_bytes[18:20]) + h.update(bytes(tp.payload)) + h.update(key) + + return h.digest() + + +def create_l2socket(ns: str = "", **kw): + """Create a scapy L2socket inside a namespace""" + + with netns_context(ns): + return scapy_conf.L2socket(**kw) + + +def create_capture_socket(ns: str = "", **kw): + """Create a scapy L2listen socket inside a namespace""" + from scapy.config import conf as scapy_conf + + with netns_context(ns): + return scapy_conf.L2listen(**kw) + + +def scapy_sniffer_start_block(sniffer: AsyncSniffer, timeout=1): + """Like AsyncSniffer.start except block until sniffing starts + + This ensures no lost packets and no delays + """ + if sniffer.kwargs.get("started_callback"): + raise ValueError("sniffer must not already have a started_callback") + + e = threading.Event() + sniffer.kwargs["started_callback"] = e.set + sniffer.start() + e.wait(timeout=timeout) + if not e.is_set(): + raise TimeoutError("Timed out waiting for sniffer to start") + + +def scapy_sniffer_stop(sniffer: AsyncSniffer): + """Like AsyncSniffer.stop except no error is raising if not running""" + if sniffer is not None and sniffer.running: + sniffer.stop() + + +class AsyncSnifferContext(AsyncSniffer): + def __enter__(self): + scapy_sniffer_start_block(self) + return self + + def __exit__(self, *a): + scapy_sniffer_stop(self) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vectors.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vectors.py new file mode 100644 index 000000000000..e0fcde04629c --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vectors.py @@ -0,0 +1,365 @@ +# SPDX-License-Identifier: GPL-2.0 +import logging +import socket +from ipaddress import IPv4Address, IPv6Address + +from scapy.layers.inet import IP, TCP +from scapy.layers.inet6 import IPv6 + +from .scapy_tcp_authopt import ( + build_context_from_packet, + build_message_from_packet, + get_alg, +) +from .scapy_utils import scapy_tcp_get_authopt_val + +logger = logging.getLogger(__name__) + + +class TestIETFVectors: + """Test python implementation of TCP-AO algorithms + + Data is a subset of IETF test vectors: + https://datatracker.ietf.org/doc/html/draft-touch-tcpm-ao-test-vectors-02 + """ + + master_key = b"testvector" + client_keyid = 61 + server_keyid = 84 + client_ipv4 = IPv4Address("10.11.12.13") + client_ipv6 = IPv6Address("FD00::1") + server_ipv4 = IPv4Address("172.27.28.29") + server_ipv6 = IPv6Address("FD00::2") + + client_isn_41x = 0xFBFBAB5A + server_isn_41x = 0x11C14261 + client_isn_42x = 0xCB0EFBEE + server_isn_42x = 0xACD5B5E1 + client_isn_61x = 0x176A833F + server_isn_61x = 0x3F51994B + client_isn_62x = 0x020C1E69 + server_isn_62x = 0xEBA3734D + + def check( + self, + packet_hex: str, + traffic_key_hex: str, + mac_hex: str, + src_isn, + dst_isn, + include_options=True, + alg_name="HMAC-SHA-1-96", + sne=0, + ): + packet_bytes = bytes.fromhex(packet_hex) + + # sanity check for ip version + ipv = packet_bytes[0] >> 4 + if ipv == 4: + p = IP(bytes.fromhex(packet_hex)) + assert p[IP].proto == socket.IPPROTO_TCP + elif ipv == 6: + p = IPv6(bytes.fromhex(packet_hex)) + assert p[IPv6].nh == socket.IPPROTO_TCP + else: + raise ValueError(f"bad ipv={ipv}") + + # sanity check for seq/ack in SYN/ACK packets + if p[TCP].flags.S and p[TCP].flags.A is False: + assert p[TCP].seq == src_isn + assert p[TCP].ack == 0 + if p[TCP].flags.S and p[TCP].flags.A: + assert p[TCP].seq == src_isn + assert p[TCP].ack == dst_isn + 1 + + # check traffic key + alg = get_alg(alg_name) + context_bytes = build_context_from_packet(p, src_isn, dst_isn) + traffic_key = alg.kdf(self.master_key, context_bytes) + assert traffic_key.hex(" ") == traffic_key_hex + + # check mac + message_bytes = build_message_from_packet( + p, include_options=include_options, sne=sne + ) + mac = alg.mac(traffic_key, message_bytes) + assert mac.hex(" ") == mac_hex + + # check option bytes in header + opt = scapy_tcp_get_authopt_val(p[TCP]) + assert opt is not None + assert opt.keyid in [self.client_keyid, self.server_keyid] + assert opt.rnextkeyid in [self.client_keyid, self.server_keyid] + assert opt.mac.hex(" ") == mac_hex + + def test_4_1_1(self): + self.check( + """ + 45 e0 00 4c dd 0f 40 00 ff 06 bf 6b 0a 0b 0c 0d + ac 1b 1c 1d e9 d7 00 b3 fb fb ab 5a 00 00 00 00 + e0 02 ff ff ca c4 00 00 02 04 05 b4 01 03 03 08 + 04 02 08 0a 00 15 5a b7 00 00 00 00 1d 10 3d 54 + 2e e4 37 c6 f8 ed e6 d7 c4 d6 02 e7 + """, + "6d 63 ef 1b 02 fe 15 09 d4 b1 40 27 07 fd 7b 04 16 ab b7 4f", + "2e e4 37 c6 f8 ed e6 d7 c4 d6 02 e7", + self.client_isn_41x, + 0, + ) + + def test_4_1_2(self): + self.check( + """ + 45 e0 00 4c 65 06 40 00 ff 06 37 75 ac 1b 1c 1d + 0a 0b 0c 0d 00 b3 e9 d7 11 c1 42 61 fb fb ab 5b + e0 12 ff ff 37 76 00 00 02 04 05 b4 01 03 03 08 + 04 02 08 0a 84 a5 0b eb 00 15 5a b7 1d 10 54 3d + ee ab 0f e2 4c 30 10 81 51 16 b3 be + """, + "d9 e2 17 e4 83 4a 80 ca 2f 3f d8 de 2e 41 b8 e6 79 7f ea 96", + "ee ab 0f e2 4c 30 10 81 51 16 b3 be", + self.server_isn_41x, + self.client_isn_41x, + ) + + def test_4_1_3(self): + self.check( + """ + 45 e0 00 87 36 a1 40 00 ff 06 65 9f 0a 0b 0c 0d + ac 1b 1c 1d e9 d7 00 b3 fb fb ab 5b 11 c1 42 62 + c0 18 01 04 a1 62 00 00 01 01 08 0a 00 15 5a c1 + 84 a5 0b eb 1d 10 3d 54 70 64 cf 99 8c c6 c3 15 + c2 c2 e2 bf ff ff ff ff ff ff ff ff ff ff ff ff + ff ff ff ff 00 43 01 04 da bf 00 b4 0a 0b 0c 0d + 26 02 06 01 04 00 01 00 01 02 02 80 00 02 02 02 + 00 02 02 42 00 02 06 41 04 00 00 da bf 02 08 40 + 06 00 64 00 01 01 00 + """, + "d2 e5 9c 65 ff c7 b1 a3 93 47 65 64 63 b7 0e dc 24 a1 3d 71", + "70 64 cf 99 8c c6 c3 15 c2 c2 e2 bf", + self.client_isn_41x, + self.server_isn_41x, + ) + + def test_4_1_4(self): + self.check( + """ + 45 e0 00 87 1f a9 40 00 ff 06 7c 97 ac 1b 1c 1d + 0a 0b 0c 0d 00 b3 e9 d7 11 c1 42 62 fb fb ab 9e + c0 18 01 00 40 0c 00 00 01 01 08 0a 84 a5 0b f5 + 00 15 5a c1 1d 10 54 3d a6 3f 0e cb bb 2e 63 5c + 95 4d ea c7 ff ff ff ff ff ff ff ff ff ff ff ff + ff ff ff ff 00 43 01 04 da c0 00 b4 ac 1b 1c 1d + 26 02 06 01 04 00 01 00 01 02 02 80 00 02 02 02 + 00 02 02 42 00 02 06 41 04 00 00 da c0 02 08 40 + 06 00 64 00 01 01 00 + """, + "d9 e2 17 e4 83 4a 80 ca 2f 3f d8 de 2e 41 b8 e6 79 7f ea 96", + "a6 3f 0e cb bb 2e 63 5c 95 4d ea c7", + self.server_isn_41x, + self.client_isn_41x, + ) + + def test_4_2_1(self): + self.check( + """ + 45 e0 00 4c 53 99 40 00 ff 06 48 e2 0a 0b 0c 0d + ac 1b 1c 1d ff 12 00 b3 cb 0e fb ee 00 00 00 00 + e0 02 ff ff 54 1f 00 00 02 04 05 b4 01 03 03 08 + 04 02 08 0a 00 02 4c ce 00 00 00 00 1d 10 3d 54 + 80 af 3c fe b8 53 68 93 7b 8f 9e c2 + """, + "30 ea a1 56 0c f0 be 57 da b5 c0 45 22 9f b1 0a 42 3c d7 ea", + "80 af 3c fe b8 53 68 93 7b 8f 9e c2", + self.client_isn_42x, + 0, + include_options=False, + ) + + def test_4_2_2(self): + self.check( + """ + 45 e0 00 4c 32 84 40 00 ff 06 69 f7 ac 1b 1c 1d + 0a 0b 0c 0d 00 b3 ff 12 ac d5 b5 e1 cb 0e fb ef + e0 12 ff ff 38 8e 00 00 02 04 05 b4 01 03 03 08 + 04 02 08 0a 57 67 72 f3 00 02 4c ce 1d 10 54 3d + 09 30 6f 9a ce a6 3a 8c 68 cb 9a 70 + """, + "b5 b2 89 6b b3 66 4e 81 76 b0 ed c6 e7 99 52 41 01 a8 30 7f", + "09 30 6f 9a ce a6 3a 8c 68 cb 9a 70", + self.server_isn_42x, + self.client_isn_42x, + include_options=False, + ) + + def test_4_2_3(self): + self.check( + """ + 45 e0 00 87 a8 f5 40 00 ff 06 f3 4a 0a 0b 0c 0d + ac 1b 1c 1d ff 12 00 b3 cb 0e fb ef ac d5 b5 e2 + c0 18 01 04 6c 45 00 00 01 01 08 0a 00 02 4c ce + 57 67 72 f3 1d 10 3d 54 71 06 08 cc 69 6c 03 a2 + 71 c9 3a a5 ff ff ff ff ff ff ff ff ff ff ff ff + ff ff ff ff 00 43 01 04 da bf 00 b4 0a 0b 0c 0d + 26 02 06 01 04 00 01 00 01 02 02 80 00 02 02 02 + 00 02 02 42 00 02 06 41 04 00 00 da bf 02 08 40 + 06 00 64 00 01 01 00 + """, + "f3 db 17 93 d7 91 0e cd 80 6c 34 f1 55 ea 1f 00 34 59 53 e3", + "71 06 08 cc 69 6c 03 a2 71 c9 3a a5", + self.client_isn_42x, + self.server_isn_42x, + include_options=False, + ) + + def test_4_2_4(self): + self.check( + """ + 45 e0 00 87 54 37 40 00 ff 06 48 09 ac 1b 1c 1d + 0a 0b 0c 0d 00 b3 ff 12 ac d5 b5 e2 cb 0e fc 32 + c0 18 01 00 46 b6 00 00 01 01 08 0a 57 67 72 f3 + 00 02 4c ce 1d 10 54 3d 97 76 6e 48 ac 26 2d e9 + ae 61 b4 f9 ff ff ff ff ff ff ff ff ff ff ff ff + ff ff ff ff 00 43 01 04 da c0 00 b4 ac 1b 1c 1d + 26 02 06 01 04 00 01 00 01 02 02 80 00 02 02 02 + 00 02 02 42 00 02 06 41 04 00 00 da c0 02 08 40 + 06 00 64 00 01 01 00 + """, + "b5 b2 89 6b b3 66 4e 81 76 b0 ed c6 e7 99 52 41 01 a8 30 7f", + "97 76 6e 48 ac 26 2d e9 ae 61 b4 f9", + self.server_isn_42x, + self.client_isn_42x, + include_options=False, + ) + + def test_5_1_1(self): + self.check( + """ + 45 e0 00 4c 7b 9f 40 00 ff 06 20 dc 0a 0b 0c 0d + ac 1b 1c 1d c4 fa 00 b3 78 7a 1d df 00 00 00 00 + e0 02 ff ff 5a 0f 00 00 02 04 05 b4 01 03 03 08 + 04 02 08 0a 00 01 7e d0 00 00 00 00 1d 10 3d 54 + e4 77 e9 9c 80 40 76 54 98 e5 50 91 + """, + "f5 b8 b3 d5 f3 4f db b6 eb 8d 4a b9 66 0e 60 e3", + "e4 77 e9 9c 80 40 76 54 98 e5 50 91", + 0x787A1DDF, + 0, + include_options=True, + alg_name="AES-128-CMAC-96", + ) + + def test_6_1_1(self): + self.check( + """ + 6e 08 91 dc 00 38 06 40 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 01 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 02 f7 e4 00 b3 17 6a 83 3f + 00 00 00 00 e0 02 ff ff 47 21 00 00 02 04 05 a0 + 01 03 03 08 04 02 08 0a 00 41 d0 87 00 00 00 00 + 1d 10 3d 54 90 33 ec 3d 73 34 b6 4c 5e dd 03 9f + """, + "62 5e c0 9d 57 58 36 ed c9 b6 42 84 18 bb f0 69 89 a3 61 bb", + "90 33 ec 3d 73 34 b6 4c 5e dd 03 9f", + self.client_isn_61x, + 0, + include_options=True, + ) + + def test_6_1_2(self): + self.check( + """ + 6e 01 00 9e 00 38 06 40 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 02 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 01 00 b3 f7 e4 3f 51 99 4b + 17 6a 83 40 e0 12 ff ff bf ec 00 00 02 04 05 a0 + 01 03 03 08 04 02 08 0a bd 33 12 9b 00 41 d0 87 + 1d 10 54 3d f1 cb a3 46 c3 52 61 63 f7 1f 1f 55 + """, + "e4 a3 7a da 2a 0a fc a8 71 14 34 91 3f e1 38 c7 71 eb cb 4a", + "f1 cb a3 46 c3 52 61 63 f7 1f 1f 55", + self.server_isn_61x, + self.client_isn_61x, + include_options=True, + ) + + def test_6_2_2(self): + self.check( + """ + 6e 0a 7e 1f 00 38 06 40 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 02 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 01 00 b3 c6 cd eb a3 73 4d + 02 0c 1e 6a e0 12 ff ff 77 4d 00 00 02 04 05 a0 + 01 03 03 08 04 02 08 0a 5e c9 9b 70 00 9d b9 5b + 1d 10 54 3d 3c 54 6b ad 97 43 f1 2d f8 b8 01 0d + """, + "40 51 08 94 7f 99 65 75 e7 bd bc 26 d4 02 16 a2 c7 fa 91 bd", + "3c 54 6b ad 97 43 f1 2d f8 b8 01 0d", + self.server_isn_62x, + self.client_isn_62x, + include_options=False, + ) + + def test_6_2_4(self): + self.check( + """ + 6e 0a 7e 1f 00 73 06 40 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 02 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 01 00 b3 c6 cd eb a3 73 4e + 02 0c 1e ad c0 18 01 00 71 6a 00 00 01 01 08 0a + 5e c9 9b 7a 00 9d b9 65 1d 10 54 3d 55 9a 81 94 + 45 b4 fd e9 8d 9e 13 17 ff ff ff ff ff ff ff ff + ff ff ff ff ff ff ff ff 00 43 01 04 fd e8 00 b4 + 01 01 01 7a 26 02 06 01 04 00 01 00 01 02 02 80 + 00 02 02 02 00 02 02 42 00 02 06 41 04 00 00 fd + e8 02 08 40 06 00 64 00 01 01 00 + """, + "40 51 08 94 7f 99 65 75 e7 bd bc 26 d4 02 16 a2 c7 fa 91 bd", + "55 9a 81 94 45 b4 fd e9 8d 9e 13 17", + self.server_isn_62x, + self.client_isn_62x, + include_options=False, + ) + + server_isn_71x = 0xA6744ECB + client_isn_71x = 0x193CCCEC + + def test_7_1_2(self): + self.check( + """ + 6e 06 15 20 00 38 06 40 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 02 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 01 00 b3 f8 5a a6 74 4e cb + 19 3c cc ed e0 12 ff ff ea bb 00 00 02 04 05 a0 + 01 03 03 08 04 02 08 0a 71 da ab c8 13 e4 ab 99 + 1d 10 54 3d dc 28 43 a8 4e 78 a6 bc fd c5 ed 80 + """, + "cf 1b 1e 22 5e 06 a6 36 16 76 4a 06 7b 46 f4 b1", + "dc 28 43 a8 4e 78 a6 bc fd c5 ed 80", + self.server_isn_71x, + self.client_isn_71x, + alg_name="AES-128-CMAC-96", + include_options=True, + ) + + def test_7_1_4(self): + self.check( + """ + 6e 06 15 20 00 73 06 40 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 02 fd 00 00 00 00 00 00 00 + 00 00 00 00 00 00 00 01 00 b3 f8 5a a6 74 4e cc + 19 3c cd 30 c0 18 01 00 52 f4 00 00 01 01 08 0a + 71 da ab d3 13 e4 ab a3 1d 10 54 3d c1 06 9b 7d + fd 3d 69 3a 6d f3 f2 89 ff ff ff ff ff ff ff ff + ff ff ff ff ff ff ff ff 00 43 01 04 fd e8 00 b4 + 01 01 01 7a 26 02 06 01 04 00 01 00 01 02 02 80 + 00 02 02 02 00 02 02 42 00 02 06 41 04 00 00 fd + e8 02 08 40 06 00 64 00 01 01 00 + """, + "cf 1b 1e 22 5e 06 a6 36 16 76 4a 06 7b 46 f4 b1", + "c1 06 9b 7d fd 3d 69 3a 6d f3 f2 89", + self.server_isn_71x, + self.client_isn_71x, + alg_name="AES-128-CMAC-96", + include_options=True, + ) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/validator.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/validator.py new file mode 100644 index 000000000000..295220e3964d --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/validator.py @@ -0,0 +1,138 @@ +# SPDX-License-Identifier: GPL-2.0 +import logging +import typing +from dataclasses import dataclass + +from scapy.layers.inet import TCP +from scapy.packet import Packet + +from . import scapy_tcp_authopt +from .scapy_conntrack import TCPConnectionTracker, get_packet_tcp_connection_key +from .scapy_utils import scapy_tcp_get_authopt_val + +logger = logging.getLogger(__name__) + + +@dataclass +class TcpAuthValidatorKey: + """Representation of a TCP Authentication Option key for the validator + + The matching rules are independent. + """ + + key: bytes + alg_name: str + include_options: bool = True + keyid: typing.Optional[int] = None + sport: typing.Optional[int] = None + dport: typing.Optional[int] = None + + def match_packet(self, p: Packet) -> bool: + """Determine if this key matches a specific packet""" + if not TCP in p: + return False + authopt = scapy_tcp_get_authopt_val(p[TCP]) + if authopt is None: + return False + if self.keyid is not None and authopt.keyid != self.keyid: + return False + if self.sport is not None and p[TCP].sport != self.sport: + return False + if self.dport is not None and p[TCP].dport != self.dport: + return False + return True + + def get_alg_imp(self): + return scapy_tcp_authopt.get_alg(self.alg_name) + + +class TcpAuthValidator: + """Validate TCP Authentication Option signatures inside a capture + + This can track multiple connections, determine their initial sequence numbers + and verify their signatures independently. + + Keys are provided as a collection of `.TcpAuthValidatorKey` + """ + + keys: typing.List[TcpAuthValidatorKey] + tracker: TCPConnectionTracker + any_incomplete: bool = False + any_unsigned: bool = False + any_fail: bool = False + debug_sne: bool = False + + def __init__(self, keys=None): + self.keys = keys or [] + self.tracker = TCPConnectionTracker() + self.conn_dict = {} + + def get_key_for_packet(self, p): + for k in self.keys: + if k.match_packet(p): + return k + return None + + def handle_packet(self, p: Packet): + if not TCP in p: + return + self.tracker.handle_packet(p) + authopt = scapy_tcp_get_authopt_val(p[TCP]) + if not authopt: + self.any_unsigned = True + logger.debug("skip packet without tcp authopt: %r", p) + return + key = self.get_key_for_packet(p) + if not key: + self.any_unsigned = True + logger.debug("skip packet not matching any known keys: %r", p) + return + tcp_track_key = get_packet_tcp_connection_key(p) + conn = self.tracker.get(tcp_track_key) + + if conn is None: + raise ValueError( + "TCPConnectionTracker.handle_packet should have initialized TCPConnectionInfo" + ) + + if not conn.found_syn: + logger.warning("missing SYN for packet %s", p.summary()) + self.any_incomplete = True + return + if not conn.found_synack and not p[TCP].flags.S: + logger.warning("missing SYNACK for packet %s", p.summary()) + self.any_incomplete = True + return + + alg = key.get_alg_imp() + context_bytes = scapy_tcp_authopt.build_context_from_packet( + p, conn.sisn or 0, conn.disn or 0 + ) + traffic_key = alg.kdf(key.key, context_bytes) + sne = conn.snd_sne.calc(p[TCP].seq, update=False) + if self.debug_sne: + logger.debug("sne %08x seq %08x for %s", sne, p[TCP].seq, p[TCP].summary()) + message_bytes = scapy_tcp_authopt.build_message_from_packet( + p, + include_options=key.include_options, + sne=sne, + ) + computed_mac = alg.mac(traffic_key, message_bytes) + captured_mac = authopt.mac + if computed_mac == captured_mac: + logger.debug("ok - mac %s", computed_mac.hex()) + else: + self.any_fail = True + logger.error( + "not ok - captured %s computed %s", + captured_mac.hex(), + computed_mac.hex(), + ) + + def raise_errors(self, allow_unsigned=False, allow_incomplete=False): + if self.any_fail: + raise Exception("Found failed signatures") + if self.any_incomplete and not allow_incomplete: + raise Exception("Incomplete capture missing SYN/ACK") + if self.any_unsigned and not allow_unsigned: + raise Exception("Found unsigned packets")
This patch validates that the TCP-AO signatures inserted by linux are correct in all algorithm permutations, using scapy.
It also tests that TCP-AO behaves correctly in a number of corner cases such as:
* reset handling * timewait * syn-recv * ipv4-mapped ipv6 * interaction with tcp-md5
This reverts commit 297a301a4f1c3abe41d554a9f6df192257a017b8.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- .../full_tcp_sniff_session.py | 91 +++ .../tcp_authopt_test/linux_tcp_md5sig.py | 110 ++++ .../tcp_authopt_test/scapy_conntrack.py | 173 ++++++ .../tcp_connection_fixture.py | 276 +++++++++ .../tcp_authopt_test/test_verify_capture.py | 559 ++++++++++++++++++ 5 files changed, 1209 insertions(+) create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/full_tcp_sniff_session.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_md5sig.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_conntrack.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/tcp_connection_fixture.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_verify_capture.py
diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/full_tcp_sniff_session.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/full_tcp_sniff_session.py new file mode 100644 index 000000000000..d37f7d947bcd --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/full_tcp_sniff_session.py @@ -0,0 +1,91 @@ +# SPDX-License-Identifier: GPL-2.0 +import logging +import threading +import typing + +import scapy.sessions +from scapy.packet import Packet + +from .scapy_conntrack import TCPConnectionInfo, TCPConnectionTracker + +logger = logging.getLogger(__name__) + + +class FullTCPSniffSession(scapy.sessions.DefaultSession): + """Implementation of a scapy sniff session that can wait for a full TCP capture + + Allows another thread to wait for a complete FIN handshake without polling or sleep. + """ + + #: Server port used to identify client and server + server_port: int + #: Connection tracker + tracker: TCPConnectionTracker + + def __init__(self, server_port, **kw): + super().__init__(**kw) + self.server_port = server_port + self.tracker = TCPConnectionTracker() + self._close_event = threading.Event() + self._init_isn_event = threading.Event() + self._client_info = None + self._server_info = None + + @property + def client_info(self) -> TCPConnectionInfo: + if not self._client_info: + self._client_info = self.tracker.match_one(dport=self.server_port) + return self._client_info + + @property + def server_info(self) -> TCPConnectionInfo: + if not self._server_info: + self._server_info = self.tracker.match_one(sport=self.server_port) + return self._server_info + + @property + def client_isn(self): + return self.client_info.sisn + + @property + def server_isn(self): + return self.server_info.sisn + + def on_packet_received(self, p: Packet): + super().on_packet_received(p) + self.tracker.handle_packet(p) + + # check events: + if self.client_info.sisn is not None and self.client_info.disn is not None: + assert ( + self.client_info.sisn == self.server_info.disn + and self.server_info.sisn == self.client_info.disn + ) + self._init_isn_event.set() + if self.client_info.found_recv_finack and self.server_info.found_recv_finack: + self._close_event.set() + + def reset(self): + """Reset known information about client/server""" + self.tracker.reset() + self._server_info = None + self._client_info = None + self._close_event.clear() + self._init_isn_event.clear() + + def wait_close(self, timeout=10): + """Wait for a graceful close with FINs acked by both side""" + self._close_event.wait(timeout=timeout) + if not self._close_event.is_set(): + raise TimeoutError("Timed out waiting for graceful close") + + def wait_init_isn(self, timeout=10): + """Wait for both client_isn and server_isn to be determined""" + self._init_isn_event.wait(timeout=timeout) + if not self._init_isn_event.is_set(): + raise TimeoutError("Timed out waiting for Initial Sequence Numbers") + + def get_client_server_isn(self, timeout=10) -> typing.Tuple[int, int]: + """Return client/server ISN, blocking until they are captured""" + self.wait_init_isn(timeout=timeout) + return self.client_isn, self.server_isn diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_md5sig.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_md5sig.py new file mode 100644 index 000000000000..5cfd0428672a --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_md5sig.py @@ -0,0 +1,110 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Python wrapper around linux TCP_MD5SIG ABI""" + +import socket +import struct +import typing +from dataclasses import dataclass +from enum import IntFlag + +from .sockaddr import sockaddr_convert, sockaddr_unpack + +TCP_MD5SIG = 14 +TCP_MD5SIG_EXT = 32 +TCP_MD5SIG_MAXKEYLEN = 80 + + +class TCP_MD5SIG_FLAG(IntFlag): + PREFIX = 0x1 + IFINDEX = 0x2 + + +@dataclass +class tcp_md5sig: + """Like linux struct tcp_md5sig""" + + addr: typing.Any + flags: typing.Optional[int] + prefixlen: typing.Optional[int] + keylen: typing.Optional[int] + ifindex: typing.Optional[int] + key: bytes + + sizeof = 128 + 88 + + def __init__( + self, addr=None, flags=None, prefixlen=None, keylen=None, ifindex=0, key=bytes() + ): + self.addr = addr + self.flags = flags + self.prefixlen = prefixlen + self.ifindex = ifindex + self.key = key + self.keylen = keylen + + def get_auto_flags(self): + return (TCP_MD5SIG_FLAG.PREFIX if self.prefixlen is not None else 0) | ( + TCP_MD5SIG_FLAG.IFINDEX if self.ifindex else 0 + ) + + def get_real_flags(self): + if self.flags is None: + return self.get_auto_flags() + else: + return self.flags + + def get_addr_bytes(self) -> bytes: + if self.addr is None: + return b"\0" * 128 + if self.addr is bytes: + assert len(self.addr) == 128 + return self.addr + return sockaddr_convert(self.addr).pack() + + def pack(self) -> bytes: + return struct.pack( + "128sBBHi80s", + self.get_addr_bytes(), + self.get_real_flags(), + self.prefixlen if self.prefixlen is not None else 0, + self.keylen if self.keylen is not None else len(self.key), + self.ifindex if self.ifindex is not None else 0, + self.key, + ) + + def __bytes__(self): + return self.pack() + + @classmethod + def unpack(cls, buffer: bytes) -> "tcp_md5sig": + tup = struct.unpack("128sBBHi80s", buffer) + addr = sockaddr_unpack(tup[0]) + return cls(addr, *tup[1:]) + + def set_ipv4_addr_all(self): + from .sockaddr import sockaddr_in + + self.addr = sockaddr_in() + self.prefixlen = 0 + + def set_ipv6_addr_all(self): + from .sockaddr import sockaddr_in6 + + self.addr = sockaddr_in6() + self.prefixlen = 0 + + +def setsockopt_md5sig(sock, opt: tcp_md5sig): + if opt.flags != 0: + optname = TCP_MD5SIG_EXT + else: + optname = TCP_MD5SIG + return sock.setsockopt(socket.SOL_TCP, optname, bytes(opt)) + + +def setsockopt_md5sig_kwargs(sock, opt: tcp_md5sig = None, **kw): + if opt is None: + opt = tcp_md5sig() + for k, v in kw.items(): + setattr(opt, k, v) + return setsockopt_md5sig(sock, opt) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_conntrack.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_conntrack.py new file mode 100644 index 000000000000..0f7cba70a917 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/scapy_conntrack.py @@ -0,0 +1,173 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Identify TCP connections inside a capture and collect per-connection information""" +import typing +from dataclasses import dataclass + +from scapy.layers.inet import TCP +from scapy.packet import Packet + +from .scapy_utils import IPvXAddress, get_packet_ipvx_dst, get_packet_ipvx_src +from .sne_alg import SequenceNumberExtenderLinux + + +@dataclass(frozen=True) +class TCPConnectionKey: + """TCP connection identification key: standard 4-tuple""" + + saddr: typing.Optional[IPvXAddress] = None + daddr: typing.Optional[IPvXAddress] = None + sport: int = 0 + dport: int = 0 + + def rev(self) -> "TCPConnectionKey": + return TCPConnectionKey(self.daddr, self.saddr, self.dport, self.sport) + + +def get_packet_tcp_connection_key(p: Packet) -> TCPConnectionKey: + th = p[TCP] + return TCPConnectionKey( + get_packet_ipvx_src(p), get_packet_ipvx_dst(p), th.sport, th.dport + ) + + +class TCPConnectionInfo: + saddr: typing.Optional[IPvXAddress] = None + daddr: typing.Optional[IPvXAddress] = None + sport: int = 0 + dport: int = 0 + sisn: typing.Optional[int] = None + disn: typing.Optional[int] = None + rcv_sne: SequenceNumberExtenderLinux + snd_sne: SequenceNumberExtenderLinux + + found_syn = False + found_synack = False + + found_send_fin = False + found_send_finack = False + found_recv_fin = False + found_recv_finack = False + + def __init__(self): + self.rcv_sne = SequenceNumberExtenderLinux() + self.snd_sne = SequenceNumberExtenderLinux() + + def get_key(self): + return TCPConnectionKey(self.saddr, self.daddr, self.sport, self.dport) + + @classmethod + def from_key(cls, key: TCPConnectionKey) -> "TCPConnectionInfo": + obj = cls() + obj.saddr = key.saddr + obj.daddr = key.daddr + obj.sport = key.sport + obj.dport = key.dport + return obj + + def handle_send(self, p: Packet): + th = p[TCP] + if self.get_key() != get_packet_tcp_connection_key(p): + raise ValueError("Packet not for this connection") + + if th.flags.S and not th.flags.A: + assert th.ack == 0 + self.found_syn = True + self.sisn = th.seq + self.snd_sne.reset(th.seq) + elif th.flags.S and th.flags.A: + self.found_synack = True + self.sisn = th.seq + self.snd_sne.reset(th.seq) + assert self.disn == th.ack - 1 + + # Should track seq numbers instead + if th.flags.F: + self.found_send_fin = True + if th.flags.A and self.found_recv_fin: + self.found_send_finack = True + + # Should only take valid packets into account + self.snd_sne.calc(th.seq) + + def handle_recv(self, p: Packet): + th = p[TCP] + if self.get_key().rev() != get_packet_tcp_connection_key(p): + raise ValueError("Packet not for this connection") + + if th.flags.S and not th.flags.A: + assert th.ack == 0 + self.found_syn = True + self.disn = th.seq + self.rcv_sne.reset(th.seq) + elif th.flags.S and th.flags.A: + self.found_synack = True + self.disn = th.seq + self.rcv_sne.reset(th.seq) + assert self.sisn == th.ack - 1 + + # Should track seq numbers instead + if th.flags.F: + self.found_recv_fin = True + if th.flags.A and self.found_send_fin: + self.found_recv_finack = True + + # Should only take valid packets into account + self.rcv_sne.calc(th.seq) + + +class TCPConnectionTracker: + table: typing.Dict[TCPConnectionKey, TCPConnectionInfo] + + def __init__(self): + self.table = {} + + def reset(self): + """Forget known connections""" + self.table = {} + + def get_or_create(self, key: TCPConnectionKey) -> TCPConnectionInfo: + info = self.table.get(key, None) + if info is None: + info = TCPConnectionInfo.from_key(key) + self.table[key] = info + return info + + def get(self, key: TCPConnectionKey) -> typing.Optional[TCPConnectionInfo]: + return self.table.get(key, None) + + def handle_packet(self, p: Packet): + if not p or not TCP in p: + return + key = get_packet_tcp_connection_key(p) + info = self.get_or_create(key) + info.handle_send(p) + rkey = key.rev() + rinfo = self.get_or_create(rkey) + rinfo.handle_recv(p) + + def iter_match(self, saddr=None, daddr=None, sport=None, dport=None): + def attr_optional_match(obj, name, val) -> bool: + if val is None: + return True + else: + return getattr(obj, name) == val + + for key, info in self.table.items(): + if ( + attr_optional_match(key, "saddr", saddr) + and attr_optional_match(key, "daddr", daddr) + and attr_optional_match(key, "sport", sport) + and attr_optional_match(key, "dport", dport) + ): + yield info + + def match_one( + self, saddr=None, daddr=None, sport=None, dport=None + ) -> typing.Optional[TCPConnectionInfo]: + res = list(self.iter_match(saddr, daddr, sport, dport)) + if len(res) == 1: + return res[0] + elif len(res) == 0: + return None + else: + raise ValueError("Multiple connection matches") diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/tcp_connection_fixture.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/tcp_connection_fixture.py new file mode 100644 index 000000000000..0d14f343c282 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/tcp_connection_fixture.py @@ -0,0 +1,276 @@ +# SPDX-License-Identifier: GPL-2.0 +import logging +import socket +import subprocess +from contextlib import ExitStack + +import pytest +from scapy.data import ETH_P_IP, ETH_P_IPV6 +from scapy.layers.inet import IP, TCP +from scapy.layers.inet6 import IPv6 +from scapy.layers.l2 import Ether +from scapy.packet import Packet + +from . import linux_tcp_authopt +from .full_tcp_sniff_session import FullTCPSniffSession +from .linux_tcp_authopt import set_tcp_authopt_key, tcp_authopt_key +from .netns_fixture import NamespaceFixture +from .scapy_utils import ( + AsyncSnifferContext, + create_capture_socket, + create_l2socket, + scapy_tcp_get_authopt_val, + scapy_tcp_get_md5_sig, +) +from .server import SimpleServerThread +from .utils import ( + DEFAULT_TCP_SERVER_PORT, + create_client_socket, + create_listen_socket, + netns_context, + nstat_json, +) + +logger = logging.getLogger(__name__) + + +class TCPConnectionFixture: + """Test fixture with an instrumented TCP connection + + Includes: + * pair of network namespaces + * one listen socket + * server thread with echo protocol + * one client socket + * one async sniffer on the server interface + * A `FullTCPSniffSession` examining TCP packets + * l2socket allowing packet injection from client + + :ivar tcp_md5_key: Secret key for md5 (addr is implicit) + """ + + sniffer_session: FullTCPSniffSession + + def __init__( + self, + address_family=socket.AF_INET, + sniffer_kwargs=None, + tcp_authopt_key: tcp_authopt_key = None, + server_thread_kwargs=None, + tcp_md5_key=None, + ): + self.address_family = address_family + self.server_port = DEFAULT_TCP_SERVER_PORT + self.client_port = 27972 + self.sniffer_session = FullTCPSniffSession(DEFAULT_TCP_SERVER_PORT) + if sniffer_kwargs is None: + sniffer_kwargs = {} + self.sniffer_kwargs = sniffer_kwargs + self.tcp_authopt_key = tcp_authopt_key + self.server_thread = SimpleServerThread( + mode="echo", **(server_thread_kwargs or {}) + ) + self.tcp_md5_key = tcp_md5_key + + def _set_tcp_md5(self): + from . import linux_tcp_md5sig + from .sockaddr import sockaddr_convert + + linux_tcp_md5sig.setsockopt_md5sig( + self.listen_socket, + linux_tcp_md5sig.tcp_md5sig( + key=self.tcp_md5_key, addr=sockaddr_convert(self.client_addr) + ), + ) + linux_tcp_md5sig.setsockopt_md5sig( + self.client_socket, + linux_tcp_md5sig.tcp_md5sig( + key=self.tcp_md5_key, addr=sockaddr_convert(self.server_addr) + ), + ) + + def create_client_socket(self, bind_port=0): + return create_client_socket( + ns=self.nsfixture.client_netns_name, + family=self.address_family, + bind_addr=self.client_addr, + bind_port=bind_port, + ) + + def __enter__(self): + if self.tcp_authopt_key and not linux_tcp_authopt.has_tcp_authopt(): + pytest.skip("Need TCP_AUTHOPT") + + self.exit_stack = ExitStack() + self.exit_stack.__enter__() + + self.nsfixture = self.exit_stack.enter_context(NamespaceFixture()) + self.server_addr = self.nsfixture.get_addr(self.address_family, 1) + self.client_addr = self.nsfixture.get_addr(self.address_family, 2) + + self.listen_socket = create_listen_socket( + ns=self.nsfixture.server_netns_name, + family=self.address_family, + bind_addr=self.server_addr, + bind_port=self.server_port, + ) + self.exit_stack.enter_context(self.listen_socket) + self.client_socket = self.create_client_socket(bind_port=self.client_port) + self.exit_stack.enter_context(self.client_socket) + self.server_thread.add_listen_socket(self.listen_socket) + self.exit_stack.enter_context(self.server_thread) + + if self.tcp_authopt_key: + set_tcp_authopt_key(self.listen_socket, self.tcp_authopt_key) + set_tcp_authopt_key(self.client_socket, self.tcp_authopt_key) + + if self.tcp_md5_key: + self._set_tcp_md5() + + capture_filter = f"tcp port {self.server_port}" + self.capture_socket = create_capture_socket( + ns=self.nsfixture.server_netns_name, iface="veth0", filter=capture_filter + ) + self.exit_stack.enter_context(self.capture_socket) + + self.sniffer = AsyncSnifferContext( + opened_socket=self.capture_socket, + session=self.sniffer_session, + prn=log_tcp_authopt_packet, + **self.sniffer_kwargs, + ) + self.exit_stack.enter_context(self.sniffer) + + self.client_l2socket = create_l2socket( + ns=self.nsfixture.client_netns_name, iface="veth0" + ) + self.exit_stack.enter_context(self.client_l2socket) + self.server_l2socket = create_l2socket( + ns=self.nsfixture.server_netns_name, iface="veth0" + ) + self.exit_stack.enter_context(self.server_l2socket) + + def __exit__(self, *args): + self.exit_stack.__exit__(*args) + + @property + def ethertype(self): + if self.address_family == socket.AF_INET: + return ETH_P_IP + elif self.address_family == socket.AF_INET6: + return ETH_P_IPV6 + else: + raise ValueError("bad address_family={self.address_family}") + + def scapy_iplayer(self): + if self.address_family == socket.AF_INET: + return IP + elif self.address_family == socket.AF_INET6: + return IPv6 + else: + raise ValueError("bad address_family={self.address_family}") + + def create_client2server_packet(self) -> Packet: + return ( + Ether( + type=self.ethertype, + src=self.nsfixture.client_mac_addr, + dst=self.nsfixture.server_mac_addr, + ) + / self.scapy_iplayer()(src=str(self.client_addr), dst=str(self.server_addr)) + / TCP(sport=self.client_port, dport=self.server_port) + ) + + def create_server2client_packet(self) -> Packet: + return ( + Ether( + type=self.ethertype, + src=self.nsfixture.server_mac_addr, + dst=self.nsfixture.client_mac_addr, + ) + / self.scapy_iplayer()(src=str(self.server_addr), dst=str(self.client_addr)) + / TCP(sport=self.server_port, dport=self.client_port) + ) + + @property + def server_addr_port(self): + return (str(self.server_addr), self.server_port) + + @property + def server_netns_name(self): + return self.nsfixture.server_netns_name + + @property + def client_netns_name(self): + return self.nsfixture.client_netns_name + + def client_nstat_json(self): + with netns_context(self.client_netns_name): + return nstat_json() + + def server_nstat_json(self): + with netns_context(self.server_netns_name): + return nstat_json() + + def assert_no_snmp_output_failures(self): + client_nstat_dict = self.client_nstat_json() + assert client_nstat_dict["TcpExtTCPAuthOptFailure"] == 0 + server_nstat_dict = self.server_nstat_json() + assert server_nstat_dict["TcpExtTCPAuthOptFailure"] == 0 + + def _get_state_via_ss(self, command_prefix: str): + # Every namespace should have at most one socket + # the "state connected" filter includes TIME-WAIT but not LISTEN + cmd = command_prefix + "ss --numeric --no-header --tcp state connected" + out = subprocess.check_output(cmd, text=True, shell=True) + lines = out.splitlines() + # No socket found usually means "CLOSED". It is distinct from "TIME-WAIT" + if len(lines) == 0: + return None + if len(lines) > 1: + raise ValueError("At most one line expected") + return lines[0].split()[0] + + def get_client_tcp_state(self): + return self._get_state_via_ss(f"ip netns exec {self.client_netns_name} ") + + def get_server_tcp_state(self): + return self._get_state_via_ss(f"ip netns exec {self.server_netns_name} ") + + +def format_tcp_authopt_packet( + p: Packet, include_ethernet=False, include_seq=False, include_md5=True +) -> str: + """Format a TCP packet in a way that is useful for TCP-AO testing""" + if not TCP in p: + return p.summary() + th = p[TCP] + if isinstance(th.underlayer, IP): + result = p.sprintf(r"%IP.src%:%TCP.sport% > %IP.dst%:%TCP.dport%") + elif isinstance(th.underlayer, IPv6): + result = p.sprintf(r"%IPv6.src%:%TCP.sport% > %IPv6.dst%:%TCP.dport%") + else: + raise ValueError(f"Unknown TCP underlayer {th.underlayer}") + result += p.sprintf(r" Flags %-2s,TCP.flags%") + if include_ethernet: + result = p.sprintf(r"ethertype %Ether.type% ") + result + result = p.sprintf(r"%Ether.src% > %Ether.dst% ") + result + if include_seq: + result += p.sprintf(r" seq %TCP.seq% ack %TCP.ack%") + result += f" len {len(p[TCP].payload)}" + authopt = scapy_tcp_get_authopt_val(p[TCP]) + if authopt: + result += f" AO keyid={authopt.keyid} rnextkeyid={authopt.rnextkeyid} mac={authopt.mac.hex()}" + else: + result += " no AO" + if include_md5: + md5sig = scapy_tcp_get_md5_sig(p[TCP]) + if md5sig: + result += f" MD5 {md5sig.hex()}" + else: + result += " no MD5" + return result + + +def log_tcp_authopt_packet(p): + logger.info("sniff %s", format_tcp_authopt_packet(p, include_seq=True)) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_verify_capture.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_verify_capture.py new file mode 100644 index 000000000000..68d002139974 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_verify_capture.py @@ -0,0 +1,559 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Capture packets with TCP-AO and verify signatures""" + +import logging +import os +import socket +import subprocess +from contextlib import ExitStack, nullcontext + +import pytest +import waiting +from scapy.layers.inet import TCP + +from .conftest import ( + raises_optional_exception, + skipif_cant_capture, + skipif_missing_tcp_authopt, +) +from .full_tcp_sniff_session import FullTCPSniffSession +from .linux_tcp_authopt import ( + TCP_AUTHOPT_ALG, + TCP_AUTHOPT_KEY_FLAG, + set_tcp_authopt_key, + tcp_authopt_key, +) +from .netns_fixture import NamespaceFixture +from .scapy_tcp_authopt import ( + TcpAuthOptAlg_HMAC_SHA1, + add_tcp_authopt_signature, + break_tcp_authopt_signature, +) +from .scapy_utils import ( + AsyncSnifferContext, + scapy_sniffer_stop, + scapy_tcp_get_authopt_val, + scapy_tcp_get_md5_sig, + tcp_seq_wrap, +) +from .server import SimpleServerThread +from .tcp_connection_fixture import TCPConnectionFixture +from .utils import ( + DEFAULT_TCP_SERVER_PORT, + check_socket_echo, + create_client_socket, + create_listen_socket, + nstat_json, + socket_set_linger, +) +from .validator import TcpAuthValidator, TcpAuthValidatorKey + +logger = logging.getLogger(__name__) +pytestmark = [skipif_missing_tcp_authopt, skipif_cant_capture] +DEFAULT_TCP_AUTHOPT_KEY = tcp_authopt_key( + alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, + key=b"hello", +) + + +def get_alg_id(alg_name) -> int: + if alg_name == "HMAC-SHA-1-96": + return TCP_AUTHOPT_ALG.HMAC_SHA_1_96 + elif alg_name == "AES-128-CMAC-96": + return TCP_AUTHOPT_ALG.AES_128_CMAC_96 + else: + raise ValueError() + + +@pytest.mark.parametrize( + "address_family,alg_name,include_options,transfer_data", + [ + (socket.AF_INET, "HMAC-SHA-1-96", True, True), + (socket.AF_INET, "AES-128-CMAC-96", True, True), + (socket.AF_INET, "AES-128-CMAC-96", False, True), + (socket.AF_INET6, "HMAC-SHA-1-96", True, True), + (socket.AF_INET6, "HMAC-SHA-1-96", False, True), + (socket.AF_INET6, "AES-128-CMAC-96", True, True), + (socket.AF_INET, "HMAC-SHA-1-96", True, False), + (socket.AF_INET6, "AES-128-CMAC-96", False, False), + ], +) +def test_verify_capture( + exit_stack, address_family, alg_name, include_options, transfer_data +): + master_key = b"testvector" + alg_id = get_alg_id(alg_name) + + session = FullTCPSniffSession(server_port=DEFAULT_TCP_SERVER_PORT) + sniffer = exit_stack.enter_context( + AsyncSnifferContext( + filter=f"inbound and tcp port {DEFAULT_TCP_SERVER_PORT}", + iface="lo", + session=session, + ) + ) + + listen_socket = create_listen_socket(family=address_family) + listen_socket = exit_stack.enter_context(listen_socket) + exit_stack.enter_context(SimpleServerThread(listen_socket, mode="echo")) + + client_socket = socket.socket(address_family, socket.SOCK_STREAM) + client_socket = exit_stack.push(client_socket) + + key = tcp_authopt_key(alg=alg_id, key=master_key, include_options=include_options) + set_tcp_authopt_key(listen_socket, key) + set_tcp_authopt_key(client_socket, key) + + # even if one signature is incorrect keep processing the capture + old_nstat = nstat_json() + valkey = TcpAuthValidatorKey( + key=master_key, alg_name=alg_name, include_options=include_options + ) + validator = TcpAuthValidator(keys=[valkey]) + + try: + client_socket.settimeout(1.0) + client_socket.connect(("localhost", DEFAULT_TCP_SERVER_PORT)) + if transfer_data: + for _ in range(5): + check_socket_echo(client_socket) + client_socket.close() + session.wait_close() + except socket.timeout: + # If invalid packets are sent let the validator run + logger.warning("socket timeout", exc_info=True) + pass + + sniffer.stop() + + logger.info("capture: %r", sniffer.results) + for p in sniffer.results: + validator.handle_packet(p) + validator.raise_errors() + + new_nstat = nstat_json() + assert old_nstat["TcpExtTCPAuthOptFailure"] == new_nstat["TcpExtTCPAuthOptFailure"] + + +@pytest.mark.parametrize( + "address_family,use_tcp_authopt,use_tcp_md5sig", + [ + (socket.AF_INET, 0, 0), + (socket.AF_INET, 1, 0), + (socket.AF_INET, 0, 1), + (socket.AF_INET6, 0, 0), + (socket.AF_INET6, 1, 0), + (socket.AF_INET6, 0, 1), + (socket.AF_INET, 1, 1), + (socket.AF_INET6, 1, 1), + ], +) +def test_both_authopt_md5(exit_stack, address_family, use_tcp_authopt, use_tcp_md5sig): + """Basic test for interaction between TCP_AUTHOPT and TCP_MD5SIG + + Configuring both on same socket is allowed but RFC5925 doesn't allow both on the + same packet or same connection. + + The naive handling of inserting or validation both options is incorrect. + """ + con = TCPConnectionFixture(address_family=address_family) + if use_tcp_authopt: + con.tcp_authopt_key = DEFAULT_TCP_AUTHOPT_KEY + if use_tcp_md5sig: + con.tcp_md5_key = b"hello" + exit_stack.enter_context(con) + + con.client_socket.connect(con.server_addr_port) + check_socket_echo(con.client_socket) + check_socket_echo(con.client_socket) + check_socket_echo(con.client_socket) + con.client_socket.close() + + scapy_sniffer_stop(con.sniffer) + fail = False + for p in con.sniffer.results: + has_tcp_authopt = scapy_tcp_get_authopt_val(p[TCP]) is not None + has_tcp_md5sig = scapy_tcp_get_md5_sig(p[TCP]) is not None + + if has_tcp_authopt and has_tcp_md5sig: + logger.error("Packet has both AO and MD5: %r", p) + fail = False + + if use_tcp_authopt: + if not has_tcp_authopt: + logger.error("missing AO: %r", p) + fail = True + elif use_tcp_md5sig: + if not has_tcp_md5sig: + logger.error("missing MD5: %r", p) + fail = True + else: + if has_tcp_md5sig or has_tcp_authopt: + logger.error("unexpected MD5 or AO: %r", p) + fail = True + + assert not fail + + +@pytest.mark.parametrize("mode", ["none", "ao", "ao-addrbind", "md5"]) +def test_v4mapv6(exit_stack, mode: str): + """Test ipv4 client and ipv6 server with and without TCP-AO + + By default any IPv6 server will also receive packets from IPv4 clients. This + is not currently supported by TCP_AUTHOPT but it should fail in an orderly + manner. + """ + nsfixture = NamespaceFixture() + exit_stack.enter_context(nsfixture) + server_ipv4_addr = nsfixture.get_addr(socket.AF_INET, 1) + + listen_socket = create_listen_socket( + ns=nsfixture.server_netns_name, family=socket.AF_INET6 + ) + listen_socket = exit_stack.enter_context(listen_socket) + + server_thread = SimpleServerThread(listen_socket, mode="echo") + exit_stack.enter_context(server_thread) + + client_socket = create_client_socket( + ns=nsfixture.client_netns_name, + family=socket.AF_INET, + ) + client_socket = exit_stack.push(client_socket) + + if mode == "ao": + alg = TCP_AUTHOPT_ALG.HMAC_SHA_1_96 + key = tcp_authopt_key(alg=alg, key="hello") + set_tcp_authopt_key(listen_socket, key) + set_tcp_authopt_key(client_socket, key) + + if mode == "ao-addrbind": + alg = TCP_AUTHOPT_ALG.HMAC_SHA_1_96 + client_ipv6_addr = nsfixture.get_addr(socket.AF_INET6, 2) + server_key = tcp_authopt_key(alg=alg, key="hello", addr=client_ipv6_addr) + server_key.flags = TCP_AUTHOPT_KEY_FLAG.BIND_ADDR + set_tcp_authopt_key(listen_socket, server_key) + + client_key = tcp_authopt_key(alg=alg, key="hello") + set_tcp_authopt_key(client_socket, client_key) + + if mode == "md5": + from . import linux_tcp_md5sig + + server_md5key = linux_tcp_md5sig.tcp_md5sig(key=b"hello") + server_md5key.set_ipv6_addr_all() + linux_tcp_md5sig.setsockopt_md5sig(listen_socket, server_md5key) + client_md5key = linux_tcp_md5sig.tcp_md5sig(key=b"hellx") + client_md5key.set_ipv4_addr_all() + linux_tcp_md5sig.setsockopt_md5sig(client_socket, client_md5key) + + with raises_optional_exception(socket.timeout if mode != "none" else None): + client_socket.connect((str(server_ipv4_addr), DEFAULT_TCP_SERVER_PORT)) + check_socket_echo(client_socket) + client_socket.close() + + +@pytest.mark.parametrize( + "address_family,signed", + [ + (socket.AF_INET, True), + (socket.AF_INET, False), + (socket.AF_INET6, True), + (socket.AF_INET6, False), + ], +) +def test_rst(exit_stack: ExitStack, address_family, signed: bool): + """Check that an unsigned RST breaks a normal connection but not one protected by TCP-AO""" + + con = TCPConnectionFixture(address_family=address_family) + if signed: + con.tcp_authopt_key = DEFAULT_TCP_AUTHOPT_KEY + exit_stack.enter_context(con) + + # connect + con.client_socket.connect(con.server_addr_port) + check_socket_echo(con.client_socket) + + client_isn, server_isn = con.sniffer_session.get_client_server_isn() + p = con.create_client2server_packet() + p[TCP].flags = "R" + p[TCP].seq = tcp_seq_wrap(client_isn + 1001) + p[TCP].ack = tcp_seq_wrap(server_isn + 1001) + con.client_l2socket.send(p) + + if signed: + # When protected by TCP-AO unsigned RSTs are ignored. + check_socket_echo(con.client_socket) + else: + # By default an RST that guesses seq can kill the connection. + with pytest.raises(ConnectionResetError): + check_socket_echo(con.client_socket) + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_rst_signed_manually(exit_stack: ExitStack, address_family): + """Check that an manually signed RST breaks a connection protected by TCP-AO""" + + con = TCPConnectionFixture(address_family=address_family) + con.tcp_authopt_key = key = DEFAULT_TCP_AUTHOPT_KEY + exit_stack.enter_context(con) + + # connect + con.client_socket.connect(con.server_addr_port) + check_socket_echo(con.client_socket) + + client_isn, server_isn = con.sniffer_session.get_client_server_isn() + p = con.create_client2server_packet() + p[TCP].flags = "R" + p[TCP].seq = tcp_seq_wrap(client_isn + 1001) + p[TCP].ack = tcp_seq_wrap(server_isn + 1001) + + add_tcp_authopt_signature( + p, TcpAuthOptAlg_HMAC_SHA1(), key.key, client_isn, server_isn + ) + con.client_l2socket.send(p) + + # The server socket will close in response to RST without a TIME-WAIT + # Attempting to send additional packets will result in a timeout because + # the signature can't be validated. + with pytest.raises(socket.timeout): + check_socket_echo(con.client_socket) + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_tw_ack(exit_stack: ExitStack, address_family): + """Manually sent a duplicate ACK after FIN and check TWSK signs replies correctly + + Kernel has a custom code path for this + """ + + con = TCPConnectionFixture(address_family=address_family) + con.tcp_authopt_key = key = DEFAULT_TCP_AUTHOPT_KEY + exit_stack.enter_context(con) + + # connect and close nicely + con.client_socket.connect(con.server_addr_port) + check_socket_echo(con.client_socket) + assert con.get_client_tcp_state() == "ESTAB" + assert con.get_server_tcp_state() == "ESTAB" + con.client_socket.close() + con.sniffer_session.wait_close() + + assert con.get_client_tcp_state() == "TIME-WAIT" + assert con.get_server_tcp_state() is None + + # Sent a duplicate FIN/ACK + client_isn, server_isn = con.sniffer_session.get_client_server_isn() + p = con.create_server2client_packet() + p[TCP].flags = "FA" + p[TCP].seq = tcp_seq_wrap(server_isn + 1001) + p[TCP].ack = tcp_seq_wrap(client_isn + 1002) + add_tcp_authopt_signature( + p, TcpAuthOptAlg_HMAC_SHA1(), key.key, server_isn, client_isn + ) + pr = con.server_l2socket.sr1(p) + assert pr[TCP].ack == tcp_seq_wrap(server_isn + 1001) + assert pr[TCP].seq == tcp_seq_wrap(client_isn + 1001) + assert pr[TCP].flags == "A" + + scapy_sniffer_stop(con.sniffer) + + val = TcpAuthValidator() + val.keys.append(TcpAuthValidatorKey(key=b"hello", alg_name="HMAC-SHA-1-96")) + for p in con.sniffer.results: + val.handle_packet(p) + val.raise_errors() + + # The server does not have enough state to validate the ACK from TIME-WAIT + # so it reports a failure. + assert con.server_nstat_json()["TcpExtTCPAuthOptFailure"] == 1 + assert con.client_nstat_json()["TcpExtTCPAuthOptFailure"] == 0 + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_tw_rst(exit_stack: ExitStack, address_family): + """Manually sent a signed invalid packet after FIN and check TWSK signs RST correctly + + Kernel has a custom code path for this + """ + key = DEFAULT_TCP_AUTHOPT_KEY + con = TCPConnectionFixture( + address_family=address_family, + tcp_authopt_key=key, + ) + con.server_thread.keep_half_open = True + exit_stack.enter_context(con) + + # connect, transfer data and close client nicely + con.client_socket.connect(con.server_addr_port) + check_socket_echo(con.client_socket) + con.client_socket.close() + + # since server keeps connection open client goes to FIN-WAIT-2 + def check_socket_states(): + client_tcp_state_name = con.get_client_tcp_state() + server_tcp_state_name = con.get_server_tcp_state() + logger.info("%s %s", client_tcp_state_name, server_tcp_state_name) + return ( + client_tcp_state_name == "FIN-WAIT-2" + and server_tcp_state_name == "CLOSE-WAIT" + ) + + waiting.wait(check_socket_states) + + # sending a FIN-ACK with incorrect seq makes + # tcp_timewait_state_process return a TCP_TW_RST + client_isn, server_isn = con.sniffer_session.get_client_server_isn() + p = con.create_server2client_packet() + p[TCP].flags = "FA" + p[TCP].seq = tcp_seq_wrap(server_isn + 1001 + 1) + p[TCP].ack = tcp_seq_wrap(client_isn + 1002) + add_tcp_authopt_signature( + p, TcpAuthOptAlg_HMAC_SHA1(), key.key, server_isn, client_isn + ) + con.server_l2socket.send(p) + + # remove delay by scapy trick? + import time + + time.sleep(1) + scapy_sniffer_stop(con.sniffer) + + # Check client socket moved from FIN-WAIT-2 to CLOSED + assert con.get_client_tcp_state() is None + + # Check some RST was seen + def is_tcp_rst(p): + return TCP in p and p[TCP].flags.R + + assert any(is_tcp_rst(p) for p in con.sniffer.results) + + # Check everything was valid + val = TcpAuthValidator() + val.keys.append(TcpAuthValidatorKey(key=b"hello", alg_name="HMAC-SHA-1-96")) + for p in con.sniffer.results: + val.handle_packet(p) + val.raise_errors() + + # Check no snmp failures + con.assert_no_snmp_output_failures() + + +def test_rst_linger(exit_stack: ExitStack): + """Test RST sent deliberately via SO_LINGER is valid""" + con = TCPConnectionFixture( + sniffer_kwargs=dict(count=8), tcp_authopt_key=DEFAULT_TCP_AUTHOPT_KEY + ) + exit_stack.enter_context(con) + + con.client_socket.connect(con.server_addr_port) + check_socket_echo(con.client_socket) + socket_set_linger(con.client_socket, 1, 0) + con.client_socket.close() + + con.sniffer.join(timeout=3) + + val = TcpAuthValidator() + val.keys.append(TcpAuthValidatorKey(key=b"hello", alg_name="HMAC-SHA-1-96")) + for p in con.sniffer.results: + val.handle_packet(p) + val.raise_errors() + + def is_tcp_rst(p): + return TCP in p and p[TCP].flags.R + + assert any(is_tcp_rst(p) for p in con.sniffer.results) + + +@pytest.mark.parametrize( + "address_family,mode", + [ + (socket.AF_INET, "goodsign"), + (socket.AF_INET, "fakesign"), + (socket.AF_INET, "unsigned"), + (socket.AF_INET6, "goodsign"), + (socket.AF_INET6, "fakesign"), + (socket.AF_INET6, "unsigned"), + ], +) +def test_badack_to_synack(exit_stack, address_family, mode: str): + """Test bad ack in response to server to syn/ack. + + This is handled by a minisocket in the TCP_SYN_RECV state on a separate code path + """ + con = TCPConnectionFixture(address_family=address_family) + if mode != "unsigned": + con.tcp_authopt_key = tcp_authopt_key( + alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, + key=b"hello", + ) + exit_stack.enter_context(con) + + client_l2socket = con.client_l2socket + client_isn = 1000 + server_isn = 0 + + def sign(packet): + if mode == "unsigned": + return + add_tcp_authopt_signature( + packet, + TcpAuthOptAlg_HMAC_SHA1(), + con.tcp_authopt_key.key, + client_isn, + server_isn, + ) + + # Prevent TCP in client namespace from sending RST + # Do this by removing the client address and insert a static ARP on server side + client_prefix_length = con.nsfixture.get_prefix_length(address_family) + subprocess.run( + f"""\ +set -e +ip netns exec {con.nsfixture.client_netns_name} ip addr del {con.client_addr}/{client_prefix_length} dev veth0 +ip netns exec {con.nsfixture.server_netns_name} ip neigh add {con.client_addr} lladdr {con.nsfixture.client_mac_addr} dev veth0 +""", + shell=True, + check=True, + ) + + p1 = con.create_client2server_packet() + p1[TCP].flags = "S" + p1[TCP].seq = client_isn + p1[TCP].ack = 0 + sign(p1) + + p2 = client_l2socket.sr1(p1, timeout=1) + server_isn = p2[TCP].seq + assert p2[TCP].ack == client_isn + 1 + assert p2[TCP].flags == "SA" + + p3 = con.create_client2server_packet() + p3[TCP].flags = "A" + p3[TCP].seq = client_isn + 1 + p3[TCP].ack = server_isn + 1 + sign(p3) + if mode == "fakesign": + break_tcp_authopt_signature(p3) + + assert con.server_nstat_json()["TcpExtTCPAuthOptFailure"] == 0 + client_l2socket.send(p3) + + def confirm_good(): + return len(con.server_thread.server_socket) > 0 + + def confirm_fail(): + return con.server_nstat_json()["TcpExtTCPAuthOptFailure"] == 1 + + def wait_good(): + assert not confirm_fail() + return confirm_good() + + def wait_fail(): + assert not confirm_good() + return confirm_fail() + + if mode == "fakesign": + waiting.wait(wait_fail, timeout_seconds=5, sleep_seconds=0.1) + else: + waiting.wait(wait_good, timeout_seconds=5, sleep_seconds=0.1)
In order to trigger a seq or ack rollover create many connection in a loop and check for a "high" value, then make a lot of traffic.
This relies on both TCP_REPAIR and TCP_REPAIR_AUTHOPT, making it unfit for upstream.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- .../tcp_authopt_test/linux_tcp_repair.py | 67 ++++++ .../tcp_authopt/tcp_authopt_test/test_sne.py | 202 ++++++++++++++++++ 2 files changed, 269 insertions(+) create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_repair.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne.py
diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_repair.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_repair.py new file mode 100644 index 000000000000..68f111a207e6 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_repair.py @@ -0,0 +1,67 @@ +# SPDX-License-Identifier: GPL-2.0 +import socket +import struct +from contextlib import contextmanager +from enum import IntEnum + +# Extra sockopts not present in python stdlib +TCP_REPAIR = 19 +TCP_REPAIR_QUEUE = 20 +TCP_QUEUE_SEQ = 21 +TCP_REPAIR_OPTIONS = 22 +TCP_REPAIR_WINDOW = 29 + + +class TCP_REPAIR_VAL(IntEnum): + OFF = 0 + ON = 1 + OFF_NO_WP = -1 + + +def get_tcp_repair(sock) -> TCP_REPAIR_VAL: + return TCP_REPAIR_VAL(sock.getsockopt(socket.SOL_TCP, TCP_REPAIR)) + + +def set_tcp_repair(sock, val: TCP_REPAIR_VAL) -> None: + return sock.setsockopt(socket.SOL_TCP, TCP_REPAIR, int(val)) + + +class TCP_REPAIR_QUEUE_ID(IntEnum): + NO_QUEUE = 0 + RECV_QUEUE = 1 + SEND_QUEUE = 2 + + +def get_tcp_repair_queue(sock) -> TCP_REPAIR_QUEUE_ID: + return TCP_REPAIR_QUEUE_ID(sock.getsockopt(socket.SOL_TCP, TCP_REPAIR_QUEUE)) + + +def set_tcp_repair_queue(sock, val: TCP_REPAIR_QUEUE_ID) -> None: + return sock.setsockopt(socket.SOL_TCP, TCP_REPAIR_QUEUE, int(val)) + + +def get_tcp_queue_seq(sock) -> int: + return struct.unpack("I", sock.getsockopt(socket.SOL_TCP, TCP_QUEUE_SEQ, 4))[0] + + +def set_tcp_queue_seq(sock, val: int) -> None: + return sock.setsockopt(socket.SOL_TCP, TCP_QUEUE_SEQ, val)[0] + + +@contextmanager +def tcp_repair_toggle(sock, off_val=TCP_REPAIR_VAL.OFF_NO_WP): + """Set TCP_REPAIR on/off as a context""" + try: + set_tcp_repair(sock, TCP_REPAIR_VAL.ON) + yield + finally: + set_tcp_repair(sock, off_val) + + +def get_tcp_repair_recv_send_queue_seq(sock): + with tcp_repair_toggle(sock): + set_tcp_repair_queue(sock, TCP_REPAIR_QUEUE_ID.RECV_QUEUE) + recv_seq = get_tcp_queue_seq(sock) + set_tcp_repair_queue(sock, TCP_REPAIR_QUEUE_ID.SEND_QUEUE) + send_seq = get_tcp_queue_seq(sock) + return (recv_seq, send_seq) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne.py new file mode 100644 index 000000000000..180d7cdbd5f3 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_sne.py @@ -0,0 +1,202 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Validate SNE implementation for TCP-AO""" + +import logging +import socket +from contextlib import ExitStack +from ipaddress import ip_address + +import pytest + +from .linux_tcp_authopt import set_tcp_authopt_key_kwargs +from .linux_tcp_repair import get_tcp_repair_recv_send_queue_seq, tcp_repair_toggle +from .netns_fixture import NamespaceFixture +from .scapy_conntrack import TCPConnectionKey, TCPConnectionTracker +from .scapy_utils import AsyncSnifferContext, create_capture_socket, tcp_seq_wrap +from .server import SimpleServerThread +from .utils import ( + DEFAULT_TCP_SERVER_PORT, + check_socket_echo, + create_client_socket, + create_listen_socket, + socket_set_linger, +) +from .validator import TcpAuthValidator, TcpAuthValidatorKey + +logger = logging.getLogger(__name__) + + +def add_connection_info( + tracker: TCPConnectionTracker, + saddr, + daddr, + sport, + dport, + sisn, + disn, +): + client2server_key = TCPConnectionKey( + saddr=saddr, + daddr=daddr, + sport=sport, + dport=dport, + ) + client2server_conn = tracker.get_or_create(client2server_key) + client2server_conn.sisn = sisn + client2server_conn.disn = disn + client2server_conn.snd_sne.reset(sisn) + client2server_conn.rcv_sne.reset(disn) + client2server_conn.found_syn = True + client2server_conn.found_synack = True + server2client_conn = tracker.get_or_create(client2server_key.rev()) + server2client_conn.sisn = disn + server2client_conn.disn = sisn + server2client_conn.snd_sne.reset(disn) + server2client_conn.rcv_sne.reset(sisn) + server2client_conn.found_syn = True + server2client_conn.found_synack = True + + +@pytest.mark.parametrize("signed", [False, True]) +def test_high_seq_rollover(exit_stack: ExitStack, signed: bool): + """Test SNE by rolling over from a high seq/ack value + + Create many connections until a very high seq/ack is found and then transfer + enough for those values to roll over. + + A side effect of this approach is that this stresses connection + establishment. + """ + overflow = 0x200000 + bufsize = 0x10000 + secret_key = b"12345" + mode = "echo" + validator_enabled = True + + nsfixture = exit_stack.enter_context(NamespaceFixture()) + server_addr = nsfixture.get_addr(socket.AF_INET, 1) + client_addr = nsfixture.get_addr(socket.AF_INET, 2) + server_addr_port = (str(server_addr), DEFAULT_TCP_SERVER_PORT) + listen_socket = create_listen_socket( + ns=nsfixture.server_netns_name, + bind_addr=server_addr, + listen_depth=1024, + ) + exit_stack.enter_context(listen_socket) + if signed: + set_tcp_authopt_key_kwargs(listen_socket, key=secret_key) + server_thread = SimpleServerThread(listen_socket, mode=mode, bufsize=bufsize) + exit_stack.enter_context(server_thread) + + found = False + client_socket = None + for iternum in range(50000): + try: + # Manually assign increasing client ports + # + # Sometimes linux kills timewait sockets (TCPTimeWaitOverflow) and + # then attempts to reuse the port. The stricter validation + # requirements of TCP-AO mean the other side of the socket survives + # and rejects packets coming from the reused port. + # + # This issue is not related to SNE so a workaround is acceptable. + client_socket = create_client_socket( + ns=nsfixture.client_netns_name, + bind_addr=client_addr, + bind_port=10000 + iternum, + ) + if signed: + set_tcp_authopt_key_kwargs(client_socket, key=secret_key) + try: + client_socket.connect(server_addr_port) + except: + logger.error("failed connect on iteration %d", iternum, exc_info=True) + raise + + recv_seq, send_seq = get_tcp_repair_recv_send_queue_seq(client_socket) + if (recv_seq + overflow > 0x100000000 and mode == "echo") or ( + send_seq + overflow > 0x100000000 + ): + found = True + break + # Wait for graceful close to avoid swamping server listen queue. + # This makes the test work even with a server listen_depth=1 but set + # a very high value anyway. + socket_set_linger(client_socket, 1, 1) + client_socket.close() + client_socket = None + finally: + if not found and client_socket: + client_socket.close() + assert found + assert client_socket is not None + + logger.debug("setup recv_seq %08x send_seq %08x", recv_seq, send_seq) + + # Init validator + if signed and validator_enabled: + capture_filter = f"tcp port {DEFAULT_TCP_SERVER_PORT}" + capture_socket = create_capture_socket( + ns=nsfixture.client_netns_name, + iface="veth0", + filter=capture_filter, + ) + sniffer = exit_stack.enter_context( + AsyncSnifferContext(opened_socket=capture_socket) + ) + validator = TcpAuthValidator() + validator.keys.append( + TcpAuthValidatorKey(key=secret_key, alg_name="HMAC-SHA-1-96") + ) + + # SYN+SYNACK is not captured so initialize connection info manually + add_connection_info( + validator.tracker, + saddr=ip_address(client_addr), + daddr=ip_address(server_addr), + dport=client_socket.getpeername()[1], + sport=client_socket.getsockname()[1], + sisn=tcp_seq_wrap(send_seq - 1), + disn=tcp_seq_wrap(recv_seq - 1), + ) + + logger.info("transfer %d bytes", 2 * overflow) + fail_transfer = False + for iternum in range(2 * overflow // bufsize): + try: + if mode == "recv": + from .utils import randbytes + + send_buf = randbytes(bufsize) + client_socket.sendall(send_buf) + else: + check_socket_echo(client_socket, bufsize) + except: + logger.error("failed traffic on iteration %d", iternum, exc_info=True) + fail_transfer = True + break + + new_recv_seq, new_send_seq = get_tcp_repair_recv_send_queue_seq(client_socket) + logger.debug("final recv_seq %08x send_seq %08x", new_recv_seq, new_send_seq) + assert new_recv_seq < recv_seq or new_send_seq < send_seq + + # Validate capture + if signed and validator_enabled: + sniffer.stop() + for p in sniffer.results: + validator.handle_packet(p) + # Allow incomplete connections from FIN/ACK of connections dropped + # because of low seq/ack + validator.raise_errors(allow_incomplete=True) + client_scappy_key = TCPConnectionKey( + saddr=ip_address(client_addr), + daddr=ip_address(server_addr), + dport=client_socket.getpeername()[1], + sport=client_socket.getsockname()[1], + ) + client_scappy_conn = validator.tracker.get(client_scappy_key) + snd_sne_rollover = client_scappy_conn.snd_sne.sne != 0 + rcv_sne_rollover = client_scappy_conn.rcv_sne.sne != 0 + assert snd_sne_rollover or rcv_sne_rollover + + assert not fail_transfer
The RFC requires that TCP can report the keyid and rnextkeyid values being sent or received, implement this via getsockopt values.
The RFC also requires that user can select the sending key and that the sending key is automatically switched based on rnextkeyid. These requirements can conflict so we implement both and add a flag which specifies if user or peer request takes priority.
Also add an option to control rnextkeyid explicitly from userspace.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- Documentation/networking/tcp_authopt.rst | 25 ++++++ include/net/tcp_authopt.h | 38 ++++++++- include/uapi/linux/tcp.h | 31 ++++++++ net/ipv4/tcp_authopt.c | 98 +++++++++++++++++++++++- net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/tcp_ipv6.c | 3 +- 6 files changed, 190 insertions(+), 7 deletions(-)
diff --git a/Documentation/networking/tcp_authopt.rst b/Documentation/networking/tcp_authopt.rst index 484f66f41ad5..cded87a70d05 100644 --- a/Documentation/networking/tcp_authopt.rst +++ b/Documentation/networking/tcp_authopt.rst @@ -35,10 +35,35 @@ Keys can be bound to remote addresses in a way that is similar to TCP_MD5.
RFC5925 requires that key ids do not overlap when tcp identifiers (addr/port) overlap. This is not enforced by linux, configuring ambiguous keys will result in packet drops and lost connections.
+Key selection +------------- + +On getsockopt(TCP_AUTHOPT) information is provided about keyid/rnextkeyid in +the last send packet and about the keyid/rnextkeyd in the last valid received +packet. + +By default the sending keyid is selected to match the "rnextkeyid" value sent +by the remote side. If that keyid is not available (or for new connections) a +random matching key is selected. + +If the `TCP_AUTHOPT_LOCK_KEYID` is set then the sending key is selected by the +`tcp_authopt.send_local_id` field and rnextkeyid is ignored. If no key with +local_id == send_local_id is configured then a random matching key is +selected. + +The current sending key is cached in the socket and will not change unless +requested by remote rnextkeyid or by setsockopt. + +The rnextkeyid value sent on the wire is usually the recv_id of the current +key used for sending. If the TCP_AUTHOPT_LOCK_RNEXTKEY flag is set in +`tcp_authopt.flags` the value of `tcp_authopt.send_rnextkeyid` is send +instead. This can be used to implement smooth rollover: the peer will switch +its keyid to the received rnextkeyid when it is available. + ABI Reference =============
.. kernel-doc:: include/uapi/linux/tcp.h :identifiers: tcp_authopt tcp_authopt_flag tcp_authopt_key tcp_authopt_key_flag tcp_authopt_alg diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index ae7d6a1eab8d..9341e10ef542 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -66,10 +66,43 @@ struct tcp_authopt_info { u32 dst_isn; /** @rcv_sne: Recv-side Sequence Number Extension tracking tcp_sock.rcv_nxt */ u32 rcv_sne; /** @snd_sne: Send-side Sequence Number Extension tracking tcp_sock.snd_nxt */ u32 snd_sne; + + /** + * @send_keyid: keyid currently being sent + * + * This is controlled by userspace by userspace if + * TCP_AUTHOPT_FLAG_LOCK_KEYID, otherwise we try to match recv_rnextkeyid + */ + u8 send_keyid; + /** + * @send_rnextkeyid: rnextkeyid currently being sent + * + * This is controlled by userspace if TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID is set + */ + u8 send_rnextkeyid; + /** + * @recv_keyid: last keyid received from remote + * + * This is reported to userspace but has no other special behavior attached. + */ + u8 recv_keyid; + /** + * @recv_rnextkeyid: last rnextkeyid received from remote + * + * Linux tries to honor this unless TCP_AUTHOPT_FLAG_LOCK_KEYID is set + */ + u8 recv_rnextkeyid; + + /** + * @send_key: Current key used for sending, cached. + * + * Once a key is found it only changes by user or remote request. + */ + struct tcp_authopt_key_info *send_key; };
#ifdef CONFIG_TCP_AUTHOPT extern int sysctl_tcp_authopt; DECLARE_STATIC_KEY_FALSE(tcp_authopt_needed); @@ -81,22 +114,23 @@ int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *key); int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen); struct tcp_authopt_key_info *__tcp_authopt_select_key( const struct sock *sk, struct tcp_authopt_info *info, const struct sock *addr_sk, - u8 *rnextkeyid); + u8 *rnextkeyid, + bool locked); static inline struct tcp_authopt_key_info *tcp_authopt_select_key( const struct sock *sk, const struct sock *addr_sk, struct tcp_authopt_info **info, u8 *rnextkeyid) { if (static_branch_unlikely(&tcp_authopt_needed)) { *info = rcu_dereference(tcp_sk(sk)->authopt_info);
if (*info) - return __tcp_authopt_select_key(sk, *info, addr_sk, rnextkeyid); + return __tcp_authopt_select_key(sk, *info, addr_sk, rnextkeyid, true); } return NULL; } int tcp_authopt_hash( char *hash_location, diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 76d7be6b27f4..e02176390519 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -346,10 +346,24 @@ struct tcp_diag_md5sig {
/** * enum tcp_authopt_flag - flags for `tcp_authopt.flags` */ enum tcp_authopt_flag { + /** + * @TCP_AUTHOPT_FLAG_LOCK_KEYID: keyid controlled by sockopt + * + * If this is set `tcp_authopt.send_keyid` is used to determined sending + * key. Otherwise a key with send_id == recv_rnextkeyid is preferred. + */ + TCP_AUTHOPT_FLAG_LOCK_KEYID = (1 << 0), + /** + * @TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID: Override rnextkeyid from userspace + * + * If this is set then `tcp_authopt.send_rnextkeyid` is sent on outbound + * packets. Other the recv_id of the current sending key is sent. + */ + TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID = (1 << 1), /** * @TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED: * Configure behavior of segments with TCP-AO coming from hosts for which no * key is configured. The default recommended by RFC is to silently accept * such connections. @@ -361,10 +375,27 @@ enum tcp_authopt_flag { * struct tcp_authopt - Per-socket options related to TCP Authentication Option */ struct tcp_authopt { /** @flags: Combination of &enum tcp_authopt_flag */ __u32 flags; + /** + * @send_keyid: `tcp_authopt_key.send_id` of preferred send key + * + * This is only used if `TCP_AUTHOPT_FLAG_LOCK_KEYID` is set. + */ + __u8 send_keyid; + /** + * @send_rnextkeyid: The rnextkeyid to send in packets + * + * This is controlled by the user iff TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID is + * set. Otherwise rnextkeyid is the recv_id of the current key. + */ + __u8 send_rnextkeyid; + /** @recv_keyid: A recently-received keyid value. Only for getsockopt. */ + __u8 recv_keyid; + /** @recv_rnextkeyid: A recently-received rnextkeyid value. Only for getsockopt. */ + __u8 recv_rnextkeyid; };
/** * enum tcp_authopt_key_flag - flags for `tcp_authopt.flags` * diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index aef63e35b56f..a02fe0d14b63 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -285,20 +285,76 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct tcp_authopt_i * * @sk: socket * @info: socket's tcp_authopt_info * @addr_sk: socket used for address lookup. Same as sk except for synack case * @rnextkeyid: value of rnextkeyid caller should write in packet + * @locked: If we're holding the socket lock. This is false for some timewait and reset cases * * Result is protected by RCU and can't be stored, it may only be passed to * tcp_authopt_hash and only under a single rcu_read_lock. */ struct tcp_authopt_key_info *__tcp_authopt_select_key(const struct sock *sk, struct tcp_authopt_info *info, const struct sock *addr_sk, - u8 *rnextkeyid) + u8 *rnextkeyid, + bool locked) { - return tcp_authopt_lookup_send(info, addr_sk, -1); + struct tcp_authopt_key_info *key, *new_key = NULL; + + /* Listen sockets don't refer to any specific connection so we don't try + * to keep using the same key and ignore any received keyids. + */ + if (sk->sk_state == TCP_LISTEN) { + int send_keyid = -1; + + if (info->flags & TCP_AUTHOPT_FLAG_LOCK_KEYID) + send_keyid = info->send_keyid; + key = tcp_authopt_lookup_send(info, addr_sk, send_keyid); + if (key) + *rnextkeyid = key->recv_id; + + return key; + } + + if (locked) + key = rcu_dereference_protected(info->send_key, lockdep_sock_is_held(sk)); + else + key = rcu_dereference(info->send_key); + + /* Try to keep the same sending key unless user or peer requires a different key + * User request (via TCP_AUTHOPT_FLAG_LOCK_KEYID) always overrides peer request. + */ + if (info->flags & TCP_AUTHOPT_FLAG_LOCK_KEYID) { + int send_keyid = info->send_keyid; + + if (!key || key->send_id != send_keyid) + new_key = tcp_authopt_lookup_send(info, addr_sk, send_keyid); + } else { + if (!key || key->send_id != info->recv_rnextkeyid) + new_key = tcp_authopt_lookup_send(info, addr_sk, info->recv_rnextkeyid); + } + /* If no key found with specific send_id try anything else. */ + if (!key && !new_key) + new_key = tcp_authopt_lookup_send(info, addr_sk, -1); + + /* Update current key only if we hold the socket lock, otherwise we might + * store a pointer that goes stale + */ + if (new_key && key != new_key) { + key = new_key; + if (locked) + rcu_assign_pointer(info->send_key, key); + } + + if (key) { + if (info->flags & TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID) + *rnextkeyid = info->send_rnextkeyid; + else + *rnextkeyid = info->send_rnextkeyid = key->recv_id; + } + + return key; } EXPORT_SYMBOL(__tcp_authopt_select_key);
static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk) { @@ -321,10 +377,12 @@ static struct tcp_authopt_info *__tcp_authopt_info_get_or_create(struct sock *sk
return info; }
#define TCP_AUTHOPT_KNOWN_FLAGS ( \ + TCP_AUTHOPT_FLAG_LOCK_KEYID | \ + TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID | \ TCP_AUTHOPT_FLAG_REJECT_UNEXPECTED)
/* Like copy_from_sockopt except tolerate different optlen for compatibility reasons * * If the src is shorter then it's from an old userspace and the rest of dst is @@ -381,18 +439,23 @@ int tcp_set_authopt(struct sock *sk, sockptr_t optval, unsigned int optlen) info = __tcp_authopt_info_get_or_create(sk); if (IS_ERR(info)) return PTR_ERR(info);
info->flags = opt.flags & TCP_AUTHOPT_KNOWN_FLAGS; + if (opt.flags & TCP_AUTHOPT_FLAG_LOCK_KEYID) + info->send_keyid = opt.send_keyid; + if (opt.flags & TCP_AUTHOPT_FLAG_LOCK_RNEXTKEYID) + info->send_rnextkeyid = opt.send_rnextkeyid;
return 0; }
int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_authopt_info *info; + struct tcp_authopt_key_info *send_key;
memset(opt, 0, sizeof(*opt)); sock_owned_by_me(sk); if (!sysctl_tcp_authopt) return -EPERM; @@ -400,10 +463,22 @@ int tcp_get_authopt_val(struct sock *sk, struct tcp_authopt *opt) info = rcu_dereference_check(tp->authopt_info, lockdep_sock_is_held(sk)); if (!info) return -ENOENT;
opt->flags = info->flags & TCP_AUTHOPT_KNOWN_FLAGS; + /* These keyids might be undefined, for example before connect. + * Reporting zero is not strictly correct because there are no reserved + * values. + */ + send_key = rcu_dereference_check(info->send_key, lockdep_sock_is_held(sk)); + if (send_key) + opt->send_keyid = send_key->send_id; + else + opt->send_keyid = 0; + opt->send_rnextkeyid = info->send_rnextkeyid; + opt->recv_keyid = info->recv_keyid; + opt->recv_rnextkeyid = info->recv_rnextkeyid;
return 0; }
/* Free key nicely, for living sockets */ @@ -411,10 +486,12 @@ static void tcp_authopt_key_del(struct sock *sk, struct tcp_authopt_info *info, struct tcp_authopt_key_info *key) { sock_owned_by_me(sk); hlist_del_rcu(&key->node); + if (rcu_dereference_protected(info->send_key, lockdep_sock_is_held(sk)) == key) + rcu_assign_pointer(info->send_key, NULL); atomic_sub(sizeof(*key), &sk->sk_omem_alloc); kfree_rcu(key, rcu); }
/* Free info and keys. @@ -1439,11 +1516,11 @@ int __tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb, struct tcp NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); print_tcpao_notice("TCP Authentication Unexpected: Rejected", skb); return -EINVAL; } print_tcpao_notice("TCP Authentication Unexpected: Accepted", skb); - return 0; + goto accept; }
/* bad inbound key len */ if (opt->len != TCPOLEN_AUTHOPT_OUTPUT) return -EINVAL; @@ -1456,8 +1533,23 @@ int __tcp_authopt_inbound_check(struct sock *sk, struct sk_buff *skb, struct tcp NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTHOPTFAILURE); print_tcpao_notice("TCP Authentication Failed", skb); return -EINVAL; }
+accept: + /* Doing this for all valid packets will results in keyids temporarily + * flipping back and forth if packets are reordered or retransmitted + * but keys should eventually stabilize. + * + * This is connection-specific so don't store for listen sockets. + * + * We could store rnextkeyid from SYN in a request sock and use it for + * the SYNACK but we don't. + */ + if (sk->sk_state != TCP_LISTEN) { + info->recv_keyid = opt->keyid; + info->recv_rnextkeyid = opt->rnextkeyid; + } + return 1; } EXPORT_SYMBOL(__tcp_authopt_inbound_check); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 21971f5fa40e..2165b95ff7ed 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -664,11 +664,11 @@ static int tcp_v4_authopt_handle_reply(const struct sock *sk, info = tcp_twsk(sk)->tw_authopt_info; else info = tcp_sk(sk)->authopt_info; if (!info) return 0; - key_info = __tcp_authopt_select_key(sk, info, sk, &rnextkeyid); + key_info = __tcp_authopt_select_key(sk, info, sk, &rnextkeyid, false); if (!key_info) return 0; *optptr = htonl((TCPOPT_AUTHOPT << 24) | (TCPOLEN_AUTHOPT_OUTPUT << 16) | (key_info->send_id << 8) | diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 68f9545e4347..bb21f11f4246 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -920,11 +920,12 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 else authopt_info = rcu_dereference(tcp_sk(sk)->authopt_info);
if (authopt_info) { authopt_key_info = __tcp_authopt_select_key(sk, authopt_info, sk, - &authopt_rnextkeyid); + &authopt_rnextkeyid, + false); if (authopt_key_info) { tot_len += TCPOLEN_AUTHOPT_OUTPUT; /* Don't use MD5 */ key = NULL; }
RFC5925 requires that the use can examine or control the keys being used. This is implemented in linux via fields on the TCP_AUTHOPT sockopt.
Add socket-level tests for the adjusting keyids on live connections and checking the they are reflected on the peer.
Also check smooth transitions via rnextkeyid.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- .../tcp_authopt_test/linux_tcp_authopt.py | 16 +- .../tcp_authopt_test/test_rollover.py | 181 ++++++++++++++++++ 2 files changed, 194 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_rollover.py
diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py index b9dc9decda07..75cf5f993ccb 100644 --- a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py @@ -29,10 +29,12 @@ TCP_AUTHOPT_KEY = 39
TCP_AUTHOPT_MAXKEYLEN = 80
class TCP_AUTHOPT_FLAG(IntFlag): + LOCK_KEYID = BIT(0) + LOCK_RNEXTKEYID = BIT(1) REJECT_UNEXPECTED = BIT(2)
class TCP_AUTHOPT_KEY_FLAG(IntFlag): DEL = BIT(0) @@ -48,24 +50,32 @@ class TCP_AUTHOPT_ALG(IntEnum): @dataclass class tcp_authopt: """Like linux struct tcp_authopt"""
flags: int = 0 - sizeof = 4 + send_keyid: int = 0 + send_rnextkeyid: int = 0 + recv_keyid: int = 0 + recv_rnextkeyid: int = 0 + sizeof = 8
def pack(self) -> bytes: return struct.pack( - "I", + "IBBBB", self.flags, + self.send_keyid, + self.send_rnextkeyid, + self.recv_keyid, + self.recv_rnextkeyid, )
def __bytes__(self): return self.pack()
@classmethod def unpack(cls, b: bytes): - tup = struct.unpack("I", b) + tup = struct.unpack("IBBBB", b) return cls(*tup)
def set_tcp_authopt(sock, opt: tcp_authopt): return sock.setsockopt(socket.SOL_TCP, TCP_AUTHOPT, bytes(opt)) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_rollover.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_rollover.py new file mode 100644 index 000000000000..2f48706a90e5 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_rollover.py @@ -0,0 +1,181 @@ +# SPDX-License-Identifier: GPL-2.0 +import socket +import typing +from contextlib import ExitStack, contextmanager + +from .conftest import skipif_missing_tcp_authopt +from .linux_tcp_authopt import ( + TCP_AUTHOPT_FLAG, + get_tcp_authopt, + set_tcp_authopt, + set_tcp_authopt_key, + tcp_authopt, + tcp_authopt_key, +) +from .server import SimpleServerThread +from .utils import DEFAULT_TCP_SERVER_PORT, check_socket_echo, create_listen_socket + +pytestmark = skipif_missing_tcp_authopt + + +@contextmanager +def make_tcp_authopt_socket_pair( + server_addr="127.0.0.1", + server_authopt: tcp_authopt = None, + server_key_list: typing.Iterable[tcp_authopt_key] = [], + client_authopt: tcp_authopt = None, + client_key_list: typing.Iterable[tcp_authopt_key] = [], +) -> typing.Iterator[typing.Tuple[socket.socket, socket.socket]]: + """Make a pair for connected sockets for key switching tests + + Server runs in a background thread implementing echo protocol""" + with ExitStack() as exit_stack: + listen_socket = exit_stack.enter_context( + create_listen_socket(bind_addr=server_addr) + ) + server_thread = exit_stack.enter_context( + SimpleServerThread(listen_socket, mode="echo") + ) + client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + client_socket.settimeout(1.0) + + if server_authopt: + set_tcp_authopt(listen_socket, server_authopt) + for k in server_key_list: + set_tcp_authopt_key(listen_socket, k) + if client_authopt: + set_tcp_authopt(client_socket, client_authopt) + for k in client_key_list: + set_tcp_authopt_key(client_socket, k) + + client_socket.connect((server_addr, DEFAULT_TCP_SERVER_PORT)) + check_socket_echo(client_socket) + server_socket = server_thread.server_socket[0] + + yield client_socket, server_socket + + +def test_get_keyids(exit_stack: ExitStack): + """Check reading key ids""" + sk1 = tcp_authopt_key(send_id=11, recv_id=12, key="111") + sk2 = tcp_authopt_key(send_id=21, recv_id=22, key="222") + ck1 = tcp_authopt_key(send_id=12, recv_id=11, key="111") + client_socket, server_socket = exit_stack.enter_context( + make_tcp_authopt_socket_pair( + server_key_list=[sk1, sk2], + client_key_list=[ck1], + ) + ) + + check_socket_echo(client_socket) + client_tcp_authopt = get_tcp_authopt(client_socket) + server_tcp_authopt = get_tcp_authopt(server_socket) + assert server_tcp_authopt.send_keyid == 11 + assert server_tcp_authopt.send_rnextkeyid == 12 + assert server_tcp_authopt.recv_keyid == 12 + assert server_tcp_authopt.recv_rnextkeyid == 11 + assert client_tcp_authopt.send_keyid == 12 + assert client_tcp_authopt.send_rnextkeyid == 11 + assert client_tcp_authopt.recv_keyid == 11 + assert client_tcp_authopt.recv_rnextkeyid == 12 + + +def test_rollover_send_keyid(exit_stack: ExitStack): + """Check reading key ids""" + sk1 = tcp_authopt_key(send_id=11, recv_id=12, key="111") + sk2 = tcp_authopt_key(send_id=21, recv_id=22, key="222") + ck1 = tcp_authopt_key(send_id=12, recv_id=11, key="111") + ck2 = tcp_authopt_key(send_id=22, recv_id=21, key="222") + client_socket, server_socket = exit_stack.enter_context( + make_tcp_authopt_socket_pair( + server_key_list=[sk1, sk2], + client_key_list=[ck1, ck2], + client_authopt=tcp_authopt( + send_keyid=12, flags=TCP_AUTHOPT_FLAG.LOCK_KEYID + ), + ) + ) + + check_socket_echo(client_socket) + assert get_tcp_authopt(client_socket).recv_keyid == 11 + assert get_tcp_authopt(server_socket).recv_keyid == 12 + + # Explicit request for key2 + set_tcp_authopt( + client_socket, tcp_authopt(send_keyid=22, flags=TCP_AUTHOPT_FLAG.LOCK_KEYID) + ) + check_socket_echo(client_socket) + assert get_tcp_authopt(client_socket).recv_keyid == 21 + assert get_tcp_authopt(server_socket).recv_keyid == 22 + + +def test_rollover_rnextkeyid(exit_stack: ExitStack): + """Check reading key ids""" + sk1 = tcp_authopt_key(send_id=11, recv_id=12, key="111") + sk2 = tcp_authopt_key(send_id=21, recv_id=22, key="222") + ck1 = tcp_authopt_key(send_id=12, recv_id=11, key="111") + ck2 = tcp_authopt_key(send_id=22, recv_id=21, key="222") + client_socket, server_socket = exit_stack.enter_context( + make_tcp_authopt_socket_pair( + server_key_list=[sk1], + client_key_list=[ck1, ck2], + client_authopt=tcp_authopt( + send_keyid=12, flags=TCP_AUTHOPT_FLAG.LOCK_KEYID + ), + ) + ) + + check_socket_echo(client_socket) + assert get_tcp_authopt(server_socket).recv_rnextkeyid == 11 + + # request rnextkeyd=22 but server does not have it + set_tcp_authopt( + client_socket, + tcp_authopt(send_rnextkeyid=21, flags=TCP_AUTHOPT_FLAG.LOCK_RNEXTKEYID), + ) + check_socket_echo(client_socket) + check_socket_echo(client_socket) + assert get_tcp_authopt(server_socket).recv_rnextkeyid == 21 + assert get_tcp_authopt(server_socket).send_keyid == 11 + + # after adding k2 on server the key is switched + set_tcp_authopt_key(server_socket, sk2) + check_socket_echo(client_socket) + check_socket_echo(client_socket) + assert get_tcp_authopt(server_socket).send_keyid == 21 + + +def test_rollover_delkey(exit_stack: ExitStack): + sk1 = tcp_authopt_key(send_id=11, recv_id=12, key="111") + sk2 = tcp_authopt_key(send_id=21, recv_id=22, key="222") + ck1 = tcp_authopt_key(send_id=12, recv_id=11, key="111") + ck2 = tcp_authopt_key(send_id=22, recv_id=21, key="222") + client_socket, server_socket = exit_stack.enter_context( + make_tcp_authopt_socket_pair( + server_key_list=[sk1, sk2], + client_key_list=[ck1, ck2], + client_authopt=tcp_authopt( + send_keyid=12, flags=TCP_AUTHOPT_FLAG.LOCK_KEYID + ), + ) + ) + + check_socket_echo(client_socket) + assert get_tcp_authopt(server_socket).recv_keyid == 12 + + # invalid send_keyid is just ignored + set_tcp_authopt(client_socket, tcp_authopt(send_keyid=7)) + check_socket_echo(client_socket) + assert get_tcp_authopt(client_socket).send_keyid == 12 + assert get_tcp_authopt(server_socket).recv_keyid == 12 + assert get_tcp_authopt(client_socket).recv_keyid == 11 + + # If a key is removed it is replaced by anything that matches + ck1.delete_flag = True + set_tcp_authopt_key(client_socket, ck1) + check_socket_echo(client_socket) + check_socket_echo(client_socket) + assert get_tcp_authopt(client_socket).send_keyid == 22 + assert get_tcp_authopt(server_socket).send_keyid == 21 + assert get_tcp_authopt(server_socket).recv_keyid == 22 + assert get_tcp_authopt(client_socket).recv_keyid == 21
This is a parallel feature to tcp_md5sig.tcpm_ifindex support and allows applications to server multiple VRFs with a single socket.
The ifindex argument must be the ifindex of a VRF device and must match exactly, keys with ifindex == 0 (outside of VRF) will not match for connections inside a VRF.
Keys without the TCP_AUTHOPT_KEY_IFINDEX will ignore ifindex and match both inside and outside VRF.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- include/net/tcp_authopt.h | 2 ++ include/uapi/linux/tcp.h | 11 ++++++ net/ipv4/tcp_authopt.c | 76 +++++++++++++++++++++++++++++++++++---- 3 files changed, 82 insertions(+), 7 deletions(-)
diff --git a/include/net/tcp_authopt.h b/include/net/tcp_authopt.h index 9341e10ef542..072d5383f14b 100644 --- a/include/net/tcp_authopt.h +++ b/include/net/tcp_authopt.h @@ -39,10 +39,12 @@ struct tcp_authopt_key_info { u8 alg_id; /** @keylen: Same as &tcp_authopt_key.keylen */ u8 keylen; /** @key: Same as &tcp_authopt_key.key */ u8 key[TCP_AUTHOPT_MAXKEYLEN]; + /** @l3index: Same as &tcp_authopt_key.ifindex */ + int l3index; /** @addr: Same as &tcp_authopt_key.addr */ struct sockaddr_storage addr; /** @alg: Algorithm implementation matching alg_id */ struct tcp_authopt_alg_imp *alg; }; diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index e02176390519..a7f5f918ed5a 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -400,15 +400,17 @@ struct tcp_authopt { * enum tcp_authopt_key_flag - flags for `tcp_authopt.flags` * * @TCP_AUTHOPT_KEY_DEL: Delete the key and ignore non-id fields * @TCP_AUTHOPT_KEY_EXCLUDE_OPTS: Exclude TCP options from signature * @TCP_AUTHOPT_KEY_ADDR_BIND: Key only valid for `tcp_authopt.addr` + * @TCP_AUTHOPT_KEY_IFINDEX: Key only valid for `tcp_authopt.ifindex` */ enum tcp_authopt_key_flag { TCP_AUTHOPT_KEY_DEL = (1 << 0), TCP_AUTHOPT_KEY_EXCLUDE_OPTS = (1 << 1), TCP_AUTHOPT_KEY_ADDR_BIND = (1 << 2), + TCP_AUTHOPT_KEY_IFINDEX = (1 << 3), };
/** * enum tcp_authopt_alg - Algorithms for TCP Authentication Option */ @@ -450,10 +452,19 @@ struct tcp_authopt_key { * @addr: Key is only valid for this address * * Ignored unless TCP_AUTHOPT_KEY_ADDR_BIND flag is set */ struct __kernel_sockaddr_storage addr; + /** + * @ifindex: ifindex of vrf (l3mdev_master) interface + * + * If the TCP_AUTHOPT_KEY_IFINDEX flag is set then key only applies for + * connections through this interface. Interface must be an vrf master. + * + * This is similar to `tcp_msg5sig.tcpm_ifindex` + */ + int ifindex; };
/* setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) */
#define TCP_RECEIVE_ZEROCOPY_FLAG_TLB_CLEAN_HINT 0x1 diff --git a/net/ipv4/tcp_authopt.c b/net/ipv4/tcp_authopt.c index a02fe0d14b63..f497537ce16c 100644 --- a/net/ipv4/tcp_authopt.c +++ b/net/ipv4/tcp_authopt.c @@ -1,7 +1,8 @@ // SPDX-License-Identifier: GPL-2.0-or-later
+#include "linux/net.h" #include <linux/kernel.h> #include <net/tcp.h> #include <net/tcp_authopt.h> #include <crypto/hash.h>
@@ -190,10 +191,14 @@ static bool tcp_authopt_key_match_exact(struct tcp_authopt_key_info *info, { if (info->send_id != key->send_id) return false; if (info->recv_id != key->recv_id) return false; + if ((info->flags & TCP_AUTHOPT_KEY_IFINDEX) != (key->flags & TCP_AUTHOPT_KEY_IFINDEX)) + return false; + if ((info->flags & TCP_AUTHOPT_KEY_IFINDEX) && info->l3index != key->ifindex) + return false; if ((info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) != (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND)) return false; if (info->flags & TCP_AUTHOPT_KEY_ADDR_BIND) if (!ipvx_addr_match(&info->addr, &key->addr)) return false; @@ -257,26 +262,49 @@ static struct tcp_authopt_key_info *tcp_authopt_key_lookup_exact(const struct so return key_info;
return NULL; }
+static bool better_key_match(struct tcp_authopt_key_info *old, struct tcp_authopt_key_info *new) +{ + if (!old) + return true; + + /* l3index always overrides non-l3index */ + if (old->l3index && new->l3index == 0) + return false; + if (old->l3index == 0 && new->l3index) + return true; + + return false; +} + static struct tcp_authopt_key_info *tcp_authopt_lookup_send(struct tcp_authopt_info *info, const struct sock *addr_sk, int send_id) { struct tcp_authopt_key_info *result = NULL; struct tcp_authopt_key_info *key; + int l3index = -1;
hlist_for_each_entry_rcu(key, &info->head, node, 0) { if (send_id >= 0 && key->send_id != send_id) continue; if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND) if (!tcp_authopt_key_match_sk_addr(key, addr_sk)) continue; - if (result && net_ratelimit()) - pr_warn("ambiguous tcp authentication keys configured for send\n"); - result = key; + if (key->flags & TCP_AUTHOPT_KEY_IFINDEX) { + if (l3index < 0) + l3index = l3mdev_master_ifindex_by_index(sock_net(addr_sk), + addr_sk->sk_bound_dev_if); + if (l3index != key->l3index) + continue; + } + if (better_key_match(result, key)) + result = key; + else if (result) + net_warn_ratelimited("ambiguous tcp authentication keys configured for send\n"); }
return result; }
@@ -527,18 +555,20 @@ void tcp_authopt_clear(struct sock *sk) }
#define TCP_AUTHOPT_KEY_KNOWN_FLAGS ( \ TCP_AUTHOPT_KEY_DEL | \ TCP_AUTHOPT_KEY_EXCLUDE_OPTS | \ - TCP_AUTHOPT_KEY_ADDR_BIND) + TCP_AUTHOPT_KEY_ADDR_BIND | \ + TCP_AUTHOPT_KEY_IFINDEX)
int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) { struct tcp_authopt_key opt; struct tcp_authopt_info *info; struct tcp_authopt_key_info *key_info, *old_key_info; struct tcp_authopt_alg_imp *alg; + int l3index = 0; int err;
sock_owned_by_me(sk); if (!sysctl_tcp_authopt) return -EPERM; @@ -584,10 +614,24 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) return -EINVAL; err = tcp_authopt_alg_require(alg); if (err) return err;
+ /* check ifindex is valid (zero is always valid) */ + if (opt.flags & TCP_AUTHOPT_KEY_IFINDEX && opt.ifindex) { + struct net_device *dev; + + rcu_read_lock(); + dev = dev_get_by_index_rcu(sock_net(sk), opt.ifindex); + if (dev && netif_is_l3_master(dev)) + l3index = dev->ifindex; + rcu_read_unlock(); + + if (!l3index) + return -EINVAL; + } + key_info = sock_kmalloc(sk, sizeof(*key_info), GFP_KERNEL | __GFP_ZERO); if (!key_info) return -ENOMEM; /* If an old key exists with exact ID then remove and replace. * RCU-protected readers might observe both and pick any. @@ -601,10 +645,11 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) key_info->alg_id = opt.alg; key_info->alg = alg; key_info->keylen = opt.keylen; memcpy(key_info->key, opt.key, opt.keylen); memcpy(&key_info->addr, &opt.addr, sizeof(key_info->addr)); + key_info->l3index = l3index; hlist_add_head_rcu(&key_info->node, &info->head);
return 0; }
@@ -1436,21 +1481,38 @@ static struct tcp_authopt_key_info *tcp_authopt_lookup_recv(struct sock *sk, struct tcp_authopt_info *info, int recv_id) { struct tcp_authopt_key_info *result = NULL; struct tcp_authopt_key_info *key; + int l3index = -1;
/* multiple matches will cause occasional failures */ hlist_for_each_entry_rcu(key, &info->head, node, 0) { if (recv_id >= 0 && key->recv_id != recv_id) continue; if (key->flags & TCP_AUTHOPT_KEY_ADDR_BIND && !tcp_authopt_key_match_skb_addr(key, skb)) continue; - if (result && net_ratelimit()) - pr_warn("ambiguous tcp authentication keys configured for receive\n"); - result = key; + if (key->flags & TCP_AUTHOPT_KEY_IFINDEX) { + if (l3index < 0) { + if (skb->protocol == htons(ETH_P_IP)) { + l3index = inet_sdif(skb) ? inet_iif(skb) : 0; + } else if (skb->protocol == htons(ETH_P_IPV6)) { + l3index = inet6_sdif(skb) ? inet6_iif(skb) : 0; + } else { + WARN_ONCE(1, "unexpected skb->protocol=%x", skb->protocol); + continue; + } + } + + if (l3index != key->l3index) + continue; + } + if (better_key_match(result, key)) + result = key; + else if (result) + net_warn_ratelimited("ambiguous tcp authentication keys configured for send\n"); }
return result; }
On 11/1/21 10:34 AM, Leonard Crestez wrote:
@@ -584,10 +614,24 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) return -EINVAL; err = tcp_authopt_alg_require(alg); if (err) return err;
- /* check ifindex is valid (zero is always valid) */
- if (opt.flags & TCP_AUTHOPT_KEY_IFINDEX && opt.ifindex) {
struct net_device *dev;
rcu_read_lock();
dev = dev_get_by_index_rcu(sock_net(sk), opt.ifindex);
if (dev && netif_is_l3_master(dev))
l3index = dev->ifindex;
rcu_read_unlock();
rcu_read_lock()... rcu_read_unlock() can be replaced with netif_index_is_l3_master(...)
On 11/3/21 5:06 AM, David Ahern wrote:
On 11/1/21 10:34 AM, Leonard Crestez wrote:
@@ -584,10 +614,24 @@ int tcp_set_authopt_key(struct sock *sk, sockptr_t optval, unsigned int optlen) return -EINVAL; err = tcp_authopt_alg_require(alg); if (err) return err;
- /* check ifindex is valid (zero is always valid) */
- if (opt.flags & TCP_AUTHOPT_KEY_IFINDEX && opt.ifindex) {
struct net_device *dev;
rcu_read_lock();
dev = dev_get_by_index_rcu(sock_net(sk), opt.ifindex);
if (dev && netif_is_l3_master(dev))
l3index = dev->ifindex;
rcu_read_unlock();
rcu_read_lock()... rcu_read_unlock() can be replaced with netif_index_is_l3_master(...)
Yes, this makes the code shorter.
These tests also verify functionality in TCP-MD5 and unsigned traffic modes. They were used to find the issue fixed in commit 86f1e3a8489f ("tcp: md5: Fix overlap between vrf and non-vrf keys")
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- .../tcp_authopt_test/linux_tcp_authopt.py | 9 + .../tcp_authopt_test/test_vrf_bind.py | 492 ++++++++++++++++++ .../tcp_authopt_test/vrf_netns_fixture.py | 127 +++++ 3 files changed, 628 insertions(+) create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vrf_bind.py create mode 100644 tools/testing/selftests/tcp_authopt/tcp_authopt_test/vrf_netns_fixture.py
diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py index 75cf5f993ccb..2a720d49cba2 100644 --- a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/linux_tcp_authopt.py @@ -38,10 +38,11 @@ class TCP_AUTHOPT_FLAG(IntFlag):
class TCP_AUTHOPT_KEY_FLAG(IntFlag): DEL = BIT(0) EXCLUDE_OPTS = BIT(1) BIND_ADDR = BIT(2) + IFINDEX = BIT(3)
class TCP_AUTHOPT_ALG(IntEnum): HMAC_SHA_1_96 = 1 AES_128_CMAC_96 = 2 @@ -102,25 +103,31 @@ class tcp_authopt_key: recv_id: int = 0, alg=TCP_AUTHOPT_ALG.HMAC_SHA_1_96, key: KeyArgType = b"", addr: AddrArgType = None, auto_flags: bool = True, + ifindex: typing.Optional[int] = None, include_options=None, ): self.flags = flags self.send_id = send_id self.recv_id = recv_id self.alg = alg self.key = key + self.ifindex = ifindex self.addr = addr self.auto_flags = auto_flags if include_options is not None: self.include_options = include_options
def get_real_flags(self) -> TCP_AUTHOPT_KEY_FLAG: result = self.flags if self.auto_flags: + if self.ifindex is not None: + result |= TCP_AUTHOPT_KEY_FLAG.IFINDEX + else: + result &= ~TCP_AUTHOPT_KEY_FLAG.IFINDEX if self.addr is not None: result |= TCP_AUTHOPT_KEY_FLAG.BIND_ADDR else: result &= ~TCP_AUTHOPT_KEY_FLAG.BIND_ADDR return result @@ -136,10 +143,12 @@ class tcp_authopt_key: self.alg, len(self.key), self.key, ) data += bytes(self.addrbuf.ljust(sockaddr_storage.sizeof, b"\x00")) + if self.ifindex is not None: + data += bytes(struct.pack("I", self.ifindex)) return data
def __bytes__(self): return self.pack()
diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vrf_bind.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vrf_bind.py new file mode 100644 index 000000000000..da43ac8842e5 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/test_vrf_bind.py @@ -0,0 +1,492 @@ +# SPDX-License-Identifier: GPL-2.0 +"""Test VRF overlap behavior + +With tcp_l3mdev_accept single server should be able to differentiate multiple +clients with same IP coming from different VRFs. +""" +import errno +import logging +import socket +from contextlib import ExitStack + +import pytest + +from . import linux_tcp_md5sig +from .conftest import parametrize_product, skipif_missing_tcp_authopt +from .linux_tcp_authopt import ( + set_tcp_authopt_key, + set_tcp_authopt_key_kwargs, + tcp_authopt_key, +) +from .server import SimpleServerThread +from .utils import ( + DEFAULT_TCP_SERVER_PORT, + check_socket_echo, + create_client_socket, + create_listen_socket, +) +from .vrf_netns_fixture import VrfNamespaceFixture + +logger = logging.getLogger(__name__) + + +class VrfFixture: + """Fixture for VRF testing + + Single server has two interfaces with same IP addr: one inside VRF and one + outside. Two clients two namespaces have same client IP, one connected to + VRF and one outside. + """ + + def __init__( + self, + address_family=socket.AF_INET, + tcp_l3mdev_accept=1, + init_default_listen_socket=True, + ): + self.address_family = address_family + self.tcp_l3mdev_accept = tcp_l3mdev_accept + self.init_default_listen_socket = init_default_listen_socket + + @property + def server_addr(self): + if self.address_family == socket.AF_INET: + return self.nsfixture.server_ipv4_addr + else: + return self.nsfixture.server_ipv6_addr + + @property + def client_addr(self): + if self.address_family == socket.AF_INET: + return self.nsfixture.client_ipv4_addr + else: + return self.nsfixture.client_ipv6_addr + + @property + def server_addr_port(self): + return (str(self.server_addr), DEFAULT_TCP_SERVER_PORT) + + @property + def vrf1_ifindex(self): + return self.nsfixture.server_vrf1_ifindex + + @property + def vrf2_ifindex(self): + return self.nsfixture.server_vrf2_ifindex + + def create_listen_socket(self, **kw): + result = create_listen_socket( + family=self.address_family, + ns=self.nsfixture.server_netns_name, + bind_addr=self.server_addr, + **kw + ) + self.exit_stack.enter_context(result) + return result + + def create_client_socket(self, ns): + result = create_client_socket( + ns=ns, family=self.address_family, bind_addr=self.client_addr + ) + self.exit_stack.enter_context(result) + return result + + def __enter__(self): + self.exit_stack = ExitStack() + self.exit_stack.__enter__() + self.nsfixture = self.exit_stack.enter_context( + VrfNamespaceFixture(tcp_l3mdev_accept=self.tcp_l3mdev_accept) + ) + + self.server_thread = SimpleServerThread(mode="echo") + if self.init_default_listen_socket: + self.listen_socket = self.create_listen_socket() + self.server_thread.add_listen_socket(self.listen_socket) + self.exit_stack.enter_context(self.server_thread) + return self + + def __exit__(self, *args): + self.exit_stack.__exit__(*args) + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_vrf_overlap_unsigned(exit_stack: ExitStack, address_family): + """Test without any signature support""" + fix = VrfFixture(address_family) + exit_stack.enter_context(fix) + + client_socket0 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + client_socket1 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + client_socket2 = fix.create_client_socket(fix.nsfixture.client2_netns_name) + + client_socket2.connect(fix.server_addr_port) + client_socket1.connect(fix.server_addr_port) + client_socket0.connect(fix.server_addr_port) + check_socket_echo(client_socket1) + check_socket_echo(client_socket2) + check_socket_echo(client_socket0) + check_socket_echo(client_socket1) + check_socket_echo(client_socket2) + check_socket_echo(client_socket0) + check_socket_echo(client_socket2) + + +KEY0 = b"00000" +KEY1 = b"1" +KEY2 = b"22" + + +def set_server_md5(fix, key=KEY0, **kw): + linux_tcp_md5sig.setsockopt_md5sig_kwargs( + fix.listen_socket, key=key, addr=fix.client_addr, **kw + ) + + +def set_server_md5_key0(fix, key=KEY0): + return set_server_md5(fix, key=key) + + +def set_server_md5_key1(fix, key=KEY1): + return set_server_md5(fix, key=key, ifindex=fix.vrf1_ifindex) + + +def set_server_md5_key2(fix, key=KEY2): + return set_server_md5(fix, key=key, ifindex=fix.vrf2_ifindex) + + +def set_client_md5_key(fix, client_socket, key): + linux_tcp_md5sig.setsockopt_md5sig_kwargs( + client_socket, key=key, addr=fix.server_addr + ) + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_vrf_overlap_md5_samekey(exit_stack: ExitStack, address_family): + """Test overlapping keys that are identical""" + fix = VrfFixture(address_family) + exit_stack.enter_context(fix) + set_server_md5_key0(fix, b"same") + set_server_md5_key1(fix, b"same") + set_server_md5_key2(fix, b"same") + client_socket0 = fix.create_client_socket(fix.nsfixture.client0_netns_name) + client_socket1 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + client_socket2 = fix.create_client_socket(fix.nsfixture.client2_netns_name) + set_client_md5_key(fix, client_socket0, b"same") + set_client_md5_key(fix, client_socket1, b"same") + set_client_md5_key(fix, client_socket2, b"same") + client_socket0.connect(fix.server_addr_port) + client_socket1.connect(fix.server_addr_port) + client_socket2.connect(fix.server_addr_port) + check_socket_echo(client_socket1) + check_socket_echo(client_socket2) + check_socket_echo(client_socket0) + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_vrf_overlap12_md5(exit_stack: ExitStack, address_family): + """Test overlapping keys between vrfs""" + fix = VrfFixture(address_family) + exit_stack.enter_context(fix) + set_server_md5_key1(fix) + set_server_md5_key2(fix) + client_socket1 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + client_socket2 = fix.create_client_socket(fix.nsfixture.client2_netns_name) + set_client_md5_key(fix, client_socket1, KEY1) + set_client_md5_key(fix, client_socket2, KEY2) + client_socket1.connect(fix.server_addr_port) + client_socket2.connect(fix.server_addr_port) + check_socket_echo(client_socket1) + check_socket_echo(client_socket2) + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_vrf_overlap01_md5(exit_stack: ExitStack, address_family): + """Test overlapping keys inside and outside vrf, VRF key added second""" + fix = VrfFixture(address_family) + exit_stack.enter_context(fix) + set_server_md5_key0(fix) + set_server_md5_key1(fix) + client_socket0 = fix.create_client_socket(fix.nsfixture.client0_netns_name) + client_socket1 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + set_client_md5_key(fix, client_socket0, KEY0) + set_client_md5_key(fix, client_socket1, KEY1) + client_socket1.connect(fix.server_addr_port) + client_socket0.connect(fix.server_addr_port) + check_socket_echo(client_socket0) + check_socket_echo(client_socket1) + + +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_vrf_overlap10_md5(exit_stack: ExitStack, address_family): + """Test overlapping keys inside and outside vrf, VRF key added first""" + fix = VrfFixture(address_family) + exit_stack.enter_context(fix) + set_server_md5_key1(fix) + set_server_md5_key0(fix) + client_socket0 = fix.create_client_socket(fix.nsfixture.client0_netns_name) + client_socket1 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + set_client_md5_key(fix, client_socket0, KEY0) + set_client_md5_key(fix, client_socket1, KEY1) + client_socket1.connect(fix.server_addr_port) + client_socket0.connect(fix.server_addr_port) + check_socket_echo(client_socket0) + check_socket_echo(client_socket1) + + +@pytest.mark.parametrize("address_family", [socket.AF_INET]) +def test_vrf_overlap_md5_prefix(exit_stack: ExitStack, address_family): + """VRF keys should take precedence even if prefixlen is low""" + fix = VrfFixture(address_family) + exit_stack.enter_context(fix) + set_server_md5(fix, key=b"fail", prefixlen=16) + set_server_md5( + fix, key=b"pass", ifindex=fix.nsfixture.server_vrf1_ifindex, prefixlen=1 + ) + set_server_md5(fix, key=b"fail", prefixlen=24) + + # connect via VRF + client_socket = fix.create_client_socket(fix.nsfixture.client1_netns_name) + set_client_md5_key(fix, client_socket, b"pass") + client_socket.connect(fix.server_addr_port) + + +class TestVRFOverlapAOBoundKeyPrecedence: + """Keys bound to VRF should take precedence over unbound keys. + + KEY0 is unbound (accepts all vrfs) + KEY1 is bound to vrf1 + """ + + fix: VrfFixture + + @pytest.fixture( + autouse=True, + scope="class", + params=[socket.AF_INET, socket.AF_INET6], + ) + def init(self, request: pytest.FixtureRequest): + address_family = request.param + logger.info("init address_family=%s", address_family) + with ExitStack() as exit_stack: + fix = exit_stack.enter_context(VrfFixture(address_family)) + set_tcp_authopt_key_kwargs( + fix.listen_socket, + key=KEY0, + ifindex=None, + ) + set_tcp_authopt_key_kwargs( + fix.listen_socket, + key=KEY1, + ifindex=fix.vrf1_ifindex, + ) + self.__class__.fix = fix + yield + logger.info("done address_family=%s", address_family) + + def test_vrf1_key0(self): + client_socket = self.fix.create_client_socket( + self.fix.nsfixture.client1_netns_name + ) + set_tcp_authopt_key_kwargs(client_socket, key=KEY0) + with pytest.raises(socket.timeout): + client_socket.connect(self.fix.server_addr_port) + + def test_vrf1_key1(self): + client_socket = self.fix.create_client_socket( + self.fix.nsfixture.client1_netns_name + ) + set_tcp_authopt_key_kwargs(client_socket, key=KEY1) + client_socket.connect(self.fix.server_addr_port) + + def test_vrf2_key0(self): + client_socket = self.fix.create_client_socket( + self.fix.nsfixture.client2_netns_name + ) + set_tcp_authopt_key_kwargs(client_socket, key=KEY0) + client_socket.connect(self.fix.server_addr_port) + + def test_vrf2_key1(self): + client_socket = self.fix.create_client_socket( + self.fix.nsfixture.client2_netns_name + ) + set_tcp_authopt_key_kwargs(client_socket, key=KEY1) + with pytest.raises(socket.timeout): + client_socket.connect(self.fix.server_addr_port) + + +def assert_raises_enoent(func): + with pytest.raises(OSError) as e: + func() + assert e.value.errno == errno.ENOENT + + +def test_vrf_overlap_md5_del_0110(): + """Removing keys should not raise ENOENT because they are distinct""" + with VrfFixture() as fix: + set_server_md5(fix, key=KEY0) + set_server_md5(fix, key=KEY1, ifindex=fix.vrf1_ifindex) + set_server_md5(fix, key=b"", ifindex=fix.vrf1_ifindex) + set_server_md5(fix, key=b"") + assert_raises_enoent(lambda: set_server_md5(fix, key=b"")) + + +def test_vrf_overlap_md5_del_1001(): + """Removing keys should not raise ENOENT because they are distinct""" + with VrfFixture() as fix: + set_server_md5(fix, key=KEY1, ifindex=fix.vrf1_ifindex) + set_server_md5(fix, key=KEY0) + set_server_md5(fix, key=b"") + set_server_md5(fix, key=b"", ifindex=fix.vrf1_ifindex) + assert_raises_enoent(lambda: set_server_md5(fix, key=b"")) + + +def test_vrf_overlap_md5_del_1010(): + """Removing keys should not raise ENOENT because they are distinct""" + with VrfFixture() as fix: + set_server_md5(fix, key=KEY1, ifindex=fix.vrf1_ifindex) + set_server_md5(fix, key=KEY0) + set_server_md5(fix, key=b"", ifindex=fix.vrf1_ifindex) + set_server_md5(fix, key=b"") + assert_raises_enoent(lambda: set_server_md5(fix, key=b"")) + + +@skipif_missing_tcp_authopt +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_vrf_overlap_ao_samekey(exit_stack: ExitStack, address_family): + """Single server serving both VRF and non-VRF client with same password. + + This requires no special support from TCP-AO. + """ + fix = VrfFixture(address_family) + exit_stack.enter_context(fix) + set_tcp_authopt_key(fix.listen_socket, tcp_authopt_key(key="11111")) + + client_socket1 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + client_socket2 = fix.create_client_socket(fix.nsfixture.client2_netns_name) + + set_tcp_authopt_key(client_socket1, tcp_authopt_key(key="11111")) + set_tcp_authopt_key(client_socket2, tcp_authopt_key(key="11111")) + client_socket1.connect(fix.server_addr_port) + client_socket2.connect(fix.server_addr_port) + check_socket_echo(client_socket1) + check_socket_echo(client_socket2) + check_socket_echo(client_socket1) + check_socket_echo(client_socket2) + + +@skipif_missing_tcp_authopt +@pytest.mark.parametrize("address_family", [socket.AF_INET, socket.AF_INET6]) +def test_vrf_overlap_ao(exit_stack: ExitStack, address_family): + """Single server serving both VRF and non-VRF client with different passwords + + This requires kernel to handle ifindex + """ + fix = VrfFixture(address_family) + exit_stack.enter_context(fix) + set_tcp_authopt_key( + fix.listen_socket, + tcp_authopt_key(key=KEY0, ifindex=0), + ) + set_tcp_authopt_key( + fix.listen_socket, + tcp_authopt_key(key=KEY1, ifindex=fix.vrf1_ifindex), + ) + set_tcp_authopt_key( + fix.listen_socket, + tcp_authopt_key(key=KEY2, ifindex=fix.vrf2_ifindex), + ) + + client_socket0 = fix.create_client_socket(fix.nsfixture.client0_netns_name) + client_socket1 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + client_socket2 = fix.create_client_socket(fix.nsfixture.client2_netns_name) + set_tcp_authopt_key(client_socket0, tcp_authopt_key(key=KEY0)) + set_tcp_authopt_key(client_socket1, tcp_authopt_key(key=KEY1)) + set_tcp_authopt_key(client_socket2, tcp_authopt_key(key=KEY2)) + client_socket0.connect(fix.server_addr_port) + client_socket1.connect(fix.server_addr_port) + client_socket2.connect(fix.server_addr_port) + check_socket_echo(client_socket0) + check_socket_echo(client_socket1) + check_socket_echo(client_socket2) + check_socket_echo(client_socket0) + check_socket_echo(client_socket1) + check_socket_echo(client_socket2) + + +@parametrize_product( + address_family=(socket.AF_INET, socket.AF_INET6), + tcp_l3mdev_accept=(0, 1), + bind_key_to_vrf=(0, 1), +) +def test_md5_pervrf( + exit_stack: ExitStack, address_family, tcp_l3mdev_accept, bind_key_to_vrf +): + """Test one VRF-bound socket. + + Since the socket is already bound to the vrf binding the key should not be required. + """ + fix = VrfFixture( + address_family, + tcp_l3mdev_accept=tcp_l3mdev_accept, + init_default_listen_socket=False, + ) + exit_stack.enter_context(fix) + listen_socket1 = fix.create_listen_socket(bind_device="veth1") + linux_tcp_md5sig.setsockopt_md5sig_kwargs( + listen_socket1, + key=KEY1, + addr=fix.client_addr, + ifindex=fix.vrf1_ifindex if bind_key_to_vrf else None, + ) + fix.server_thread.add_listen_socket(listen_socket1) + client_socket1 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + set_client_md5_key(fix, client_socket1, KEY1) + client_socket1.connect(fix.server_addr_port) + check_socket_echo(client_socket1) + + +@pytest.mark.parametrize( + "address_family", + (socket.AF_INET, socket.AF_INET6), +) +def test_vrf_overlap_md5_pervrf(exit_stack: ExitStack, address_family): + """Test overlapping via per-VRF sockets""" + fix = VrfFixture( + address_family, + tcp_l3mdev_accept=0, + init_default_listen_socket=False, + ) + exit_stack.enter_context(fix) + listen_socket0 = fix.create_listen_socket() + listen_socket1 = fix.create_listen_socket(bind_device="veth1") + listen_socket2 = fix.create_listen_socket(bind_device="veth2") + linux_tcp_md5sig.setsockopt_md5sig_kwargs( + listen_socket0, + key=KEY0, + addr=fix.client_addr, + ) + linux_tcp_md5sig.setsockopt_md5sig_kwargs( + listen_socket1, + key=KEY1, + addr=fix.client_addr, + ) + linux_tcp_md5sig.setsockopt_md5sig_kwargs( + listen_socket2, + key=KEY2, + addr=fix.client_addr, + ) + fix.server_thread.add_listen_socket(listen_socket0) + fix.server_thread.add_listen_socket(listen_socket1) + fix.server_thread.add_listen_socket(listen_socket2) + client_socket0 = fix.create_client_socket(fix.nsfixture.client0_netns_name) + client_socket1 = fix.create_client_socket(fix.nsfixture.client1_netns_name) + client_socket2 = fix.create_client_socket(fix.nsfixture.client2_netns_name) + set_client_md5_key(fix, client_socket0, KEY0) + set_client_md5_key(fix, client_socket1, KEY1) + set_client_md5_key(fix, client_socket2, KEY2) + client_socket0.connect(fix.server_addr_port) + client_socket1.connect(fix.server_addr_port) + client_socket2.connect(fix.server_addr_port) + check_socket_echo(client_socket1) + check_socket_echo(client_socket1) + check_socket_echo(client_socket0) diff --git a/tools/testing/selftests/tcp_authopt/tcp_authopt_test/vrf_netns_fixture.py b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/vrf_netns_fixture.py new file mode 100644 index 000000000000..ff9c0959a268 --- /dev/null +++ b/tools/testing/selftests/tcp_authopt/tcp_authopt_test/vrf_netns_fixture.py @@ -0,0 +1,127 @@ +# SPDX-License-Identifier: GPL-2.0 +import subprocess +from ipaddress import IPv4Address, IPv6Address + + +def ip_link_get_ifindex(dev: str, prefix: str = "") -> int: + out = subprocess.check_output( + f"{prefix}ip -o link show {dev}", text=True, shell=True + ) + return int(out.split(":", 1)[0]) + + +def get_ipv4_addr(ns=1, index=1) -> IPv4Address: + return IPv4Address("10.10.0.0") + (ns << 8) + index + + +def get_ipv6_addr(ns=1, index=1) -> IPv6Address: + return IPv6Address("fd00::") + (ns << 16) + index + + +class VrfNamespaceFixture: + """Namespace fixture for VRF testing. + + Single server has two interfaces with same IP addr: one inside VRF and one + outside. + + Two clients two namespaces have same client IP, one connected to VRF and one + outside. + """ + + tcp_l3mdev_accept = 1 + + server_netns_name = "tcp_authopt_test_server" + client0_netns_name = "tcp_authopt_test_client0" + client1_netns_name = "tcp_authopt_test_client1" + client2_netns_name = "tcp_authopt_test_client2" + + # 02:* means "locally administered" + server_veth0_mac_addr = "02:00:00:01:00:00" + server_veth1_mac_addr = "02:00:00:01:00:01" + server_veth2_mac_addr = "02:00:00:01:00:02" + client0_mac_addr = "02:00:00:02:00:00" + client1_mac_addr = "02:00:00:02:01:00" + client2_mac_addr = "02:00:00:02:02:00" + + server_ipv4_addr = get_ipv4_addr(1, 1) + server_ipv6_addr = get_ipv6_addr(1, 1) + client_ipv4_addr = get_ipv4_addr(2, 1) + client_ipv6_addr = get_ipv6_addr(2, 1) + + def __init__(self, **kw): + import os + + import pytest + + from .conftest import raise_skip_no_netns + + raise_skip_no_netns() + if not os.path.exists("/proc/sys/net/ipv4/tcp_l3mdev_accept"): + pytest.skip( + "missing tcp_l3mdev_accept, is CONFIG_NET_L3_MASTER_DEV enabled?)" + ) + for k, v in kw.items(): + setattr(self, k, v) + + def get_server_ifindex(self, dev): + return ip_link_get_ifindex(dev, f"ip netns exec {self.server_netns_name} ") + + def __enter__(self): + self._del_netns() + script = f""" +set -e +ip netns add {self.server_netns_name} +ip netns add {self.client0_netns_name} +ip netns add {self.client1_netns_name} +ip netns add {self.client2_netns_name} +# Enable tcp_l3mdev unconditionally +ip netns exec {self.server_netns_name} sysctl -q net.ipv4.tcp_l3mdev_accept={int(self.tcp_l3mdev_accept)} +ip link add veth0 netns {self.server_netns_name} type veth peer name veth0 netns {self.client0_netns_name} +ip link add veth1 netns {self.server_netns_name} type veth peer name veth0 netns {self.client1_netns_name} +ip link add veth2 netns {self.server_netns_name} type veth peer name veth0 netns {self.client2_netns_name} +ip link add vrf1 netns {self.server_netns_name} type vrf table 1000 +ip link add vrf2 netns {self.server_netns_name} type vrf table 2000 +ip -n {self.server_netns_name} link set vrf1 up +ip -n {self.server_netns_name} link set vrf2 up +ip -n {self.server_netns_name} link set veth1 vrf vrf1 +ip -n {self.server_netns_name} link set veth2 vrf vrf2 +ip -n {self.server_netns_name} link set veth0 up addr {self.server_veth0_mac_addr} +ip -n {self.server_netns_name} link set veth1 up addr {self.server_veth1_mac_addr} +ip -n {self.server_netns_name} link set veth2 up addr {self.server_veth2_mac_addr} +ip -n {self.server_netns_name} addr add {self.server_ipv4_addr}/16 dev veth0 +ip -n {self.server_netns_name} addr add {self.server_ipv6_addr}/64 dev veth0 nodad +ip -n {self.server_netns_name} addr add {self.server_ipv4_addr}/16 dev veth1 +ip -n {self.server_netns_name} addr add {self.server_ipv6_addr}/64 dev veth1 nodad +ip -n {self.server_netns_name} addr add {self.server_ipv4_addr}/16 dev veth2 +ip -n {self.server_netns_name} addr add {self.server_ipv6_addr}/64 dev veth2 nodad +ip -n {self.client0_netns_name} link set veth0 up addr {self.client0_mac_addr} +ip -n {self.client0_netns_name} addr add {self.client_ipv4_addr}/16 dev veth0 +ip -n {self.client0_netns_name} addr add {self.client_ipv6_addr}/64 dev veth0 nodad +ip -n {self.client1_netns_name} link set veth0 up addr {self.client1_mac_addr} +ip -n {self.client1_netns_name} addr add {self.client_ipv4_addr}/16 dev veth0 +ip -n {self.client1_netns_name} addr add {self.client_ipv6_addr}/64 dev veth0 nodad +ip -n {self.client2_netns_name} link set veth0 up addr {self.client2_mac_addr} +ip -n {self.client2_netns_name} addr add {self.client_ipv4_addr}/16 dev veth0 +ip -n {self.client2_netns_name} addr add {self.client_ipv6_addr}/64 dev veth0 nodad +""" + subprocess.run(script, shell=True, check=True) + self.server_veth0_ifindex = self.get_server_ifindex("veth0") + self.server_veth1_ifindex = self.get_server_ifindex("veth1") + self.server_veth2_ifindex = self.get_server_ifindex("veth2") + self.server_vrf1_ifindex = self.get_server_ifindex("vrf1") + self.server_vrf2_ifindex = self.get_server_ifindex("vrf2") + return self + + def _del_netns(self): + script = f"""\ +set -e +for ns in {self.server_netns_name} {self.client0_netns_name} {self.client1_netns_name} {self.client2_netns_name}; do + if ip netns list | grep -q "$ns"; then + ip netns del "$ns" + fi +done +""" + subprocess.run(script, shell=True, check=True) + + def __exit__(self, *a): + self._del_netns()
This is in preparation for reusing the same option for TCP-AO
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- tools/testing/selftests/net/nettest.c | 50 +++++++++++++-------------- 1 file changed, 25 insertions(+), 25 deletions(-)
diff --git a/tools/testing/selftests/net/nettest.c b/tools/testing/selftests/net/nettest.c index b599003eb5ba..525a439ce3b3 100644 --- a/tools/testing/selftests/net/nettest.c +++ b/tools/testing/selftests/net/nettest.c @@ -93,17 +93,17 @@ struct sock_args { const char *clientns; const char *serverns;
const char *password; const char *client_pw; - /* prefix for MD5 password */ - const char *md5_prefix_str; + /* prefix for MD5/AO*/ + const char *key_addr_prefix_str; union { struct sockaddr_in v4; struct sockaddr_in6 v6; - } md5_prefix; - unsigned int prefix_len; + } key_addr; + unsigned int key_addr_prefix_len; /* 0: default, -1: force off, +1: force on */ int bind_key_ifindex;
/* expected addresses and device index for connection */ const char *expected_dev; @@ -263,16 +263,16 @@ static int tcp_md5sig(int sd, void *addr, socklen_t alen, struct sock_args *args int rc;
md5sig.tcpm_keylen = keylen; memcpy(md5sig.tcpm_key, args->password, keylen);
- if (args->prefix_len) { + if (args->key_addr_prefix_len) { opt = TCP_MD5SIG_EXT; md5sig.tcpm_flags |= TCP_MD5SIG_FLAG_PREFIX;
- md5sig.tcpm_prefixlen = args->prefix_len; - addr = &args->md5_prefix; + md5sig.tcpm_prefixlen = args->key_addr_prefix_len; + addr = &args->key_addr; } memcpy(&md5sig.tcpm_addr, addr, alen);
if ((args->ifindex && args->bind_key_ifindex >= 0) || args->bind_key_ifindex >= 1) { opt = TCP_MD5SIG_EXT; @@ -308,17 +308,17 @@ static int tcp_md5_remote(int sd, struct sock_args *args) int alen;
switch (args->version) { case AF_INET: sin.sin_port = htons(args->port); - sin.sin_addr = args->md5_prefix.v4.sin_addr; + sin.sin_addr = args->key_addr.v4.sin_addr; addr = &sin; alen = sizeof(sin); break; case AF_INET6: sin6.sin6_port = htons(args->port); - sin6.sin6_addr = args->md5_prefix.v6.sin6_addr; + sin6.sin6_addr = args->key_addr.v6.sin6_addr; addr = &sin6; alen = sizeof(sin6); break; default: log_error("unknown address family\n"); @@ -681,11 +681,11 @@ enum addr_type { ADDR_TYPE_LOCAL, ADDR_TYPE_REMOTE, ADDR_TYPE_MCAST, ADDR_TYPE_EXPECTED_LOCAL, ADDR_TYPE_EXPECTED_REMOTE, - ADDR_TYPE_MD5_PREFIX, + ADDR_TYPE_KEY_PREFIX, };
static int convert_addr(struct sock_args *args, const char *_str, enum addr_type atype) { @@ -721,32 +721,32 @@ static int convert_addr(struct sock_args *args, const char *_str, break; case ADDR_TYPE_EXPECTED_REMOTE: desc = "expected remote"; addr = &args->expected_raddr; break; - case ADDR_TYPE_MD5_PREFIX: - desc = "md5 prefix"; + case ADDR_TYPE_KEY_PREFIX: + desc = "key addr prefix"; if (family == AF_INET) { - args->md5_prefix.v4.sin_family = AF_INET; - addr = &args->md5_prefix.v4.sin_addr; + args->key_addr.v4.sin_family = AF_INET; + addr = &args->key_addr.v4.sin_addr; } else if (family == AF_INET6) { - args->md5_prefix.v6.sin6_family = AF_INET6; - addr = &args->md5_prefix.v6.sin6_addr; + args->key_addr.v6.sin6_family = AF_INET6; + addr = &args->key_addr.v6.sin6_addr; } else return 1;
sep = strchr(str, '/'); if (sep) { *sep = '\0'; sep++; if (str_to_uint(sep, 1, pfx_len_max, - &args->prefix_len) != 0) { - fprintf(stderr, "Invalid port\n"); + &args->key_addr_prefix_len) != 0) { + fprintf(stderr, "Invalid prefix\n"); return 1; } } else { - args->prefix_len = 0; + args->key_addr_prefix_len = 0; } break; default: log_error("unknown address type\n"); exit(1); @@ -811,13 +811,13 @@ static int validate_addresses(struct sock_args *args)
if (args->remote_addr_str && convert_addr(args, args->remote_addr_str, ADDR_TYPE_REMOTE) < 0) return 1;
- if (args->md5_prefix_str && - convert_addr(args, args->md5_prefix_str, - ADDR_TYPE_MD5_PREFIX) < 0) + if (args->key_addr_prefix_str && + convert_addr(args, args->key_addr_prefix_str, + ADDR_TYPE_KEY_PREFIX) < 0) return 1;
if (args->expected_laddr_str && convert_addr(args, args->expected_laddr_str, ADDR_TYPE_EXPECTED_LOCAL)) @@ -1992,11 +1992,11 @@ int main(int argc, char *argv[]) break; case 'X': args.client_pw = optarg; break; case 'm': - args.md5_prefix_str = optarg; + args.key_addr_prefix_str = optarg; break; case 'S': args.use_setsockopt = 1; break; case 'C': @@ -2048,17 +2048,17 @@ int main(int argc, char *argv[]) return 1; } }
if (args.password && - ((!args.has_remote_ip && !args.md5_prefix_str) || + ((!args.has_remote_ip && !args.key_addr_prefix_str) || args.type != SOCK_STREAM)) { log_error("MD5 passwords apply to TCP only and require a remote ip for the password\n"); return 1; }
- if (args.md5_prefix_str && !args.password) { + if (args.key_addr_prefix_str && !args.password) { log_error("Prefix range for MD5 protection specified without a password\n"); return 1; }
if (iter == 0) {
On 11/1/21 10:34 AM, Leonard Crestez wrote:
This is in preparation for reusing the same option for TCP-AO
Signed-off-by: Leonard Crestez cdleonard@gmail.com
tools/testing/selftests/net/nettest.c | 50 +++++++++++++-------------- 1 file changed, 25 insertions(+), 25 deletions(-)
Reviewed-by: David Ahern dsahern@kernel.org
Add support for configuring TCP Authentication Option. Only a single key is supported with default options.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- tools/testing/selftests/net/nettest.c | 75 ++++++++++++++++++++++++--- 1 file changed, 69 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/net/nettest.c b/tools/testing/selftests/net/nettest.c index 525a439ce3b3..837a45921845 100644 --- a/tools/testing/selftests/net/nettest.c +++ b/tools/testing/selftests/net/nettest.c @@ -103,10 +103,12 @@ struct sock_args { } key_addr; unsigned int key_addr_prefix_len; /* 0: default, -1: force off, +1: force on */ int bind_key_ifindex;
+ const char *authopt_password; + /* expected addresses and device index for connection */ const char *expected_dev; const char *expected_server_dev; int expected_ifindex;
@@ -253,10 +255,54 @@ static int switch_ns(const char *ns) close(fd);
return ret; }
+static int tcp_set_authopt(int sd, struct sock_args *args) +{ + struct tcp_authopt_key key; + int rc; + + memset(&key, 0, sizeof(key)); + strcpy((char *)key.key, args->authopt_password); + key.keylen = strlen(args->authopt_password); + key.alg = TCP_AUTHOPT_ALG_HMAC_SHA_1_96; + + if (args->key_addr_prefix_str) { + key.flags |= TCP_AUTHOPT_KEY_ADDR_BIND; + switch (args->version) { + case AF_INET: + memcpy(&key.addr, &args->key_addr.v4, sizeof(args->key_addr.v4)); + break; + case AF_INET6: + memcpy(&key.addr, &args->key_addr.v6, sizeof(args->key_addr.v6)); + break; + default: + log_error("unknown address family\n"); + exit(1); + } + if (args->key_addr_prefix_len) { + log_error("TCP_AUTHOPT does not support prefix length\n"); + exit(1); + } + } + + if ((args->ifindex && args->bind_key_ifindex >= 0) || args->bind_key_ifindex >= 1) { + key.flags |= TCP_AUTHOPT_KEY_IFINDEX; + key.ifindex = args->ifindex; + log_msg("TCP_AUTHOPT_KEY_IFINDEX set ifindex=%d\n", key.ifindex); + } else { + log_msg("TCP_AUTHOPT_KEY_IFINDEX off\n", key.ifindex); + } + + rc = setsockopt(sd, IPPROTO_TCP, TCP_AUTHOPT_KEY, &key, sizeof(key)); + if (rc < 0) + log_err_errno("setsockopt(TCP_AUTHOPT_KEY)"); + + return rc; +} + static int tcp_md5sig(int sd, void *addr, socklen_t alen, struct sock_args *args) { int keylen = strlen(args->password); struct tcp_md5sig md5sig = {}; int opt = TCP_MD5SIG; @@ -1514,10 +1560,15 @@ static int do_server(struct sock_args *args, int ipc_fd) if (args->password && tcp_md5_remote(lsd, args)) { close(lsd); goto err_exit; }
+ if (args->authopt_password && tcp_set_authopt(lsd, args)) { + close(lsd); + goto err_exit; + } + ipc_write(ipc_fd, 1); while (1) { log_msg("waiting for client connection.\n"); FD_ZERO(&rfds); FD_SET(lsd, &rfds); @@ -1636,10 +1687,13 @@ static int connectsock(void *addr, socklen_t alen, struct sock_args *args) goto out;
if (args->password && tcp_md5sig(sd, addr, alen, args)) goto err;
+ if (args->authopt_password && tcp_set_authopt(sd, args)) + goto err; + if (args->bind_test_only) goto out;
if (connect(sd, addr, alen) < 0) { if (errno != EINPROGRESS) { @@ -1825,11 +1879,11 @@ static int ipc_parent(int cpid, int fd, struct sock_args *args)
wait(&status); return client_status; }
-#define GETOPT_STR "sr:l:c:p:t:g:P:DRn:M:X:m:d:I:BN:O:SCi6xL:0:1:2:3:Fbq" +#define GETOPT_STR "sr:l:c:p:t:g:P:DRn:M:X:m:A:d:I:BN:O:SCi6xL:0:1:2:3:Fbq" #define OPT_FORCE_BIND_KEY_IFINDEX 1001 #define OPT_NO_BIND_KEY_IFINDEX 1002
static struct option long_opts[] = { {"force-bind-key-ifindex", 0, 0, OPT_FORCE_BIND_KEY_IFINDEX}, @@ -1869,14 +1923,15 @@ static void print_usage(char *prog) " -L len send random message of given length\n" " -n num number of times to send message\n" "\n" " -M password use MD5 sum protection\n" " -X password MD5 password for client mode\n" - " -m prefix/len prefix and length to use for MD5 key\n" - " --no-bind-key-ifindex: Force TCP_MD5SIG_FLAG_IFINDEX off\n" - " --force-bind-key-ifindex: Force TCP_MD5SIG_FLAG_IFINDEX on\n" + " -m prefix/len prefix and length to use for MD5/AO key\n" + " --no-bind-key-ifindex: Force disable binding key to ifindex\n" + " --force-bind-key-ifindex: Force enable binding key to ifindex\n" " (default: only if -I is passed)\n" + " -A password use RFC5925 TCP Authentication Option with password\n" "\n" " -g grp multicast group (e.g., 239.1.1.1)\n" " -i interactive mode (default is echo and terminate)\n" "\n" " -0 addr Expected local address\n" @@ -1994,10 +2049,13 @@ int main(int argc, char *argv[]) args.client_pw = optarg; break; case 'm': args.key_addr_prefix_str = optarg; break; + case 'A': + args.authopt_password = optarg; + break; case 'S': args.use_setsockopt = 1; break; case 'C': args.use_cmsg = 1; @@ -2054,12 +2112,17 @@ int main(int argc, char *argv[]) args.type != SOCK_STREAM)) { log_error("MD5 passwords apply to TCP only and require a remote ip for the password\n"); return 1; }
- if (args.key_addr_prefix_str && !args.password) { - log_error("Prefix range for MD5 protection specified without a password\n"); + if (args.key_addr_prefix_str && !args.password && !args.authopt_password) { + log_error("Prefix range for authentication requires -M or -A\n"); + return 1; + } + + if (args.key_addr_prefix_len && args.authopt_password) { + log_error("TCP-AO does not support prefix match, only full address\n"); return 1; }
if (iter == 0) { fprintf(stderr, "Invalid number of messages to send\n");
On 11/1/21 10:34 AM, Leonard Crestez wrote:
Add support for configuring TCP Authentication Option. Only a single key is supported with default options.
Signed-off-by: Leonard Crestez cdleonard@gmail.com
tools/testing/selftests/net/nettest.c | 75 ++++++++++++++++++++++++--- 1 file changed, 69 insertions(+), 6 deletions(-)
Reviewed-by: David Ahern dsahern@kernel.org
Tests are mostly copied from tcp_md5 with minor changes.
It covers VRF support but only based on binding multiple servers: not multiple keys bound to different interfaces.
Also add a specific -t tcp_authopt to run only these tests specifically.
Signed-off-by: Leonard Crestez cdleonard@gmail.com --- tools/testing/selftests/net/fcnal-test.sh | 249 ++++++++++++++++++++++ 1 file changed, 249 insertions(+)
diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh index 3313566ce906..d7afd9f40848 100755 --- a/tools/testing/selftests/net/fcnal-test.sh +++ b/tools/testing/selftests/net/fcnal-test.sh @@ -800,10 +800,252 @@ ipv4_ping() }
################################################################################ # IPv4 TCP
+# +# TCP Authentication Option Tests +# + +# try to enable tcp_authopt sysctl +enable_tcp_authopt() +{ + if [[ -e /proc/sys/net/ipv4/tcp_authopt ]]; then + sysctl -w net.ipv4.tcp_authopt=1 + fi +} + +# check if tcp_authopt is compiled with a client-side bind test +has_tcp_authopt() +{ + run_cmd_nsb nettest -b -A ${MD5_PW} -r ${NSA_IP} +} + +ipv4_tcp_authopt_novrf() +{ + enable_tcp_authopt + if ! has_tcp_authopt; then + echo "TCP-AO appears to be missing, skip" + return 0 + fi + + log_start + run_cmd nettest -s -A ${MD5_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: Single address config" + + log_start + run_cmd nettest -s & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 2 "AO: Server no config, client uses password" + + log_start + run_cmd nettest -s -A ${MD5_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: Client uses wrong password" + + log_start + run_cmd nettest -s -A ${MD5_PW} -m ${NSB_LO_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 2 "AO: Client address does not match address configured on server" + + # no prefixlen for AO yet +} + +ipv6_tcp_authopt_novrf() +{ + enable_tcp_authopt + if ! has_tcp_authopt; then + echo "TCP-AO appears to be missing, skip" + return 0 + fi + + log_start + run_cmd nettest -6 -s -A ${MD5_PW} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 0 "AO: Simple correct config" + + log_start + run_cmd nettest -6 -s + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 2 "AO: Server no config, client uses password" + + log_start + run_cmd nettest -6 -s -A ${MD5_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: Client uses wrong password" + + log_start + run_cmd nettest -6 -s -A ${MD5_PW} -m ${NSB_LO_IP6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 2 "AO: Client address does not match address configured on server" + + # no prefixlen for AO yet +} + +ipv4_tcp_authopt_vrf() +{ + enable_tcp_authopt + if ! has_tcp_authopt; then + echo "TCP-AO appears to be missing, skip" + return 0 + fi + + log_start + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Simple config" + + # + # duplicate config between default VRF and a VRF + # + + log_start + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP} & + run_cmd nettest -s -A ${MD5_WRONG_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Servers in default-VRF and VRF, client in VRF" + + log_start + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP} & + run_cmd nettest -s -A ${MD5_WRONG_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsc nettest -r ${NSA_IP} -A ${MD5_WRONG_PW} + log_test $? 0 "AO: VRF: Servers in default-VRF and VRF, client in default-VRF" + + log_start + show_hint "Should timeout since client in default VRF uses VRF password" + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP} & + run_cmd nettest -s -A ${MD5_WRONG_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsc nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 2 "AO: VRF: Servers in default VRF and VRF, conn in default-VRF with VRF pw" + + log_start + show_hint "Should timeout since client in VRF uses default VRF password" + run_cmd nettest -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP} & + run_cmd nettest -s -A ${MD5_WRONG_PW} -m ${NSB_IP} & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: VRF: Servers in default VRF and VRF, conn in VRF with default-VRF pw" + + test_ipv4_tcp_authopt_vrf__global_server__bind_ifindex0 +} + +test_ipv4_tcp_authopt_vrf__global_server__bind_ifindex0() +{ + # This particular test needs tcp_l3mdev_accept=1 for Global server to accept VRF connections + local old_tcp_l3mdev_accept + old_tcp_l3mdev_accept=$(get_sysctl net.ipv4.tcp_l3mdev_accept) + set_sysctl net.ipv4.tcp_l3mdev_accept=1 + + log_start + run_cmd nettest -s -A ${MD5_PW} --force-bind-key-ifindex & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 2 "AO: VRF: Global server, Key bound to ifindex=0 rejects VRF connection" + + log_start + run_cmd nettest -s -A ${MD5_PW} --force-bind-key-ifindex & + sleep 1 + run_cmd_nsc nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Global server, key bound to ifindex=0 accepts non-VRF connection" + log_start + + run_cmd nettest -s -A ${MD5_PW} --no-bind-key-ifindex & + sleep 1 + run_cmd_nsb nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Global server, key not bound to ifindex accepts VRF connection" + + log_start + run_cmd nettest -s -A ${MD5_PW} --no-bind-key-ifindex & + sleep 1 + run_cmd_nsc nettest -r ${NSA_IP} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Global server, key not bound to ifindex accepts non-VRF connection" + + # restore value + set_sysctl net.ipv4.tcp_l3mdev_accept="$old_tcp_l3mdev_accept" +} + +ipv6_tcp_authopt_vrf() +{ + enable_tcp_authopt + if ! has_tcp_authopt; then + echo "TCP-AO appears to be missing, skip" + return 0 + fi + + log_start + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Simple config" + + # + # duplicate config between default VRF and a VRF + # + + log_start + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 0 "AO: VRF: Servers in default-VRF and VRF, client in VRF" + + log_start + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsc nettest -6 -r ${NSA_IP6} -A ${MD5_WRONG_PW} + log_test $? 0 "AO: VRF: Servers in default-VRF and VRF, client in default-VRF" + + log_start + show_hint "Should timeout since client in default VRF uses VRF password" + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsc nettest -6 -r ${NSA_IP6} -A ${MD5_PW} + log_test $? 2 "AO: VRF: Servers in default VRF and VRF, conn in default-VRF with VRF pw" + + log_start + show_hint "Should timeout since client in VRF uses default VRF password" + run_cmd nettest -6 -s -I ${VRF} -A ${MD5_PW} -m ${NSB_IP6} & + run_cmd nettest -6 -s -A ${MD5_WRONG_PW} -m ${NSB_IP6} & + sleep 1 + run_cmd_nsb nettest -6 -r ${NSA_IP6} -A ${MD5_WRONG_PW} + log_test $? 2 "AO: VRF: Servers in default VRF and VRF, conn in VRF with default-VRF pw" +} + +only_tcp_authopt() +{ + log_section "TCP Authentication Option" + + setup + set_sysctl net.ipv4.tcp_l3mdev_accept=0 + log_subsection "TCP-AO IPv4 no VRF" + ipv4_tcp_authopt_novrf + log_subsection "TCP-AO IPv6 no VRF" + ipv6_tcp_authopt_novrf + + setup "yes" + set_sysctl net.ipv4.tcp_l3mdev_accept=0 + log_subsection "TCP-AO IPv4 VRF" + ipv4_tcp_authopt_vrf + log_subsection "TCP-AO IPv6 VRF" + ipv6_tcp_authopt_vrf +} + # # MD5 tests without VRF # ipv4_tcp_md5_novrf() { @@ -1185,10 +1427,11 @@ ipv4_tcp_novrf() show_hint "Should fail 'Connection refused'" run_cmd nettest -d ${NSA_DEV} -r ${a} log_test_addr ${a} $? 1 "No server, device client, local conn"
ipv4_tcp_md5_novrf + ipv4_tcp_authopt_novrf }
ipv4_tcp_vrf() { local a @@ -1239,10 +1482,12 @@ ipv4_tcp_vrf() run_cmd nettest -r ${a} -d ${NSA_DEV} log_test_addr ${a} $? 1 "Global server, local connection"
# run MD5 tests ipv4_tcp_md5 + # run AO tests + ipv6_tcp_md5_vrf
# # enable VRF global server # log_subsection "VRF Global server enabled" @@ -2648,10 +2893,11 @@ ipv6_tcp_novrf() run_cmd nettest -6 -d ${NSA_DEV} -r ${a} log_test_addr ${a} $? 1 "No server, device client, local conn" done
ipv6_tcp_md5_novrf + ipv6_tcp_authopt_novrf }
ipv6_tcp_vrf() { local a @@ -2718,10 +2964,12 @@ ipv6_tcp_vrf() run_cmd nettest -6 -r ${a} -d ${NSA_DEV} log_test_addr ${a} $? 1 "Global server, local connection"
# run MD5 tests ipv6_tcp_md5 + # run AO tests + ipv6_tcp_authopt_vrf
# # enable VRF global server # log_subsection "VRF Global server enabled" @@ -4062,10 +4310,11 @@ do ipv6_bind|bind6) ipv6_addr_bind;; ipv6_runtime) ipv6_runtime;; ipv6_netfilter) ipv6_netfilter;;
use_cases) use_cases;; + tcp_authopt) only_tcp_authopt;;
# setup namespaces and config, but do not run any tests setup) setup; exit 0;; vrf_setup) setup "yes"; exit 0;;
On 11/1/21 10:34 AM, Leonard Crestez wrote:
This is similar to TCP MD5 in functionality but it's sufficiently different that wire formats are incompatible. Compared to TCP-MD5 more algorithms are supported and multiple keys can be used on the same connection but there is still no negotiation mechanism.
Expected use-case is protecting long-duration BGP/LDP connections between routers using pre-shared keys. The goal of this series is to allow routers using the linux TCP stack to interoperate with vendors such as Cisco and Juniper.
Both algorithms described in RFC5926 are implemented but the code is not very easily extensible beyond that. In particular there are several code paths making stack allocations based on RFC5926 maximum, those would have to be increased.
This version implements SNE and l3mdev awareness and adds more tests. Here are some known flaws and limitations:
- Interaction with TCP-MD5 not tested in all corners
- Interaction with FASTOPEN not tested and unlikely to work because
sequence number assumptions for syn/ack.
- Not clear if crypto_shash_setkey might sleep. If some implementation
do that then maybe they could be excluded through alloc flags.
- Traffic key is not cached (reducing performance)
- User is responsible for ensuring keys do not overlap.
- There is no useful way to list keys, making userspace debug difficult.
- There is no prefixlen support equivalent to md5. This is used in
some complex FRR configs.
Test suite was added to tools/selftests/tcp_authopt. Tests are written in python using pytest and scapy and check the API in some detail and validate packet captures. Python code is already used in linux and in kselftests but virtualenvs not very much, this particular test suite uses `pip` to create a private virtualenv and hide dependencies.
This actually forms the bulk of the series by raw line-count. Since there is a lot of code it was mostly split on "functional area" so most files are only affected by a single code. A lot of those tests are relevant to TCP-MD5 so perhaps it might help to split into a separate series?
Some testing support is included in nettest and fcnal-test.sh, similar to the current level of tcp-md5 testing.
SNE was tested by creating connections in a loop until a large SEQ is randomly selected and then making it rollover. The "connect in a loop" step ran into timewait overflow and connection failure on port reuse. After spending some time on this issue and my conclusion is that AO makes it impossible to kill remainders of old connections in a manner similar to unsigned or md5sig, this is because signatures are dependent on ISNs. This means that if a timewait socket is closed improperly then information required to RST the peer is lost.
The fact that AO completely breaks all connection-less RSTs is acknowledged in the RFC and the workaround of "respect timewait" seems acceptable.
Changes for frr (old): https://github.com/FRRouting/frr/pull/9442 That PR was made early for ABI feedback, it has many issues.
overall looks ok to me. I did not wade through the protocol details.
I did see the comment about no prefixlen support in the tests. A lot of patches to absorb, perhaps I missed it. Does AuthOpt support for prefixes? If not, you should consider adding that as a quick follow on (within the same dev cycle). MD5 added prefix support for scalability; seems like AO should be concerned about the same.
On 11/3/21 5:18 AM, David Ahern wrote:
On 11/1/21 10:34 AM, Leonard Crestez wrote:
This is similar to TCP MD5 in functionality but it's sufficiently different that wire formats are incompatible. Compared to TCP-MD5 more algorithms are supported and multiple keys can be used on the same connection but there is still no negotiation mechanism.
Expected use-case is protecting long-duration BGP/LDP connections between routers using pre-shared keys. The goal of this series is to allow routers using the linux TCP stack to interoperate with vendors such as Cisco and Juniper.
Both algorithms described in RFC5926 are implemented but the code is not very easily extensible beyond that. In particular there are several code paths making stack allocations based on RFC5926 maximum, those would have to be increased.
This version implements SNE and l3mdev awareness and adds more tests. Here are some known flaws and limitations:
- Interaction with TCP-MD5 not tested in all corners
- Interaction with FASTOPEN not tested and unlikely to work because
sequence number assumptions for syn/ack.
- Not clear if crypto_shash_setkey might sleep. If some implementation
do that then maybe they could be excluded through alloc flags.
- Traffic key is not cached (reducing performance)
- User is responsible for ensuring keys do not overlap.
- There is no useful way to list keys, making userspace debug difficult.
- There is no prefixlen support equivalent to md5. This is used in
some complex FRR configs.
Test suite was added to tools/selftests/tcp_authopt. Tests are written in python using pytest and scapy and check the API in some detail and validate packet captures. Python code is already used in linux and in kselftests but virtualenvs not very much, this particular test suite uses `pip` to create a private virtualenv and hide dependencies.
This actually forms the bulk of the series by raw line-count. Since there is a lot of code it was mostly split on "functional area" so most files are only affected by a single code. A lot of those tests are relevant to TCP-MD5 so perhaps it might help to split into a separate series?
Some testing support is included in nettest and fcnal-test.sh, similar to the current level of tcp-md5 testing.
SNE was tested by creating connections in a loop until a large SEQ is randomly selected and then making it rollover. The "connect in a loop" step ran into timewait overflow and connection failure on port reuse. After spending some time on this issue and my conclusion is that AO makes it impossible to kill remainders of old connections in a manner similar to unsigned or md5sig, this is because signatures are dependent on ISNs. This means that if a timewait socket is closed improperly then information required to RST the peer is lost.
The fact that AO completely breaks all connection-less RSTs is acknowledged in the RFC and the workaround of "respect timewait" seems acceptable.
Changes for frr (old): https://github.com/FRRouting/frr/pull/9442 That PR was made early for ABI feedback, it has many issues.
overall looks ok to me. I did not wade through the protocol details.
I did see the comment about no prefixlen support in the tests. A lot of patches to absorb, perhaps I missed it. Does AuthOpt support for prefixes? If not, you should consider adding that as a quick follow on (within the same dev cycle). MD5 added prefix support for scalability; seems like AO should be concerned about the same.
I just skipped it because it's not required for core functionality.
It's very straight forward so I will add it to the next version.
-- Regards, Leonard
linux-kselftest-mirror@lists.linaro.org