This adds support for receiving KeyUpdate messages (RFC 8446, 4.6.3 [1]). A sender transmits a KeyUpdate message and then changes its TX key. The receiver should react by updating its RX key before processing the next message.
This patchset implements key updates by: 1. pausing decryption when a KeyUpdate message is received, to avoid attempting to use the old key to decrypt a record encrypted with the new key 2. returning -EKEYEXPIRED to syscalls that cannot receive the KeyUpdate message, until the rekey has been performed by userspace 3. passing the KeyUpdate message to userspace as a control message 4. allowing updates of the crypto_info via the TLS_TX/TLS_RX setsockopts
This API has been tested with gnutls to make sure that it allows userspace libraries to implement key updates [2]. Thanks to Frantisek Krenzelok fkrenzel@redhat.com for providing the implementation in gnutls and testing the kernel patches.
Note: in a future series, I'll clean up tls_set_sw_offload and eliminate the per-cipher copy-paste using tls_cipher_size_desc.
[1] https://www.rfc-editor.org/rfc/rfc8446#section-4.6.3 [2] https://gitlab.com/gnutls/gnutls/-/merge_requests/1625
Changes in v2 use reverse xmas tree ordering in tls_set_sw_offload and do_tls_setsockopt_conf turn the alt_crypto_info into an else if selftests: add rekey_fail test
Vadim suggested simplifying tls_set_sw_offload by copying the new crypto_info in the context in do_tls_setsockopt_conf, and then detecting the rekey in tls_set_sw_offload based on whether the iv was already set, but I don't think we can have a common error path (otherwise we'd free the aead etc on rekey failure). I decided instead to reorganize tls_set_sw_offload so that the context is unmodified until we know the rekey cannot fail. Some fields will be touched during the rekey, but they will be set to the same value they had before the rekey (prot->rec_seq_size, etc).
Apoorv suggested to name the struct tls_crypto_info_keys "tls13" rather than "tls12". Since we're using the same crypto_info data for TLS1.3 as for 1.2, even if the tests only run for TLS1.3, I'd rather keep the "tls12" name, in case we end up adding a "tls13_crypto_info_aes_gcm_128" type in the future.
Kuniyuki and Apoorv also suggested preventing rekeys on RX when we haven't received a matching KeyUpdate message, but I'd rather let userspace handle this and have a symmetric API between TX and RX on the kernel side. It's a bit of a foot-gun, but we can't really stop a broken userspace from rolling back the rec_seq on an existing crypto_info either, and that seems like a worse possible breakage.
Sabrina Dubroca (5): tls: remove tls_context argument from tls_set_sw_offload tls: block decryption when a rekey is pending tls: implement rekey for TLS1.3 selftests: tls: add key_generation argument to tls_crypto_info_init selftests: tls: add rekey tests
include/net/tls.h | 4 + net/tls/tls.h | 3 +- net/tls/tls_device.c | 2 +- net/tls/tls_main.c | 37 +++- net/tls/tls_sw.c | 189 +++++++++++++---- tools/testing/selftests/net/tls.c | 336 +++++++++++++++++++++++++++++- 6 files changed, 511 insertions(+), 60 deletions(-)
It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf.
v2: reverse xmas tree
Signed-off-by: Sabrina Dubroca sd@queasysnail.net --- net/tls/tls.h | 2 +- net/tls/tls_device.c | 2 +- net/tls/tls_main.c | 4 ++-- net/tls/tls_sw.c | 17 +++++++---------- 4 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/net/tls/tls.h b/net/tls/tls.h index 0e840a0c3437..34d0fe814600 100644 --- a/net/tls/tls.h +++ b/net/tls/tls.h @@ -90,7 +90,7 @@ int tls_sk_attach(struct sock *sk, int optname, char __user *optval, unsigned int optlen); void tls_err_abort(struct sock *sk, int err);
-int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx); +int tls_set_sw_offload(struct sock *sk, int tx); void tls_update_rx_zc_capable(struct tls_context *tls_ctx); void tls_sw_strparser_arm(struct sock *sk, struct tls_context *ctx); void tls_sw_strparser_done(struct tls_context *tls_ctx); diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index 6c593788dc25..c149f36b42ee 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -1291,7 +1291,7 @@ int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx) context->resync_nh_reset = 1;
ctx->priv_ctx_rx = context; - rc = tls_set_sw_offload(sk, ctx, 0); + rc = tls_set_sw_offload(sk, 0); if (rc) goto release_ctx;
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 3735cb00905d..fb1da1780f50 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -772,7 +772,7 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval, TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXDEVICE); TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSCURRTXDEVICE); } else { - rc = tls_set_sw_offload(sk, ctx, 1); + rc = tls_set_sw_offload(sk, 1); if (rc) goto err_crypto_info; TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXSW); @@ -786,7 +786,7 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval, TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXDEVICE); TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSCURRRXDEVICE); } else { - rc = tls_set_sw_offload(sk, ctx, 0); + rc = tls_set_sw_offload(sk, 0); if (rc) goto err_crypto_info; TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXSW); diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 6d0a534b7baa..238bd18c5eb6 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2465,25 +2465,22 @@ void tls_update_rx_zc_capable(struct tls_context *tls_ctx) tls_ctx->prot_info.version != TLS_1_3_VERSION; }
-int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx) +int tls_set_sw_offload(struct sock *sk, int tx) { - struct tls_context *tls_ctx = tls_get_ctx(sk); - struct tls_prot_info *prot = &tls_ctx->prot_info; - struct tls_crypto_info *crypto_info; + u16 nonce_size, tag_size, iv_size, rec_seq_size, salt_size; + char *iv, *rec_seq, *key, *salt, *cipher_name; struct tls_sw_context_tx *sw_ctx_tx = NULL; struct tls_sw_context_rx *sw_ctx_rx = NULL; + struct tls_context *ctx = tls_get_ctx(sk); + struct tls_crypto_info *crypto_info; struct cipher_context *cctx; + struct tls_prot_info *prot; struct crypto_aead **aead; - u16 nonce_size, tag_size, iv_size, rec_seq_size, salt_size; struct crypto_tfm *tfm; - char *iv, *rec_seq, *key, *salt, *cipher_name; size_t keysize; int rc = 0;
- if (!ctx) { - rc = -EINVAL; - goto out; - } + prot = &ctx->prot_info;
if (tx) { if (!ctx->priv_ctx_tx) {
When a TLS handshake record carrying a KeyUpdate message is received, all subsequent records will be encrypted with a new key. We need to stop decrypting incoming records with the old key, and wait until userspace provides a new key.
Make a note of this in the RX context just after decrypting that record, and stop recvmsg/splice calls with EKEYEXPIRED until the new key is available.
Signed-off-by: Sabrina Dubroca sd@queasysnail.net --- include/net/tls.h | 4 ++++ net/tls/tls_sw.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+)
diff --git a/include/net/tls.h b/include/net/tls.h index 154949c7b0c8..297732f23804 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -69,8 +69,11 @@ extern const struct tls_cipher_size_desc tls_cipher_size_desc[];
#define TLS_CRYPTO_INFO_READY(info) ((info)->cipher_type)
+#define TLS_RECORD_TYPE_HANDSHAKE 0x16 #define TLS_RECORD_TYPE_DATA 0x17
+#define TLS_HANDSHAKE_KEYUPDATE 24 /* rfc8446 B.3: Key update */ + #define TLS_AAD_SPACE_SIZE 13
#define MAX_IV_SIZE 16 @@ -145,6 +148,7 @@ struct tls_sw_context_rx {
struct tls_strparser strp;
+ bool key_update_pending; atomic_t decrypt_pending; /* protect crypto_wait with decrypt_pending*/ spinlock_t decrypt_compl_lock; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 238bd18c5eb6..149a39d9a56a 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1687,6 +1687,33 @@ tls_decrypt_device(struct sock *sk, struct msghdr *msg, return 1; }
+static int tls_check_pending_rekey(struct sock *sk, struct sk_buff *skb) +{ + const struct tls_msg *tlm = tls_msg(skb); + const struct strp_msg *rxm = strp_msg(skb); + + if (tlm->control == TLS_RECORD_TYPE_HANDSHAKE) { + char hs_type; + int err; + + if (rxm->full_len < 1) + return -EINVAL; + + err = skb_copy_bits(skb, rxm->offset, &hs_type, 1); + if (err < 0) + return err; + + if (hs_type == TLS_HANDSHAKE_KEYUPDATE) { + struct tls_context *ctx = tls_get_ctx(sk); + struct tls_sw_context_rx *rx_ctx = ctx->priv_ctx_rx; + + rx_ctx->key_update_pending = true; + } + } + + return 0; +} + static int tls_rx_one_record(struct sock *sk, struct msghdr *msg, struct tls_decrypt_arg *darg) { @@ -1706,6 +1733,10 @@ static int tls_rx_one_record(struct sock *sk, struct msghdr *msg, rxm->full_len -= prot->overhead_size; tls_advance_record_sn(sk, prot, &tls_ctx->rx);
+ err = tls_check_pending_rekey(sk, darg->skb); + if (err < 0) + return err; + return 0; }
@@ -1957,6 +1988,12 @@ int tls_sw_recvmsg(struct sock *sk, struct tls_decrypt_arg darg; int to_decrypt, chunk;
+ /* a rekey is pending, let userspace deal with it */ + if (unlikely(ctx->key_update_pending)) { + err = -EKEYEXPIRED; + break; + } + err = tls_rx_rec_wait(sk, psock, flags & MSG_DONTWAIT, released); if (err <= 0) { @@ -2141,6 +2178,12 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, if (err < 0) return err;
+ /* a rekey is pending, let userspace deal with it */ + if (unlikely(ctx->key_update_pending)) { + err = -EKEYEXPIRED; + goto splice_read_end; + } + if (!skb_queue_empty(&ctx->rx_list)) { skb = __skb_dequeue(&ctx->rx_list); } else { @@ -2526,6 +2569,7 @@ int tls_set_sw_offload(struct sock *sk, int tx) skb_queue_head_init(&sw_ctx_rx->rx_list); skb_queue_head_init(&sw_ctx_rx->async_hold); aead = &sw_ctx_rx->aead_recv; + sw_ctx_rx->key_update_pending = false; }
switch (crypto_info->cipher_type) {
On Tue, 14 Feb 2023 12:17:39 +0100 Sabrina Dubroca wrote:
@@ -2141,6 +2178,12 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, if (err < 0) return err;
- /* a rekey is pending, let userspace deal with it */
- if (unlikely(ctx->key_update_pending)) {
err = -EKEYEXPIRED;
goto splice_read_end;
- }
This will prevent splicing peek()'ed data. Just put the check in tls_rx_rec_wait().
2023-02-14, 21:09:25 -0800, Jakub Kicinski wrote:
On Tue, 14 Feb 2023 12:17:39 +0100 Sabrina Dubroca wrote:
@@ -2141,6 +2178,12 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos, if (err < 0) return err;
- /* a rekey is pending, let userspace deal with it */
- if (unlikely(ctx->key_update_pending)) {
err = -EKEYEXPIRED;
goto splice_read_end;
- }
This will prevent splicing peek()'ed data. Just put the check in tls_rx_rec_wait().
Ok, I'll do that and add a selftest for this sequence of syscalls.
This adds the possibility to change the key and IV when using TLS1.3. Changing the cipher or TLS version is not supported.
Once we have updated the RX key, we can unblock the receive side. If the rekey fails, the context is unmodified and userspace is free to retry the update or close the socket.
This change only affects tls_sw, since 1.3 offload isn't supported.
v2: reverse xmas tree turn the alt_crypto_info into an else if don't modify the context when rekey fails
Signed-off-by: Sabrina Dubroca sd@queasysnail.net --- net/tls/tls.h | 3 +- net/tls/tls_device.c | 2 +- net/tls/tls_main.c | 37 +++++++++--- net/tls/tls_sw.c | 134 +++++++++++++++++++++++++++++++------------ 4 files changed, 129 insertions(+), 47 deletions(-)
diff --git a/net/tls/tls.h b/net/tls/tls.h index 34d0fe814600..6f9c85eaa9c5 100644 --- a/net/tls/tls.h +++ b/net/tls/tls.h @@ -90,7 +90,8 @@ int tls_sk_attach(struct sock *sk, int optname, char __user *optval, unsigned int optlen); void tls_err_abort(struct sock *sk, int err);
-int tls_set_sw_offload(struct sock *sk, int tx); +int tls_set_sw_offload(struct sock *sk, int tx, + struct tls_crypto_info *new_crypto_info); void tls_update_rx_zc_capable(struct tls_context *tls_ctx); void tls_sw_strparser_arm(struct sock *sk, struct tls_context *ctx); void tls_sw_strparser_done(struct tls_context *tls_ctx); diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index c149f36b42ee..1ad50c253dfe 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -1291,7 +1291,7 @@ int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx) context->resync_nh_reset = 1;
ctx->priv_ctx_rx = context; - rc = tls_set_sw_offload(sk, 0); + rc = tls_set_sw_offload(sk, 0, NULL); if (rc) goto release_ctx;
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index fb1da1780f50..24a4bdb54a53 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -669,9 +669,11 @@ static int tls_getsockopt(struct sock *sk, int level, int optname, static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval, unsigned int optlen, int tx) { - struct tls_crypto_info *crypto_info; - struct tls_crypto_info *alt_crypto_info; + struct tls_crypto_info *crypto_info, *alt_crypto_info; + struct tls_crypto_info *old_crypto_info = NULL; struct tls_context *ctx = tls_get_ctx(sk); + union tls_crypto_context tmp = {}; + bool update = false; size_t optsize; int rc = 0; int conf; @@ -687,9 +689,17 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval, alt_crypto_info = &ctx->crypto_send.info; }
- /* Currently we don't support set crypto info more than one time */ - if (TLS_CRYPTO_INFO_READY(crypto_info)) - return -EBUSY; + if (TLS_CRYPTO_INFO_READY(crypto_info)) { + /* Currently we only support setting crypto info more + * than one time for TLS 1.3 + */ + if (crypto_info->version != TLS_1_3_VERSION) + return -EBUSY; + + update = true; + old_crypto_info = crypto_info; + crypto_info = &tmp.info; + }
rc = copy_from_sockptr(crypto_info, optval, sizeof(*crypto_info)); if (rc) { @@ -704,8 +714,15 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval, goto err_crypto_info; }
- /* Ensure that TLS version and ciphers are same in both directions */ - if (TLS_CRYPTO_INFO_READY(alt_crypto_info)) { + if (update) { + /* Ensure that TLS version and ciphers are not modified */ + if (crypto_info->version != old_crypto_info->version || + crypto_info->cipher_type != old_crypto_info->cipher_type) { + rc = -EINVAL; + goto err_crypto_info; + } + } else if (TLS_CRYPTO_INFO_READY(alt_crypto_info)) { + /* Ensure that TLS version and ciphers are same in both directions */ if (alt_crypto_info->version != crypto_info->version || alt_crypto_info->cipher_type != crypto_info->cipher_type) { rc = -EINVAL; @@ -772,7 +789,8 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval, TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXDEVICE); TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSCURRTXDEVICE); } else { - rc = tls_set_sw_offload(sk, 1); + rc = tls_set_sw_offload(sk, 1, + update ? crypto_info : NULL); if (rc) goto err_crypto_info; TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXSW); @@ -786,7 +804,8 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval, TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXDEVICE); TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSCURRRXDEVICE); } else { - rc = tls_set_sw_offload(sk, 0); + rc = tls_set_sw_offload(sk, 0, + update ? crypto_info : NULL); if (rc) goto err_crypto_info; TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXSW); diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 149a39d9a56a..a35b3bfe5b47 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -2508,23 +2508,50 @@ void tls_update_rx_zc_capable(struct tls_context *tls_ctx) tls_ctx->prot_info.version != TLS_1_3_VERSION; }
-int tls_set_sw_offload(struct sock *sk, int tx) +static void tls_finish_key_update(struct tls_context *tls_ctx) +{ + struct tls_sw_context_rx *ctx = tls_ctx->priv_ctx_rx; + + ctx->key_update_pending = false; +} + +int tls_set_sw_offload(struct sock *sk, int tx, + struct tls_crypto_info *new_crypto_info) { u16 nonce_size, tag_size, iv_size, rec_seq_size, salt_size; + struct tls_crypto_info *crypto_info, *src_crypto_info; char *iv, *rec_seq, *key, *salt, *cipher_name; struct tls_sw_context_tx *sw_ctx_tx = NULL; struct tls_sw_context_rx *sw_ctx_rx = NULL; struct tls_context *ctx = tls_get_ctx(sk); - struct tls_crypto_info *crypto_info; + size_t keysize, crypto_info_size; struct cipher_context *cctx; struct tls_prot_info *prot; struct crypto_aead **aead; struct crypto_tfm *tfm; - size_t keysize; int rc = 0;
prot = &ctx->prot_info;
+ if (new_crypto_info) { + /* non-NULL new_crypto_info means rekey */ + src_crypto_info = new_crypto_info; + if (tx) { + sw_ctx_tx = ctx->priv_ctx_tx; + crypto_info = &ctx->crypto_send.info; + cctx = &ctx->tx; + aead = &sw_ctx_tx->aead_send; + sw_ctx_tx = NULL; + } else { + sw_ctx_rx = ctx->priv_ctx_rx; + crypto_info = &ctx->crypto_recv.info; + cctx = &ctx->rx; + aead = &sw_ctx_rx->aead_recv; + sw_ctx_rx = NULL; + } + goto skip_init; + } + if (tx) { if (!ctx->priv_ctx_tx) { sw_ctx_tx = kzalloc(sizeof(*sw_ctx_tx), GFP_KERNEL); @@ -2571,12 +2598,15 @@ int tls_set_sw_offload(struct sock *sk, int tx) aead = &sw_ctx_rx->aead_recv; sw_ctx_rx->key_update_pending = false; } + src_crypto_info = crypto_info;
+skip_init: switch (crypto_info->cipher_type) { case TLS_CIPHER_AES_GCM_128: { struct tls12_crypto_info_aes_gcm_128 *gcm_128_info;
- gcm_128_info = (void *)crypto_info; + crypto_info_size = sizeof(struct tls12_crypto_info_aes_gcm_128); + gcm_128_info = (void *)src_crypto_info; nonce_size = TLS_CIPHER_AES_GCM_128_IV_SIZE; tag_size = TLS_CIPHER_AES_GCM_128_TAG_SIZE; iv_size = TLS_CIPHER_AES_GCM_128_IV_SIZE; @@ -2593,7 +2623,8 @@ int tls_set_sw_offload(struct sock *sk, int tx) case TLS_CIPHER_AES_GCM_256: { struct tls12_crypto_info_aes_gcm_256 *gcm_256_info;
- gcm_256_info = (void *)crypto_info; + crypto_info_size = sizeof(struct tls12_crypto_info_aes_gcm_256); + gcm_256_info = (void *)src_crypto_info; nonce_size = TLS_CIPHER_AES_GCM_256_IV_SIZE; tag_size = TLS_CIPHER_AES_GCM_256_TAG_SIZE; iv_size = TLS_CIPHER_AES_GCM_256_IV_SIZE; @@ -2610,7 +2641,8 @@ int tls_set_sw_offload(struct sock *sk, int tx) case TLS_CIPHER_AES_CCM_128: { struct tls12_crypto_info_aes_ccm_128 *ccm_128_info;
- ccm_128_info = (void *)crypto_info; + crypto_info_size = sizeof(struct tls12_crypto_info_aes_ccm_128); + ccm_128_info = (void *)src_crypto_info; nonce_size = TLS_CIPHER_AES_CCM_128_IV_SIZE; tag_size = TLS_CIPHER_AES_CCM_128_TAG_SIZE; iv_size = TLS_CIPHER_AES_CCM_128_IV_SIZE; @@ -2627,7 +2659,8 @@ int tls_set_sw_offload(struct sock *sk, int tx) case TLS_CIPHER_CHACHA20_POLY1305: { struct tls12_crypto_info_chacha20_poly1305 *chacha20_poly1305_info;
- chacha20_poly1305_info = (void *)crypto_info; + crypto_info_size = sizeof(struct tls12_crypto_info_chacha20_poly1305); + chacha20_poly1305_info = (void *)src_crypto_info; nonce_size = 0; tag_size = TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE; iv_size = TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE; @@ -2644,7 +2677,8 @@ int tls_set_sw_offload(struct sock *sk, int tx) case TLS_CIPHER_SM4_GCM: { struct tls12_crypto_info_sm4_gcm *sm4_gcm_info;
- sm4_gcm_info = (void *)crypto_info; + crypto_info_size = sizeof(struct tls12_crypto_info_sm4_gcm); + sm4_gcm_info = (void *)src_crypto_info; nonce_size = TLS_CIPHER_SM4_GCM_IV_SIZE; tag_size = TLS_CIPHER_SM4_GCM_TAG_SIZE; iv_size = TLS_CIPHER_SM4_GCM_IV_SIZE; @@ -2661,7 +2695,8 @@ int tls_set_sw_offload(struct sock *sk, int tx) case TLS_CIPHER_SM4_CCM: { struct tls12_crypto_info_sm4_ccm *sm4_ccm_info;
- sm4_ccm_info = (void *)crypto_info; + crypto_info_size = sizeof(struct tls12_crypto_info_sm4_ccm); + sm4_ccm_info = (void *)src_crypto_info; nonce_size = TLS_CIPHER_SM4_CCM_IV_SIZE; tag_size = TLS_CIPHER_SM4_CCM_TAG_SIZE; iv_size = TLS_CIPHER_SM4_CCM_IV_SIZE; @@ -2678,7 +2713,8 @@ int tls_set_sw_offload(struct sock *sk, int tx) case TLS_CIPHER_ARIA_GCM_128: { struct tls12_crypto_info_aria_gcm_128 *aria_gcm_128_info;
- aria_gcm_128_info = (void *)crypto_info; + crypto_info_size = sizeof(struct tls12_crypto_info_aria_gcm_128); + aria_gcm_128_info = (void *)src_crypto_info; nonce_size = TLS_CIPHER_ARIA_GCM_128_IV_SIZE; tag_size = TLS_CIPHER_ARIA_GCM_128_TAG_SIZE; iv_size = TLS_CIPHER_ARIA_GCM_128_IV_SIZE; @@ -2695,7 +2731,8 @@ int tls_set_sw_offload(struct sock *sk, int tx) case TLS_CIPHER_ARIA_GCM_256: { struct tls12_crypto_info_aria_gcm_256 *gcm_256_info;
- gcm_256_info = (void *)crypto_info; + crypto_info_size = sizeof(struct tls12_crypto_info_aria_gcm_256); + gcm_256_info = (void *)src_crypto_info; nonce_size = TLS_CIPHER_ARIA_GCM_256_IV_SIZE; tag_size = TLS_CIPHER_ARIA_GCM_256_TAG_SIZE; iv_size = TLS_CIPHER_ARIA_GCM_256_IV_SIZE; @@ -2739,19 +2776,18 @@ int tls_set_sw_offload(struct sock *sk, int tx) prot->tag_size + prot->tail_size; prot->iv_size = iv_size; prot->salt_size = salt_size; - cctx->iv = kmalloc(iv_size + salt_size, GFP_KERNEL); - if (!cctx->iv) { - rc = -ENOMEM; - goto free_priv; - } - /* Note: 128 & 256 bit salt are the same size */ - prot->rec_seq_size = rec_seq_size; - memcpy(cctx->iv, salt, salt_size); - memcpy(cctx->iv + salt_size, iv, iv_size); - cctx->rec_seq = kmemdup(rec_seq, rec_seq_size, GFP_KERNEL); - if (!cctx->rec_seq) { - rc = -ENOMEM; - goto free_iv; + if (!new_crypto_info) { + cctx->iv = kmalloc(iv_size + salt_size, GFP_KERNEL); + if (!cctx->iv) { + rc = -ENOMEM; + goto free_priv; + } + + cctx->rec_seq = kmemdup(rec_seq, rec_seq_size, GFP_KERNEL); + if (!cctx->rec_seq) { + rc = -ENOMEM; + goto free_iv; + } }
if (!*aead) { @@ -2765,14 +2801,24 @@ int tls_set_sw_offload(struct sock *sk, int tx)
ctx->push_pending_record = tls_sw_push_pending_record;
+ /* setkey is the last operation that could fail during a + * rekey. if it succeeds, we can start modifying the + * context. + */ rc = crypto_aead_setkey(*aead, key, keysize); + if (rc) { + if (new_crypto_info) + goto out; + else + goto free_aead; + }
- if (rc) - goto free_aead; - - rc = crypto_aead_setauthsize(*aead, prot->tag_size); - if (rc) - goto free_aead; + if (!new_crypto_info) { + rc = crypto_aead_setauthsize(*aead, prot->tag_size); + if (rc) { + goto free_aead; + } + }
if (sw_ctx_rx) { tfm = crypto_aead_tfm(sw_ctx_rx->aead_recv); @@ -2787,6 +2833,20 @@ int tls_set_sw_offload(struct sock *sk, int tx) goto free_aead; }
+ /* Note: 128 & 256 bit salt are the same size */ + prot->rec_seq_size = rec_seq_size; + memcpy(cctx->iv, salt, salt_size); + memcpy(cctx->iv + salt_size, iv, iv_size); + + if (new_crypto_info) { + memcpy(cctx->rec_seq, rec_seq, rec_seq_size); + + memcpy(crypto_info, new_crypto_info, crypto_info_size); + memzero_explicit(new_crypto_info, crypto_info_size); + if (!tx) + tls_finish_key_update(ctx); + } + goto out;
free_aead: @@ -2799,12 +2859,14 @@ int tls_set_sw_offload(struct sock *sk, int tx) kfree(cctx->iv); cctx->iv = NULL; free_priv: - if (tx) { - kfree(ctx->priv_ctx_tx); - ctx->priv_ctx_tx = NULL; - } else { - kfree(ctx->priv_ctx_rx); - ctx->priv_ctx_rx = NULL; + if (!new_crypto_info) { + if (tx) { + kfree(ctx->priv_ctx_tx); + ctx->priv_ctx_tx = NULL; + } else { + kfree(ctx->priv_ctx_rx); + ctx->priv_ctx_rx = NULL; + } } out: return rc;
This allows us to generate different keys, so that we can test that rekey is using the correct one.
Signed-off-by: Sabrina Dubroca sd@queasysnail.net --- tools/testing/selftests/net/tls.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index 2cbb12736596..5f3adb28fee1 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -38,9 +38,11 @@ struct tls_crypto_info_keys { };
static void tls_crypto_info_init(uint16_t tls_version, uint16_t cipher_type, - struct tls_crypto_info_keys *tls12) + struct tls_crypto_info_keys *tls12, + char key_generation) { - memset(tls12, 0, sizeof(*tls12)); + memset(tls12, key_generation, sizeof(*tls12)); + memset(tls12, 0, sizeof(struct tls_crypto_info));
switch (cipher_type) { case TLS_CIPHER_CHACHA20_POLY1305: @@ -312,7 +314,7 @@ FIXTURE_SETUP(tls) int ret;
tls_crypto_info_init(variant->tls_version, variant->cipher_type, - &tls12); + &tls12, 0);
ulp_sock_pair(_metadata, &self->fd, &self->cfd, &self->notls);
@@ -1071,7 +1073,7 @@ TEST_F(tls, bidir) struct tls_crypto_info_keys tls12;
tls_crypto_info_init(variant->tls_version, variant->cipher_type, - &tls12); + &tls12, 0);
ret = setsockopt(self->fd, SOL_TLS, TLS_RX, &tls12, tls12.len); @@ -1479,7 +1481,7 @@ FIXTURE_SETUP(tls_err) int ret;
tls_crypto_info_init(variant->tls_version, TLS_CIPHER_AES_GCM_128, - &tls12); + &tls12, 0);
ulp_sock_pair(_metadata, &self->fd, &self->cfd, &self->notls); ulp_sock_pair(_metadata, &self->fd2, &self->cfd2, &self->notls); @@ -1771,7 +1773,7 @@ TEST(tls_v6ops) { int sfd, ret, fd; socklen_t len, len2;
- tls_crypto_info_init(TLS_1_2_VERSION, TLS_CIPHER_AES_GCM_128, &tls12); + tls_crypto_info_init(TLS_1_2_VERSION, TLS_CIPHER_AES_GCM_128, &tls12, 0);
addr.sin6_family = AF_INET6; addr.sin6_addr = in6addr_any;
v2: add rekey_fail test (reject changing the version/cipher)
Signed-off-by: Sabrina Dubroca sd@queasysnail.net --- tools/testing/selftests/net/tls.c | 322 ++++++++++++++++++++++++++++++ 1 file changed, 322 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index 5f3adb28fee1..098d52859521 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -1453,6 +1453,328 @@ TEST_F(tls, shutdown_reuse) EXPECT_EQ(errno, EISCONN); }
+#define TLS_RECORD_TYPE_HANDSHAKE 0x16 +/* key_update, length 1, update_not_requested */ +static const char key_update_msg[] = "\x18\x00\x00\x01\x00"; +static void tls_send_keyupdate(struct __test_metadata *_metadata, int fd) +{ + size_t len = sizeof(key_update_msg); + + EXPECT_EQ(tls_send_cmsg(fd, TLS_RECORD_TYPE_HANDSHAKE, + (char *)key_update_msg, len, 0), + len); +} + +static void tls_recv_keyupdate(struct __test_metadata *_metadata, int fd, int flags) +{ + char buf[100]; + + EXPECT_EQ(tls_recv_cmsg(_metadata, fd, TLS_RECORD_TYPE_HANDSHAKE, buf, sizeof(buf), flags), + sizeof(key_update_msg)); + EXPECT_EQ(memcmp(buf, key_update_msg, sizeof(key_update_msg)), 0); +} + +/* set the key to 0 then 1 for RX, immediately to 1 for TX */ +TEST_F(tls_basic, rekey_rx) +{ + struct tls_crypto_info_keys tls12_0, tls12_1; + char const *test_str = "test_message"; + int send_len = strlen(test_str) + 1; + char buf[20]; + int ret; + + if (self->notls) + return; + + tls_crypto_info_init(TLS_1_3_VERSION, TLS_CIPHER_AES_GCM_128, + &tls12_0, 0); + tls_crypto_info_init(TLS_1_3_VERSION, TLS_CIPHER_AES_GCM_128, + &tls12_1, 1); + + + ret = setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12_1, tls12_1.len); + ASSERT_EQ(ret, 0); + + ret = setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12_0, tls12_0.len); + ASSERT_EQ(ret, 0); + + ret = setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12_1, tls12_1.len); + EXPECT_EQ(ret, 0); + + EXPECT_EQ(send(self->fd, test_str, send_len, 0), send_len); + EXPECT_EQ(recv(self->cfd, buf, send_len, 0), send_len); + EXPECT_EQ(memcmp(buf, test_str, send_len), 0); +} + +/* set the key to 0 then 1 for TX, immediately to 1 for RX */ +TEST_F(tls_basic, rekey_tx) +{ + struct tls_crypto_info_keys tls12_0, tls12_1; + char const *test_str = "test_message"; + int send_len = strlen(test_str) + 1; + char buf[20]; + int ret; + + if (self->notls) + return; + + tls_crypto_info_init(TLS_1_3_VERSION, TLS_CIPHER_AES_GCM_128, + &tls12_0, 0); + tls_crypto_info_init(TLS_1_3_VERSION, TLS_CIPHER_AES_GCM_128, + &tls12_1, 1); + + + ret = setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12_0, tls12_0.len); + ASSERT_EQ(ret, 0); + + ret = setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12_1, tls12_1.len); + ASSERT_EQ(ret, 0); + + ret = setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12_1, tls12_1.len); + EXPECT_EQ(ret, 0); + + EXPECT_EQ(send(self->fd, test_str, send_len, 0), send_len); + EXPECT_EQ(recv(self->cfd, buf, send_len, 0), send_len); + EXPECT_EQ(memcmp(buf, test_str, send_len), 0); +} + +TEST_F(tls, rekey) +{ + char const *test_str_1 = "test_message_before_rekey"; + char const *test_str_2 = "test_message_after_rekey"; + struct tls_crypto_info_keys tls12; + int send_len; + char buf[100]; + + if (variant->tls_version != TLS_1_3_VERSION) + return; + + /* initial send/recv */ + send_len = strlen(test_str_1) + 1; + EXPECT_EQ(send(self->fd, test_str_1, send_len, 0), send_len); + EXPECT_EQ(recv(self->cfd, buf, send_len, 0), send_len); + EXPECT_EQ(memcmp(buf, test_str_1, send_len), 0); + + /* update TX key */ + tls_send_keyupdate(_metadata, self->fd); + tls_crypto_info_init(variant->tls_version, variant->cipher_type, &tls12, 1); + EXPECT_EQ(setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12, tls12.len), 0); + + /* send after rekey */ + send_len = strlen(test_str_2) + 1; + EXPECT_EQ(send(self->fd, test_str_2, send_len, 0), send_len); + + /* can't receive the KeyUpdate without a control message */ + EXPECT_EQ(recv(self->cfd, buf, send_len, 0), -1); + + /* get KeyUpdate */ + tls_recv_keyupdate(_metadata, self->cfd, 0); + + /* recv blocking -> -EKEYEXPIRED */ + EXPECT_EQ(recv(self->cfd, buf, sizeof(buf), 0), -1); + EXPECT_EQ(errno, EKEYEXPIRED); + + /* recv non-blocking -> -EKEYEXPIRED */ + EXPECT_EQ(recv(self->cfd, buf, sizeof(buf), MSG_DONTWAIT), -1); + EXPECT_EQ(errno, EKEYEXPIRED); + + /* update RX key */ + EXPECT_EQ(setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12, tls12.len), 0); + + /* recv after rekey */ + EXPECT_NE(recv(self->cfd, buf, send_len, 0), -1); + EXPECT_EQ(memcmp(buf, test_str_2, send_len), 0); +} + +TEST_F(tls, rekey_fail) +{ + char const *test_str_1 = "test_message_before_rekey"; + char const *test_str_2 = "test_message_after_rekey"; + struct tls_crypto_info_keys tls12; + int send_len; + char buf[100]; + + if (variant->tls_version != TLS_1_3_VERSION) + return; + + /* initial send/recv */ + send_len = strlen(test_str_1) + 1; + EXPECT_EQ(send(self->fd, test_str_1, send_len, 0), send_len); + EXPECT_EQ(recv(self->cfd, buf, send_len, 0), send_len); + EXPECT_EQ(memcmp(buf, test_str_1, send_len), 0); + + /* update TX key */ + tls_send_keyupdate(_metadata, self->fd); + + /* successful update */ + tls_crypto_info_init(variant->tls_version, variant->cipher_type, &tls12, 1); + EXPECT_EQ(setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12, tls12.len), 0); + + /* invalid update: change of version */ + tls_crypto_info_init(TLS_1_2_VERSION, variant->cipher_type, &tls12, 1); + EXPECT_EQ(setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12, tls12.len), -1); + EXPECT_EQ(errno, EINVAL); + + /* invalid update: change of cipher */ + if (variant->cipher_type == TLS_CIPHER_AES_GCM_256) + tls_crypto_info_init(variant->tls_version, TLS_CIPHER_CHACHA20_POLY1305, &tls12, 1); + else + tls_crypto_info_init(variant->tls_version, TLS_CIPHER_AES_GCM_256, &tls12, 1); + EXPECT_EQ(setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12, tls12.len), -1); + EXPECT_EQ(errno, EINVAL); + + /* send after rekey, the invalid updates shouldn't have an effect */ + send_len = strlen(test_str_2) + 1; + EXPECT_EQ(send(self->fd, test_str_2, send_len, 0), send_len); + + /* can't receive the KeyUpdate without a control message */ + EXPECT_EQ(recv(self->cfd, buf, send_len, 0), -1); + + /* get KeyUpdate */ + tls_recv_keyupdate(_metadata, self->cfd, 0); + + /* recv blocking -> -EKEYEXPIRED */ + EXPECT_EQ(recv(self->cfd, buf, sizeof(buf), 0), -1); + EXPECT_EQ(errno, EKEYEXPIRED); + + /* recv non-blocking -> -EKEYEXPIRED */ + EXPECT_EQ(recv(self->cfd, buf, sizeof(buf), MSG_DONTWAIT), -1); + EXPECT_EQ(errno, EKEYEXPIRED); + + /* update RX key */ + tls_crypto_info_init(variant->tls_version, variant->cipher_type, &tls12, 1); + EXPECT_EQ(setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12, tls12.len), 0); + + /* recv after rekey */ + EXPECT_NE(recv(self->cfd, buf, send_len, 0), -1); + EXPECT_EQ(memcmp(buf, test_str_2, send_len), 0); +} + +TEST_F(tls, rekey_peek) +{ + char const *test_str_1 = "test_message_before_rekey"; + struct tls_crypto_info_keys tls12; + int send_len; + char buf[100]; + + if (variant->tls_version != TLS_1_3_VERSION) + return; + + send_len = strlen(test_str_1) + 1; + EXPECT_EQ(send(self->fd, test_str_1, send_len, 0), send_len); + + /* update TX key */ + tls_send_keyupdate(_metadata, self->fd); + tls_crypto_info_init(variant->tls_version, variant->cipher_type, &tls12, 1); + EXPECT_EQ(setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12, tls12.len), 0); + + EXPECT_EQ(recv(self->cfd, buf, sizeof(buf), MSG_PEEK), send_len); + EXPECT_EQ(memcmp(buf, test_str_1, send_len), 0); + + EXPECT_EQ(recv(self->cfd, buf, send_len, 0), send_len); + EXPECT_EQ(memcmp(buf, test_str_1, send_len), 0); + + /* can't receive the KeyUpdate without a control message */ + EXPECT_EQ(recv(self->cfd, buf, send_len, MSG_PEEK), -1); + + /* peek KeyUpdate */ + tls_recv_keyupdate(_metadata, self->cfd, MSG_PEEK); + + /* get KeyUpdate */ + tls_recv_keyupdate(_metadata, self->cfd, 0); + + /* update RX key */ + EXPECT_EQ(setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12, tls12.len), 0); +} + +TEST_F(tls, splice_rekey) +{ + int send_len = TLS_PAYLOAD_MAX_LEN / 2; + char mem_send[TLS_PAYLOAD_MAX_LEN]; + char mem_recv[TLS_PAYLOAD_MAX_LEN]; + struct tls_crypto_info_keys tls12; + int p[2]; + + if (variant->tls_version != TLS_1_3_VERSION) + return; + + memrnd(mem_send, sizeof(mem_send)); + + ASSERT_GE(pipe(p), 0); + EXPECT_EQ(send(self->fd, mem_send, send_len, 0), send_len); + + /* update TX key */ + tls_send_keyupdate(_metadata, self->fd); + tls_crypto_info_init(variant->tls_version, variant->cipher_type, &tls12, 1); + EXPECT_EQ(setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12, tls12.len), 0); + + EXPECT_EQ(send(self->fd, mem_send, send_len, 0), send_len); + + EXPECT_EQ(splice(self->cfd, NULL, p[1], NULL, TLS_PAYLOAD_MAX_LEN, 0), send_len); + EXPECT_EQ(read(p[0], mem_recv, send_len), send_len); + EXPECT_EQ(memcmp(mem_send, mem_recv, send_len), 0); + + /* can't splice the KeyUpdate */ + EXPECT_EQ(splice(self->cfd, NULL, p[1], NULL, TLS_PAYLOAD_MAX_LEN, 0), -1); + EXPECT_EQ(errno, EINVAL); + + /* peek KeyUpdate */ + tls_recv_keyupdate(_metadata, self->cfd, MSG_PEEK); + + /* get KeyUpdate */ + tls_recv_keyupdate(_metadata, self->cfd, 0); + + /* can't splice before updating the key */ + EXPECT_EQ(splice(self->cfd, NULL, p[1], NULL, TLS_PAYLOAD_MAX_LEN, 0), -1); + EXPECT_EQ(errno, EKEYEXPIRED); + + /* update RX key */ + EXPECT_EQ(setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12, tls12.len), 0); + + EXPECT_EQ(splice(self->cfd, NULL, p[1], NULL, TLS_PAYLOAD_MAX_LEN, 0), send_len); + EXPECT_EQ(read(p[0], mem_recv, send_len), send_len); + EXPECT_EQ(memcmp(mem_send, mem_recv, send_len), 0); +} + +TEST_F(tls, rekey_getsockopt) +{ + struct tls_crypto_info_keys tls12; + struct tls_crypto_info_keys tls12_get; + socklen_t len; + + tls_crypto_info_init(variant->tls_version, variant->cipher_type, &tls12, 0); + + len = tls12.len; + EXPECT_EQ(getsockopt(self->fd, SOL_TLS, TLS_TX, &tls12_get, &len), 0); + EXPECT_EQ(len, tls12.len); + EXPECT_EQ(memcmp(&tls12_get, &tls12, tls12.len), 0); + + len = tls12.len; + EXPECT_EQ(getsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12_get, &len), 0); + EXPECT_EQ(len, tls12.len); + EXPECT_EQ(memcmp(&tls12_get, &tls12, tls12.len), 0); + + if (variant->tls_version != TLS_1_3_VERSION) + return; + + tls_send_keyupdate(_metadata, self->fd); + tls_crypto_info_init(variant->tls_version, variant->cipher_type, &tls12, 1); + EXPECT_EQ(setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12, tls12.len), 0); + + tls_recv_keyupdate(_metadata, self->cfd, 0); + EXPECT_EQ(setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12, tls12.len), 0); + + len = tls12.len; + EXPECT_EQ(getsockopt(self->fd, SOL_TLS, TLS_TX, &tls12_get, &len), 0); + EXPECT_EQ(len, tls12.len); + EXPECT_EQ(memcmp(&tls12_get, &tls12, tls12.len), 0); + + len = tls12.len; + EXPECT_EQ(getsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12_get, &len), 0); + EXPECT_EQ(len, tls12.len); + EXPECT_EQ(memcmp(&tls12_get, &tls12, tls12.len), 0); +} + FIXTURE(tls_err) { int fd, cfd;
On Tue, 14 Feb 2023 12:17:37 +0100 Sabrina Dubroca wrote:
Changes in v2 use reverse xmas tree ordering in tls_set_sw_offload and do_tls_setsockopt_conf turn the alt_crypto_info into an else if selftests: add rekey_fail test
Vadim suggested simplifying tls_set_sw_offload by copying the new crypto_info in the context in do_tls_setsockopt_conf, and then detecting the rekey in tls_set_sw_offload based on whether the iv was already set, but I don't think we can have a common error path (otherwise we'd free the aead etc on rekey failure). I decided instead to reorganize tls_set_sw_offload so that the context is unmodified until we know the rekey cannot fail. Some fields will be touched during the rekey, but they will be set to the same value they had before the rekey (prot->rec_seq_size, etc).
Apoorv suggested to name the struct tls_crypto_info_keys "tls13" rather than "tls12". Since we're using the same crypto_info data for TLS1.3 as for 1.2, even if the tests only run for TLS1.3, I'd rather keep the "tls12" name, in case we end up adding a "tls13_crypto_info_aes_gcm_128" type in the future.
Kuniyuki and Apoorv also suggested preventing rekeys on RX when we haven't received a matching KeyUpdate message, but I'd rather let userspace handle this and have a symmetric API between TX and RX on the kernel side. It's a bit of a foot-gun, but we can't really stop a broken userspace from rolling back the rec_seq on an existing crypto_info either, and that seems like a worse possible breakage.
And how will we handle re-keying in offload?
include/net/tls.h | 4 + net/tls/tls.h | 3 +- net/tls/tls_device.c | 2 +- net/tls/tls_main.c | 37 +++- net/tls/tls_sw.c | 189 +++++++++++++---- tools/testing/selftests/net/tls.c | 336 +++++++++++++++++++++++++++++-
Documentation please.
2023-02-14, 21:08:11 -0800, Jakub Kicinski wrote:
On Tue, 14 Feb 2023 12:17:37 +0100 Sabrina Dubroca wrote:
Changes in v2 use reverse xmas tree ordering in tls_set_sw_offload and do_tls_setsockopt_conf turn the alt_crypto_info into an else if selftests: add rekey_fail test
Vadim suggested simplifying tls_set_sw_offload by copying the new crypto_info in the context in do_tls_setsockopt_conf, and then detecting the rekey in tls_set_sw_offload based on whether the iv was already set, but I don't think we can have a common error path (otherwise we'd free the aead etc on rekey failure). I decided instead to reorganize tls_set_sw_offload so that the context is unmodified until we know the rekey cannot fail. Some fields will be touched during the rekey, but they will be set to the same value they had before the rekey (prot->rec_seq_size, etc).
Apoorv suggested to name the struct tls_crypto_info_keys "tls13" rather than "tls12". Since we're using the same crypto_info data for TLS1.3 as for 1.2, even if the tests only run for TLS1.3, I'd rather keep the "tls12" name, in case we end up adding a "tls13_crypto_info_aes_gcm_128" type in the future.
Kuniyuki and Apoorv also suggested preventing rekeys on RX when we haven't received a matching KeyUpdate message, but I'd rather let userspace handle this and have a symmetric API between TX and RX on the kernel side. It's a bit of a foot-gun, but we can't really stop a broken userspace from rolling back the rec_seq on an existing crypto_info either, and that seems like a worse possible breakage.
And how will we handle re-keying in offload?
Sorry for the stupid question... do you mean that I need to solve that problem before this series can progress, or that the cover letter should summarize the state of the discussion?
include/net/tls.h | 4 + net/tls/tls.h | 3 +- net/tls/tls_device.c | 2 +- net/tls/tls_main.c | 37 +++- net/tls/tls_sw.c | 189 +++++++++++++---- tools/testing/selftests/net/tls.c | 336 +++++++++++++++++++++++++++++-
Documentation please.
Ok, I'll add a paragraph to Documentation/networking/tls.rst in the next version.
On Wed, 15 Feb 2023 18:29:50 +0100 Sabrina Dubroca wrote:
And how will we handle re-keying in offload?
Sorry for the stupid question... do you mean that I need to solve that problem before this series can progress, or that the cover letter should summarize the state of the discussion?
I maintain that 1.3 offload is much more important than rekeying. Offloads being available for 1.2 may be stalling adoption of 1.3 (just a guess, I run across this article mentioning 1.2 being used in Oracle cloud for instance: https://blogs.oracle.com/cloudsecurity/post/how-oci-helps-you-protect-data-w... could be because MITM requirements, or maybe they have HW which can only do 1.2? Dunno).
But I'm willing to compromise, we just need a solid plan of how to handle the inevitable. I'm worried that how this will pay out is: - you don't care about offload and add rekey - vendors don't care about rekey and add 1.3 ... time passes ... - both you and the vendors have moved on - users run into issues, waste their time debugging and eventually report the problem upstream - it's on me to fix?
:(
2023-02-15, 11:10:20 -0800, Jakub Kicinski wrote:
On Wed, 15 Feb 2023 18:29:50 +0100 Sabrina Dubroca wrote:
And how will we handle re-keying in offload?
Sorry for the stupid question... do you mean that I need to solve that problem before this series can progress, or that the cover letter should summarize the state of the discussion?
I maintain that 1.3 offload is much more important than rekeying.
Sure, it'd be great if we had 1.3 offload (and it should also support rekeying, because you can't force the peer to not rekey). But I can't implement offload.
Offloads being available for 1.2 may be stalling adoption of 1.3 (just a guess, I run across this article mentioning 1.2 being used in Oracle cloud for instance: https://blogs.oracle.com/cloudsecurity/post/how-oci-helps-you-protect-data-w... could be because MITM requirements, or maybe they have HW which can only do 1.2? Dunno).
But I'm willing to compromise, we just need a solid plan of how to handle the inevitable. I'm worried that how this will pay out is:
- you don't care about offload and add rekey
I think that's a bit unfair. Not having to deal with offload at all would make things easier for me, sure, but I'm open to the discussion, even if I don't have a good understanding of the offloading side.
I'd just like to avoid holding this feature (arguably a bug fix) until the vendors finally decide that they care about 1.3, if possible. If not, so be it.
I wasn't trying to force you to accept this series. Sorry if that's what it sounded like. I really wanted to understand what you were asking for, because your question wasn't clear to me. Now it makes sense.
- vendors don't care about rekey and add 1.3
... time passes ...
- both you and the vendors have moved on
- users run into issues, waste their time debugging and eventually report the problem upstream
- it's on me to fix?
:(
Yeah, I see. If the rekey already exists in SW, I think it'll be a bit harder for them to just not care about it, but maybe I'm being optimistic.
I'm not sure we can come up with the correct uAPI/rekey design without trying to implement rekey with offload and seeing how that blows up (and possibly in different ways with different devices).
Picking up from where the discussion died off in the previous thread:
On transmit, I think the software fallback for retransmits will be needed, whether we can keep two generations of keys on the device or just one. We could have 2 consecutive rekeys, without even worrying about a broken peer spamming key updates for both sides (or the local user's library doing that). If devices can juggle 3 generations of keys, then maybe we don't have to worry too much about software fallback, but we'll need to define an API to set the extra keys ahead of time and then advance to the next one. Will all devices support installing 2 or 3 keys?
On receive, we also have the problem of more than one rekey arriving, so if we can't feed enough keys to the device in advance, we'll have to decrypt some records in software. The host will have to survive the burst of software decryption while we wait until the device config catches up.
One option might be to do the key derivation in the kernel following section 7.2 of the RFC [1]. I don't know how happy crypto/security people would be with that. We'd have to introduce new crypto_info structs, and new cipher types (or a flag in the upper bits of the cipher type) to go with them. Then the kernel processes incoming key update messages on its own, and emits its own key update messages when its current key is expiring. On transmit we also need to inject a Finished message before the KeyUpdate [2]. That's bringing a lot of TLS logic in the kernel. At that point we might as well do the whole handshake... but I really hope it doesn't come to that.
It's hard, but I don't think "let's just kill the connection" is a nice solution. It's definitely easier for the kernel.
[1] https://www.rfc-editor.org/rfc/rfc8446#section-7.2 [2] https://www.rfc-editor.org/rfc/rfc8446#section-4.6.3
On Thu, 16 Feb 2023 00:23:11 +0100 Sabrina Dubroca wrote:
Offloads being available for 1.2 may be stalling adoption of 1.3 (just a guess, I run across this article mentioning 1.2 being used in Oracle cloud for instance: https://blogs.oracle.com/cloudsecurity/post/how-oci-helps-you-protect-data-w... could be because MITM requirements, or maybe they have HW which can only do 1.2? Dunno).
But I'm willing to compromise, we just need a solid plan of how to handle the inevitable. I'm worried that how this will pay out is:
- you don't care about offload and add rekey
I think that's a bit unfair. Not having to deal with offload at all would make things easier for me, sure, but I'm open to the discussion, even if I don't have a good understanding of the offloading side.
I'd just like to avoid holding this feature (arguably a bug fix) until the vendors finally decide that they care about 1.3, if possible. If not, so be it.
I wasn't trying to force you to accept this series. Sorry if that's what it sounded like. I really wanted to understand what you were asking for, because your question wasn't clear to me. Now it makes sense.
- vendors don't care about rekey and add 1.3
... time passes ...
- both you and the vendors have moved on
- users run into issues, waste their time debugging and eventually report the problem upstream
- it's on me to fix?
:(
Yeah, I see. If the rekey already exists in SW, I think it'll be a bit harder for them to just not care about it, but maybe I'm being optimistic.
True, they may try to weasel out / require some pushing and support. Depends on which vendor gets to it first, I guess.
I'm not sure we can come up with the correct uAPI/rekey design without trying to implement rekey with offload and seeing how that blows up (and possibly in different ways with different devices).
Yes, best we can do now is have a plan in place... and your promise of future help? :) (incl. being on the lookout for when the patches come because I'll probably forget)
Picking up from where the discussion died off in the previous thread:
On transmit, I think the software fallback for retransmits will be needed, whether we can keep two generations of keys on the device or just one. We could have 2 consecutive rekeys, without even worrying about a broken peer spamming key updates for both sides (or the local user's library doing that). If devices can juggle 3 generations of keys, then maybe we don't have to worry too much about software fallback, but we'll need to define an API to set the extra keys ahead of time and then advance to the next one. Will all devices support installing 2 or 3 keys?
I think we could try to switch to SW crypto on Tx until all data using old key is ACK'ed, drivers can look at skb->decrypted to skip touching the transitional skbs. Then remove old key, install new one, resume offload.
We may need special care to make sure we don't try to encrypt the same packet with both keys. In case a rtx gets stuck somewhere and comes to the NIC after it's already acked (happens surprisingly often).
Multiple keys on the device would probably mean the device needs some intelligence to know when to use which - not my first choice.
On receive, we also have the problem of more than one rekey arriving, so if we can't feed enough keys to the device in advance, we'll have to decrypt some records in software. The host will have to survive the burst of software decryption while we wait until the device config catches up.
I think receive is easier. The fallback is quite effective and already in place. Here too we may want to enforce some transitional SW-only mode to avoid the (highly unlikely?) case that NIC will decrypt successfully a packet with the old key, even tho new key should be used. Carrying "key ID" with the skb is probably an overkill.
One option might be to do the key derivation in the kernel following section 7.2 of the RFC [1]. I don't know how happy crypto/security people would be with that. We'd have to introduce new crypto_info structs, and new cipher types (or a flag in the upper bits of the cipher type) to go with them. Then the kernel processes incoming key update messages on its own, and emits its own key update messages when its current key is expiring. On transmit we also need to inject a Finished message before the KeyUpdate [2]. That's bringing a lot of TLS logic in the kernel. At that point we might as well do the whole handshake... but I really hope it doesn't come to that.
I think it's mostly a device vs host state sharing problem, so TLS ULP or user space - not a big difference, both are on the host.
2023-02-15, 19:57:48 -0800, Jakub Kicinski wrote:
On Thu, 16 Feb 2023 00:23:11 +0100 Sabrina Dubroca wrote:
Offloads being available for 1.2 may be stalling adoption of 1.3 (just a guess, I run across this article mentioning 1.2 being used in Oracle cloud for instance: https://blogs.oracle.com/cloudsecurity/post/how-oci-helps-you-protect-data-w... could be because MITM requirements, or maybe they have HW which can only do 1.2? Dunno).
But I'm willing to compromise, we just need a solid plan of how to handle the inevitable. I'm worried that how this will pay out is:
- you don't care about offload and add rekey
I think that's a bit unfair. Not having to deal with offload at all would make things easier for me, sure, but I'm open to the discussion, even if I don't have a good understanding of the offloading side.
I'd just like to avoid holding this feature (arguably a bug fix) until the vendors finally decide that they care about 1.3, if possible. If not, so be it.
I wasn't trying to force you to accept this series. Sorry if that's what it sounded like. I really wanted to understand what you were asking for, because your question wasn't clear to me. Now it makes sense.
- vendors don't care about rekey and add 1.3
... time passes ...
- both you and the vendors have moved on
- users run into issues, waste their time debugging and eventually report the problem upstream
- it's on me to fix?
:(
Yeah, I see. If the rekey already exists in SW, I think it'll be a bit harder for them to just not care about it, but maybe I'm being optimistic.
True, they may try to weasel out / require some pushing and support. Depends on which vendor gets to it first, I guess.
I'm not sure we can come up with the correct uAPI/rekey design without trying to implement rekey with offload and seeing how that blows up (and possibly in different ways with different devices).
Yes, best we can do now is have a plan in place... and your promise of future help? :) (incl. being on the lookout for when the patches come because I'll probably forget)
I'll try to help. None of us can guarantee that we'll be around :)
Picking up from where the discussion died off in the previous thread:
On transmit, I think the software fallback for retransmits will be needed, whether we can keep two generations of keys on the device or just one. We could have 2 consecutive rekeys, without even worrying about a broken peer spamming key updates for both sides (or the local user's library doing that). If devices can juggle 3 generations of keys, then maybe we don't have to worry too much about software fallback, but we'll need to define an API to set the extra keys ahead of time and then advance to the next one. Will all devices support installing 2 or 3 keys?
I think we could try to switch to SW crypto on Tx until all data using old key is ACK'ed, drivers can look at skb->decrypted to skip touching the transitional skbs. Then remove old key, install new one, resume offload.
"all data using the old key" needs to be one list of record per old key, since we can have multiple rekeys.
Could we install the new key in HW a bit earlier? Keep the old key as SW fallback for rtx, but the driver installs the new key when the corresponding KeyUpdate record has gone through and tells the stack to stop doing SW crypto? I'm not sure that'd be a significant improvement in the standard case, though.
We may need special care to make sure we don't try to encrypt the same packet with both keys. In case a rtx gets stuck somewhere and comes to the NIC after it's already acked (happens surprisingly often).
Don't we have that already? If there's a retransmit while we're setting the TX key in HW, data that was queued on the socket before (and shouldn't be encrypted at all) would also be encrypted otherwise. Or is it different with rekey?
Multiple keys on the device would probably mean the device needs some intelligence to know when to use which - not my first choice.
I only mentioned that because Boris talked about it in the previous thread. It's also not my first choice because I doubt all devices will support it the same way.
On receive, we also have the problem of more than one rekey arriving, so if we can't feed enough keys to the device in advance, we'll have to decrypt some records in software. The host will have to survive the burst of software decryption while we wait until the device config catches up.
I think receive is easier. The fallback is quite effective and already in place.
I was thinking about a peer sending at line rate just after the rekey, and we have to handle that in software for a short while. If the peer is also dropping to a software fallback, I guess there's no issue, both sides will slow down.
Here too we may want to enforce some transitional SW-only mode to avoid the (highly unlikely?) case that NIC will decrypt successfully a packet with the old key, even tho new key should be used.
Not a crypto expert, but I don't think that's a realistic thing to worry about.
If the NIC uses the old key to decrypt a record that was encrypted with the new key and fails, we end up seeing the encrypted record and decrypting it in SW, right? (and then we can pick the correct key in the SW fallback) We don't just tear down the connection?
Carrying "key ID" with the skb is probably an overkill.
I think we can retrieve it from the sequence of records. When we see a KeyUpdate, we advance the "key ID" after we're done with the message.
This makes me wonder again if we should have fake offloads on veth (still calling the kernel's crypto library to simulate a device doing the encryption and/or decryption), to make it easy to play with the software bits, without requiring actual hardware that can offload TLS/IPsec/MACsec. But maybe it's too complex to be useful and we'd just waste our time fixing bugs in the fake offload rather than improving the stack.
One option might be to do the key derivation in the kernel following section 7.2 of the RFC [1]. I don't know how happy crypto/security people would be with that. We'd have to introduce new crypto_info structs, and new cipher types (or a flag in the upper bits of the cipher type) to go with them. Then the kernel processes incoming key update messages on its own, and emits its own key update messages when its current key is expiring. On transmit we also need to inject a Finished message before the KeyUpdate [2]. That's bringing a lot of TLS logic in the kernel. At that point we might as well do the whole handshake... but I really hope it doesn't come to that.
I think it's mostly a device vs host state sharing problem, so TLS ULP or user space - not a big difference, both are on the host.
Maybe. But that's one more implementation of TLS (or large parts of it) that admins and security teams have to worry about. Maintained by networking people. I really want to avoid going down that route.
I'll try to dig more into what you're proposing and to poke some holes into it over the next few days.
Sorry for the delay, long weekend + merge window.
On Thu, 16 Feb 2023 17:23:19 +0100 Sabrina Dubroca wrote:
2023-02-15, 19:57:48 -0800, Jakub Kicinski wrote:
On Thu, 16 Feb 2023 00:23:11 +0100 Sabrina Dubroca wrote:
I'm not sure we can come up with the correct uAPI/rekey design without trying to implement rekey with offload and seeing how that blows up (and possibly in different ways with different devices).
Yes, best we can do now is have a plan in place... and your promise of future help? :) (incl. being on the lookout for when the patches come because I'll probably forget)
I'll try to help. None of us can guarantee that we'll be around :)
Right.
I think we could try to switch to SW crypto on Tx until all data using old key is ACK'ed, drivers can look at skb->decrypted to skip touching the transitional skbs. Then remove old key, install new one, resume offload.
"all data using the old key" needs to be one list of record per old key, since we can have multiple rekeys.
No fully parsing this bit.
Could we install the new key in HW a bit earlier? Keep the old key as SW fallback for rtx, but the driver installs the new key when the corresponding KeyUpdate record has gone through and tells the stack to stop doing SW crypto? I'm not sure that'd be a significant improvement in the standard case, though.
Important consideration is making the non-rekey path as fast as possible (given rekeying is extremely rare). Looking at skb->decrypted should be very fast but we can possibly fit some other indication of "are we rekeying" into another already referenced cache line. We definitely don't want to have to look up the record to know what state we're in.
The fallback can't use AES-NI (it's in sirq context) so it's slower than SW encrypt before queuing to TCP. Hence my first thought is using SW crypto for new key and let the traffic we already queued with old key drain leveraging HW crypto. But as I said the impact on performance when not rekeying is more important, and so is driver simplicity.
We may need special care to make sure we don't try to encrypt the same packet with both keys. In case a rtx gets stuck somewhere and comes to the NIC after it's already acked (happens surprisingly often).
Don't we have that already? If there's a retransmit while we're setting the TX key in HW, data that was queued on the socket before (and shouldn't be encrypted at all) would also be encrypted otherwise. Or is it different with rekey?
We have a "start marker" record which is supposed to indicate that anything before it has already been encrypted. The driver is programmed with the start seq no, when it sees a packet from before this seq no it checks if a record exists, finds its before the start marker and sends the data as is.
Multiple keys on the device would probably mean the device needs some intelligence to know when to use which - not my first choice.
I only mentioned that because Boris talked about it in the previous thread. It's also not my first choice because I doubt all devices will support it the same way.
On receive, we also have the problem of more than one rekey arriving, so if we can't feed enough keys to the device in advance, we'll have to decrypt some records in software. The host will have to survive the burst of software decryption while we wait until the device config catches up.
I think receive is easier. The fallback is quite effective and already in place.
I was thinking about a peer sending at line rate just after the rekey, and we have to handle that in software for a short while. If the peer is also dropping to a software fallback, I guess there's no issue, both sides will slow down.
Here too we may want to enforce some transitional SW-only mode to avoid the (highly unlikely?) case that NIC will decrypt successfully a packet with the old key, even tho new key should be used.
Not a crypto expert, but I don't think that's a realistic thing to worry about.
If the NIC uses the old key to decrypt a record that was encrypted with the new key and fails, we end up seeing the encrypted record and decrypting it in SW, right? (and then we can pick the correct key in the SW fallback) We don't just tear down the connection?
Right, in the common case it should just work. We're basically up against the probability of a hash collision on the auth tag (match with both old and new key).
Carrying "key ID" with the skb is probably an overkill.
I think we can retrieve it from the sequence of records. When we see a KeyUpdate, we advance the "key ID" after we're done with the message.
This makes me wonder again if we should have fake offloads on veth (still calling the kernel's crypto library to simulate a device doing the encryption and/or decryption), to make it easy to play with the software bits, without requiring actual hardware that can offload TLS/IPsec/MACsec. But maybe it's too complex to be useful and we'd just waste our time fixing bugs in the fake offload rather than improving the stack.
It should be quite useful. I also usually just hack up veth, but I reckon adding support to netdevsim would be a better fit. We just need a way to tell two netdevsim ports to "connect to each other".
One option might be to do the key derivation in the kernel following section 7.2 of the RFC [1]. I don't know how happy crypto/security people would be with that. We'd have to introduce new crypto_info structs, and new cipher types (or a flag in the upper bits of the cipher type) to go with them. Then the kernel processes incoming key update messages on its own, and emits its own key update messages when its current key is expiring. On transmit we also need to inject a Finished message before the KeyUpdate [2]. That's bringing a lot of TLS logic in the kernel. At that point we might as well do the whole handshake... but I really hope it doesn't come to that.
I think it's mostly a device vs host state sharing problem, so TLS ULP or user space - not a big difference, both are on the host.
Maybe. But that's one more implementation of TLS (or large parts of it) that admins and security teams have to worry about. Maintained by networking people. I really want to avoid going down that route.
I'll try to dig more into what you're proposing and to poke some holes into it over the next few days.
2023-02-21, 19:19:44 -0800, Jakub Kicinski wrote:
Sorry for the delay, long weekend + merge window.
No worries, I wasn't expecting much activity on this from you during the merge window.
On Thu, 16 Feb 2023 17:23:19 +0100 Sabrina Dubroca wrote:
2023-02-15, 19:57:48 -0800, Jakub Kicinski wrote:
I think we could try to switch to SW crypto on Tx until all data using old key is ACK'ed, drivers can look at skb->decrypted to skip touching the transitional skbs. Then remove old key, install new one, resume offload.
"all data using the old key" needs to be one list of record per old key, since we can have multiple rekeys.
No fully parsing this bit.
We can have multiple rekeys in the time it takes to get an ACK for the first KeyUpdate message to be ACK'ed. I'm not sure why I talked about a "list of records".
But we could have this sequence of records:
recN(k1,hwenc) KeyUpdate(k1,hwenc) // switch to k2 and sw crypto
rec0(k2,swenc) rec1(k2,swenc) KeyUpdate(k2,swenc) rec0(k3,swenc) // receive ACK for KU1, don't enable HW offload for k2 or k3 because we've already switched off k2
rec1(k3,swenc) // receive ACK for KU2, now enable HW offload for k3
rec2(k3,hwenc)
So we'll need to record the most recent TX rekey, and wait until the corresponding KU record is ACK'ed, before we resume offload using the most recent key (and skip possible intermediate keys).
Installing the key in HW and re-enabling the offload will need to happen via the icsk_clean_acked callback. We'll need a workqueue so that we don't actually talk to the driver from softirq.
Then, we have to handle a failure to install the key. Since we're not installing it in HW immediately during setsockopt, notifying userspace of a rekey failure is more complicated. Maybe we can do a rekey_prepare during the setsocktopt, and then the actual rekey is an operation that cannot fail?
Could we install the new key in HW a bit earlier? Keep the old key as SW fallback for rtx, but the driver installs the new key when the corresponding KeyUpdate record has gone through and tells the stack to stop doing SW crypto? I'm not sure that'd be a significant improvement in the standard case, though.
Important consideration is making the non-rekey path as fast as possible (given rekeying is extremely rare). Looking at skb->decrypted should be very fast but we can possibly fit some other indication of "are we rekeying" into another already referenced cache line. We definitely don't want to have to look up the record to know what state we're in.
The fallback can't use AES-NI (it's in sirq context) so it's slower than SW encrypt before queuing to TCP. Hence my first thought is using SW crypto for new key and let the traffic we already queued with old key drain leveraging HW crypto. But as I said the impact on performance when not rekeying is more important, and so is driver simplicity.
Right, sorry, full tls_sw path and not the existing fallback.
Changing the socket ops back and forth between the HW and SW variants worries me, because we only lock the socket once we have entered tls_{device,sw}_sendmsg. So I think we have to stay on the _device ops even during the SW crypto phase of the rekey, and let that call into the SW variant after locking the socket and making sure we're in a rekey.
We may need special care to make sure we don't try to encrypt the same packet with both keys. In case a rtx gets stuck somewhere and comes to the NIC after it's already acked (happens surprisingly often).
Don't we have that already? If there's a retransmit while we're setting the TX key in HW, data that was queued on the socket before (and shouldn't be encrypted at all) would also be encrypted otherwise. Or is it different with rekey?
We have a "start marker" record which is supposed to indicate that anything before it has already been encrypted. The driver is programmed with the start seq no, when it sees a packet from before this seq no it checks if a record exists, finds its before the start marker and sends the data as is.
Yes, I was looking into that earlier this week. I think we could reuse a similar mechanism for rekeying. tls_dev_add takes tcp_sk->write_seq, we could have a tls_dev_rekey op passing the new key and new write_seq to the driver. I think we can also reuse the ->eor trick from tls_set_device_offload, and we wouldn't have to look at skb->decrypted. Close and push the current SW record, mark ->eor, pass write_seq to the driver along with the key. Also pretty close to what tls_device_resync_tx does.
[...]
This makes me wonder again if we should have fake offloads on veth (still calling the kernel's crypto library to simulate a device doing the encryption and/or decryption), to make it easy to play with the software bits, without requiring actual hardware that can offload TLS/IPsec/MACsec. But maybe it's too complex to be useful and we'd just waste our time fixing bugs in the fake offload rather than improving the stack.
It should be quite useful. I also usually just hack up veth, but I reckon adding support to netdevsim would be a better fit. We just need a way to tell two netdevsim ports to "connect to each other".
Oh, nice idea. I'll add that to my todo list.
On Thu, 23 Feb 2023 17:27:40 +0100 Sabrina Dubroca wrote:
2023-02-21, 19:19:44 -0800, Jakub Kicinski wrote:
"all data using the old key" needs to be one list of record per old key, since we can have multiple rekeys.
No fully parsing this bit.
We can have multiple rekeys in the time it takes to get an ACK for the first KeyUpdate message to be ACK'ed. I'm not sure why I talked about a "list of records".
But we could have this sequence of records:
recN(k1,hwenc) KeyUpdate(k1,hwenc) // switch to k2 and sw crypto
rec0(k2,swenc) rec1(k2,swenc) KeyUpdate(k2,swenc) rec0(k3,swenc) // receive ACK for KU1, don't enable HW offload for k2 or k3 because we've already switched off k2
rec1(k3,swenc) // receive ACK for KU2, now enable HW offload for k3
rec2(k3,hwenc)
So we'll need to record the most recent TX rekey, and wait until the corresponding KU record is ACK'ed, before we resume offload using the most recent key (and skip possible intermediate keys).
Installing the key in HW and re-enabling the offload will need to happen via the icsk_clean_acked callback. We'll need a workqueue so that we don't actually talk to the driver from softirq.
Installing from icsk_clean_acked won't win us anything, right? We'll only need the key once the next sendmsg() comes, what's pushed to TCP with swenc is already out of our hands.
Then, we have to handle a failure to install the key. Since we're not installing it in HW immediately during setsockopt, notifying userspace of a rekey failure is more complicated. Maybe we can do a rekey_prepare during the setsocktopt, and then the actual rekey is an operation that cannot fail?
TLS offload silently falls back to SW on any errors. So that's fine. Just bump a counter. User/infra must be tracking error counters in our current design already.
Important consideration is making the non-rekey path as fast as possible (given rekeying is extremely rare). Looking at skb->decrypted should be very fast but we can possibly fit some other indication of "are we rekeying" into another already referenced cache line. We definitely don't want to have to look up the record to know what state we're in.
The fallback can't use AES-NI (it's in sirq context) so it's slower than SW encrypt before queuing to TCP. Hence my first thought is using SW crypto for new key and let the traffic we already queued with old key drain leveraging HW crypto. But as I said the impact on performance when not rekeying is more important, and so is driver simplicity.
Right, sorry, full tls_sw path and not the existing fallback.
Changing the socket ops back and forth between the HW and SW variants worries me, because we only lock the socket once we have entered tls_{device,sw}_sendmsg. So I think we have to stay on the _device ops even during the SW crypto phase of the rekey, and let that call into the SW variant after locking the socket and making sure we're in a rekey.
Fair point :S
Don't we have that already? If there's a retransmit while we're setting the TX key in HW, data that was queued on the socket before (and shouldn't be encrypted at all) would also be encrypted otherwise. Or is it different with rekey?
We have a "start marker" record which is supposed to indicate that anything before it has already been encrypted. The driver is programmed with the start seq no, when it sees a packet from before this seq no it checks if a record exists, finds its before the start marker and sends the data as is.
Yes, I was looking into that earlier this week. I think we could reuse a similar mechanism for rekeying. tls_dev_add takes tcp_sk->write_seq, we could have a tls_dev_rekey op passing the new key and new write_seq to the driver. I think we can also reuse the ->eor trick from tls_set_device_offload, and we wouldn't have to look at skb->decrypted. Close and push the current SW record, mark ->eor, pass write_seq to the driver along with the key. Also pretty close to what tls_device_resync_tx does.
That sounds like you'd expose the rekeying logic to the drivers? New op, having to track seq#...
2023-02-23, 09:29:45 -0800, Jakub Kicinski wrote:
On Thu, 23 Feb 2023 17:27:40 +0100 Sabrina Dubroca wrote:
Installing the key in HW and re-enabling the offload will need to happen via the icsk_clean_acked callback. We'll need a workqueue so that we don't actually talk to the driver from softirq.
Installing from icsk_clean_acked won't win us anything, right? We'll only need the key once the next sendmsg() comes, what's pushed to TCP with swenc is already out of our hands.
Avoiding an unpredictable slowdown on the sendmsg() call? We can deal with that later if it turns out to be an issue. I simply didn't think of deferring to the next sendmsg().
Then, we have to handle a failure to install the key. Since we're not installing it in HW immediately during setsockopt, notifying userspace of a rekey failure is more complicated. Maybe we can do a rekey_prepare during the setsocktopt, and then the actual rekey is an operation that cannot fail?
TLS offload silently falls back to SW on any errors. So that's fine. Just bump a counter. User/infra must be tracking error counters in our current design already.
True. User might be a bit surprised by "well it was offloaded and now it's not", but ok.
Important consideration is making the non-rekey path as fast as possible (given rekeying is extremely rare). Looking at skb->decrypted should be very fast but we can possibly fit some other indication of "are we rekeying" into another already referenced cache line. We definitely don't want to have to look up the record to know what state we're in.
The fallback can't use AES-NI (it's in sirq context) so it's slower than SW encrypt before queuing to TCP. Hence my first thought is using SW crypto for new key and let the traffic we already queued with old key drain leveraging HW crypto. But as I said the impact on performance when not rekeying is more important, and so is driver simplicity.
Right, sorry, full tls_sw path and not the existing fallback.
Changing the socket ops back and forth between the HW and SW variants worries me, because we only lock the socket once we have entered tls_{device,sw}_sendmsg. So I think we have to stay on the _device ops even during the SW crypto phase of the rekey, and let that call into the SW variant after locking the socket and making sure we're in a rekey.
Fair point :S
Don't we have that already? If there's a retransmit while we're setting the TX key in HW, data that was queued on the socket before (and shouldn't be encrypted at all) would also be encrypted otherwise. Or is it different with rekey?
We have a "start marker" record which is supposed to indicate that anything before it has already been encrypted. The driver is programmed with the start seq no, when it sees a packet from before this seq no it checks if a record exists, finds its before the start marker and sends the data as is.
Yes, I was looking into that earlier this week. I think we could reuse a similar mechanism for rekeying. tls_dev_add takes tcp_sk->write_seq, we could have a tls_dev_rekey op passing the new key and new write_seq to the driver. I think we can also reuse the ->eor trick from tls_set_device_offload, and we wouldn't have to look at skb->decrypted. Close and push the current SW record, mark ->eor, pass write_seq to the driver along with the key. Also pretty close to what tls_device_resync_tx does.
That sounds like you'd expose the rekeying logic to the drivers? New op, having to track seq#...
Well, we have to call into the drivers to install the key, whether that's a new rekey op, or adding an update argument to ->tls_dev_add, or letting the driver guess that it's a rekey (or ignore that and just install the key if rekey vs initial key isn't a meaningful distinction).
We already feed drivers the seq# with ->tls_dev_add, so passing it for rekeys as well is not a big change.
Does that seem problematic? Adding a rekey op seemed more natural to me than simply using the existing _del + _add ops, but maybe we can get away with just using those two ops.
On Mon, 13 Mar 2023 16:41:36 +0100 Sabrina Dubroca wrote:
Yes, I was looking into that earlier this week. I think we could reuse a similar mechanism for rekeying. tls_dev_add takes tcp_sk->write_seq, we could have a tls_dev_rekey op passing the new key and new write_seq to the driver. I think we can also reuse the ->eor trick from tls_set_device_offload, and we wouldn't have to look at skb->decrypted. Close and push the current SW record, mark ->eor, pass write_seq to the driver along with the key. Also pretty close to what tls_device_resync_tx does.
That sounds like you'd expose the rekeying logic to the drivers? New op, having to track seq#...
Well, we have to call into the drivers to install the key, whether that's a new rekey op, or adding an update argument to ->tls_dev_add, or letting the driver guess that it's a rekey (or ignore that and just install the key if rekey vs initial key isn't a meaningful distinction).
We already feed drivers the seq# with ->tls_dev_add, so passing it for rekeys as well is not a big change.
Does that seem problematic? Adding a rekey op seemed more natural to me than simply using the existing _del + _add ops, but maybe we can get away with just using those two ops.
Theoretically a rekey op is nicer and cleaner. Practically the quality of the driver implementations will vary wildly*, and it's a significant time investment to review all of them. So for non-technical reasons my intuition is that we'd deliver a better overall user experience if we handled the rekey entirely in the core.
Wait for old key to no longer be needed, _del + _add, start using the offload again.
* One vendor submitted a driver claiming support for TLS 1.3, when TLS 1.3 offload was rejected by the core. So this is the level of testing and diligence we're working with :(
2023-03-13, 11:35:10 -0700, Jakub Kicinski wrote:
On Mon, 13 Mar 2023 16:41:36 +0100 Sabrina Dubroca wrote:
Yes, I was looking into that earlier this week. I think we could reuse a similar mechanism for rekeying. tls_dev_add takes tcp_sk->write_seq, we could have a tls_dev_rekey op passing the new key and new write_seq to the driver. I think we can also reuse the ->eor trick from tls_set_device_offload, and we wouldn't have to look at skb->decrypted. Close and push the current SW record, mark ->eor, pass write_seq to the driver along with the key. Also pretty close to what tls_device_resync_tx does.
That sounds like you'd expose the rekeying logic to the drivers? New op, having to track seq#...
Well, we have to call into the drivers to install the key, whether that's a new rekey op, or adding an update argument to ->tls_dev_add, or letting the driver guess that it's a rekey (or ignore that and just install the key if rekey vs initial key isn't a meaningful distinction).
We already feed drivers the seq# with ->tls_dev_add, so passing it for rekeys as well is not a big change.
Does that seem problematic? Adding a rekey op seemed more natural to me than simply using the existing _del + _add ops, but maybe we can get away with just using those two ops.
Theoretically a rekey op is nicer and cleaner. Practically the quality of the driver implementations will vary wildly*, and it's a significant time investment to review all of them. So for non-technical reasons my intuition is that we'd deliver a better overall user experience if we handled the rekey entirely in the core.
Wait for old key to no longer be needed, _del + _add, start using the offload again.
- One vendor submitted a driver claiming support for TLS 1.3, when TLS 1.3 offload was rejected by the core. So this is the level of testing and diligence we're working with :(
:(
Ok, _del + _add then.
I went over the thread to summarize what we've come up with so far:
RX - The existing SW path will handle all records between the KeyUpdate message signaling the change of key and the new key becoming known to the kernel -- those will be queued encrypted, and decrypted in SW as they are read by userspace (once the key is provided, ie same as this patchset) - Call ->tls_dev_del + ->tls_dev_add immediately during setsockopt(TLS_RX)
TX - After setsockopt(TLS_TX), switch to the existing SW path (not the current device_fallback) until we're able to re-enable HW offload - tls_device_{sendmsg,sendpage} will call into tls_sw_{sendmsg,sendpage} under lock_sock to avoid changing socket ops during the rekey while another thread might be waiting on the lock - We only re-enable HW offload (call ->tls_dev_add to install the new key in HW) once all records sent with the old key have been ACKed. At this point, all unacked records are SW-encrypted with the new key, and the old key is unused by both HW and retransmissions. - If there are no unacked records when userspace does setsockopt(TLS_TX), we can (try to) install the new key in HW immediately. - If yet another key has been provided via setsockopt(TLS_TX), we don't install intermediate keys, only the latest. - TCP notifies ktls of ACKs via the icsk_clean_acked callback. In case of a rekey, tls_icsk_clean_acked will record when all data sent with the most recent past key has been sent. The next call to sendmsg/sendpage will install the new key in HW. - We close and push the current SW record before reenabling offload.
If ->tls_dev_add fails to install the new key in HW, we stay in SW mode. We can add a counter to keep track of this.
In addition:
Because we can't change socket ops during a rekey, we'll also have to modify do_tls_setsockopt_conf to check ctx->tx_conf and only call either tls_set_device_offload or tls_set_sw_offload. RX already uses the same ops for both TLS_HW and TLS_SW, so we could switch between HW and SW mode on rekey.
An alternative would be to have a common sendmsg/sendpage which locks the socket and then calls the correct implementation. We'll need that anyway for the offload under rekey case, so that would only add a test to the SW path's ops (compared to the current code). That should allow us to make build_protos a lot simpler.
On Wed, 22 Mar 2023 17:10:25 +0100 Sabrina Dubroca wrote:
Theoretically a rekey op is nicer and cleaner. Practically the quality of the driver implementations will vary wildly*, and it's a significant time investment to review all of them. So for non-technical reasons my intuition is that we'd deliver a better overall user experience if we handled the rekey entirely in the core.
Wait for old key to no longer be needed, _del + _add, start using the offload again.
- One vendor submitted a driver claiming support for TLS 1.3, when TLS 1.3 offload was rejected by the core. So this is the level of testing and diligence we're working with :(
:(
Ok, _del + _add then.
I went over the thread to summarize what we've come up with so far:
RX
- The existing SW path will handle all records between the KeyUpdate message signaling the change of key and the new key becoming known to the kernel -- those will be queued encrypted, and decrypted in SW as they are read by userspace (once the key is provided, ie same as this patchset)
- Call ->tls_dev_del + ->tls_dev_add immediately during setsockopt(TLS_RX)
TX
- After setsockopt(TLS_TX), switch to the existing SW path (not the current device_fallback) until we're able to re-enable HW offload
- tls_device_{sendmsg,sendpage} will call into tls_sw_{sendmsg,sendpage} under lock_sock to avoid changing socket ops during the rekey while another thread might be waiting on the lock
- We only re-enable HW offload (call ->tls_dev_add to install the new key in HW) once all records sent with the old key have been ACKed. At this point, all unacked records are SW-encrypted with the new key, and the old key is unused by both HW and retransmissions.
- If there are no unacked records when userspace does setsockopt(TLS_TX), we can (try to) install the new key in HW immediately.
- If yet another key has been provided via setsockopt(TLS_TX), we don't install intermediate keys, only the latest.
- TCP notifies ktls of ACKs via the icsk_clean_acked callback. In case of a rekey, tls_icsk_clean_acked will record when all data sent with the most recent past key has been sent. The next call to sendmsg/sendpage will install the new key in HW.
- We close and push the current SW record before reenabling offload.
If ->tls_dev_add fails to install the new key in HW, we stay in SW mode. We can add a counter to keep track of this.
SG!
In addition:
Because we can't change socket ops during a rekey, we'll also have to modify do_tls_setsockopt_conf to check ctx->tx_conf and only call either tls_set_device_offload or tls_set_sw_offload. RX already uses the same ops for both TLS_HW and TLS_SW, so we could switch between HW and SW mode on rekey.
An alternative would be to have a common sendmsg/sendpage which locks the socket and then calls the correct implementation. We'll need that anyway for the offload under rekey case, so that would only add a test to the SW path's ops (compared to the current code). That should allow us to make build_protos a lot simpler.
No preference assuming perf is the same.
linux-kselftest-mirror@lists.linaro.org