 
            From: Wilfred Mallawa wilfred.mallawa@wdc.com
During a handshake, an endpoint may specify a maximum record size limit. Currently, the kernel defaults to TLS_MAX_PAYLOAD_SIZE (16KB) for the maximum record size. Meaning that, the outgoing records from the kernel can exceed a lower size negotiated during the handshake. In such a case, the TLS endpoint must send a fatal "record_overflow" alert [1], and thus the record is discarded.
Upcoming Western Digital NVMe-TCP hardware controllers implement TLS support. For these devices, supporting TLS record size negotiation is necessary because the maximum TLS record size supported by the controller is less than the default 16KB currently used by the kernel.
This patch adds support for retrieving the negotiated record size limit during a handshake, and enforcing it at the TLS layer such that outgoing records are no larger than the size negotiated. This patch depends on the respective userspace support in tlshd and GnuTLS [2].
[1] https://www.rfc-editor.org/rfc/rfc8449 [2] https://gitlab.com/gnutls/gnutls/-/merge_requests/2005
Signed-off-by: Wilfred Mallawa wilfred.mallawa@wdc.com --- Changes V3 -> V4: * Added record_size_limit RFC reference to documentation * Always export the record size limit in tls_get_info() * Disallow user space to change the record_size_limit from under us if an open record is pending. * Added record_size_limit minimum size check as per RFC * Allow space for the ContentType byte for TLS 1.3. The expected behaviour is that userspace directly uses the negotiated record_size_limit, kernel will limit the plaintext buffer size appropirately. * New patch to add self-tests. --- Documentation/networking/tls.rst | 12 +++++ include/net/tls.h | 5 +++ include/uapi/linux/tls.h | 2 + net/tls/tls_device.c | 2 +- net/tls/tls_main.c | 75 ++++++++++++++++++++++++++++++++ net/tls/tls_sw.c | 2 +- 6 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst index 36cc7afc2527..d24bf8911bb8 100644 --- a/Documentation/networking/tls.rst +++ b/Documentation/networking/tls.rst @@ -280,6 +280,18 @@ If the record decrypted turns out to had been padded or is not a data record it will be decrypted again into a kernel buffer without zero copy. Such events are counted in the ``TlsDecryptRetry`` statistic.
+TLS_TX_RECORD_SIZE_LIM +~~~~~~~~~~~~~~~~~~~~~~ + +Sets the maximum size for the plaintext of a protected record. + +The provided value should correspond to the limit negotiated during the TLS +handshake via the `record_size_limit` extension (RFC 8449)[1]. When this +option is set, the kernel enforces this limit on all transmitted TLS records, +ensuring no plaintext fragment exceeds the specified size. + +[1] https://datatracker.ietf.org/doc/html/rfc8449 + Statistics ==========
diff --git a/include/net/tls.h b/include/net/tls.h index 857340338b69..32f053770ec4 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -53,6 +53,8 @@ struct tls_rec;
/* Maximum data size carried in a TLS record */ #define TLS_MAX_PAYLOAD_SIZE ((size_t)1 << 14) +/* Minimum record size limit as per RFC8449 */ +#define TLS_MIN_RECORD_SIZE_LIM ((size_t)1 << 6)
#define TLS_HEADER_SIZE 5 #define TLS_NONCE_OFFSET TLS_HEADER_SIZE @@ -226,6 +228,9 @@ struct tls_context { u8 rx_conf:3; u8 zerocopy_sendfile:1; u8 rx_no_pad:1; + u16 tx_record_size_limit; /* Max plaintext fragment size. For TLS 1.3, + * this excludes the ContentType. + */
int (*push_pending_record)(struct sock *sk, int flags); void (*sk_write_space)(struct sock *sk); diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h index b66a800389cc..3add266d5916 100644 --- a/include/uapi/linux/tls.h +++ b/include/uapi/linux/tls.h @@ -41,6 +41,7 @@ #define TLS_RX 2 /* Set receive parameters */ #define TLS_TX_ZEROCOPY_RO 3 /* TX zerocopy (only sendfile now) */ #define TLS_RX_EXPECT_NO_PAD 4 /* Attempt opportunistic zero-copy */ +#define TLS_TX_RECORD_SIZE_LIM 5 /* Maximum record size */
/* Supported versions */ #define TLS_VERSION_MINOR(ver) ((ver) & 0xFF) @@ -194,6 +195,7 @@ enum { TLS_INFO_RXCONF, TLS_INFO_ZC_RO_TX, TLS_INFO_RX_NO_PAD, + TLS_INFO_TX_RECORD_SIZE_LIM, __TLS_INFO_MAX, }; #define TLS_INFO_MAX (__TLS_INFO_MAX - 1) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index f672a62a9a52..bf16ceb41dde 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -459,7 +459,7 @@ static int tls_push_data(struct sock *sk, /* TLS_HEADER_SIZE is not counted as part of the TLS record, and * we need to leave room for an authentication tag. */ - max_open_record_len = TLS_MAX_PAYLOAD_SIZE + + max_open_record_len = tls_ctx->tx_record_size_limit + prot->prepend_size; do { rc = tls_do_allocation(sk, ctx, pfrag, prot->prepend_size); diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index a3ccb3135e51..09883d9c6c96 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -544,6 +544,31 @@ static int do_tls_getsockopt_no_pad(struct sock *sk, char __user *optval, return 0; }
+static int do_tls_getsockopt_tx_record_size(struct sock *sk, char __user *optval, + int __user *optlen) +{ + struct tls_context *ctx = tls_get_ctx(sk); + int len; + /* TLS 1.3: Record length contains ContentType */ + u16 record_size_limit = ctx->prot_info.version == TLS_1_3_VERSION ? + ctx->tx_record_size_limit + 1 : + ctx->tx_record_size_limit; + + if (get_user(len, optlen)) + return -EFAULT; + + if (len < sizeof(record_size_limit)) + return -EINVAL; + + if (put_user(sizeof(record_size_limit), optlen)) + return -EFAULT; + + if (copy_to_user(optval, &record_size_limit, sizeof(record_size_limit))) + return -EFAULT; + + return 0; +} + static int do_tls_getsockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen) { @@ -563,6 +588,9 @@ static int do_tls_getsockopt(struct sock *sk, int optname, case TLS_RX_EXPECT_NO_PAD: rc = do_tls_getsockopt_no_pad(sk, optval, optlen); break; + case TLS_TX_RECORD_SIZE_LIM: + rc = do_tls_getsockopt_tx_record_size(sk, optval, optlen); + break; default: rc = -ENOPROTOOPT; break; @@ -812,6 +840,43 @@ static int do_tls_setsockopt_no_pad(struct sock *sk, sockptr_t optval, return rc; }
+static int do_tls_setsockopt_tx_record_size(struct sock *sk, sockptr_t optval, + unsigned int optlen) +{ + struct tls_context *ctx = tls_get_ctx(sk); + struct tls_sw_context_tx *sw_ctx = tls_sw_ctx_tx(ctx); + u16 value; + + if (sw_ctx->open_rec) + return -EBUSY; + + if (sockptr_is_null(optval) || optlen != sizeof(value)) + return -EINVAL; + + if (copy_from_sockptr(&value, optval, sizeof(value))) + return -EFAULT; + + if (value < TLS_MIN_RECORD_SIZE_LIM) + return -EINVAL; + + if (ctx->prot_info.version == TLS_1_2_VERSION && + value > TLS_MAX_PAYLOAD_SIZE) + return -EINVAL; + + if (ctx->prot_info.version == TLS_1_3_VERSION && + value - 1 > TLS_MAX_PAYLOAD_SIZE) + return -EINVAL; + + /* + * For TLS 1.3: 'value' includes one byte for the appended ContentType. + * Adjust the kernel's internal plaintext limit accordingly. + */ + ctx->tx_record_size_limit = ctx->prot_info.version == TLS_1_3_VERSION ? + value - 1 : value; + + return 0; +} + static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, unsigned int optlen) { @@ -833,6 +898,9 @@ static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, case TLS_RX_EXPECT_NO_PAD: rc = do_tls_setsockopt_no_pad(sk, optval, optlen); break; + case TLS_TX_RECORD_SIZE_LIM: + rc = do_tls_setsockopt_tx_record_size(sk, optval, optlen); + break; default: rc = -ENOPROTOOPT; break; @@ -1022,6 +1090,7 @@ static int tls_init(struct sock *sk)
ctx->tx_conf = TLS_BASE; ctx->rx_conf = TLS_BASE; + ctx->tx_record_size_limit = TLS_MAX_PAYLOAD_SIZE; update_sk_prot(sk, ctx); out: write_unlock_bh(&sk->sk_callback_lock); @@ -1111,6 +1180,11 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin) goto nla_failure; }
+ err = nla_put_u16(skb, TLS_INFO_TX_RECORD_SIZE_LIM, + ctx->tx_record_size_limit); + if (err) + goto nla_failure; + rcu_read_unlock(); nla_nest_end(skb, start); return 0; @@ -1132,6 +1206,7 @@ static size_t tls_get_info_size(const struct sock *sk, bool net_admin) nla_total_size(sizeof(u16)) + /* TLS_INFO_TXCONF */ nla_total_size(0) + /* TLS_INFO_ZC_RO_TX */ nla_total_size(0) + /* TLS_INFO_RX_NO_PAD */ + nla_total_size(sizeof(u16)) + /* TLS_INFO_TX_RECORD_SIZE_LIM */ 0;
return size; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index bac65d0d4e3e..28fb796573d1 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1079,7 +1079,7 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, orig_size = msg_pl->sg.size; full_record = false; try_to_copy = msg_data_left(msg); - record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size; + record_room = tls_ctx->tx_record_size_limit - msg_pl->sg.size; if (try_to_copy >= record_room) { try_to_copy = record_room; full_record = true;
 
            From: Wilfred Mallawa wilfred.mallawa@wdc.com
Test that outgoing plaintext records respect the tls record_size_limit set using setsockopt(). The record size limit is set to be 128, thus, in all received records, the plaintext must not exceed this amount.
Also test that setting a new record size limit whilst a pending open record exists is handled correctly by discarding the request.
Suggested-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Wilfred Mallawa wilfred.mallawa@wdc.com --- tools/testing/selftests/net/tls.c | 149 ++++++++++++++++++++++++++++++ 1 file changed, 149 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index 0f5640d8dc7f..c5bd431d5af3 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -24,6 +24,7 @@ #include "../kselftest_harness.h"
#define TLS_PAYLOAD_MAX_LEN 16384 +#define TLS_TX_RECORD_SIZE_LIM 5 #define SOL_TLS 282
static int fips_enabled; @@ -2770,6 +2771,154 @@ TEST_F(tls_err, poll_partial_rec_async) } }
+/* + * Parse a stream of TLS records and ensure that each record respects + * the specified @record_size_limit. + */ +static size_t parse_tls_records(struct __test_metadata *_metadata, + const __u8 *rx_buf, int rx_len, int overhead, + __u16 record_size_limit) +{ + const __u8 *rec = rx_buf; + size_t total_plaintext_rx = 0; + const __u8 rec_header_len = 5; + + while (rec < rx_buf + rx_len) { + __u16 record_payload_len; + __u16 plaintext_len; + + /* Sanity check that it's a TLS header for application data */ + ASSERT_EQ(rec[0], 23); + ASSERT_EQ(rec[1], 0x3); + ASSERT_EQ(rec[2], 0x3); + + memcpy(&record_payload_len, rec + 3, 2); + record_payload_len = ntohs(record_payload_len); + ASSERT_GE(record_payload_len, overhead); + + plaintext_len = record_payload_len - overhead; + total_plaintext_rx += plaintext_len; + + /* Plaintext must not exceed the specified limit */ + ASSERT_LE(plaintext_len, record_size_limit); + rec += rec_header_len + record_payload_len; + } + + return total_plaintext_rx; +} + +TEST(tx_record_size) +{ + struct tls_crypto_info_keys tls12; + int cfd, ret, fd, rx_len, overhead; + size_t total_plaintext_rx = 0; + __u8 tx[1024], rx[2000]; + __u8 *rec; + __u16 limit = 128; + __u16 opt = 0; + __u8 rec_header_len = 5; + unsigned int optlen = sizeof(opt); + bool notls; + + tls_crypto_info_init(TLS_1_2_VERSION, TLS_CIPHER_AES_CCM_128, + &tls12, 0); + + ulp_sock_pair(_metadata, &fd, &cfd, ¬ls); + + if (notls) + exit(KSFT_SKIP); + + /* Don't install keys on fd, we'll parse raw records */ + ret = setsockopt(cfd, SOL_TLS, TLS_TX, &tls12, tls12.len); + ASSERT_EQ(ret, 0); + + ret = setsockopt(cfd, SOL_TLS, TLS_TX_RECORD_SIZE_LIM, &limit, sizeof(limit)); + ASSERT_EQ(ret, 0); + + ret = getsockopt(cfd, SOL_TLS, TLS_TX_RECORD_SIZE_LIM, &opt, &optlen); + ASSERT_EQ(ret, 0); + ASSERT_EQ(limit, opt); + ASSERT_EQ(optlen, sizeof(limit)); + + memset(tx, 0, sizeof(tx)); + EXPECT_EQ(send(cfd, tx, sizeof(tx), 0), sizeof(tx)); + close(cfd); + + ret = recv(fd, rx, sizeof(rx), 0); + memcpy(&rx_len, rx + 3, 2); + rx_len = htons(rx_len); + + /* + * 16B tag + 8B IV -- record header (5B) is not counted but we'll + * need it to walk the record stream + */ + overhead = 16 + 8; + total_plaintext_rx = parse_tls_records(_metadata, rx, ret, overhead, + limit); + + ASSERT_EQ(total_plaintext_rx, sizeof(tx)); + close(fd); +} + +TEST(tx_record_size_open_rec) +{ + struct tls_crypto_info_keys tls12; + int cfd, ret, fd, rx_len, overhead; + size_t total_plaintext_rx = 0; + __u8 tx[1024], rx[2000]; + __u16 tx_partial = 256; + __u8 *rec; + __u16 og_limit = 512, limit = 128; + __u8 rec_header_len = 5; + bool notls; + + tls_crypto_info_init(TLS_1_2_VERSION, TLS_CIPHER_AES_CCM_128, + &tls12, 0); + + ulp_sock_pair(_metadata, &fd, &cfd, ¬ls); + + if (notls) + exit(KSFT_SKIP); + + /* Don't install keys on fd, we'll parse raw records */ + ret = setsockopt(cfd, SOL_TLS, TLS_TX, &tls12, tls12.len); + ASSERT_EQ(ret, 0); + + ret = setsockopt(cfd, SOL_TLS, TLS_TX_RECORD_SIZE_LIM, &og_limit, + sizeof(og_limit)); + ASSERT_EQ(ret, 0); + + memset(tx, 0, sizeof(tx)); + EXPECT_EQ(send(cfd, tx, tx_partial, MSG_MORE), tx_partial); + + /* + * Changing the record size limit with a pending open record should + * not be allowed. + */ + ret = setsockopt(cfd, SOL_TLS, TLS_TX_RECORD_SIZE_LIM, &limit, + sizeof(limit)); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EBUSY); + + EXPECT_EQ(send(cfd, tx + tx_partial, sizeof(tx) - tx_partial, MSG_EOR), + sizeof(tx) - tx_partial); + close(cfd); + + ret = recv(fd, rx, sizeof(rx), 0); + memcpy(&rx_len, rx + 3, 2); + rx_len = htons(rx_len); + + /* + * 16B tag + 8B IV -- record header (5B) is not counted but we'll + * need it to walk the record stream + */ + overhead = 16 + 8; + total_plaintext_rx = parse_tls_records(_metadata, rx, ret, overhead, + og_limit); + ASSERT_EQ(total_plaintext_rx, sizeof(tx)); + close(fd); +} + TEST(non_established) { struct tls12_crypto_info_aes_gcm_256 tls12; struct sockaddr_in addr;
 
            [got a bit distracted while writing this so Simon got to the process stuff before me, but I'll leave it in:]
BTW, a few details about process: since this is a new feature, the subject prefix should be [PATCH net-next v4 n/m] (new stuff targets the net-next tree), and the patches should be based on the net-next tree [1] (I'm not sure what you based this on, git am complained on both net and net-next for this patch). More info about this in the docs [2].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/ [2] https://docs.kernel.org/process/maintainer-netdev.html (in case you're not aware: also note the bits about "merge window" which will quite likely become relevant in a few days)
2025-09-23, 15:32:07 +1000, Wilfred Mallawa wrote:
From: Wilfred Mallawa wilfred.mallawa@wdc.com
Test that outgoing plaintext records respect the tls record_size_limit set using setsockopt(). The record size limit is set to be 128, thus, in all received records, the plaintext must not exceed this amount.
Also test that setting a new record size limit whilst a pending open record exists is handled correctly by discarding the request.
Suggested-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Wilfred Mallawa wilfred.mallawa@wdc.com
Thanks for adding this patch. (and for the tag :))
tools/testing/selftests/net/tls.c | 149 ++++++++++++++++++++++++++++++ 1 file changed, 149 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index 0f5640d8dc7f..c5bd431d5af3 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -24,6 +24,7 @@ #include "../kselftest_harness.h" #define TLS_PAYLOAD_MAX_LEN 16384 +#define TLS_TX_RECORD_SIZE_LIM 5
nit: That should not be needed if you run `make headers_install` before compiling the selftest:
make -s headers_install ; make -C tools/testing/selftests/net tls make: Entering directory '/home/sab/linux/net/tools/testing/selftests/net' gcc -Wall -Wl,--no-as-needed -O2 -g -I../../../../usr/include/ -isystem /home/sab/linux/net/tools/testing/selftests/../../../usr/include -I../ -D_GNU_SOURCE= tls.c -o tls
and that will find the new constant defined in the previous patch using the headers from the current kernel tree, instead of those in the system.
[...]
+TEST(tx_record_size) +{
- struct tls_crypto_info_keys tls12;
- int cfd, ret, fd, rx_len, overhead;
- size_t total_plaintext_rx = 0;
- __u8 tx[1024], rx[2000];
- __u8 *rec;
- __u16 limit = 128;
- __u16 opt = 0;
- __u8 rec_header_len = 5;
gcc complains about unused variables, I guess leftovers from extracting parse_tls_records:
tls.c: In function ‘tx_record_size’: tls.c:2840:14: warning: unused variable ‘rec_header_len’ [-Wunused-variable] 2840 | __u8 rec_header_len = 5; | ^~~~~~~~~~~~~~ tls.c:2837:15: warning: unused variable ‘rec’ [-Wunused-variable] 2837 | __u8 *rec; | ^~~ tls.c: In function ‘tx_record_size_open_rec’: tls.c:2893:14: warning: unused variable ‘rec_header_len’ [-Wunused-variable] 2893 | __u8 rec_header_len = 5; | ^~~~~~~~~~~~~~ tls.c:2891:15: warning: unused variable ‘rec’ [-Wunused-variable] 2891 | __u8 *rec; | ^~~
- unsigned int optlen = sizeof(opt);
- bool notls;
- tls_crypto_info_init(TLS_1_2_VERSION, TLS_CIPHER_AES_CCM_128,
&tls12, 0);- ulp_sock_pair(_metadata, &fd, &cfd, ¬ls);
- if (notls)
exit(KSFT_SKIP);- /* Don't install keys on fd, we'll parse raw records */
- ret = setsockopt(cfd, SOL_TLS, TLS_TX, &tls12, tls12.len);
- ASSERT_EQ(ret, 0);
- ret = setsockopt(cfd, SOL_TLS, TLS_TX_RECORD_SIZE_LIM, &limit, sizeof(limit));
- ASSERT_EQ(ret, 0);
- ret = getsockopt(cfd, SOL_TLS, TLS_TX_RECORD_SIZE_LIM, &opt, &optlen);
- ASSERT_EQ(ret, 0);
- ASSERT_EQ(limit, opt);
- ASSERT_EQ(optlen, sizeof(limit));
nit: Maybe a few of those should be EXPECT_EQ? (ASSERT_* stops the test, EXPECT_* will run the rest of the test)
Getting the wrong value back from this getsockopt is worth noting but there's value in running the traffic through anyway?
- memset(tx, 0, sizeof(tx));
- EXPECT_EQ(send(cfd, tx, sizeof(tx), 0), sizeof(tx));
But this one should maybe be an ASSERT because trying to parse records from whatever data we managed to send (if any) may not make much sense?
(just some thoughts, this is not a "strict requirement" to change anything in the patch)
- close(cfd);
- ret = recv(fd, rx, sizeof(rx), 0);
- memcpy(&rx_len, rx + 3, 2);
- rx_len = htons(rx_len);
nit: set but not used (also in tx_record_size_open_rec)
 
            On Wed, 2025-09-24 at 19:50 +0200, Sabrina Dubroca wrote:
[got a bit distracted while writing this so Simon got to the process stuff before me, but I'll leave it in:]
BTW, a few details about process: since this is a new feature, the subject prefix should be [PATCH net-next v4 n/m] (new stuff targets the net-next tree), and the patches should be based on the net-next tree [1] (I'm not sure what you based this on, git am complained on both net and net-next for this patch). More info about this in the docs [2].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/ [2] https://docs.kernel.org/process/maintainer-netdev.html (in case you're not aware: also note the bits about "merge window" which will quite likely become relevant in a few days)
Thanks! I will rebase this on [1] for V5 with the changes you specified.
2025-09-23, 15:32:07 +1000, Wilfred Mallawa wrote:
From: Wilfred Mallawa wilfred.mallawa@wdc.com
Test that outgoing plaintext records respect the tls record_size_limit set using setsockopt(). The record size limit is set to be 128, thus, in all received records, the plaintext must not exceed this amount.
Also test that setting a new record size limit whilst a pending open record exists is handled correctly by discarding the request.
Suggested-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: Wilfred Mallawa wilfred.mallawa@wdc.com
Thanks for adding this patch. (and for the tag :))
tools/testing/selftests/net/tls.c | 149 ++++++++++++++++++++++++++++++ 1 file changed, 149 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c index 0f5640d8dc7f..c5bd431d5af3 100644 --- a/tools/testing/selftests/net/tls.c +++ b/tools/testing/selftests/net/tls.c @@ -24,6 +24,7 @@ #include "../kselftest_harness.h" #define TLS_PAYLOAD_MAX_LEN 16384 +#define TLS_TX_RECORD_SIZE_LIM 5
nit: That should not be needed if you run `make headers_install` before compiling the selftest:
make -s headers_install ; make -C tools/testing/selftests/net tls make: Entering directory '/home/sab/linux/net/tools/testing/selftests/net' gcc -Wall -Wl,--no-as-needed -O2 -g -I../../../../usr/include/ - isystem /home/sab/linux/net/tools/testing/selftests/../../../usr/include - I../ -D_GNU_SOURCE= tls.c -o tls
and that will find the new constant defined in the previous patch using the headers from the current kernel tree, instead of those in the system.
Thanks!
[...]
+TEST(tx_record_size) +{
- struct tls_crypto_info_keys tls12;
- int cfd, ret, fd, rx_len, overhead;
- size_t total_plaintext_rx = 0;
- __u8 tx[1024], rx[2000];
- __u8 *rec;
- __u16 limit = 128;
- __u16 opt = 0;
- __u8 rec_header_len = 5;
gcc complains about unused variables, I guess leftovers from extracting parse_tls_records:
tls.c: In function ‘tx_record_size’: tls.c:2840:14: warning: unused variable ‘rec_header_len’ [-Wunused- variable] 2840 | __u8 rec_header_len = 5; | ^~~~~~~~~~~~~~ tls.c:2837:15: warning: unused variable ‘rec’ [-Wunused-variable] 2837 | __u8 *rec; | ^~~ tls.c: In function ‘tx_record_size_open_rec’: tls.c:2893:14: warning: unused variable ‘rec_header_len’ [-Wunused- variable] 2893 | __u8 rec_header_len = 5; | ^~~~~~~~~~~~~~ tls.c:2891:15: warning: unused variable ‘rec’ [-Wunused-variable] 2891 | __u8 *rec; | ^~~
- unsigned int optlen = sizeof(opt);
- bool notls;
- tls_crypto_info_init(TLS_1_2_VERSION,
TLS_CIPHER_AES_CCM_128,
&tls12, 0);- ulp_sock_pair(_metadata, &fd, &cfd, ¬ls);
- if (notls)
exit(KSFT_SKIP);- /* Don't install keys on fd, we'll parse raw records */
- ret = setsockopt(cfd, SOL_TLS, TLS_TX, &tls12, tls12.len);
- ASSERT_EQ(ret, 0);
- ret = setsockopt(cfd, SOL_TLS, TLS_TX_RECORD_SIZE_LIM,
&limit, sizeof(limit));
- ASSERT_EQ(ret, 0);
- ret = getsockopt(cfd, SOL_TLS, TLS_TX_RECORD_SIZE_LIM,
&opt, &optlen);
- ASSERT_EQ(ret, 0);
- ASSERT_EQ(limit, opt);
- ASSERT_EQ(optlen, sizeof(limit));
nit: Maybe a few of those should be EXPECT_EQ? (ASSERT_* stops the test, EXPECT_* will run the rest of the test)
Getting the wrong value back from this getsockopt is worth noting but there's value in running the traffic through anyway?
- memset(tx, 0, sizeof(tx));
- EXPECT_EQ(send(cfd, tx, sizeof(tx), 0), sizeof(tx));
But this one should maybe be an ASSERT because trying to parse records from whatever data we managed to send (if any) may not make much sense?
(just some thoughts, this is not a "strict requirement" to change anything in the patch)
Good points, I think that makes more sense.
Regards, Wilfred
 
            On Tue, Sep 23, 2025 at 03:32:06PM +1000, Wilfred Mallawa wrote:
From: Wilfred Mallawa wilfred.mallawa@wdc.com
During a handshake, an endpoint may specify a maximum record size limit. Currently, the kernel defaults to TLS_MAX_PAYLOAD_SIZE (16KB) for the maximum record size. Meaning that, the outgoing records from the kernel can exceed a lower size negotiated during the handshake. In such a case, the TLS endpoint must send a fatal "record_overflow" alert [1], and thus the record is discarded.
Upcoming Western Digital NVMe-TCP hardware controllers implement TLS support. For these devices, supporting TLS record size negotiation is necessary because the maximum TLS record size supported by the controller is less than the default 16KB currently used by the kernel.
This patch adds support for retrieving the negotiated record size limit during a handshake, and enforcing it at the TLS layer such that outgoing records are no larger than the size negotiated. This patch depends on the respective userspace support in tlshd and GnuTLS [2].
[1] https://www.rfc-editor.org/rfc/rfc8449 [2] https://gitlab.com/gnutls/gnutls/-/merge_requests/2005
Signed-off-by: Wilfred Mallawa wilfred.mallawa@wdc.com
Changes V3 -> V4: * Added record_size_limit RFC reference to documentation * Always export the record size limit in tls_get_info() * Disallow user space to change the record_size_limit from under us if an open record is pending. * Added record_size_limit minimum size check as per RFC * Allow space for the ContentType byte for TLS 1.3. The expected behaviour is that userspace directly uses the negotiated record_size_limit, kernel will limit the plaintext buffer size appropirately. * New patch to add self-tests.
Hi Wilfred,
Unfortunately this series doesn't apply cleanly against current net-next. So you will need to rebase and repost after waiting for some more meaningful review from others.
Also, please include net-next in the subject, assuming that is the target tree.
Subject: [PATCH net-next v5 1/2] ...
See: https://docs.kernel.org/process/maintainer-netdev.html
Thanks!
...
 
            On Wed, 2025-09-24 at 18:03 +0100, Simon Horman wrote:
[...]
Hi Wilfred,
Unfortunately this series doesn't apply cleanly against current net- next. So you will need to rebase and repost after waiting for some more meaningful review from others.
Also, please include net-next in the subject, assuming that is the target tree.
Subject: [PATCH net-next v5 1/2] ...
Hey Simon,
Indeed, I incorrectly did not base this on [1], will fixup for V5. Thanks!
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
Regards, Wilfred
Thanks!
...
 
            2025-09-23, 15:32:06 +1000, Wilfred Mallawa wrote:
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index a3ccb3135e51..09883d9c6c96 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -544,6 +544,31 @@ static int do_tls_getsockopt_no_pad(struct sock *sk, char __user *optval, return 0; } +static int do_tls_getsockopt_tx_record_size(struct sock *sk, char __user *optval,
int __user *optlen)+{
- struct tls_context *ctx = tls_get_ctx(sk);
- int len;
- /* TLS 1.3: Record length contains ContentType */
- u16 record_size_limit = ctx->prot_info.version == TLS_1_3_VERSION ?
ctx->tx_record_size_limit + 1 :
ctx->tx_record_size_limit;
nit: reverse xmas tree
[...]
+static int do_tls_setsockopt_tx_record_size(struct sock *sk, sockptr_t optval,
unsigned int optlen)+{
- struct tls_context *ctx = tls_get_ctx(sk);
- struct tls_sw_context_tx *sw_ctx = tls_sw_ctx_tx(ctx);
- u16 value;
- if (sw_ctx->open_rec)
return -EBUSY;- if (sockptr_is_null(optval) || optlen != sizeof(value))
return -EINVAL;- if (copy_from_sockptr(&value, optval, sizeof(value)))
return -EFAULT;- if (value < TLS_MIN_RECORD_SIZE_LIM)
return -EINVAL;- if (ctx->prot_info.version == TLS_1_2_VERSION &&
value > TLS_MAX_PAYLOAD_SIZE)
return -EINVAL;- if (ctx->prot_info.version == TLS_1_3_VERSION &&
value - 1 > TLS_MAX_PAYLOAD_SIZE)
return -EINVAL;- /*
* For TLS 1.3: 'value' includes one byte for the appended ContentType.
* Adjust the kernel's internal plaintext limit accordingly.
*/- ctx->tx_record_size_limit = ctx->prot_info.version == TLS_1_3_VERSION ?
value - 1 : value;- return 0;
+}
static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, unsigned int optlen) { @@ -833,6 +898,9 @@ static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, case TLS_RX_EXPECT_NO_PAD: rc = do_tls_setsockopt_no_pad(sk, optval, optlen); break;
- case TLS_TX_RECORD_SIZE_LIM:
rc = do_tls_setsockopt_tx_record_size(sk, optval, optlen);
I think we want to lock the socket here, to avoid any concurrent send()? Especially now with the ->open_rec check.
@@ -1111,6 +1180,11 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin) goto nla_failure; }
- err = nla_put_u16(skb, TLS_INFO_TX_RECORD_SIZE_LIM,
ctx->tx_record_size_limit);
I'm not sure here: if we do the +1 adjustment we'd be consistent with the value reported by getsockopt, but OTOH users may get confused about seeing a value larger than TLS_MAX_PAYLOAD_SIZE.
 
            On Wed, 2025-09-24 at 19:50 +0200, Sabrina Dubroca wrote:
[...]
static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, unsigned int optlen) { @@ -833,6 +898,9 @@ static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval, case TLS_RX_EXPECT_NO_PAD: rc = do_tls_setsockopt_no_pad(sk, optval, optlen); break;
- case TLS_TX_RECORD_SIZE_LIM:
rc = do_tls_setsockopt_tx_record_size(sk, optval,optlen);
I think we want to lock the socket here, to avoid any concurrent send()? Especially now with the ->open_rec check.
Yeah that's a good point, will fixup!
@@ -1111,6 +1180,11 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin) goto nla_failure; }
- err = nla_put_u16(skb, TLS_INFO_TX_RECORD_SIZE_LIM,
ctx->tx_record_size_limit);I'm not sure here: if we do the +1 adjustment we'd be consistent with the value reported by getsockopt, but OTOH users may get confused about seeing a value larger than TLS_MAX_PAYLOAD_SIZE.
Makes sense to keep the behaviour the same as getsockopt() right? So add the +1 changes here based on version (same as getsockopt()). In which case, it should never exceed TLS_MAX_PAYLOAD_SIZE.
Regards, Wilfred
 
            2025-09-25, 05:39:14 +0000, Wilfred Mallawa wrote:
On Wed, 2025-09-24 at 19:50 +0200, Sabrina Dubroca wrote:
@@ -1111,6 +1180,11 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin) goto nla_failure; }
- err = nla_put_u16(skb, TLS_INFO_TX_RECORD_SIZE_LIM,
ctx->tx_record_size_limit);I'm not sure here: if we do the +1 adjustment we'd be consistent with the value reported by getsockopt, but OTOH users may get confused about seeing a value larger than TLS_MAX_PAYLOAD_SIZE.
Makes sense to keep the behaviour the same as getsockopt() right? So add the +1 changes here based on version (same as getsockopt()). In which case, it should never exceed TLS_MAX_PAYLOAD_SIZE.
The max value for 1.3 is TLS_MAX_PAYLOAD_SIZE+1 (after adjustment), since it's the max value that will be accepted by setsockopt (after passing the "value - 1 > TLS_MAX_PAYLOAD_SIZE" check). And it's the value most users will see since it's the default.
 
            On Thu, 2025-09-25 at 23:29 +0200, Sabrina Dubroca wrote:
2025-09-25, 05:39:14 +0000, Wilfred Mallawa wrote:
On Wed, 2025-09-24 at 19:50 +0200, Sabrina Dubroca wrote:
@@ -1111,6 +1180,11 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin) goto nla_failure; }
- err = nla_put_u16(skb, TLS_INFO_TX_RECORD_SIZE_LIM,
ctx->tx_record_size_limit);I'm not sure here: if we do the +1 adjustment we'd be consistent with the value reported by getsockopt, but OTOH users may get confused about seeing a value larger than TLS_MAX_PAYLOAD_SIZE.
Makes sense to keep the behaviour the same as getsockopt() right? So add the +1 changes here based on version (same as getsockopt()). In which case, it should never exceed TLS_MAX_PAYLOAD_SIZE.
The max value for 1.3 is TLS_MAX_PAYLOAD_SIZE+1 (after adjustment), since it's the max value that will be accepted by setsockopt (after passing the "value - 1 > TLS_MAX_PAYLOAD_SIZE" check). And it's the value most users will see since it's the default.
Ah I see what you mean. In regards to "but OTOH users may get confused about seeing a value larger than TLS_MAX_PAYLOAD_SIZE.", do you think it's sufficient to document TLS_MAX_PAYLOAD_SIZE and specify that for TLS 1.3 this doesn't include the ContentType byte?
Wilfred
 
            2025-09-25, 23:37:09 +0000, Wilfred Mallawa wrote:
On Thu, 2025-09-25 at 23:29 +0200, Sabrina Dubroca wrote:
2025-09-25, 05:39:14 +0000, Wilfred Mallawa wrote:
On Wed, 2025-09-24 at 19:50 +0200, Sabrina Dubroca wrote:
@@ -1111,6 +1180,11 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin) goto nla_failure; }
- err = nla_put_u16(skb, TLS_INFO_TX_RECORD_SIZE_LIM,
ctx->tx_record_size_limit);I'm not sure here: if we do the +1 adjustment we'd be consistent with the value reported by getsockopt, but OTOH users may get confused about seeing a value larger than TLS_MAX_PAYLOAD_SIZE.
Makes sense to keep the behaviour the same as getsockopt() right? So add the +1 changes here based on version (same as getsockopt()). In which case, it should never exceed TLS_MAX_PAYLOAD_SIZE.
The max value for 1.3 is TLS_MAX_PAYLOAD_SIZE+1 (after adjustment), since it's the max value that will be accepted by setsockopt (after passing the "value - 1 > TLS_MAX_PAYLOAD_SIZE" check). And it's the value most users will see since it's the default.
Ah I see what you mean. In regards to "but OTOH users may get confused about seeing a value larger than TLS_MAX_PAYLOAD_SIZE.", do you think it's sufficient to document TLS_MAX_PAYLOAD_SIZE and specify that for TLS 1.3 this doesn't include the ContentType byte?
I guess it will have to do. Otherwise, unless someone has another idea, we're back to the discussion on v3 (ie setting the actual payload size instead of the record limit).
linux-kselftest-mirror@lists.linaro.org



