Re: [PATCH net-next v2 0/5] tls: implement key updates for TLS1.3

22 Feb 2023


      Sorry for the delay, long weekend + merge window.
On Thu, 16 Feb 2023 17:23:19 +0100 Sabrina Dubroca wrote:
...
2023-02-15, 19:57:48 -0800, Jakub Kicinski wrote:
...
On Thu, 16 Feb 2023 00:23:11 +0100 Sabrina Dubroca wrote:
...
I'm not sure we can come up with the correct uAPI/rekey design without
trying to implement rekey with offload and seeing how that blows up
(and possibly in different ways with different devices).
Yes, best we can do now is have a plan in place... and your promise 
of future help? :) (incl. being on the lookout for when the patches 
come because I'll probably forget)
I'll try to help. None of us can guarantee that we'll be around :)
Right.
...
...
I think we could try to switch to SW crypto on Tx until all data using
old key is ACK'ed, drivers can look at skb->decrypted to skip touching
the transitional skbs. Then remove old key, install new one, resume
offload.
"all data using the old key" needs to be one list of record per old
key, since we can have multiple rekeys.
No fully parsing this bit.
...
Could we install the new key in HW a bit earlier? Keep the old key as
SW fallback for rtx, but the driver installs the new key when the
corresponding KeyUpdate record has gone through and tells the stack to
stop doing SW crypto? I'm not sure that'd be a significant improvement
in the standard case, though.
Important consideration is making the non-rekey path as fast as
possible (given rekeying is extremely rare). Looking at skb->decrypted
should be very fast but we can possibly fit some other indication of
"are we rekeying" into another already referenced cache line.
We definitely don't want to have to look up the record to know what
state we're in.
The fallback can't use AES-NI (it's in sirq context) so it's slower 
than SW encrypt before queuing to TCP. Hence my first thought is using
SW crypto for new key and let the traffic we already queued with old
key drain leveraging HW crypto. But as I said the impact on performance
when not rekeying is more important, and so is driver simplicity.
...
...
We may need special care to make sure we don't try to encrypt the same
packet with both keys. In case a rtx gets stuck somewhere and comes to
the NIC after it's already acked (happens surprisingly often).
Don't we have that already? If there's a retransmit while we're
setting the TX key in HW, data that was queued on the socket before
(and shouldn't be encrypted at all) would also be encrypted
otherwise. Or is it different with rekey?
We have a "start marker" record which is supposed to indicate that
anything before it has already been encrypted. The driver is programmed
with the start seq no, when it sees a packet from before this seq no
it checks if a record exists, finds its before the start marker and
sends the data as is.
...
...
Multiple keys on the device would probably mean the device needs some
intelligence to know when to use which - not my first choice.
I only mentioned that because Boris talked about it in the previous
thread.  It's also not my first choice because I doubt all devices
will support it the same way.
...
...
On receive, we also have the problem of more than one rekey arriving,
so if we can't feed enough keys to the device in advance, we'll have
to decrypt some records in software. The host will have to survive the
burst of software decryption while we wait until the device config
catches up.
I think receive is easier. The fallback is quite effective and already
in place.
I was thinking about a peer sending at line rate just after the rekey,
and we have to handle that in software for a short while. If the peer
is also dropping to a software fallback, I guess there's no issue,
both sides will slow down.
...
Here too we may want to enforce some transitional SW-only
mode to avoid the (highly unlikely?) case that NIC will decrypt
successfully a packet with the old key, even tho new key should be used.
Not a crypto expert, but I don't think that's a realistic thing to
worry about.
If the NIC uses the old key to decrypt a record that was encrypted
with the new key and fails, we end up seeing the encrypted record and
decrypting it in SW, right? (and then we can pick the correct key in
the SW fallback) We don't just tear down the connection?
Right, in the common case it should just work. We're basically up
against the probability of a hash collision on the auth tag (match 
with both old and new key).
...
...
Carrying "key ID" with the skb is probably an overkill.
I think we can retrieve it from the sequence of records. When we see a
KeyUpdate, we advance the "key ID" after we're done with the message.
This makes me wonder again if we should have fake offloads on veth
(still calling the kernel's crypto library to simulate a device doing
the encryption and/or decryption), to make it easy to play with the
software bits, without requiring actual hardware that can offload
TLS/IPsec/MACsec. But maybe it's too complex to be useful and we'd
just waste our time fixing bugs in the fake offload rather than
improving the stack.
It should be quite useful. I also usually just hack up veth, but 
I reckon adding support to netdevsim would be a better fit.
We just need a way to tell two netdevsim ports to "connect to each
other".
...
...
...
One option might be to do the key derivation in the kernel following
section 7.2 of the RFC [1]. I don't know how happy crypto/security
people would be with that. We'd have to introduce new crypto_info
structs, and new cipher types (or a flag in the upper bits of the
cipher type) to go with them. Then the kernel processes incoming key
update messages on its own, and emits its own key update messages when
its current key is expiring. On transmit we also need to inject a
Finished message before the KeyUpdate [2]. That's bringing a lot of
TLS logic in the kernel. At that point we might as well do the whole
handshake... but I really hope it doesn't come to that.
I think it's mostly a device vs host state sharing problem, so TLS ULP
or user space - not a big difference, both are on the host.
Maybe. But that's one more implementation of TLS (or large parts of
it) that admins and security teams have to worry about. Maintained by
networking people. I really want to avoid going down that route.
I'll try to dig more into what you're proposing and to poke some holes
into it over the next few days.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH net-next v2 0/5] tls: implement key updates for TLS1.3