2023-03-13, 11:35:10 -0700, Jakub Kicinski wrote:
On Mon, 13 Mar 2023 16:41:36 +0100 Sabrina Dubroca wrote:
Yes, I was looking into that earlier this week. I think we could reuse a similar mechanism for rekeying. tls_dev_add takes tcp_sk->write_seq, we could have a tls_dev_rekey op passing the new key and new write_seq to the driver. I think we can also reuse the ->eor trick from tls_set_device_offload, and we wouldn't have to look at skb->decrypted. Close and push the current SW record, mark ->eor, pass write_seq to the driver along with the key. Also pretty close to what tls_device_resync_tx does.
That sounds like you'd expose the rekeying logic to the drivers? New op, having to track seq#...
Well, we have to call into the drivers to install the key, whether that's a new rekey op, or adding an update argument to ->tls_dev_add, or letting the driver guess that it's a rekey (or ignore that and just install the key if rekey vs initial key isn't a meaningful distinction).
We already feed drivers the seq# with ->tls_dev_add, so passing it for rekeys as well is not a big change.
Does that seem problematic? Adding a rekey op seemed more natural to me than simply using the existing _del + _add ops, but maybe we can get away with just using those two ops.
Theoretically a rekey op is nicer and cleaner. Practically the quality of the driver implementations will vary wildly*, and it's a significant time investment to review all of them. So for non-technical reasons my intuition is that we'd deliver a better overall user experience if we handled the rekey entirely in the core.
Wait for old key to no longer be needed, _del + _add, start using the offload again.
- One vendor submitted a driver claiming support for TLS 1.3, when TLS 1.3 offload was rejected by the core. So this is the level of testing and diligence we're working with :(
:(
Ok, _del + _add then.
I went over the thread to summarize what we've come up with so far:
RX - The existing SW path will handle all records between the KeyUpdate message signaling the change of key and the new key becoming known to the kernel -- those will be queued encrypted, and decrypted in SW as they are read by userspace (once the key is provided, ie same as this patchset) - Call ->tls_dev_del + ->tls_dev_add immediately during setsockopt(TLS_RX)
TX - After setsockopt(TLS_TX), switch to the existing SW path (not the current device_fallback) until we're able to re-enable HW offload - tls_device_{sendmsg,sendpage} will call into tls_sw_{sendmsg,sendpage} under lock_sock to avoid changing socket ops during the rekey while another thread might be waiting on the lock - We only re-enable HW offload (call ->tls_dev_add to install the new key in HW) once all records sent with the old key have been ACKed. At this point, all unacked records are SW-encrypted with the new key, and the old key is unused by both HW and retransmissions. - If there are no unacked records when userspace does setsockopt(TLS_TX), we can (try to) install the new key in HW immediately. - If yet another key has been provided via setsockopt(TLS_TX), we don't install intermediate keys, only the latest. - TCP notifies ktls of ACKs via the icsk_clean_acked callback. In case of a rekey, tls_icsk_clean_acked will record when all data sent with the most recent past key has been sent. The next call to sendmsg/sendpage will install the new key in HW. - We close and push the current SW record before reenabling offload.
If ->tls_dev_add fails to install the new key in HW, we stay in SW mode. We can add a counter to keep track of this.
In addition:
Because we can't change socket ops during a rekey, we'll also have to modify do_tls_setsockopt_conf to check ctx->tx_conf and only call either tls_set_device_offload or tls_set_sw_offload. RX already uses the same ops for both TLS_HW and TLS_SW, so we could switch between HW and SW mode on rekey.
An alternative would be to have a common sendmsg/sendpage which locks the socket and then calls the correct implementation. We'll need that anyway for the offload under rekey case, so that would only add a test to the SW path's ops (compared to the current code). That should allow us to make build_protos a lot simpler.