A BPF application, e.g., a TCP congestion control, might benefit from or even require precise (=hardware) packet timestamps. These timestamps are already available through __sk_buff.hwtstamp and bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were not allowed to set SO_TIMESTAMPING* on sockets.
Enable BPF programs to actively request the generation of timestamps from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the network device must still be done separately, in user space.
This patch had previously been submitted in a two-part series (first link below). The second patch has been independently applied in commit 7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)") (second link below).
On the earlier submission, there was the open question whether to only allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
For a BPF program, this won't make a difference: A timestamp, when accessed through the fields mentioned above, is directly read from skb_shared_info.hwtstamps, independent of the places where NEW/OLD is relevant. See bpf_convert_ctx_access() besides others.
I am unsure, though, when it comes to the interconnection of user space and BPF "space", when both are interested in the timestamps. I think it would cause an unsolvable conflict when user space is bound to use SO_TIMESTAMPING_OLD with a BPF program only allowed to set SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm mistaken.
Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berli... Link: https://lore.kernel.org/all/20231221231901.67003-1-jthinz@mailbox.tu-berlin.... Cc: Arnd Bergmann arnd@arndb.de Cc: Deepa Dinamani deepa.kernel@gmail.com Cc: Willem de Bruijn willemdebruijn.kernel@gmail.com Signed-off-by: Jörn-Thorben Hinz j-t.hinz@alumni.tu-berlin.de --- include/uapi/linux/bpf.h | 3 ++- net/core/filter.c | 2 ++ tools/include/uapi/linux/bpf.h | 3 ++- tools/testing/selftests/bpf/progs/bpf_tracing_net.h | 2 ++ tools/testing/selftests/bpf/progs/setget_sockopt.c | 4 ++++ 5 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 754e68ca8744..8825d0648efe 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2734,7 +2734,8 @@ union bpf_attr { * **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**, * **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**, * **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**, - * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**. + * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**, + * **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**. * * **IPPROTO_TCP**, which supports the following *optname*\ s: * **TCP_CONGESTION**, **TCP_BPF_IW**, * **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**, diff --git a/net/core/filter.c b/net/core/filter.c index 8c9f67c81e22..4f5280874fd8 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5144,6 +5144,8 @@ static int sol_socket_sockopt(struct sock *sk, int optname, case SO_MAX_PACING_RATE: case SO_BINDTOIFINDEX: case SO_TXREHASH: + case SO_TIMESTAMPING_NEW: + case SO_TIMESTAMPING_OLD: if (*optlen != sizeof(int)) return -EINVAL; break; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 7f24d898efbb..09eaafa6ab43 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2734,7 +2734,8 @@ union bpf_attr { * **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**, * **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**, * **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**, - * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**. + * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**, + * **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**. * * **IPPROTO_TCP**, which supports the following *optname*\ s: * **TCP_CONGESTION**, **TCP_BPF_IW**, * **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**, diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h index 1bdc680b0e0e..95f5f169819e 100644 --- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h +++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h @@ -15,8 +15,10 @@ #define SO_RCVLOWAT 18 #define SO_BINDTODEVICE 25 #define SO_MARK 36 +#define SO_TIMESTAMPING_OLD 37 #define SO_MAX_PACING_RATE 47 #define SO_BINDTOIFINDEX 62 +#define SO_TIMESTAMPING_NEW 65 #define SO_TXREHASH 74 #define __SO_ACCEPTCON (1 << 16)
diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c b/tools/testing/selftests/bpf/progs/setget_sockopt.c index 7a438600ae98..54205d10793c 100644 --- a/tools/testing/selftests/bpf/progs/setget_sockopt.c +++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c @@ -48,6 +48,10 @@ static const struct sockopt_test sol_socket_tests[] = { { .opt = SO_MARK, .new = 0xeb9f, .expected = 0xeb9f, }, { .opt = SO_MAX_PACING_RATE, .new = 0xeb9f, .expected = 0xeb9f, }, { .opt = SO_TXREHASH, .flip = 1, }, + { .opt = SO_TIMESTAMPING_NEW, .new = SOF_TIMESTAMPING_RX_HARDWARE, + .expected = SOF_TIMESTAMPING_RX_HARDWARE, }, + { .opt = SO_TIMESTAMPING_OLD, .new = SOF_TIMESTAMPING_RX_HARDWARE, + .expected = SOF_TIMESTAMPING_RX_HARDWARE, }, { .opt = 0, }, };
Jörn-Thorben Hinz wrote:
A BPF application, e.g., a TCP congestion control, might benefit from or even require precise (=hardware) packet timestamps. These timestamps are already available through __sk_buff.hwtstamp and bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were not allowed to set SO_TIMESTAMPING* on sockets.
Enable BPF programs to actively request the generation of timestamps from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the network device must still be done separately, in user space.
This patch had previously been submitted in a two-part series (first link below). The second patch has been independently applied in commit 7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)") (second link below).
On the earlier submission, there was the open question whether to only allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
For a BPF program, this won't make a difference: A timestamp, when accessed through the fields mentioned above, is directly read from skb_shared_info.hwtstamps, independent of the places where NEW/OLD is relevant. See bpf_convert_ctx_access() besides others.
I am unsure, though, when it comes to the interconnection of user space and BPF "space", when both are interested in the timestamps. I think it would cause an unsolvable conflict when user space is bound to use SO_TIMESTAMPING_OLD with a BPF program only allowed to set SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm mistaken.
The difference between OLD and NEW only affects the system calls. It is not reflected in how the data is stored in the skb, or how BPF can read the data. A process setting SO_TIMESTAMPING_OLD will still allow BPF to read data using SO_TIMESTAMPING_NEW.
But, he one place where I see a conflict is in setting sock_flag SOCK_TSTAMP_NEW. That affects what getsockopt returns and which cmsg is written:
if (sock_flag(sk, SOCK_TSTAMP_NEW)) put_cmsg_scm_timestamping64(msg, tss); else put_cmsg_scm_timestamping(msg, tss);
So a process could issue setsockopt SO_TIMESTAMPING_OLD followed by a BPF program that issues setsockopt SO_TIMESTAMPING_NEW and this would flip SOCK_TSTAMP_NEW.
Just allowing BPF to set SO_TIMESTAMPING_OLD does not fix it, as it just adds the inverse case.
A related problem is how does the BPF program know which of the two variants to set. The BPF program is usually compiled and loaded independently of the running process.
Perhaps one option is to fail the setsockop if it would flip sock_flag SOCK_TSTAMP_NEW. But only if called from BPF, as else it changes existing ABI.
Then a BPF program can attempt to set SO_TIMESTAMPING NEW, be prepared to handle a particular errno, and retry with SO_TIMESTAMPING_OLD.
Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berli... Link: https://lore.kernel.org/all/20231221231901.67003-1-jthinz@mailbox.tu-berlin.... Cc: Arnd Bergmann arnd@arndb.de Cc: Deepa Dinamani deepa.kernel@gmail.com Cc: Willem de Bruijn willemdebruijn.kernel@gmail.com Signed-off-by: Jörn-Thorben Hinz j-t.hinz@alumni.tu-berlin.de
On 1/16/24 7:17 AM, Willem de Bruijn wrote:
Jörn-Thorben Hinz wrote:
A BPF application, e.g., a TCP congestion control, might benefit from or even require precise (=hardware) packet timestamps. These timestamps are already available through __sk_buff.hwtstamp and bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were not allowed to set SO_TIMESTAMPING* on sockets.
This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue which seems not good. If rx tstamp is useful, tx tstamp should be useful also?
Enable BPF programs to actively request the generation of timestamps from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the network device must still be done separately, in user space.
hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
I likely miss something. When skb is created in the driver rx path, the sk is not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the skb_shinfo(skb)->hwtstamps?
Martin KaFai Lau wrote:
On 1/16/24 7:17 AM, Willem de Bruijn wrote:
Jörn-Thorben Hinz wrote:
A BPF application, e.g., a TCP congestion control, might benefit from or even require precise (=hardware) packet timestamps. These timestamps are already available through __sk_buff.hwtstamp and bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were not allowed to set SO_TIMESTAMPING* on sockets.
This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue which seems not good. If rx tstamp is useful, tx tstamp should be useful also?
Good point. Or should not be allowed to be set from BPF.
That significantly changes process behavior, e.g., by returning POLLERR.
Enable BPF programs to actively request the generation of timestamps from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the network device must still be done separately, in user space.
hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
I likely miss something. When skb is created in the driver rx path, the sk is not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the skb_shinfo(skb)->hwtstamps?
Indeed it does not seem to do anything in the datapath.
Requesting SOF_TIMESTAMPING_RX_SOFTWARE will call net_enable_timestamp to start timestamping packets.
But SOF_TIMESTAMPING_RX_HARDWARE does not so thing.
Drivers do use it in ethtool get_ts_info to signal hardware capabilities. But those must be configured using the ioctl.
It is there more for consistency with the other timestamp recording options, I suppose.
On 1/17/24 7:55 AM, Willem de Bruijn wrote:
Martin KaFai Lau wrote:
On 1/16/24 7:17 AM, Willem de Bruijn wrote:
Jörn-Thorben Hinz wrote:
A BPF application, e.g., a TCP congestion control, might benefit from or even require precise (=hardware) packet timestamps. These timestamps are already available through __sk_buff.hwtstamp and bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were not allowed to set SO_TIMESTAMPING* on sockets.
This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue which seems not good. If rx tstamp is useful, tx tstamp should be useful also?
Good point. Or should not be allowed to be set from BPF.
That significantly changes process behavior, e.g., by returning POLLERR.
Enable BPF programs to actively request the generation of timestamps from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the network device must still be done separately, in user space.
hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
I likely miss something. When skb is created in the driver rx path, the sk is not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the skb_shinfo(skb)->hwtstamps?
Indeed it does not seem to do anything in the datapath.
Requesting SOF_TIMESTAMPING_RX_SOFTWARE will call net_enable_timestamp to start timestamping packets.
But SOF_TIMESTAMPING_RX_HARDWARE does not so thing.
Drivers do use it in ethtool get_ts_info to signal hardware capabilities. But those must be configured using the ioctl.
It is there more for consistency with the other timestamp recording options, I suppose.
Thanks for the explanation on the SOF_TIMESTAMPING_RX_{HARDWARE,SOFTWARE}.
__sk_buff.hwtstamp should have the NIC rx timestamp then as long as the NIC is ioctl configured.
Jorn, do you need RX_SOFTWARE? From looking at net_timestamp_set(), any socket requested RX_SOFTWARE should be enough to get a skb->tstamp for all skbs. A workaround is to manually create a socket and turn on RX_SOFTWARE.
It will still be nice to get proper bpf_setsockopt() support for RX_SOFTWARE but it should be considered together with how SO_TIMESTAMPING_TX_* should work in bpf prog considering the TX tstamping does not have a workaround solution like RX_SOFTWARE.
It is probably cleaner to have a separate bit in sk->sk_tsflags for bpf such that the bpf prog won't be affected by the userspace turning it on/off and it won't change the userspace's expectation also (e.g. sk_error_queue and POLLERR).
The part that needs more thoughts in the tx tstamp is how to notify the bpf prog to consume it. Potentially the kernel can involve a bpf prog to collect the tx timestamp when the bpf bit in sk->sk_tsflags is set. An example on how TCP-CC is using it will help to think of the approach here.
Hmm, after taking a new look at it today, I think my patch can be disregarded---at least for having a BPF program access *RX* *hardware* timestamps. (Sorry about the noise then.)
When I looked into this a few months ago, I half-blindly followed Documentation/networking/timestamping.rst, afterwards assuming bpf_setsockopt(SO_TIMESTAMPING*) will be necessary for my use case (see about it at the end).
Looking at it again today, it seems the ioctl(SIOCSHWTSTAMP) is sufficient here: It enables the hardware timestamping on the device, which are placed in skb's/skb_shared_info's hwtstamps field. This hwtstamps is where the values of __sk_buff.hwtstamp and bpf_sock_ops.skb_hwtstamp are coming from. No further timestamp processing is involved when a BPF program reads the these two fields. Meaning bpf_setsockopt(SOF_TIMESTAMPING_RX_HARDWARE) would be a no-op from the view of a BPF program.
I started this message before coming to the above understanding but I've left my replies in below.
With bpf_setsockopt(SOF_TIMESTAMPING_RX_HARDWARE) being unnecessary, and bpf_setsockopt(SOF_TIMESTAMPING_RX_SOFTWARE), as I understand, having a number of possibly unwanted implications---should we leave it at that here?
On Wed, 2024-01-17 at 13:23 -0800, Martin KaFai Lau wrote:
On 1/17/24 7:55 AM, Willem de Bruijn wrote:
Martin KaFai Lau wrote:
On 1/16/24 7:17 AM, Willem de Bruijn wrote: > > Jörn-Thorben Hinz wrote: > > > > A BPF application, e.g., a TCP congestion control, > > > > might > > > > benefit from or > > > > even require precise (=hardware) packet timestamps. > > > > These > > > > timestamps are > > > > already available through __sk_buff.hwtstamp and > > > > bpf_sock_ops.skb_hwtstamp, but could not be > > > > requested: BPF > > > > programs were > > > > not allowed to set SO_TIMESTAMPING* on sockets.
This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue which seems not good. If rx tstamp is useful, tx tstamp should be useful also?
I admit I only ever looked at enabling and using SOF_TIMESTAMPING_RX_HARDWARE for my/our use case. With that, I was not aware that _SOFTWARE has more, possibly complicating implications.
Good point. Or should not be allowed to be set from BPF.
That significantly changes process behavior, e.g., by returning POLLERR.
> > > > > > > > Enable BPF programs to actively request the > > > > generation of > > > > timestamps > > > > from a stream socket. The also required > > > > ioctl(SIOCSHWTSTAMP) > > > > on the > > > > network device must still be done separately, in > > > > user space.
hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
I likely miss something. When skb is created in the driver rx path, the sk is not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the skb_shinfo(skb)->hwtstamps?
I mostly followed Documentation/networking/timestamping.rst (section 3) to understand how the hardware timestamps are to be setup and used.
From my understanding, the ioctl(SIOCSHWTSTAMP) makes a persistent setting for the device/driver, independent of the lifetime of any socket or skb.
I used a simplified program[1] when trying out this patch a few months ago.
Indeed it does not seem to do anything in the datapath.
Requesting SOF_TIMESTAMPING_RX_SOFTWARE will call net_enable_timestamp to start timestamping packets.
But SOF_TIMESTAMPING_RX_HARDWARE does not so thing.
Drivers do use it in ethtool get_ts_info to signal hardware capabilities. But those must be configured using the ioctl.
It is there more for consistency with the other timestamp recording options, I suppose.
Thanks for the explanation on the SOF_TIMESTAMPING_RX_{HARDWARE,SOFTWARE}.
__sk_buff.hwtstamp should have the NIC rx timestamp then as long as the NIC is ioctl configured.
Jorn, do you need RX_SOFTWARE? From looking at net_timestamp_set(), any socket requested RX_SOFTWARE should be enough to get a skb->tstamp for all skbs. A workaround is to manually create a socket and turn on RX_SOFTWARE.
No, my use case was only for the RX hardware timestamps, as close to the packet reception time point as possible.
It will still be nice to get proper bpf_setsockopt() support for RX_SOFTWARE but it should be considered together with how SO_TIMESTAMPING_TX_* should work in bpf prog considering the TX tstamping does not have a workaround solution like RX_SOFTWARE.
It is probably cleaner to have a separate bit in sk->sk_tsflags for bpf such that the bpf prog won't be affected by the userspace turning it on/off and it won't change the userspace's expectation also (e.g. sk_error_queue and POLLERR).
The part that needs more thoughts in the tx tstamp is how to notify the bpf prog to consume it. Potentially the kernel can involve a bpf prog to collect the tx timestamp when the bpf bit in sk->sk_tsflags is set. An example on how TCP-CC is using it will help to think of the approach here.
My (academic) application was an implementation[2,3] of PowerTCP[4], a CC that (in its simplified variant) profits from precise timestamping. Only the RX timestamps would be of use there.
As mentioned above, I used[1] a while ago when I looked into timestamp usage. It shows how I imagine the timestamps could be accessed and used (similarly implemented in [2]).
[1] https://github.com/jtdor/bpf_hwtstamps [2] https://github.com/inet-tub/powertcp-linux [3] https://schmiste.github.io/ebpf23.pdf [4] https://schmiste.github.io/nsdi22powertcp.pdf
On Tue, 2024-01-16 at 10:17 -0500, Willem de Bruijn wrote:
Jörn-Thorben Hinz wrote:
A BPF application, e.g., a TCP congestion control, might benefit from or even require precise (=hardware) packet timestamps. These timestamps are already available through __sk_buff.hwtstamp and bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were not allowed to set SO_TIMESTAMPING* on sockets.
Enable BPF programs to actively request the generation of timestamps from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the network device must still be done separately, in user space.
This patch had previously been submitted in a two-part series (first link below). The second patch has been independently applied in commit 7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)") (second link below).
On the earlier submission, there was the open question whether to only allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
For a BPF program, this won't make a difference: A timestamp, when accessed through the fields mentioned above, is directly read from skb_shared_info.hwtstamps, independent of the places where NEW/OLD is relevant. See bpf_convert_ctx_access() besides others.
I am unsure, though, when it comes to the interconnection of user space and BPF "space", when both are interested in the timestamps. I think it would cause an unsolvable conflict when user space is bound to use SO_TIMESTAMPING_OLD with a BPF program only allowed to set SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm mistaken.
The difference between OLD and NEW only affects the system calls. It is not reflected in how the data is stored in the skb, or how BPF can read the data. A process setting SO_TIMESTAMPING_OLD will still allow BPF to read data using SO_TIMESTAMPING_NEW.
But, he one place where I see a conflict is in setting sock_flag SOCK_TSTAMP_NEW. That affects what getsockopt returns and which cmsg is written:
if (sock_flag(sk, SOCK_TSTAMP_NEW)) put_cmsg_scm_timestamping64(msg, tss); else put_cmsg_scm_timestamping(msg, tss);
So a process could issue setsockopt SO_TIMESTAMPING_OLD followed by a BPF program that issues setsockopt SO_TIMESTAMPING_NEW and this would flip SOCK_TSTAMP_NEW.
Just allowing BPF to set SO_TIMESTAMPING_OLD does not fix it, as it just adds the inverse case.
Thanks for elaborating on this. I see I only thought of half the possible conflicting situations.
A related problem is how does the BPF program know which of the two variants to set. The BPF program is usually compiled and loaded independently of the running process.
True, that is an additional challenge. And with respect to CO-RE, I think a really portable BPF program could (or at least should) not even decide on NEW or OLD at compile time.
Perhaps one option is to fail the setsockop if it would flip sock_flag SOCK_TSTAMP_NEW. But only if called from BPF, as else it changes existing ABI.
Then a BPF program can attempt to set SO_TIMESTAMPING NEW, be prepared to handle a particular errno, and retry with SO_TIMESTAMPING_OLD.
Hmm, would be possible, yes. But sounds like a weird and unexpected special-case behavior to the occasional BPF user.
Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berli... Link: https://lore.kernel.org/all/20231221231901.67003-1-jthinz@mailbox.tu-berlin.... Cc: Arnd Bergmann arnd@arndb.de Cc: Deepa Dinamani deepa.kernel@gmail.com Cc: Willem de Bruijn willemdebruijn.kernel@gmail.com Signed-off-by: Jörn-Thorben Hinz j-t.hinz@alumni.tu-berlin.de
Jörn-Thorben Hinz wrote:
On Tue, 2024-01-16 at 10:17 -0500, Willem de Bruijn wrote:
Jörn-Thorben Hinz wrote:
A BPF application, e.g., a TCP congestion control, might benefit from or even require precise (=hardware) packet timestamps. These timestamps are already available through __sk_buff.hwtstamp and bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were not allowed to set SO_TIMESTAMPING* on sockets.
Enable BPF programs to actively request the generation of timestamps from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the network device must still be done separately, in user space.
This patch had previously been submitted in a two-part series (first link below). The second patch has been independently applied in commit 7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)") (second link below).
On the earlier submission, there was the open question whether to only allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
For a BPF program, this won't make a difference: A timestamp, when accessed through the fields mentioned above, is directly read from skb_shared_info.hwtstamps, independent of the places where NEW/OLD is relevant. See bpf_convert_ctx_access() besides others.
I am unsure, though, when it comes to the interconnection of user space and BPF "space", when both are interested in the timestamps. I think it would cause an unsolvable conflict when user space is bound to use SO_TIMESTAMPING_OLD with a BPF program only allowed to set SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm mistaken.
The difference between OLD and NEW only affects the system calls. It is not reflected in how the data is stored in the skb, or how BPF can read the data. A process setting SO_TIMESTAMPING_OLD will still allow BPF to read data using SO_TIMESTAMPING_NEW.
But, he one place where I see a conflict is in setting sock_flag SOCK_TSTAMP_NEW. That affects what getsockopt returns and which cmsg is written:
if (sock_flag(sk, SOCK_TSTAMP_NEW)) put_cmsg_scm_timestamping64(msg, tss); else put_cmsg_scm_timestamping(msg, tss);
So a process could issue setsockopt SO_TIMESTAMPING_OLD followed by a BPF program that issues setsockopt SO_TIMESTAMPING_NEW and this would flip SOCK_TSTAMP_NEW.
Just allowing BPF to set SO_TIMESTAMPING_OLD does not fix it, as it just adds the inverse case.
Thanks for elaborating on this. I see I only thought of half the possible conflicting situations.
A related problem is how does the BPF program know which of the two variants to set. The BPF program is usually compiled and loaded independently of the running process.
True, that is an additional challenge. And with respect to CO-RE, I think a really portable BPF program could (or at least should) not even decide on NEW or OLD at compile time.
Perhaps one option is to fail the setsockop if it would flip sock_flag SOCK_TSTAMP_NEW. But only if called from BPF, as else it changes existing ABI.
Then a BPF program can attempt to set SO_TIMESTAMPING NEW, be prepared to handle a particular errno, and retry with SO_TIMESTAMPING_OLD.
Hmm, would be possible, yes. But sounds like a weird and unexpected special-case behavior to the occasional BPF user.
Agreed. So perhaps we're back to where we say: this is a new feature for BPF, only support it on modern environments that use SO_TIMESTAMPING_NEW?
linux-kselftest-mirror@lists.linaro.org