devmem test fails on NIPA. Most likely we get skb(s) with readable frags (why?) but the failure manifests as an OOM. The OOM happens because ncdevmem spams the following message:
recvmsg ret=-1 recvmsg: Bad address
As of today, ncdevmem can't deal with various reasons of EFAULT: - falling back to regular recvmsg for non-devmem skbs - increasing ctrl_data size (can't happen with ncdevmem's large buffer)
Exit (cleanly) with error when recvmsg returns EFAULT. This should at least cause the test to cleanup its state.
Signed-off-by: Stanislav Fomichev sdf@fomichev.me --- tools/testing/selftests/drivers/net/hw/ncdevmem.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c index 8dc9511d046f..c0a22938bed2 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -945,6 +945,10 @@ static int do_server(struct memory_buffer *mem) continue; if (ret < 0) { perror("recvmsg"); + if (errno == EFAULT) { + pr_err("received EFAULT, won't recover"); + goto err_close_client; + } continue; } if (ret == 0) {
On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev sdf@fomichev.me wrote:
devmem test fails on NIPA. Most likely we get skb(s) with readable frags (why?)
I would expect if we get readable frags that the frags land in the host buffer we provide ncdevmem and we actually hit this error:
``` 1 if (!is_devmem) { 0 pr_err("flow steering error"); 1 goto err_close_client; 2 } ```
which as it says, should be root caused in a flow steering error. I don't know what would cause an EFAULT off the top of my head.
but the failure manifests as an OOM. The OOM happens because ncdevmem spams the following message:
recvmsg ret=-1 recvmsg: Bad address
As of today, ncdevmem can't deal with various reasons of EFAULT:
- falling back to regular recvmsg for non-devmem skbs
- increasing ctrl_data size (can't happen with ncdevmem's large buffer)
Exit (cleanly) with error when recvmsg returns EFAULT. This should at least cause the test to cleanup its state.
Signed-off-by: Stanislav Fomichev sdf@fomichev.me
Either way, change looks good to me.
Reviewed-by: Mina Almasry almasrymina@google.com
On 09/04, Mina Almasry wrote:
On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev sdf@fomichev.me wrote:
devmem test fails on NIPA. Most likely we get skb(s) with readable frags (why?)
I would expect if we get readable frags that the frags land in the host buffer we provide ncdevmem and we actually hit this error:
1 if (!is_devmem) { 0 pr_err("flow steering error"); 1 goto err_close_client; 2 }
which as it says, should be root caused in a flow steering error. I don't know what would cause an EFAULT off the top of my head.
Yea, I don't understand what happens :-( I'm thinking of doing the following as well:
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 40b774b4f587..0c18a8c7965f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2820,7 +2820,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, used); if (err <= 0) { if (!copied) - copied = -EFAULT; + copied = err;
break; }
Should give us more info for the devmem case... LMK if you don't like it. If I don't hear from you in a couple of days, I'll send it out..
On Fri, Sep 5, 2025 at 8:22 AM Stanislav Fomichev stfomichev@gmail.com wrote:
On 09/04, Mina Almasry wrote:
On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev sdf@fomichev.me wrote:
devmem test fails on NIPA. Most likely we get skb(s) with readable frags (why?)
I would expect if we get readable frags that the frags land in the host buffer we provide ncdevmem and we actually hit this error:
1 if (!is_devmem) { 0 pr_err("flow steering error"); 1 goto err_close_client; 2 }
which as it says, should be root caused in a flow steering error. I don't know what would cause an EFAULT off the top of my head.
Yea, I don't understand what happens :-( I'm thinking of doing the following as well:
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 40b774b4f587..0c18a8c7965f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2820,7 +2820,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, used); if (err <= 0) { if (!copied)
copied = -EFAULT;
copied = err; break; }
Should give us more info for the devmem case... LMK if you don't like it. If I don't hear from you in a couple of days, I'll send it out..
Hmm, the other code paths overwrite the error to EFAULT; I don't know if that's significant in some way. But seems fine to me, I don't see why not do this, other than maybe potentional confusion with recvmsg returning an error not documented here:
https://linux.die.net/man/2/recvmsg
But that seems a minor point.
On 09/05, Mina Almasry wrote:
On Fri, Sep 5, 2025 at 8:22 AM Stanislav Fomichev stfomichev@gmail.com wrote:
On 09/04, Mina Almasry wrote:
On Thu, Sep 4, 2025 at 11:27 AM Stanislav Fomichev sdf@fomichev.me wrote:
devmem test fails on NIPA. Most likely we get skb(s) with readable frags (why?)
I would expect if we get readable frags that the frags land in the host buffer we provide ncdevmem and we actually hit this error:
1 if (!is_devmem) { 0 pr_err("flow steering error"); 1 goto err_close_client; 2 }
which as it says, should be root caused in a flow steering error. I don't know what would cause an EFAULT off the top of my head.
Yea, I don't understand what happens :-( I'm thinking of doing the following as well:
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 40b774b4f587..0c18a8c7965f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2820,7 +2820,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len, used); if (err <= 0) { if (!copied)
copied = -EFAULT;
copied = err; break; }
Should give us more info for the devmem case... LMK if you don't like it. If I don't hear from you in a couple of days, I'll send it out..
Hmm, the other code paths overwrite the error to EFAULT; I don't know if that's significant in some way. But seems fine to me, I don't see why not do this, other than maybe potentional confusion with recvmsg returning an error not documented here:
https://linux.die.net/man/2/recvmsg
But that seems a minor point.
SG! Exporting new errnos seems to be ok because we are dealing with a new devmem mode? I think the bug we have on the fbnic is that we don't properly set skb->unreadable, so it's completely unrelated to this, but still feels like it might help with future debugging..
Hello:
This patch was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Thu, 4 Sep 2025 11:27:10 -0700 you wrote:
devmem test fails on NIPA. Most likely we get skb(s) with readable frags (why?) but the failure manifests as an OOM. The OOM happens because ncdevmem spams the following message:
recvmsg ret=-1 recvmsg: Bad address
[...]
Here is the summary with links: - [net-next] selftests: ncdevmem: don't retry EFAULT https://git.kernel.org/netdev/net-next/c/8c0b9ed2401b
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org