Fix the regression introduced in 9e30ecf23b1b whereby IPv4 broadcast packets were having their ethernet destination field mangled. This broke WOL magic packets and likely other IPv4 broadcast.
The regression can be observed by sending an IPv4 WOL packet using the wakeonlan program to any ethernet address:
wakeonlan 46:3b:ad:61:e0:5d
and capturing the packet with tcpdump:
tcpdump -i eth0 -w /tmp/bad.cap dst port 9
The ethernet destination MUST be ff:ff:ff:ff:ff:ff for broadcast, but is mangled with 9e30ecf23b1b applied.
Revert the change made in 9e30ecf23b1b and ensure the MTU value for broadcast routes is retained by calling ip_dst_init_metrics() directly, avoiding the need to enter the main code block in rt_set_nexthop().
Simplify the code path taken for broadcast packets back to the original before the regression, adding only the call to ip_dst_init_metrics().
The broadcast_pmtu.sh selftest provided with the original patch still passes with this patch applied.
Fixes: 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes") Signed-off-by: Brett A C Sheffield bacs@librecast.net --- net/ipv4/route.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index f639a2ae881a..eaf78e128aca 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2588,6 +2588,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res, do_cache = true; if (type == RTN_BROADCAST) { flags |= RTCF_BROADCAST | RTCF_LOCAL; + fi = NULL; } else if (type == RTN_MULTICAST) { flags |= RTCF_MULTICAST | RTCF_LOCAL; if (!ip_check_mc_rcu(in_dev, fl4->daddr, fl4->saddr, @@ -2657,8 +2658,12 @@ static struct rtable *__mkroute_output(const struct fib_result *res, rth->dst.output = ip_mc_output; RT_CACHE_STAT_INC(out_slow_mc); } + if (type == RTN_BROADCAST) { + /* ensure MTU value for broadcast routes is retained */ + ip_dst_init_metrics(&rth->dst, res->fi->fib_metrics); + } #ifdef CONFIG_IP_MROUTE - if (type == RTN_MULTICAST) { + else if (type == RTN_MULTICAST) { if (IN_DEV_MFORWARD(in_dev) && !ipv4_is_local_multicast(fl4->daddr)) { rth->dst.input = ip_mr_input;
base-commit: 01b9128c5db1b470575d07b05b67ffa3cb02ebf1
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [REGRESSION][BISECTED][PATCH] net: ipv4: fix regression in broadcast routes Link: https://lore.kernel.org/stable/20250822165231.4353-4-bacs%40librecast.net
Thanks for bisecting and fixing!
The broadcast_pmtu.sh selftest provided with the original patch still passes with this patch applied.
Hm, yes, AFACT we're losing PMTU discovery but perhaps original commit wasn't concerned with that. Hopefully Oscar can comment.
On Fri, 22 Aug 2025 16:50:51 +0000 Brett A C Sheffield wrote:
if (type == RTN_BROADCAST) {
/* ensure MTU value for broadcast routes is retained */
ip_dst_init_metrics(&rth->dst, res->fi->fib_metrics);
You need to check if res->fi is actually set before using it
Could you add a selftest / test case for the scenario we broke? selftests can be in C / bash / Python. If bash hopefully socat can be used to repro, cause it looks like wakeonlan is not very widely packaged.
Add test to check the broadcast ethernet destination field is set correctly.
This test uses the tcpdump and socat programs.
Send a UDP broadcast packet to UDP port 9 (DISCARD), capture this with tcpdump and ensure that all bits of the 6 octet ethernet destination are correctly set.
Cc: stable@vger.kernel.org Signed-off-by: Brett A C Sheffield bacs@librecast.net Link: https://lore.kernel.org/regressions/20250822165231.4353-4-bacs@librecast.net --- tools/testing/selftests/net/Makefile | 1 + .../selftests/net/broadcast_ether_dst.sh | 38 +++++++++++++++++++ 2 files changed, 39 insertions(+) create mode 100755 tools/testing/selftests/net/broadcast_ether_dst.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index b31a71f2b372..463642a78eea 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -116,6 +116,7 @@ TEST_GEN_FILES += skf_net_off TEST_GEN_FILES += tfo TEST_PROGS += tfo_passive.sh TEST_PROGS += broadcast_pmtu.sh +TEST_PROGS += broadcast_ether_dst.sh TEST_PROGS += ipv6_force_forwarding.sh
# YNL files, must be before "include ..lib.mk" diff --git a/tools/testing/selftests/net/broadcast_ether_dst.sh b/tools/testing/selftests/net/broadcast_ether_dst.sh new file mode 100755 index 000000000000..de6abe3513b6 --- /dev/null +++ b/tools/testing/selftests/net/broadcast_ether_dst.sh @@ -0,0 +1,38 @@ +#!/bin/bash -eu +# SPDX-License-Identifier: GPL-2.0 +# +# Author: Brett A C Sheffield bacs@librecast.net +# +# Ensure destination ethernet field is correctly set for +# broadcast packets + +if ! which tcpdump > /dev/null 2>&1; then + echo "No tcpdump found. Required for this test." + exit $ERR +fi + +CAPFILE=$(mktemp -u cap.XXXXXXXXXX) + +# start tcpdump listening on udp port 9 +# tcpdump will exit after receiving a single packet +# timeout will kill tcpdump if it is still running after 2s +timeout 2s tcpdump -c 1 -w ${CAPFILE} udp port 9 > /dev/null 2>&1 & +PID=$! +sleep 0.1 # let tcpdump wake up + +echo "Testing ethernet broadcast destination" + +# send broadcast UDP packet to port 9 (DISCARD) +echo "Alonso is a good boy" | socat - udp-datagram:255.255.255.255:9,broadcast + +# wait for tcpdump for exit after receiving packet +wait $PID + +# compare ethernet destination field to ff:ff:ff:ff:ff:ff +# pcap has a 24 octet header + 16 octet header for each packet +# ethernet destination is the first field in the packet +printf '\xff\xff\xff\xff\xff\xff'| cmp -i40:0 -n6 ${CAPFILE} > /dev/null 2>&1 +RESULT=$? + +rm -f "${CAPFILE}" +exit $RESULT
base-commit: 01b9128c5db1b470575d07b05b67ffa3cb02ebf1
Fix the regression introduced in 9e30ecf23b1b whereby IPv4 broadcast packets were having their ethernet destination field mangled. The problem was first observed with WOL magic packets but affects all UDP IPv4 broadcast packets.
The regression can be observed by sending an IPv4 WOL packet using the wakeonlan program to any ethernet address:
wakeonlan 46:3b:ad:61:e0:5d
and capturing the packet with tcpdump:
tcpdump -i eth0 -w /tmp/bad.cap dst port 9
The ethernet destination MUST be ff:ff:ff:ff:ff:ff for broadcast, but is mangled in affected kernels.
Revert the change made in 9e30ecf23b1b and ensure the MTU value for broadcast routes is retained by calling ip_dst_init_metrics() directly, avoiding the need to enter the main code block in rt_set_nexthop().
Simplify the code path taken for broadcast packets back to the original before the regression, adding only the call to ip_dst_init_metrics().
The broadcast_pmtu.sh selftest provided with the original patch still passes with this patch applied.
Cc: stable@vger.kernel.org Fixes: 9e30ecf23b1b ("net: ipv4: fix incorrect MTU in broadcast routes") Link: https://lore.kernel.org/regressions/20250822165231.4353-4-bacs@librecast.net Signed-off-by: Brett A C Sheffield bacs@librecast.net --- net/ipv4/route.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index f639a2ae881a..ab4d72a59c7b 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2588,6 +2588,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res, do_cache = true; if (type == RTN_BROADCAST) { flags |= RTCF_BROADCAST | RTCF_LOCAL; + fi = NULL; } else if (type == RTN_MULTICAST) { flags |= RTCF_MULTICAST | RTCF_LOCAL; if (!ip_check_mc_rcu(in_dev, fl4->daddr, fl4->saddr, @@ -2657,8 +2658,12 @@ static struct rtable *__mkroute_output(const struct fib_result *res, rth->dst.output = ip_mc_output; RT_CACHE_STAT_INC(out_slow_mc); } + if (type == RTN_BROADCAST && res->fi) { + /* ensure MTU value for broadcast routes is retained */ + ip_dst_init_metrics(&rth->dst, res->fi->fib_metrics); + } #ifdef CONFIG_IP_MROUTE - if (type == RTN_MULTICAST) { + else if (type == RTN_MULTICAST) { if (IN_DEV_MFORWARD(in_dev) && !ipv4_is_local_multicast(fl4->daddr)) { rth->dst.input = ip_mr_input;
On 2025-08-22 18:32, Jakub Kicinski wrote:
Thanks for bisecting and fixing!
The broadcast_pmtu.sh selftest provided with the original patch still passes with this patch applied.
Hm, yes, AFACT we're losing PMTU discovery but perhaps original commit wasn't concerned with that. Hopefully Oscar can comment.
Indeed. This takes it back to the previous behaviour.
On Fri, 22 Aug 2025 16:50:51 +0000 Brett A C Sheffield wrote:
if (type == RTN_BROADCAST) {
/* ensure MTU value for broadcast routes is retained */
ip_dst_init_metrics(&rth->dst, res->fi->fib_metrics);
You need to check if res->fi is actually set before using it
Ah, yes. Fixed.
Could you add a selftest / test case for the scenario we broke? selftests can be in C / bash / Python. If bash hopefully socat can be used to repro, cause it looks like wakeonlan is not very widely packaged.
Self-test added using socat as requested. If you want this wrapped in namespaces etc. let me know. I started doing that, but decided a simpler test without requiring root was better and cleaner.
Thanks for the review Jakub. v2 patches sent.
Cheers,
Brett
linux-stable-mirror@lists.linaro.org