 
            Guillaume Nault gnault@redhat.com writes:
On Mon, Jul 14, 2025 at 09:57:52PM +0200, Salvatore Bonaccorso wrote:
Hi,
Charles Bordet reported the following issue (full context in https://bugs.debian.org/1108860)
Dear Maintainer,
What led up to the situation? We run a production environment using Debian 12 VMs, with a network topology involving VXLAN tunnels encapsulated inside Wireguard interfaces. This setup has worked reliably for over a year, with MTU set to 1500 on all interfaces except the Wireguard interface (set to 1420). Wireguard kernel fragmentation allowed this configuration to function without issues, even though the effective path MTU is lower than 1500.
What exactly did you do (or not do) that was effective (or ineffective)? We performed a routine system upgrade, updating all packages include the kernel. After the upgrade, we observed severe network issues (timeouts, very slow HTTP/HTTPS, and apt update failures) on all VMs behind the router. SSH and small-packet traffic continued to work.
To diagnose, we:
- Restored a backup (with the previous kernel): the problem disappeared.
- Repeated the upgrade, confirming the issue reappeared.
- Systematically tested each kernel version from 6.1.124-1 up to
6.1.140-1. The problem first appears with kernel 6.1.135-1; all earlier versions work as expected.
- Kernel version from the backports (6.12.32-1) did not resolve the
problem.
What was the outcome of this action?
- With kernel 6.1.135-1 or later, network timeouts occur for
large-packet protocols (HTTP, apt, etc.), while SSH and small-packet protocols work.
- With kernel 6.1.133-1 or earlier, everything works as expected.
What outcome did you expect instead? We expected the network to function as before, with Wireguard handling fragmentation transparently and no application-level timeouts, regardless of the kernel version.
While triaging the issue we found that the commit 8930424777e4 ("tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu()." introduces the issue and Charles confirmed that the issue was present as well in 6.12.35 and 6.15.4 (other version up could potentially still be affected, but we wanted to check it is not a 6.1.y specific regression).
Reverthing the commit fixes Charles' issue.
Does that ring a bell?
It doesn't ring a bell. Do you have more details on the setup that has the problem? Or, ideally, a self-contained reproducer?
+1 - I tested this patch with an OVS setup using vxlan and geneve tunnels. A reproducer or more details would help.
Regards, Salvatore