Hello,
On Fri, 2021-07-02 at 14:36 +0200, Matthias Treydte wrote:
And to answer Paolo's questions from his mail to the list (@Paolo: I'm not subscribed, please also send to me directly so I don't miss your mail)
(yup, that is what I did ?!?)
Could you please:
- tell how frequent is the pkt corruption, even a rough estimate of the
frequency.
# journalctl --since "5min ago" | grep "Packet corrupt" | wc -l 167
So there are 167 detected failures in 5 minutes, while the system is receiving at a moderate rate of about 900 pkts/s (according to Prometheus' node exporter at least, but seems about right)
Intersting. The relevant UDP GRO features are already off, and this happens infrequently. Something is happening on a per packet basis, I can't guess what.
It looks like you should be able to collect more info WRT the packet corruption enabling debug logging at ffmpeg level, but I guess that will flood the logfile.
If you have the kernel debuginfo and the 'perf' tool available, could you please try:
perf probe -a 'udp_gro_receive sk sk->__sk_common.skc_dport' perf probe -a 'udp_gro_receive_segment'
# neet to wait until at least a pkt corruption happens, 10 second # should be more then enough perf record -a -e probe:udp_gro_receive -e probe:udp_gro_receive_segment sleep 10
perf script | gzip > perf_script.gz
and share the above? I fear it could be too big for the ML, feel free to send it directly to me.
Next I'll try to capture some broken packets and reply in a separate mail, I'll have to figure out a good way to do this first.
Looks like there is corrupted packet every ~2K UDP ones. If you capture a few thousends consecutive ones, than wireshark should probably help finding the suspicious ones.
Thanks!
Paolo