On Tue, Sep 2, 2025 at 2:38 PM Matthieu Baerts matttbe@kernel.org wrote:
2 Sept 2025 23:18:56 Catalin Marinas catalin.marinas@arm.com:
On Tue, Sep 02, 2025 at 08:50:19PM +0200, Matthieu Baerts wrote:
Hi Catalin,
2 Sept 2025 20:25:19 Catalin Marinas catalin.marinas@arm.com:
On Tue, Sep 02, 2025 at 08:27:59AM -0700, Jakub Kicinski wrote:
On Tue, 2 Sep 2025 16:51:47 +0200 Matthieu Baerts wrote:
It is unclear why a second scan is needed and only the second one caught something. Was it the same with the strange issues you mentioned in driver tests? Do you think I should re-add the second scan + cat?
Not sure, cc: Catalin, from experience it seems like second scan often surfaces issues the first scan missed.
It's some of the kmemleak heuristics to reduce false positives. It does a checksum of the object during scanning and only reports a leak if the checksum is the same in two consecutive scans.
Thank you for the explanation!
Does that mean a scan should be triggered at the end of the tests, then wait 5 second for the grace period, then trigger another scan and check the results?
Or wait 5 seconds, then trigger two consecutive scans?
The 5 seconds is the minimum age of an object before it gets reported as a leak. It's not related to the scanning process. So you could do two scans in succession and wait 5 seconds before checking for leaks.
However, I'd go with the first option - do a scan, wait 5 seconds and do another. That's mostly because at the end of the scan kmemleak prints if it found new unreferenced objects. It might not print the message if a leaked object is younger than 5 seconds. In practice, though, the scan may take longer, depending on how loaded your system is.
The second option works as well but waiting between them has a better chance of removing false positives if, say, some objects are moved between lists and two consecutive scans do not detect the list_head change (and update the object's checksum).
Thank you for this very nice reply, that's very clear!
I will then adapt our CI having CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF to do a manual scan at the very end, wait 5 seconds and do another.
FWIW - I am able to pretty reliably reproduce the kmemleak. However, I also tried adding an inline kmemleak scan to the test harness (did it once with, once without a sleep). When I do that the kmemleak disappears :-)
(not saying that adding the scan isn't useful, just pointing out that this particular leak seems to be related to how quickly we iterate over the testcases)
Christoph