Add support for SuperH/"sh" to nolibc.
Only sh4 is tested for now.
This is only tested on QEMU so far.
Additional testing would be very welcome.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
selftests/nolibc: fix EXTRACONFIG variables ordering
selftests/nolibc: use file driver for QEMU serial
tools/nolibc: add support for SuperH
tools/include/nolibc/arch-sh.h | 162 ++++++++++++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 15 ++-
tools/testing/selftests/nolibc/run-tests.sh | 3 +-
4 files changed, 177 insertions(+), 5 deletions(-)
---
base-commit: 6275a61db2f0586b8a5d651dfc7b4aacf9d0b2d6
change-id: 20250528-nolibc-sh-8b4e3bb8efcb
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Reading /proc/pid/maps requires read-locking mmap_lock which prevents any
other task from concurrently modifying the address space. This guarantees
coherent reporting of virtual address ranges, however it can block
important updates from happening. Oftentimes /proc/pid/maps readers are
low priority monitoring tasks and them blocking high priority tasks
results in priority inversion.
Locking the entire address space is required to present fully coherent
picture of the address space, however even current implementation does not
strictly guarantee that by outputting vmas in page-size chunks and
dropping mmap_lock in between each chunk. Address space modifications are
possible while mmap_lock is dropped and userspace reading the content is
expected to deal with possible concurrent address space modifications.
Considering these relaxed rules, holding mmap_lock is not strictly needed
as long as we can guarantee that a concurrently modified vma is reported
either in its original form or after it was modified.
This patchset switches from holding mmap_lock while reading /proc/pid/maps
to taking per-vma locks as we walk the vma tree. This reduces the
contention with tasks modifying the address space because they would have
to contend for the same vma as opposed to the entire address space. Same
is done for PROCMAP_QUERY ioctl which locks only the vma that fell into
the requested range instead of the entire address space. Previous version
of this patchset [1] tried to perform /proc/pid/maps reading under RCU,
however its implementation is quite complex and the results are worse than
the new version because it still relied on mmap_lock speculation which
retries if any part of the address space gets modified. New implementaion
is both simpler and results in less contention. Note that similar approach
would not work for /proc/pid/smaps reading as it also walks the page table
and that's not RCU-safe.
Paul McKenney's designed a test [2] to measure mmap/munmap latencies while
concurrently reading /proc/pid/maps. The test has a pair of processes
scanning /proc/PID/maps, and another process unmapping and remapping 4K
pages from a 128MB range of anonymous memory. At the end of each 10
second run, the latency of each mmap() or munmap() operation is measured,
and for each run the maximum and mean latency is printed. The map/unmap
process is started first, its PID is passed to the scanners, and then the
map/unmap process waits until both scanners are running before starting
its timed test. The scanners keep scanning until the specified
/proc/PID/maps file disappears. This test registered close to 10x
improvement in update latencies:
Before the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
0.011 0.008 0.455
0.011 0.008 0.472
0.011 0.008 0.535
0.011 0.009 0.545
...
0.011 0.014 2.875
0.011 0.014 2.913
0.011 0.014 3.007
0.011 0.015 3.018
After the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
0.006 0.005 0.036
0.006 0.005 0.039
0.006 0.005 0.039
0.006 0.005 0.039
...
0.006 0.006 0.403
0.006 0.006 0.474
0.006 0.006 0.479
0.006 0.006 0.498
The patchset also adds a number of tests to check for /proc/pid/maps data
coherency. They are designed to detect any unexpected data tearing while
performing some common address space modifications (vma split, resize and
remap). Even before these changes, reading /proc/pid/maps might have
inconsistent data because the file is read page-by-page with mmap_lock
being dropped between the pages. An example of user-visible inconsistency
can be that the same vma is printed twice: once before it was modified and
then after the modifications. For example if vma was extended, it might be
found and reported twice. What is not expected is to see a gap where there
should have been a vma both before and after modification. This patchset
increases the chances of such tearing, therefore it's even more important
now to test for unexpected inconsistencies.
In [3] Lorenzo identified the following possible vma merging/splitting
scenarios:
Merges with changes to existing vmas:
1 Merge both - mapping a vma over another one and between two vmas which
can be merged after this replacement;
2. Merge left full - mapping a vma at the end of an existing one and
completely over its right neighbor;
3. Merge left partial - mapping a vma at the end of an existing one and
partially over its right neighbor;
4. Merge right full - mapping a vma before the start of an existing one
and completely over its left neighbor;
5. Merge right partial - mapping a vma before the start of an existing one
and partially over its left neighbor;
Merges without changes to existing vmas:
6. Merge both - mapping a vma into a gap between two vmas which can be
merged after the insertion;
7. Merge left - mapping a vma at the end of an existing one;
8. Merge right - mapping a vma before the start end of an existing one;
Splits
9. Split with new vma at the lower address;
10. Split with new vma at the higher address;
If such merges or splits happen concurrently with the /proc/maps reading
we might report a vma twice, once before the modification and once after
it is modified:
Case 1 might report overwritten and previous vma along with the final
merged vma;
Case 2 might report previous and the final merged vma;
Case 3 might cause us to retry once we detect the temporary gap caused by
shrinking of the right neighbor;
Case 4 might report overritten and the final merged vma;
Case 5 might cause us to retry once we detect the temporary gap caused by
shrinking of the left neighbor;
Case 6 might report previous vma and the gap along with the final marged
vma;
Case 7 might report previous and the final merged vma;
Case 8 might report the original gap and the final merged vma covering the
gap;
Case 9 might cause us to retry once we detect the temporary gap caused by
shrinking of the original vma at the vma start;
Case 10 might cause us to retry once we detect the temporary gap caused by
shrinking of the original vma at the vma end;
In all these cases the retry mechanism prevents us from reporting possible
temporary gaps.
Changes from v4 [4]:
- refactored trylock_vma() and other locking parts into mmap_lock.c, per
Lorenzo
- renamed {lock|unlock}_content() into {lock|unlock}_vma_range(), per
Lorenzo
- added clarifying comments for sentinels, per Lorenzo
- introduced is_sentinel_pos() helper function
- fixed position reset logic when last_addr is a sentinel, per Lorenzo
- added Acked-by to the last patch, per Andrii Nakryiko
[1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/
[2] https://github.com/paulmckrcu/proc-mmap_sem-test
[3] https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.lo…
[4] https://lore.kernel.org/all/20250604231151.799834-1-surenb@google.com/
Suren Baghdasaryan (7):
selftests/proc: add /proc/pid/maps tearing from vma split test
selftests/proc: extend /proc/pid/maps tearing test to include vma
resizing
selftests/proc: extend /proc/pid/maps tearing test to include vma
remapping
selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently
modified
selftests/proc: add verbose more for tests to facilitate debugging
mm/maps: read proc/pid/maps under per-vma lock
mm/maps: execute PROCMAP_QUERY ioctl under per-vma locks
fs/proc/internal.h | 5 +
fs/proc/task_mmu.c | 179 ++++-
include/linux/mmap_lock.h | 11 +
mm/mmap_lock.c | 88 +++
tools/testing/selftests/proc/proc-pid-vm.c | 793 ++++++++++++++++++++-
5 files changed, 1053 insertions(+), 23 deletions(-)
base-commit: 0b2a863368fb0cf674b40925c55dc8898c5a33af
--
2.50.0.714.g196bf9f422-goog
eOn Tue, Jun 24, 2025 at 11:45:09AM +0530, Dev Jain wrote:
>
> On 23/06/25 11:02 pm, Donet Tom wrote:
> > On Mon, Jun 23, 2025 at 10:23:02AM +0530, Dev Jain wrote:
> > > On 21/06/25 11:25 pm, Donet Tom wrote:
> > > > On Fri, Jun 20, 2025 at 08:15:25PM +0530, Dev Jain wrote:
> > > > > On 19/06/25 1:53 pm, Donet Tom wrote:
> > > > > > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> > > > > > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > > > > > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > > > > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
> > > > > > > > > > > > Are you accounting for sys.max_map_count? If not, then you'll be hitting that
> > > > > > > > > > > > first.
> > > > > > > > > > > run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
> > > > > > > > > > Umm, what? You mean overcommit all mode, and that has no bearing on the max
> > > > > > > > > > mapping count check.
> > > > > > > > > >
> > > > > > > > > > In do_mmap():
> > > > > > > > > >
> > > > > > > > > > /* Too many mappings? */
> > > > > > > > > > if (mm->map_count > sysctl_max_map_count)
> > > > > > > > > > return -ENOMEM;
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > As well as numerous other checks in mm/vma.c.
> > > > > > > > > Ah sorry, didn't look at the code properly just assumed that overcommit_always meant overriding
> > > > > > > > > this.
> > > > > > > > No problem! It's hard to be aware of everything in mm :)
> > > > > > > >
> > > > > > > > > > I'm not sure why an overcommit toggle is even necessary when you could use
> > > > > > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
> > > > > > > > > >
> > > > > > > > > > I'm pretty confused as to what this test is really achieving honestly. This
> > > > > > > > > > isn't a useful way of asserting mmap() behaviour as far as I can tell.
> > > > > > > > > Well, seems like a useful way to me at least : ) Not sure if you are in the mood
> > > > > > > > > to discuss that but if you'd like me to explain from start to end what the test
> > > > > > > > > is doing, I can do that : )
> > > > > > > > >
> > > > > > > > I just don't have time right now, I guess I'll have to come back to it
> > > > > > > > later... it's not the end of the world for it to be iffy in my view as long as
> > > > > > > > it passes, but it might just not be of great value.
> > > > > > > >
> > > > > > > > Philosophically I'd rather we didn't assert internal implementation details like
> > > > > > > > where we place mappings in userland memory. At no point do we promise to not
> > > > > > > > leave larger gaps if we feel like it :)
> > > > > > > You have a fair point. Anyhow a debate for another day.
> > > > > > >
> > > > > > > > I'm guessing, reading more, the _real_ test here is some mathematical assertion
> > > > > > > > about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
> > > > > > > >
> > > > > > > > But again I'm not sure that achieves much and again also is asserting internal
> > > > > > > > implementation details.
> > > > > > > >
> > > > > > > > Correct behaviour of this kind of thing probably better belongs to tests in the
> > > > > > > > userland VMA testing I'd say.
> > > > > > > >
> > > > > > > > Sorry I don't mean to do down work you've done before, just giving an honest
> > > > > > > > technical appraisal!
> > > > > > > Nah, it will be rather hilarious to see it all go down the drain xD
> > > > > > >
> > > > > > > > Anyway don't let this block work to fix the test if it's failing. We can revisit
> > > > > > > > this later.
> > > > > > > Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
> > > > > > > the gap check at the crossing boundary. What do you think?
> > > > > > >
> > > > > > One problem I am seeing with this approach is that, since the hint address
> > > > > > is generated randomly, the VMAs are also being created at randomly based on
> > > > > > the hint address.So, for the VMAs created at high addresses, we cannot guarantee
> > > > > > that the gaps between them will be aligned to MAP_CHUNK_SIZE.
> > > > > >
> > > > > > High address VMAs
> > > > > > -----------------
> > > > > > 1000000000000-1000040000000 r--p 00000000 00:00 0
> > > > > > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > > > > > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > > > > > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > > > > > e80009d260000-fffff9d260000 r--p 00000000 00:00 0
> > > > > >
> > > > > > I have a different approach to solve this issue.
> > > > > It is really weird that such a large amount of VA space
> > > > > is left between the two VMAs yet mmap is failing.
> > > > >
> > > > >
> > > > >
> > > > > Can you please do the following:
> > > > > set /proc/sys/vm/max_map_count to the highest value possible.
> > > > > If running without run_vmtests.sh, set /proc/sys/vm/overcommit_memory to 1.
> > > > > In validate_complete_va_space:
> > > > >
> > > > > if (start_addr >= HIGH_ADDR_MARK && found == false) {
> > > > > found = true;
> > > > > continue;
> > > > > }
> > > > Thanks Dev for the suggestion. I set max_map_count and set overcommit
> > > > memory to 1, added this code change as well, and then tried. Still, the
> > > > test is failing
> > > >
> > > > > where found is initialized to false. This will skip the check
> > > > > for the boundary.
> > > > >
> > > > > After this can you tell whether the test is still failing.
> > > > >
> > > > > Also can you give me the complete output of proc/pid/maps
> > > > > after putting a sleep at the end of the test.
> > > > >
> > > > on powerpc support DEFAULT_MAP_WINDOW is 128TB and with
> > > > total address space size is 4PB With hint it can map upto
> > > > 4PB. Since the hint addres is random in this test random hing VMAs
> > > > are getting created. IIUC this is expected only.
> > > >
> > > >
> > > > 10000000-10010000 r-xp 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 10010000-10020000 r--p 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 10020000-10030000 rw-p 00010000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 30000000-10030000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > > 10030770000-100307a0000 rw-p 00000000 00:00 0 [heap]
> > > > 1004f000000-7fff8f000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > > 7fff8faf0000-7fff8fe00000 rw-p 00000000 00:00 0
> > > > 7fff8fe00000-7fff90030000 r-xp 00000000 fd:00 792355 /usr/lib64/libc.so.6
> > > > 7fff90030000-7fff90040000 r--p 00230000 fd:00 792355 /usr/lib64/libc.so.6
> > > > 7fff90040000-7fff90050000 rw-p 00240000 fd:00 792355 /usr/lib64/libc.so.6
> > > > 7fff90050000-7fff90130000 r-xp 00000000 fd:00 792358 /usr/lib64/libm.so.6
> > > > 7fff90130000-7fff90140000 r--p 000d0000 fd:00 792358 /usr/lib64/libm.so.6
> > > > 7fff90140000-7fff90150000 rw-p 000e0000 fd:00 792358 /usr/lib64/libm.so.6
> > > > 7fff90160000-7fff901a0000 r--p 00000000 00:00 0 [vvar]
> > > > 7fff901a0000-7fff901b0000 r-xp 00000000 00:00 0 [vdso]
> > > > 7fff901b0000-7fff90200000 r-xp 00000000 fd:00 792351 /usr/lib64/ld64.so.2
> > > > 7fff90200000-7fff90210000 r--p 00040000 fd:00 792351 /usr/lib64/ld64.so.2
> > > > 7fff90210000-7fff90220000 rw-p 00050000 fd:00 792351 /usr/lib64/ld64.so.2
> > > > 7fffc9770000-7fffc9880000 rw-p 00000000 00:00 0 [stack]
> > > > 1000000000000-1000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > > 2000000000000-2000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > > 4000000000000-4000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > > 8000000000000-8000040000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > > eb95410220000-fffff90220000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > >
> > > >
> > > >
> > > >
> > > > If I give the hint address serially from 128TB then the address
> > > > space is contigous and gap is also MAP_SIZE, the test is passing.
> > > >
> > > > 10000000-10010000 r-xp 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 10010000-10020000 r--p 00000000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 10020000-10030000 rw-p 00010000 fd:05 134226638 /home/donet/linux/tools/testing/selftests/mm/virtual_address_range
> > > > 33000000-10033000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > > 10033380000-100333b0000 rw-p 00000000 00:00 0 [heap]
> > > > 1006f0f0000-10071000000 rw-p 00000000 00:00 0
> > > > 10071000000-7fffb1000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > > 7fffb15d0000-7fffb1800000 r-xp 00000000 fd:00 792355 /usr/lib64/libc.so.6
> > > > 7fffb1800000-7fffb1810000 r--p 00230000 fd:00 792355 /usr/lib64/libc.so.6
> > > > 7fffb1810000-7fffb1820000 rw-p 00240000 fd:00 792355 /usr/lib64/libc.so.6
> > > > 7fffb1820000-7fffb1900000 r-xp 00000000 fd:00 792358 /usr/lib64/libm.so.6
> > > > 7fffb1900000-7fffb1910000 r--p 000d0000 fd:00 792358 /usr/lib64/libm.so.6
> > > > 7fffb1910000-7fffb1920000 rw-p 000e0000 fd:00 792358 /usr/lib64/libm.so.6
> > > > 7fffb1930000-7fffb1970000 r--p 00000000 00:00 0 [vvar]
> > > > 7fffb1970000-7fffb1980000 r-xp 00000000 00:00 0 [vdso]
> > > > 7fffb1980000-7fffb19d0000 r-xp 00000000 fd:00 792351 /usr/lib64/ld64.so.2
> > > > 7fffb19d0000-7fffb19e0000 r--p 00040000 fd:00 792351 /usr/lib64/ld64.so.2
> > > > 7fffb19e0000-7fffb19f0000 rw-p 00050000 fd:00 792351 /usr/lib64/ld64.so.2
> > > > 7fffc5470000-7fffc5580000 rw-p 00000000 00:00 0 [stack]
> > > > 800000000000-2aab000000000 r--p 00000000 00:00 0 [anon:virtual_address_range]
> > > >
> > > >
> > > Thank you for this output. I can't wrap my head around why this behaviour changes
> > > when you generate the hint sequentially. The mmap() syscall is supposed to do the
> > > following (irrespective of high VA space or not) - if the allocation at the hint
> > Yes, it is working as expected. On PowerPC, the DEFAULT_MAP_WINDOW is
> > 128TB, and the system can map up to 4PB.
> >
> > In the test, the first mmap call maps memory up to 128TB without any
> > hint, so the VMAs are created below the 128TB boundary.
> >
> > In the second mmap call, we provide a hint starting from 256TB, and
> > the hint address is generated randomly above 256TB. The mappings are
> > correctly created at these hint addresses. Since the hint addresses
> > are random, the resulting VMAs are also created at random locations.
> >
> > So, what I tried is: mapping from 0 to 128TB without any hint, and
> > then for the second mmap, instead of starting the hint from 256TB, I
> > started from 128TB. Instead of using random hint addresses, I used
> > sequential hint addresses from 128TB up to 512TB. With this change,
> > the VMAs are created in order, and the test passes.
> >
> > 800000000000-2aab000000000 r--p 00000000 00:00 0 128TB to 512TB VMA
> >
> > I think we will see same behaviour on x86 with X86_FEATURE_LA57.
> >
> > I will send the updated patch in V2.
>
> Since you say it fails on both radix and hash, it means that the generic
> code path is failing. I see that on my system, when I run the test with
> LPA2 config, write() fails with errno set to -ENOMEM. Can you apply
> the following diff and check whether the test fails still. Doing this
> fixed it for arm64.
>
> diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
>
> index b380e102b22f..3032902d01f2 100644
>
> --- a/tools/testing/selftests/mm/virtual_address_range.c
>
> +++ b/tools/testing/selftests/mm/virtual_address_range.c
>
> @@ -173,10 +173,6 @@ static int validate_complete_va_space(void)
>
> */
>
> hop = 0;
>
> while (start_addr + hop < end_addr) {
>
> - if (write(fd, (void *)(start_addr + hop), 1) != 1)
>
> - return 1;
>
> - lseek(fd, 0, SEEK_SET);
>
> -
>
> if (is_marked_vma(vma_name))
>
> munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE);
>
Even with this change, the test is still failing. In this case,
we are allocating physical memory and writing into it, but our
issue seems to be with the gap between VMAs, so I believe this
might not be directly related.
I will send the next revision where the test passes and no
issues are observed
Just curious — with LPA2, is the second mmap() call successful?
And are the VMAs being created at the hint address as expected?
> >
> > > addr succeeds, then all is well, otherwise, do a top-down search for a large
> > > enough gap. I am not aware of the nuances in powerpc but I really am suspecting
> > > a bug in powerpc mmap code. Can you try to do some tracing - which function
> > > eventually fails to find the empty gap?
> > >
> > > Through my limited code tracing - we should end up in slice_find_area_topdown,
> > > then we ask the generic code to find the gap using vm_unmapped_area. So I
> > > suspect something is happening between this, probably slice_scan_available().
> > >
> > > > > > From 0 to 128TB, we map memory directly without using any hint. For the range above
> > > > > > 256TB up to 512TB, we perform the mapping using hint addresses. In the current test,
> > > > > > we use random hint addresses, but I have modified it to generate hint addresses linearly
> > > > > > starting from 128TB.
> > > > > >
> > > > > > With this change:
> > > > > >
> > > > > > The 0–128TB range is mapped without hints and verified accordingly.
> > > > > >
> > > > > > The 128TB–512TB range is mapped using linear hint addresses and then verified.
> > > > > >
> > > > > > Below are the VMAs obtained with this approach:
> > > > > >
> > > > > > 10000000-10010000 r-xp 00000000 fd:05 135019531
> > > > > > 10010000-10020000 r--p 00000000 fd:05 135019531
> > > > > > 10020000-10030000 rw-p 00010000 fd:05 135019531
> > > > > > 20000000-10020000000 r--p 00000000 00:00 0
> > > > > > 10020800000-10020830000 rw-p 00000000 00:00 0
> > > > > > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> > > > > > 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> > > > > > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> > > > > > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> > > > > > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> > > > > > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> > > > > > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> > > > > > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> > > > > > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> > > > > > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> > > > > > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> > > > > > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> > > > > > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> > > > > > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> > > > > > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> > > > > > 800000000000-2000000000000 r--p 00000000 00:00 0 -> High Address (128TB to 512TB)
> > > > > >
> > > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > index 4c4c35eac15e..0be008cba4b0 100644
> > > > > > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > @@ -56,21 +56,21 @@
> > > > > > #ifdef __aarch64__
> > > > > > #define HIGH_ADDR_MARK ADDR_MARK_256TB
> > > > > > -#define HIGH_ADDR_SHIFT 49
> > > > > > +#define HIGH_ADDR_SHIFT 48
> > > > > > #define NR_CHUNKS_LOW NR_CHUNKS_256TB
> > > > > > #define NR_CHUNKS_HIGH NR_CHUNKS_3840TB
> > > > > > #else
> > > > > > #define HIGH_ADDR_MARK ADDR_MARK_128TB
> > > > > > -#define HIGH_ADDR_SHIFT 48
> > > > > > +#define HIGH_ADDR_SHIFT 47
> > > > > > #define NR_CHUNKS_LOW NR_CHUNKS_128TB
> > > > > > #define NR_CHUNKS_HIGH NR_CHUNKS_384TB
> > > > > > #endif
> > > > > > -static char *hint_addr(void)
> > > > > > +static char *hint_addr(int hint)
> > > > > > {
> > > > > > - int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
> > > > > > + unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * MAP_CHUNK_SIZE));
> > > > > > - return (char *) (1UL << bits);
> > > > > > + return (char *) (addr);
> > > > > > }
> > > > > > static void validate_addr(char *ptr, int high_addr)
> > > > > > @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
> > > > > > }
> > > > > > for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> > > > > > - hint = hint_addr();
> > > > > > + hint = hint_addr(i);
> > > > > > hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
> > > > > > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > > > > >
> > > > > >
> > > > > >
> > > > > > Can we fix it this way?
Add support for SuperH/"sh" to nolibc.
Only sh4 is tested for now.
This is only tested on QEMU so far.
Additional testing would be very welcome.
Test instructions:
$ cd tools/testings/selftests/nolibc/
$ make -f Makefile.nolibc ARCH=sh CROSS_COMPILE=sh4-linux- nolibc-test
$ file nolibc-test
nolibc-test: ELF 32-bit LSB executable, Renesas SH, version 1 (SYSV), statically linked, not stripped
$ ./nolibc-test
Running test 'startup'
0 argc = 1 [OK]
...
Total number of errors: 0
Exiting with status 0
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Rebase onto latest nolibc-next
- Pick up Ack from Willy
- Provide some test instructions
- Link to v1: https://lore.kernel.org/r/20250609-nolibc-sh-v1-0-9dcdb1b66bb5@weissschuh.n…
---
Thomas Weißschuh (3):
selftests/nolibc: fix EXTRACONFIG variables ordering
selftests/nolibc: use file driver for QEMU serial
tools/nolibc: add support for SuperH
tools/include/nolibc/arch-sh.h | 162 +++++++++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile.nolibc | 15 ++-
tools/testing/selftests/nolibc/run-tests.sh | 3 +-
4 files changed, 177 insertions(+), 5 deletions(-)
---
base-commit: eb135311083100b6590a7545618cd9760d896a86
change-id: 20250528-nolibc-sh-8b4e3bb8efcb
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
This series creates a new PMU scheme on ARM, a partitioned PMU that
allows reserving a subset of counters for more direct guest access,
significantly reducing overhead. More details, including performance
benchmarks, can be read in the v1 cover letter linked below.
v2:
* Rebased on top of kvm/queue to pick up Sean's patch [1] that
reorganizes some of the same headers and would otherwise conflict.
* Changed the semantics of the command line parameters and the
ioctl. It was pointed out in the comments last time that it doesn't
work to repartition at runtime because the perf subsystem assumes
the number of counters it gets will not change after the PMU is
probed. Now the PMUv3 command line parameters are the sole thing
that divides up guest and host counters and the ioctl just toggles a
flag for whether a vcpu should use the partitioned PMU. I've also
moved from one to two parameters: partition_pmu=[y/n] and
reserved_guest_counters=[0-N]. This makes it possible to
unambiguously express configurations like a partitioned PMU with 0
general purpose counters exposed to the guest (which still exposes
the cycle counter.
* Moved the partitioning code into the PMUv3 driver itself so KVM code
isn't modifying fields that are otherwise internal to the driver.
* Define PMI{CNTR,FILTR} as undef_access since KVM isn't ready to
support that counter. It is, however, still handled in the
partitioning because the driver recognizes it.
* Take out the dependency on FEAT_FGT since it is not widely available
on hardware yet. Instead, define a fast path in switch.h for
handling accesses to the registers that would otherwise be
untrapped.
* During MDCR_EL2 setup for guests, ensure the computed HPMN value is
always below the number of guest counters allocated by the driver at
boot and always below the number of counters on the current
CPU. This accounts for the possibiliy of heterogeneous hardware
where I guest might be able to use the partitioned PMU on one CPU
but not another.
* The KVM PMU event filter API says that counters must not count while
the event is filtered. To ensure this, enforce the filter on every
vcpu_load into the guest.
* Settable PMCR_EL0.N with a partitioned PMU now works and the
vcpu_counter_access selftest changes reflect that.
v1:
https://lore.kernel.org/kvm/20250602192702.2125115-1-coltonlewis@google.com/
Colton Lewis (22):
arm64: cpufeature: Add cpucap for HPMN0
arm64: Generate sign macro for sysreg Enums
arm64: cpufeature: Add cpucap for PMICNTR
arm64: Define PMI{CNTR,FILTR}_EL0 as undef_access
KVM: arm64: Reorganize PMU functions
perf: arm_pmuv3: Introduce method to partition the PMU
perf: arm_pmuv3: Generalize counter bitmasks
perf: arm_pmuv3: Keep out of guest counter partition
KVM: arm64: Correct kvm_arm_pmu_get_max_counters()
KVM: arm64: Set up FGT for Partitioned PMU
KVM: arm64: Writethrough trapped PMEVTYPER register
KVM: arm64: Use physical PMSELR for PMXEVTYPER if partitioned
KVM: arm64: Writethrough trapped PMOVS register
KVM: arm64: Write fast path PMU register handlers
KVM: arm64: Setup MDCR_EL2 to handle a partitioned PMU
KVM: arm64: Account for partitioning in PMCR_EL0 access
KVM: arm64: Context swap Partitioned PMU guest registers
KVM: arm64: Enforce PMU event filter at vcpu_load()
perf: arm_pmuv3: Handle IRQs for Partitioned PMU guest counters
KVM: arm64: Inject recorded guest interrupts
KVM: arm64: Add ioctl to partition the PMU when supported
KVM: arm64: selftests: Add test case for partitioned PMU
Marc Zyngier (1):
KVM: arm64: Cleanup PMU includes
Documentation/virt/kvm/api.rst | 21 +
arch/arm/include/asm/arm_pmuv3.h | 34 +
arch/arm64/include/asm/arm_pmuv3.h | 61 +-
arch/arm64/include/asm/kvm_host.h | 20 +-
arch/arm64/include/asm/kvm_pmu.h | 61 ++
arch/arm64/kernel/cpufeature.c | 15 +
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/arm.c | 22 +
arch/arm64/kvm/debug.c | 24 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 233 ++++++
arch/arm64/kvm/pmu-emul.c | 676 +----------------
arch/arm64/kvm/pmu-part.c | 359 +++++++++
arch/arm64/kvm/pmu.c | 687 ++++++++++++++++++
arch/arm64/kvm/sys_regs.c | 66 +-
arch/arm64/tools/cpucaps | 2 +
arch/arm64/tools/gen-sysreg.awk | 1 +
arch/arm64/tools/sysreg | 6 +-
drivers/perf/arm_pmuv3.c | 150 +++-
include/linux/perf/arm_pmu.h | 15 +-
include/linux/perf/arm_pmuv3.h | 14 +-
include/uapi/linux/kvm.h | 4 +
tools/include/uapi/linux/kvm.h | 2 +
.../selftests/kvm/arm64/vpmu_counter_access.c | 63 +-
virt/kvm/kvm_main.c | 1 +
24 files changed, 1791 insertions(+), 748 deletions(-)
create mode 100644 arch/arm64/kvm/pmu-part.c
base-commit: 79150772457f4d45e38b842d786240c36bb1f97f
--
2.50.0.714.g196bf9f422-goog
Corrected two instances of the misspelled word 'occurences' to
'occurrences' in comments explaining node invariants in sparsebit.c.
These comments describe core behavior of the data structure and
should be clear.
Signed-off-by: Rahul Kumar <rk0006818(a)gmail.com>
---
tools/testing/selftests/kvm/lib/sparsebit.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/sparsebit.c b/tools/testing/selftests/kvm/lib/sparsebit.c
index cfed9d26cc71..a99188f87a38 100644
--- a/tools/testing/selftests/kvm/lib/sparsebit.c
+++ b/tools/testing/selftests/kvm/lib/sparsebit.c
@@ -116,7 +116,7 @@
*
* + A node with all mask bits set only occurs when the last bit
* described by the previous node is not equal to this nodes
- * starting index - 1. All such occurences of this condition are
+ * starting index - 1. All such occurrences of this condition are
* avoided by moving the setting of the nodes mask bits into
* the previous nodes num_after setting.
*
@@ -592,7 +592,7 @@ static struct node *node_split(struct sparsebit *s, sparsebit_idx_t idx)
*
* + A node with all mask bits set only occurs when the last bit
* described by the previous node is not equal to this nodes
- * starting index - 1. All such occurences of this condition are
+ * starting index - 1. All such occurrences of this condition are
* avoided by moving the setting of the nodes mask bits into
* the previous nodes num_after setting.
*/
--
2.43.0
This patch fixes two misspellings of the word 'occurrences' in comments within sparsebit.c used by the KVM selftests.
Fixing the spelling improves readability and clarity of the documented behavior.
Only comment text has been changed — there are no modifications to the functional logic of the tests.
I would appreciate your review and any feedback you may have.
Thank you for your time and support.
Best regards,
Rahul Kumar
Non-KVM folks,
I am hoping to route this through the KVM tree (6.17 or later), as the non-KVM
changes should be glorified nops. Please holler if you object to that idea.
Hyper-V folks in particular, let me know if you want a stable topic branch/tag,
e.g. on the off chance you want to make similar changes to the Hyper-V code,
and I'll make sure that happens.
As for what this series actually does...
Rework KVM's irqfd registration to require that an eventfd is bound to at
most one irqfd throughout the entire system. KVM currently disallows
binding an eventfd to multiple irqfds for a single VM, but doesn't reject
attempts to bind an eventfd to multiple VMs.
This is obviously an ABI change, but I'm fairly confident that it won't
break userspace, because binding an eventfd to multiple irqfds hasn't
truly worked since commit e8dbf19508a1 ("kvm/eventfd: Use priority waitqueue
to catch events before userspace"). A somewhat undocumented, and perhaps
even unintentional, side effect of suppressing eventfd notifications for
userspace is that the priority+exclusive behavior also suppresses eventfd
notifications for any subsequent waiters, even if they are priority waiters.
I.e. only the first VM with an irqfd+eventfd binding will get notifications.
And for IRQ bypass, a.k.a. device posted interrupts, globally unique
bindings are a hard requirement (at least on x86; I assume other archs are
the same). KVM and the IRQ bypass manager kinda sorta handle this, but in
the absolute worst way possible (IMO). Instead of surfacing an error to
userspace, KVM silently ignores IRQ bypass registration errors.
The motivation for this series is to harden against userspace goofs. AFAIK,
we (Google) have never actually had a bug where userspace tries to assign
an eventfd to multiple VMs, but the possibility has come up in more than one
bug investigation (our intra-host, a.k.a. copyless, migration scheme
transfers eventfds from the old to the new VM when updating the host VMM).
v3:
- Retain WQ_FLAG_EXCLUSIVE in mshv_eventfd.c, which snuck in between v1
and v2. [Peter]
- Use EXPORT_SYMBOL_GPL. [Peter]
- Move WQ_FLAG_EXCLUSIVE out of add_wait_queue_priority() in a prep patch
so that the affected subsystems are more explicitly documented (and then
immediately drop the flag from drivers/xen/privcmd.c, which amusingly
hides that file from the diff stats).
v2:
- https://lore.kernel.org/all/20250519185514.2678456-1-seanjc@google.com
- Use guard(spinlock_irqsave). [Prateek]
v1: https://lore.kernel.org/all/20250401204425.904001-1-seanjc@google.com
Sean Christopherson (13):
KVM: Use a local struct to do the initial vfs_poll() on an irqfd
KVM: Acquire SCRU lock outside of irqfds.lock during assignment
KVM: Initialize irqfd waitqueue callback when adding to the queue
KVM: Add irqfd to KVM's list via the vfs_poll() callback
KVM: Add irqfd to eventfd's waitqueue while holding irqfds.lock
sched/wait: Drop WQ_FLAG_EXCLUSIVE from add_wait_queue_priority()
xen: privcmd: Don't mark eventfd waiter as EXCLUSIVE
sched/wait: Add a waitqueue helper for fully exclusive priority
waiters
KVM: Disallow binding multiple irqfds to an eventfd with a priority
waiter
KVM: Drop sanity check that per-VM list of irqfds is unique
KVM: selftests: Assert that eventfd() succeeds in Xen shinfo test
KVM: selftests: Add utilities to create eventfds and do KVM_IRQFD
KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements
drivers/hv/mshv_eventfd.c | 8 ++
include/linux/kvm_irqfd.h | 1 -
include/linux/wait.h | 2 +
kernel/sched/wait.c | 22 ++-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
tools/testing/selftests/kvm/arm64/vgic_irq.c | 12 +-
.../testing/selftests/kvm/include/kvm_util.h | 40 ++++++
tools/testing/selftests/kvm/irqfd_test.c | 130 ++++++++++++++++++
.../selftests/kvm/x86/xen_shinfo_test.c | 21 +--
virt/kvm/eventfd.c | 130 +++++++++++++-----
10 files changed, 302 insertions(+), 65 deletions(-)
create mode 100644 tools/testing/selftests/kvm/irqfd_test.c
base-commit: 45eb29140e68ffe8e93a5471006858a018480a45
--
2.49.0.1151.ga128411c76-goog
Add a basic selftest for the netpoll polling mechanism, specifically
targeting the netpoll poll() side.
The test creates a scenario where network transmission is running at
maximum speed, and netpoll needs to poll the NIC. This is achieved by:
1. Configuring a single RX/TX queue to create contention
2. Generating background traffic to saturate the interface
3. Sending netconsole messages to trigger netpoll polling
4. Using dynamic netconsole targets via configfs
5. Delete and create new netconsole targets after 5 iterations
The test validates a critical netpoll code path by monitoring traffic
flow and ensuring netpoll_poll_dev() is called when the normal TX path
is blocked. Perf probing confirms this test successfully triggers
netpoll_poll_dev() in typical test runs.
This addresses a gap in netpoll test coverage for a path that is
tricky for the network stack.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes since RFC:
- Toggle the netconsole interfaces up and down after 5 iterations.
- Moved the traffic check under DEBUG (Willem de Bruijn).
- Bumped the iterations to 20 given it runs faster now.
- Link to the RFC: https://lore.kernel.org/r/20250612-netpoll_test-v1-1-4774fd95933f@debian.org
---
tools/testing/selftests/drivers/net/Makefile | 1 +
.../testing/selftests/drivers/net/netpoll_basic.py | 231 +++++++++++++++++++++
2 files changed, 232 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index bd309b2d39095..9bd84d6b542e5 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -16,6 +16,7 @@ TEST_PROGS := \
netcons_fragmented_msg.sh \
netcons_overflow.sh \
netcons_sysdata.sh \
+ netpoll_basic.py \
ping.py \
queues.py \
stats.py \
diff --git a/tools/testing/selftests/drivers/net/netpoll_basic.py b/tools/testing/selftests/drivers/net/netpoll_basic.py
new file mode 100755
index 0000000000000..2a81926169262
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netpoll_basic.py
@@ -0,0 +1,231 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+# This test aims to evaluate the netpoll polling mechanism (as in
+# netpoll_poll_dev()). It presents a complex scenario where the network
+# attempts to send a packet but fails, prompting it to poll the NIC from within
+# the netpoll TX side.
+#
+# This has been a crucial path in netpoll that was previously untested. Jakub
+# suggested using a single RX/TX queue, pushing traffic to the NIC, and then
+# sending netpoll messages (via netconsole) to trigger the poll. `perf` probing
+# of netpoll_poll_dev() showed that this test indeed triggers
+# netpoll_poll_dev() once or twice in 10 iterations.
+
+# Author: Breno Leitao <leitao(a)debian.org>
+
+import errno
+import os
+import random
+import string
+import time
+
+from lib.py import (
+ ethtool,
+ GenerateTraffic,
+ ksft_exit,
+ ksft_pr,
+ ksft_run,
+ KsftFailEx,
+ KsftSkipEx,
+ NetdevFamily,
+ NetDrvEpEnv,
+)
+
+NETCONSOLE_CONFIGFS_PATH = "/sys/kernel/config/netconsole"
+REMOTE_PORT = 6666
+LOCAL_PORT = 1514
+# Number of netcons messages to send. I usually see netpoll_poll_dev()
+# being called at least once in 10 iterations. Having 20 to have some buffers
+ITERATIONS = 20
+DEBUG = False
+
+
+def generate_random_netcons_name() -> str:
+ """Generate a random target name starting with 'netcons'"""
+ random_suffix = "".join(random.choices(string.ascii_lowercase + string.digits, k=8))
+ return f"netcons_{random_suffix}"
+
+
+def get_stats(cfg: NetDrvEpEnv, netdevnl: NetdevFamily) -> dict[str, int]:
+ """Get the statistics for the interface"""
+ return netdevnl.qstats_get({"ifindex": cfg.ifindex}, dump=True)[0]
+
+
+def set_single_rx_tx_queue(interface_name: str) -> None:
+ """Set the number of RX and TX queues to 1 using ethtool"""
+ try:
+ # This don't need to be reverted, since interfaces will be deleted after test
+ ethtool(f"-G {interface_name} rx 1 tx 1")
+ except Exception as e:
+ raise KsftSkipEx(
+ f"Failed to configure RX/TX queues: {e}. Ethtool not available?"
+ )
+
+
+def create_netconsole_target(
+ config_data: dict[str, str],
+ target_name: str,
+) -> None:
+ """Create a netconsole dynamic target against the interfaces"""
+ ksft_pr(f"Using netconsole name: {target_name}")
+ try:
+ os.makedirs(f"{NETCONSOLE_CONFIGFS_PATH}/{target_name}", exist_ok=True)
+ ksft_pr(f"Created target directory: {NETCONSOLE_CONFIGFS_PATH}/{target_name}")
+ except OSError as e:
+ if e.errno != errno.EEXIST:
+ raise KsftFailEx(f"Failed to create netconsole target directory: {e}")
+
+ try:
+ for key, value in config_data.items():
+ if DEBUG:
+ ksft_pr(f"Setting {key} to {value}")
+ with open(
+ f"{NETCONSOLE_CONFIGFS_PATH}/{target_name}/{key}",
+ "w",
+ encoding="utf-8",
+ ) as f:
+ # Always convert to string to write to file
+ f.write(str(value))
+ f.close()
+
+ if DEBUG:
+ # Read all configuration values for debugging
+ for debug_key in config_data.keys():
+ with open(
+ f"{NETCONSOLE_CONFIGFS_PATH}/{target_name}/{debug_key}",
+ "r",
+ encoding="utf-8",
+ ) as f:
+ content = f.read()
+ ksft_pr(
+ f"{NETCONSOLE_CONFIGFS_PATH}/{target_name}/{debug_key} {content}"
+ )
+
+ except Exception as e:
+ raise KsftFailEx(f"Failed to configure netconsole target: {e}")
+
+
+def set_netconsole(cfg: NetDrvEpEnv, interface_name: str, target_name: str) -> None:
+ """Configure netconsole on the interface with the given target name"""
+ config_data = {
+ "extended": "1",
+ "dev_name": interface_name,
+ "local_port": LOCAL_PORT,
+ "remote_port": REMOTE_PORT,
+ "local_ip": cfg.addr_v["4"] if cfg.addr_ipver == "4" else cfg.addr_v["6"],
+ "remote_ip": (
+ cfg.remote_addr_v["4"] if cfg.addr_ipver == "4" else cfg.remote_addr_v["6"]
+ ),
+ "remote_mac": "00:00:00:00:00:00", # Not important for this test
+ "enabled": "1",
+ }
+
+ create_netconsole_target(config_data, target_name)
+ ksft_pr(f"Created netconsole target: {target_name} on interface {interface_name}")
+
+
+def delete_netconsole_target(name: str) -> None:
+ """Delete a netconsole dynamic target"""
+ target_path = f"{NETCONSOLE_CONFIGFS_PATH}/{name}"
+ try:
+ if os.path.exists(target_path):
+ os.rmdir(target_path)
+ except OSError as e:
+ raise KsftFailEx(f"Failed to delete netconsole target: {e}")
+
+
+def check_traffic_flowing(cfg: NetDrvEpEnv, netdevnl: NetdevFamily) -> int:
+ """Check if traffic is flowing on the interface"""
+ stat1 = get_stats(cfg, netdevnl)
+ time.sleep(1)
+ stat2 = get_stats(cfg, netdevnl)
+ pkts_per_sec = stat2["rx-packets"] - stat1["rx-packets"]
+ # Just make sure this will not fail even in slow/debug kernels
+ if pkts_per_sec < 10:
+ raise KsftFailEx(f"Traffic seems low: {pkts_per_sec}")
+ if DEBUG:
+ ksft_pr(f"Traffic per second {pkts_per_sec}")
+
+ return pkts_per_sec
+
+
+def do_netpoll_flush(
+ cfg: NetDrvEpEnv, netdevnl: NetdevFamily, ifname: str, target_name: str
+) -> None:
+ """Print messages to the console, trying to trigger a netpoll poll"""
+
+ set_netconsole(cfg, ifname, target_name)
+ for i in range(int(ITERATIONS)):
+ msg = f"netcons test #{i}."
+
+ if DEBUG:
+ pkts_per_s = check_traffic_flowing(cfg, netdevnl)
+ msg += f" ({pkts_per_s} packets/s)"
+
+ with open("/dev/kmsg", "w", encoding="utf-8") as kmsg:
+ kmsg.write(msg)
+
+ if not i % 5:
+ # Every 5 iterations, toggle netconsole
+ delete_netconsole_target(target_name)
+ set_netconsole(cfg, ifname, target_name)
+
+
+def test_netpoll(cfg: NetDrvEpEnv, netdevnl: NetdevFamily) -> None:
+ """
+ Test netpoll by sending traffic to the interface and then sending
+ netconsole messages to trigger a poll
+ """
+
+ target_name = generate_random_netcons_name()
+ ifname = cfg.dev["ifname"]
+ traffic = None
+
+ try:
+ set_single_rx_tx_queue(ifname)
+ traffic = GenerateTraffic(cfg)
+ check_traffic_flowing(cfg, netdevnl)
+ do_netpoll_flush(cfg, netdevnl, ifname, target_name)
+ finally:
+ if traffic:
+ traffic.stop()
+ delete_netconsole_target(target_name)
+
+
+def check_dependencies() -> None:
+ """Check if the dependencies are met"""
+ if not os.path.exists(NETCONSOLE_CONFIGFS_PATH):
+ raise KsftSkipEx(
+ f"Directory {NETCONSOLE_CONFIGFS_PATH} does not exist. CONFIG_NETCONSOLE_DYNAMIC might not be set."
+ )
+
+
+def load_netconsole_module() -> None:
+ """Try to load the netconsole module"""
+ try:
+ os.system("modprobe netconsole")
+ except Exception:
+ # It is fine if we fail to load the module, it will fail later
+ # at check_dependencies()
+ pass
+
+
+def main() -> None:
+ """Main function to run the test"""
+ load_netconsole_module()
+ check_dependencies()
+ netdevnl = NetdevFamily()
+ with NetDrvEpEnv(__file__, nsim_test=True) as cfg:
+ ksft_run(
+ [test_netpoll],
+ args=(
+ cfg,
+ netdevnl,
+ ),
+ )
+ ksft_exit()
+
+
+if __name__ == "__main__":
+ main()
---
base-commit: 4f4040ea5d3e4bebebbef9379f88085c8b99221c
change-id: 20250612-netpoll_test-a1324d2057c8
Best regards,
--
Breno Leitao <leitao(a)debian.org>
The current implementation of test_unmerge_uffd_wp() explicitly sets
`uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP` before calling
UFFDIO_API. This can cause the ioctl() call to fail with EINVAL on kernels
that do not support UFFD-WP, leading the test to fail unnecessarily:
# ------------------------------
# running ./ksm_functional_tests
# ------------------------------
# TAP version 13
# 1..9
# # [RUN] test_unmerge
# ok 1 Pages were unmerged
# # [RUN] test_unmerge_zero_pages
# ok 2 KSM zero pages were unmerged
# # [RUN] test_unmerge_discarded
# ok 3 Pages were unmerged
# # [RUN] test_unmerge_uffd_wp
# not ok 4 UFFDIO_API failed <-----
# # [RUN] test_prot_none
# ok 5 Pages were unmerged
# # [RUN] test_prctl
# ok 6 Setting/clearing PR_SET_MEMORY_MERGE works
# # [RUN] test_prctl_fork
# # No pages got merged
# # [RUN] test_prctl_fork_exec
# ok 7 PR_SET_MEMORY_MERGE value is inherited
# # [RUN] test_prctl_unmerge
# ok 8 Pages were unmerged
# Bail out! 1 out of 8 tests failed
# # Planned tests != run tests (9 != 8)
# # Totals: pass:7 fail:1 xfail:0 xpass:0 skip:0 error:0
# [FAIL]
This patch improves compatibility and error handling by:
1. Changes the feature check to first query supported features (features=0)
rather than specifically requesting WP support.
2. Gracefully skipping the test if:
- UFFDIO_API fails with EINVAL (feature not supported), or
- UFFD_FEATURE_PAGEFAULT_FLAG_WP is not advertised by the kernel.
3. Providing better diagnostics by distinguishing expected failures (e.g.,
EINVAL) from unexpected ones and reporting them using strerror().
The updated logic makes the test more robust across different kernel versions
and configurations, while preserving existing behavior on systems that do
support UFFD-WP.
Signed-off-by: Li Wang <liwang(a)redhat.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Joey Gouly <joey.gouly(a)arm.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Keith Lucas <keith.lucas(a)oracle.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
---
tools/testing/selftests/mm/ksm_functional_tests.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c
index b61803e36d1c..f3db257dc555 100644
--- a/tools/testing/selftests/mm/ksm_functional_tests.c
+++ b/tools/testing/selftests/mm/ksm_functional_tests.c
@@ -393,9 +393,13 @@ static void test_unmerge_uffd_wp(void)
/* See if UFFD-WP is around. */
uffdio_api.api = UFFD_API;
- uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
+ uffdio_api.features = 0;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
- ksft_test_result_fail("UFFDIO_API failed\n");
+ if (errno == EINVAL)
+ ksft_test_result_skip("UFFDIO_API not supported (EINVAL)\n");
+ else
+ ksft_test_result_fail("UFFDIO_API failed: %s\n", strerror(errno));
+
goto close_uffd;
}
if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
--
2.49.0
On GCC 15 the following warnings is emitted:
nolibc-test.c: In function ‘run_stdlib’:
nolibc-test.c:1416:32: warning: initializer-string for array of ‘char’ truncates NUL terminator but destination lacks ‘nonstring’ attribute (11 chars into 10 available) [-Wunterminated-string-initialization]
1416 | char buf[10] = "test123456";
| ^~~~~~~~~~~~
Increase the size of buf to avoid the warning.
It would also be possible to use __attribute__((nonstring)) but that
would require some ifdeffery to work with older compilers.
Fixes: 1063649cf531 ("selftests/nolibc: Add tests for strlcat() and strlcpy()")
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
tools/testing/selftests/nolibc/nolibc-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index dbe13000fb1ac153e9a89f627492daeb584a05d4..52640d8ae402b9e34174ae798e74882ca750ec2b 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -1413,7 +1413,7 @@ int run_stdlib(int min, int max)
* Add some more chars after the \0, to test functions that overwrite the buffer set
* the \0 at the exact right position.
*/
- char buf[10] = "test123456";
+ char buf[11] = "test123456";
buf[4] = '\0';
---
base-commit: eb135311083100b6590a7545618cd9760d896a86
change-id: 20250623-nolibc-nonstring-7fe6974552b5
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Add a .gitignore for the test case build object.
Signed-off-by: Dylan Yudaken <dyudaken(a)gmail.com>
---
Hi,
I noticed this was causing some noise in my git checkout, but perhaps I was
doing something odd that it has not been noticed before?
Regards,
Dylan
tools/testing/selftests/kexec/.gitignore | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 tools/testing/selftests/kexec/.gitignore
diff --git a/tools/testing/selftests/kexec/.gitignore b/tools/testing/selftests/kexec/.gitignore
new file mode 100644
index 000000000000..5f3d9e089ae8
--- /dev/null
+++ b/tools/testing/selftests/kexec/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+test_kexec_jump
base-commit: 86731a2a651e58953fc949573895f2fa6d456841
--
2.49.0
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v8 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
Best regards,
Chia-Yu
Chia-Yu Chang (3):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (12):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option ceb/cep heuristic
tcp: accecn: AccECN ACE field multi-wrap heuristic
tcp: try to avoid safer when ACKs are thinned
.../networking/net_cachelines/tcp_sock.rst | 14 +
include/linux/tcp.h | 34 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 225 ++++++-
include/uapi/linux/tcp.h | 7 +
net/ipv4/syncookies.c | 3 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 30 +-
net/ipv4/tcp_input.c | 611 +++++++++++++++++-
net/ipv4/tcp_ipv4.c | 7 +-
net/ipv4/tcp_minisocks.c | 91 ++-
net/ipv4/tcp_output.c | 303 ++++++++-
net/ipv6/syncookies.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
14 files changed, 1250 insertions(+), 98 deletions(-)
--
2.34.1
Alex posted support for configuring pause frames in fbnic. This flipped
the pause stats test from xfail to fail. Because CI considered xfail as
pass it now flags the test as failing. This shouldn't happen. Also we
currently report pause and FEC tests as passing on virtio which doesn't
make sense.
Jakub Kicinski (2):
selftests: drv-net: stats: fix pylint issues
selftests: drv-net: stats: use skip instead of xfail for unsupported
features
tools/testing/selftests/drivers/net/stats.py | 45 +++++++++++++-------
1 file changed, 30 insertions(+), 15 deletions(-)
--
2.49.0
Remove `use core::ffi::c_void`, which shadows `kernel::ffi::c_void`
brought in via `use crate::prelude::*`, to maintain consistency and
centralize the abstraction.
Since `kernel::ffi::c_void` is a straightforward re-export of
`core::ffi::c_void`, both are functionally equivalent. However, using
`kernel::ffi::c_void` improves consistency across the kernel's Rust code
and provides a unified reference point in case the definition ever needs
to change, even if such a change is unlikely.
Reviewed-by: Benno Lossin <lossin(a)kernel.org>
Signed-off-by: Jesung Yang <y.j3ms.n(a)gmail.com>
Link: https://rust-for-linux.zulipchat.com/#narrow/channel/288089/topic/x/near/52…
---
Changes in v3:
- Rebase on a3b2347343e0
- Remove the explicit import of `kernel::ffi::c_void`
- Reword the commit message accordingly
- Link to v2: https://lore.kernel.org/rust-for-linux/20250528155147.2793921-1-y.j3ms.n@gm…
Changes in v2:
- Add "Link" tag to the related discussion on Zulip
- Reword the commit message to clarify `kernel::ffi::c_void` is a re-export
- Link to v1: https://lore.kernel.org/rust-for-linux/20250526162429.1114862-1-y.j3ms.n@gm…
---
rust/kernel/kunit.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 4b8cdcb21e77..603330f247c7 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -7,7 +7,7 @@
//! Reference: <https://docs.kernel.org/dev-tools/kunit/index.html>
use crate::prelude::*;
-use core::{ffi::c_void, fmt};
+use core::fmt;
/// Prints a KUnit error-level message.
///
base-commit: a3b2347343e077e81d3c169f32c9b2cb1364f4cc
--
2.39.5
Introduce support for the N32 and N64 ABIs. As preparation, the
entrypoint is first simplified significantly. Thanks to Maciej for all
the valuable information.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v3:
- Rebase onto latest nolibc-next
- Link to v2: https://lore.kernel.org/r/20250225-nolibc-mips-n32-v2-0-664b47d87fa0@weisss…
Changes in v2:
- Clean up entrypoint first
- Annotate #endifs
- Link to v1: https://lore.kernel.org/r/20250212-nolibc-mips-n32-v1-1-6892e58d1321@weisss…
---
Thomas Weißschuh (4):
tools/nolibc: MIPS: drop $gp setup
tools/nolibc: MIPS: drop manual stack pointer alignment
tools/nolibc: MIPS: drop noreorder option
tools/nolibc: MIPS: add support for N64 and N32 ABIs
tools/include/nolibc/arch-mips.h | 117 +++++++++++++++++++------
tools/testing/selftests/nolibc/Makefile.nolibc | 26 ++++++
tools/testing/selftests/nolibc/run-tests.sh | 2 +-
3 files changed, 117 insertions(+), 28 deletions(-)
---
base-commit: eb135311083100b6590a7545618cd9760d896a86
change-id: 20231105-nolibc-mips-n32-234901bd910d
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Add a basic selftest for the netpoll polling mechanism, specifically
targeting the netpoll poll() side.
The test creates a scenario where network transmission is running at
maximum sppend, and netpoll needs to poll the NIC. This is achieved by:
1. Configuring a single RX/TX queue to create contention
2. Generating background traffic to saturate the interface
3. Sending netconsole messages to trigger netpoll polling
4. Using dynamic netconsole targets via configfs
The test validates a critical netpoll code path by monitoring traffic
flow and ensuring netpoll_poll_dev() is called when the normal TX path
is blocked. Perf probing confirms this test successfully triggers
netpoll_poll_dev() in typical test runs.
This addresses a gap in netpoll test coverage for a path that is
tricky for the network stack.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Sending as an RFC for your appreciation, but it dpends on [1] which is
stil under review. Once [1] lands, I will send this officially.
Link: https://lore.kernel.org/all/20250611-netdevsim_stat-v1-0-c11b657d96bf@debia… [1]
---
tools/testing/selftests/drivers/net/Makefile | 1 +
.../testing/selftests/drivers/net/netpoll_basic.py | 201 +++++++++++++++++++++
2 files changed, 202 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index be780bcb73a3b..70d6e3a920b7f 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -15,6 +15,7 @@ TEST_PROGS := \
netcons_fragmented_msg.sh \
netcons_overflow.sh \
netcons_sysdata.sh \
+ netpoll_basic.py \
ping.py \
queues.py \
stats.py \
diff --git a/tools/testing/selftests/drivers/net/netpoll_basic.py b/tools/testing/selftests/drivers/net/netpoll_basic.py
new file mode 100755
index 0000000000000..8abdfb2b1eb6e
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netpoll_basic.py
@@ -0,0 +1,201 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+# This test aims to evaluate the netpoll polling mechanism (as in netpoll_poll_dev()).
+# It presents a complex scenario where the network attempts to send a packet but fails,
+# prompting it to poll the NIC from within the netpoll TX side.
+#
+# This has been a crucial path in netpoll that was previously untested. Jakub
+# suggested using a single RX/TX queue, pushing traffic to the NIC, and then sending
+# netpoll messages (via netconsole) to trigger the poll. `perf` probing of netpoll_poll_dev()
+# showed that this test indeed triggers netpoll_poll_dev() once or twice in 10 iterations.
+
+# Author: Breno Leitao <leitao(a)debian.org>
+
+import errno
+import os
+import random
+import string
+import time
+
+from lib.py import (
+ ethtool,
+ GenerateTraffic,
+ ksft_exit,
+ ksft_pr,
+ ksft_run,
+ KsftFailEx,
+ KsftSkipEx,
+ NetdevFamily,
+ NetDrvEpEnv,
+)
+
+NETCONSOLE_CONFIGFS_PATH = "/sys/kernel/config/netconsole"
+REMOTE_PORT = 6666
+LOCAL_PORT = 1514
+# Number of netcons messages to send. I usually see netpoll_poll_dev()
+# being called at least once in 10 iterations.
+ITERATIONS = 10
+DEBUG = False
+
+
+def generate_random_netcons_name() -> str:
+ """Generate a random name starting with 'netcons'"""
+ random_suffix = "".join(random.choices(string.ascii_lowercase + string.digits, k=8))
+ return f"netcons_{random_suffix}"
+
+
+def get_stats(cfg: NetDrvEpEnv, netdevnl: NetdevFamily) -> dict[str, int]:
+ """Get the statistics for the interface"""
+ return netdevnl.qstats_get({"ifindex": cfg.ifindex}, dump=True)[0]
+
+
+def set_single_rx_tx_queue(interface_name: str) -> None:
+ """Set the number of RX and TX queues to 1 using ethtool"""
+ try:
+ # This don't need to be reverted, since interfaces will be deleted after test
+ ethtool(f"-G {interface_name} rx 1 tx 1")
+ except Exception as e:
+ raise KsftSkipEx(
+ f"Failed to configure RX/TX queues: {e}. Ethtool not available?"
+ )
+
+
+def create_netconsole_target(
+ config_data: dict[str, str],
+ target_name: str,
+) -> None:
+ """Create a netconsole dynamic target against the interfaces"""
+ ksft_pr(f"Using netconsole name: {target_name}")
+ try:
+ ksft_pr(f"Created target directory: {NETCONSOLE_CONFIGFS_PATH}/{target_name}")
+ os.makedirs(f"{NETCONSOLE_CONFIGFS_PATH}/{target_name}", exist_ok=True)
+ except OSError as e:
+ if e.errno != errno.EEXIST:
+ raise KsftFailEx(f"Failed to create netconsole target directory: {e}")
+
+ try:
+ for key, value in config_data.items():
+ if DEBUG:
+ ksft_pr(f"Setting {key} to {value}")
+ with open(
+ f"{NETCONSOLE_CONFIGFS_PATH}/{target_name}/{key}",
+ "w",
+ encoding="utf-8",
+ ) as f:
+ # Always convert to string to write to file
+ f.write(str(value))
+ f.close()
+
+ if DEBUG:
+ # Read all configuration values for debugging
+ for debug_key in config_data.keys():
+ with open(
+ f"{NETCONSOLE_CONFIGFS_PATH}/{target_name}/{debug_key}",
+ "r",
+ encoding="utf-8",
+ ) as f:
+ content = f.read()
+ ksft_pr(
+ f"{NETCONSOLE_CONFIGFS_PATH}/{target_name}/{debug_key} {content}"
+ )
+
+ except Exception as e:
+ raise KsftFailEx(f"Failed to configure netconsole target: {e}")
+
+
+def set_netconsole(cfg: NetDrvEpEnv, interface_name: str, target_name: str) -> None:
+ """Configure netconsole on the interface with the given target name"""
+ config_data = {
+ "extended": "1",
+ "dev_name": interface_name,
+ "local_port": LOCAL_PORT,
+ "remote_port": REMOTE_PORT,
+ "local_ip": cfg.addr_v["4"] if cfg.addr_ipver == "4" else cfg.addr_v["6"],
+ "remote_ip": (
+ cfg.remote_addr_v["4"] if cfg.addr_ipver == "4" else cfg.remote_addr_v["6"]
+ ),
+ "remote_mac": "00:00:00:00:00:00", # Not important for this test
+ "enabled": "1",
+ }
+
+ create_netconsole_target(config_data, target_name)
+ ksft_pr(f"Created netconsole target: {target_name} on interface {interface_name}")
+
+
+def delete_netconsole_target(name: str) -> None:
+ """Delete a netconsole dynamic target"""
+ target_path = f"{NETCONSOLE_CONFIGFS_PATH}/{name}"
+ try:
+ if os.path.exists(target_path):
+ os.rmdir(target_path)
+ except OSError as e:
+ raise KsftFailEx(f"Failed to delete netconsole target: {e}")
+
+
+def check_traffic_flowing(cfg: NetDrvEpEnv, netdevnl: NetdevFamily) -> int:
+ """Check if traffic is flowing on the interface"""
+ stat1 = get_stats(cfg, netdevnl)
+ time.sleep(1)
+ stat2 = get_stats(cfg, netdevnl)
+ pkts_per_sec = stat2["rx-packets"] - stat1["rx-packets"]
+ # Just make sure this will not fail even in slow/debug kernels
+ if pkts_per_sec < 10:
+ raise KsftFailEx(f"Traffic seems low: {pkts_per_sec}")
+ if DEBUG:
+ ksft_pr(f"Traffic per second {pkts_per_sec} ", pkts_per_sec)
+
+ return pkts_per_sec
+
+
+def do_netpoll_flush(cfg: NetDrvEpEnv, netdevnl: NetdevFamily) -> None:
+ """Print messages to the console, trying to trigger a netpoll poll"""
+ for i in range(int(ITERATIONS)):
+ pkts_per_s = check_traffic_flowing(cfg, netdevnl)
+ with open("/dev/kmsg", "w", encoding="utf-8") as kmsg:
+ kmsg.write(f"netcons test #{i}: ({pkts_per_s} packets/s)\n")
+
+
+def test_netpoll(cfg: NetDrvEpEnv, netdevnl: NetdevFamily) -> None:
+ """Test netpoll by sending traffic to the interface and then sending netconsole messages to trigger a poll"""
+ target_name = generate_random_netcons_name()
+ ifname = cfg.dev["ifname"]
+ traffic = None
+
+ try:
+ set_single_rx_tx_queue(ifname)
+ traffic = GenerateTraffic(cfg)
+ check_traffic_flowing(cfg, netdevnl)
+ set_netconsole(cfg, ifname, target_name)
+ do_netpoll_flush(cfg, netdevnl)
+ finally:
+ if traffic:
+ traffic.stop()
+ delete_netconsole_target(target_name)
+
+
+def check_dependencies() -> None:
+ """Check if the dependencies are met"""
+ if not os.path.exists(NETCONSOLE_CONFIGFS_PATH):
+ raise KsftSkipEx(
+ f"Directory {NETCONSOLE_CONFIGFS_PATH} does not exist. CONFIG_NETCONSOLE_DYNAMIC might not be set."
+ )
+
+
+def main() -> None:
+ """Main function to run the test"""
+ check_dependencies()
+ netdevnl = NetdevFamily()
+ with NetDrvEpEnv(__file__, nsim_test=True) as cfg:
+ ksft_run(
+ [test_netpoll],
+ args=(
+ cfg,
+ netdevnl,
+ ),
+ )
+ ksft_exit()
+
+
+if __name__ == "__main__":
+ main()
---
base-commit: 5d6d67c4cb10a4b4d3ae35758d5eeed6239afdc8
change-id: 20250612-netpoll_test-a1324d2057c8
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Add a script to test various scenarios where a bridge is involved
in the fastpath. It runs tests in the forward path, and also in
a bridged path.
The setup is similar to a basic home router with multiple lan ports.
It uses 3 pairs of veth-devices. Each or all pairs can be
replaced by a pair of real interfaces, interconnected by wire.
This is necessary to test the behavior when dealing with
dsa ports, foreign (dsa) ports and switchdev userports that support
SWITCHDEV_OBJ_ID_PORT_VLAN.
See the head of the script for a detailed description.
Run without arguments to perform all tests on veth-devices.
Signed-off-by: Eric Woudstra <ericwouds(a)gmail.com>
---
This test script is written first for the proposed bridge-fastpath
patch-sets, but it's use is more general and can easily be expanded.
Changes in v2:
- Moved test-series to functions
- Moved code to set_pair_link() up/down
- Added conntrack zone to bridged traffic
- Test bridge chain prerouting in test without fastpath
and bridge chain forward in tests with fastpath
Some example outputs of this last version of patches from different
hardware, without and with patches:
ALL VETH:
=========
./bridge_fastpath.sh -t
Setup:
CLIENT 0
veth0cl
|
veth0rt
WAN
ROUTER
LAN1 LAN2
veth1rt veth2rt
| |
veth1cl veth2cl
CLIENT 1 CLIENT 2
Without patches:
PASS: unaware bridge, without encaps, without fastpath
PASS: unaware bridge, with single vlan encap, without fastpath
ERROR: unaware bridge, with double q vlan encaps, without fastpath: ipv4/6: established bytes 0 < 4194304
ERROR: unaware bridge, with 802.1ad vlan encaps, without fastpath: ipv4/6: established bytes 0 < 4194304
PASS: aware bridge, without/without vlan encap, without fastpath
PASS: aware bridge, with/without vlan encap, without fastpath
PASS: aware bridge, with/with vlan encap, without fastpath
PASS: aware bridge, without/with vlan encap, without fastpath
PASS: forward, without vlan-device, without vlan encap, client1, without fastpath
PASS: forward, without vlan-device, without vlan encap, client1, with fastpath
PASS: forward, without vlan-device, with vlan encap, client1, without fastpath
ERROR: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4/6: tcp broken
PASS: forward, with vlan-device, with vlan encap, client1, without fastpath
PASS: forward, with vlan-device, with vlan encap, client1, with fastpath
PASS: forward, with vlan-device, without vlan encap, client1, without fastpath
PASS: forward, with vlan-device, without vlan encap, client1, with fastpath
ERROR: bridge fastpath test has failed
With patches:
PASS: unaware bridge, without encaps, without fastpath
PASS: unaware bridge, without encaps, with fastpath
PASS: unaware bridge, with single vlan encap, without fastpath
PASS: unaware bridge, with single vlan encap, with fastpath
PASS: unaware bridge, with double q vlan encaps, without fastpath
PASS: unaware bridge, with double q vlan encaps, with fastpath
PASS: unaware bridge, with 802.1ad vlan encaps, without fastpath
PASS: unaware bridge, with 802.1ad vlan encaps, with fastpath
PASS: aware bridge, without/without vlan encap, without fastpath
PASS: aware bridge, without/without vlan encap, with fastpath
PASS: aware bridge, with/without vlan encap, without fastpath
PASS: aware bridge, with/without vlan encap, with fastpath
PASS: aware bridge, with/with vlan encap, without fastpath
PASS: aware bridge, with/with vlan encap, with fastpath
PASS: aware bridge, without/with vlan encap, without fastpath
PASS: aware bridge, without/with vlan encap, with fastpath
PASS: forward, without vlan-device, without vlan encap, client1, without fastpath
PASS: forward, without vlan-device, without vlan encap, client1, with fastpath
PASS: forward, without vlan-device, with vlan encap, client1, without fastpath
PASS: forward, without vlan-device, with vlan encap, client1, with fastpath
PASS: forward, with vlan-device, with vlan encap, client1, without fastpath
PASS: forward, with vlan-device, with vlan encap, client1, with fastpath
PASS: forward, with vlan-device, without vlan encap, client1, without fastpath
PASS: forward, with vlan-device, without vlan encap, client1, with fastpath
PASS: all tests passed
BANANAPI-R3 (lan1 & lan2 are dsa):
============
Without patches:
./bridge_fastpath.sh -t -0 enu1u2,lan2 -1 enu1u1,lan1 -2 lan4,eth1
Setup:
CLIENT 0
enu1u2
|
lan2
WAN
ROUTER
LAN1 LAN2
lan1 eth1
| |
enu1u1 lan4
CLIENT 1 CLIENT 2
PASS: unaware bridge, without encaps, without fastpath
PASS: unaware bridge, with single vlan encap, without fastpath
PASS: aware bridge, without/without vlan encap, without fastpath
PASS: aware bridge, with/without vlan encap, without fastpath
PASS: aware bridge, with/with vlan encap, without fastpath
PASS: aware bridge, without/with vlan encap, without fastpath
PASS: forward, without vlan-device, without vlan encap, client1, without fastpath
ERROR: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv4: counted bytes 2118540 > 2097152
ERROR: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv6: counted bytes 2117904 > 2097152
PASS: forward, without vlan-device, without vlan encap, client2, without fastpath
PASS: forward, without vlan-device, without vlan encap, client2, with fastpath
PASS: forward, without vlan-device, without vlan encap, client2, with hw_fastpath
PASS: forward, without vlan-device, with vlan encap, client1, without fastpath
ERROR: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4/6: tcp broken
PASS: forward, without vlan-device, with vlan encap, client2, without fastpath
ERROR: forward, without vlan-device, with vlan encap, client2, with fastpath: ipv4/6: tcp broken
PASS: forward, with vlan-device, with vlan encap, client1, without fastpath
PASS: forward, with vlan-device, with vlan encap, client1, with fastpath
PASS: forward, with vlan-device, with vlan encap, client1, with hw_fastpath
PASS: forward, with vlan-device, with vlan encap, client2, without fastpath
PASS: forward, with vlan-device, with vlan encap, client2, with fastpath
PASS: forward, with vlan-device, with vlan encap, client2, with hw_fastpath
PASS: forward, with vlan-device, without vlan encap, client1, without fastpath
PASS: forward, with vlan-device, without vlan encap, client1, with fastpath
PASS: forward, with vlan-device, without vlan encap, client1, with hw_fastpath
PASS: forward, with vlan-device, without vlan encap, client2, without fastpath
ERROR: forward, with vlan-device, without vlan encap, client2, with fastpath: ipv4: counted bytes 2109596 > 2097152
ERROR: forward, with vlan-device, without vlan encap, client2, with fastpath: ipv6: counted bytes 2121432 > 2097152
ERROR: bridge fastpath test has failed
With patches:
PASS: unaware bridge, without encaps, without fastpath
PASS: unaware bridge, without encaps, with fastpath
PASS: unaware bridge, without encaps, with hw_fastpath
PASS: unaware bridge, with single vlan encap, without fastpath
PASS: unaware bridge, with single vlan encap, with fastpath
PASS: unaware bridge, with single vlan encap, with hw_fastpath
PASS: aware bridge, without/without vlan encap, without fastpath
PASS: aware bridge, without/without vlan encap, with fastpath
PASS: aware bridge, without/without vlan encap, with hw_fastpath
PASS: aware bridge, with/without vlan encap, without fastpath
PASS: aware bridge, with/without vlan encap, with fastpath
PASS: aware bridge, with/without vlan encap, with hw_fastpath
PASS: aware bridge, with/with vlan encap, without fastpath
PASS: aware bridge, with/with vlan encap, with fastpath
PASS: aware bridge, with/with vlan encap, with hw_fastpath
PASS: aware bridge, without/with vlan encap, without fastpath
PASS: aware bridge, without/with vlan encap, with fastpath
PASS: aware bridge, without/with vlan encap, with hw_fastpath
PASS: forward, without vlan-device, without vlan encap, client1, without fastpath
PASS: forward, without vlan-device, without vlan encap, client1, with fastpath
PASS: forward, without vlan-device, without vlan encap, client1, with hw_fastpath
PASS: forward, without vlan-device, without vlan encap, client2, without fastpath
PASS: forward, without vlan-device, without vlan encap, client2, with fastpath
PASS: forward, without vlan-device, without vlan encap, client2, with hw_fastpath
PASS: forward, without vlan-device, with vlan encap, client1, without fastpath
PASS: forward, without vlan-device, with vlan encap, client1, with fastpath
PASS: forward, without vlan-device, with vlan encap, client1, with hw_fastpath
PASS: forward, without vlan-device, with vlan encap, client2, without fastpath
PASS: forward, without vlan-device, with vlan encap, client2, with fastpath
PASS: forward, without vlan-device, with vlan encap, client2, with hw_fastpath
PASS: forward, with vlan-device, with vlan encap, client1, without fastpath
PASS: forward, with vlan-device, with vlan encap, client1, with fastpath
PASS: forward, with vlan-device, with vlan encap, client1, with hw_fastpath
PASS: forward, with vlan-device, with vlan encap, client2, without fastpath
PASS: forward, with vlan-device, with vlan encap, client2, with fastpath
PASS: forward, with vlan-device, with vlan encap, client2, with hw_fastpath
PASS: forward, with vlan-device, without vlan encap, client1, without fastpath
PASS: forward, with vlan-device, without vlan encap, client1, with fastpath
PASS: forward, with vlan-device, without vlan encap, client1, with hw_fastpath
PASS: forward, with vlan-device, without vlan encap, client2, without fastpath
PASS: forward, with vlan-device, without vlan encap, client2, with fastpath
PASS: forward, with vlan-device, without vlan encap, client2, with hw_fastpath
PASS: all tests passed
AM3359 (end1 supports SWITCHDEV_OBJ_ID_PORT_VLAN, ipv4 only for now):
=======
./bridge_fastpath.sh -t -a -4 -d -1 enu1u4c2,end1
Without patches:
Setup:
CLIENT 0
veth0cl
|
veth0rt
WAN
ROUTER
LAN1 LAN2
end1 veth2rt
| |
enu1u4c2 veth2cl
CLIENT 1 CLIENT 2
INFO: Skipping unaware bridge
PASS: aware bridge, without/without vlan encap, without fastpath
PASS: aware bridge, with/without vlan encap, without fastpath
PASS: aware bridge, with/with vlan encap, without fastpath
PASS: aware bridge, without/with vlan encap, without fastpath
PASS: forward, without vlan-device, without vlan encap, client1, without fastpath
ERROR: forward, without vlan-device, without vlan encap, client1, with fastpath: ipv4: counted bytes 2190092 > 2097152
PASS: forward, without vlan-device, without vlan encap, client2, without fastpath
PASS: forward, without vlan-device, without vlan encap, client2, with fastpath
PASS: forward, without vlan-device, with vlan encap, client1, without fastpath
ERROR: forward, without vlan-device, with vlan encap, client1, with fastpath: ipv4: tcp broken
PASS: forward, without vlan-device, with vlan encap, client2, without fastpath
ERROR: forward, without vlan-device, with vlan encap, client2, with fastpath: ipv4: tcp broken
PASS: forward, with vlan-device, with vlan encap, client1, without fastpath
PASS: forward, with vlan-device, with vlan encap, client1, with fastpath
PASS: forward, with vlan-device, with vlan encap, client2, without fastpath
PASS: forward, with vlan-device, with vlan encap, client2, with fastpath
PASS: forward, with vlan-device, without vlan encap, client1, without fastpath
PASS: forward, with vlan-device, without vlan encap, client1, with fastpath
PASS: forward, with vlan-device, without vlan encap, client2, without fastpath
PASS: forward, with vlan-device, without vlan encap, client2, with fastpath
ERROR: bridge fastpath test has failed
With patches:
INFO: Skipping unaware bridge
PASS: aware bridge, without/without vlan encap, without fastpath
PASS: aware bridge, without/without vlan encap, with fastpath
PASS: aware bridge, with/without vlan encap, without fastpath
PASS: aware bridge, with/without vlan encap, with fastpath
PASS: aware bridge, with/with vlan encap, without fastpath
PASS: aware bridge, with/with vlan encap, with fastpath
PASS: aware bridge, without/with vlan encap, without fastpath
PASS: aware bridge, without/with vlan encap, with fastpath
PASS: forward, without vlan-device, without vlan encap, client1, without fastpath
PASS: forward, without vlan-device, without vlan encap, client1, with fastpath
PASS: forward, without vlan-device, without vlan encap, client2, without fastpath
PASS: forward, without vlan-device, without vlan encap, client2, with fastpath
PASS: forward, without vlan-device, with vlan encap, client1, without fastpath
PASS: forward, without vlan-device, with vlan encap, client1, with fastpath
PASS: forward, without vlan-device, with vlan encap, client2, without fastpath
PASS: forward, without vlan-device, with vlan encap, client2, with fastpath
PASS: forward, with vlan-device, with vlan encap, client1, without fastpath
PASS: forward, with vlan-device, with vlan encap, client1, with fastpath
PASS: forward, with vlan-device, with vlan encap, client2, without fastpath
PASS: forward, with vlan-device, with vlan encap, client2, with fastpath
PASS: forward, with vlan-device, without vlan encap, client1, without fastpath
PASS: forward, with vlan-device, without vlan encap, client1, with fastpath
PASS: forward, with vlan-device, without vlan encap, client2, without fastpath
PASS: forward, with vlan-device, without vlan encap, client2, with fastpath
PASS: all tests passed
(Some problem still to figure out for my AM3359 hardware: On the second run
of the command the tcp traffic is ok on all tests ipv4. On the first run
the hardware is not setup correctly, some tests report broken tcp even
without fastpath. Also ipv6 tcp broken even on second run even without
fastpath. This may be a problem with my hardware or the test-script,
but anyway it shows the fastpath is functional)
.../testing/selftests/net/netfilter/Makefile | 1 +
.../net/netfilter/bridge_fastpath.sh | 1008 +++++++++++++++++
2 files changed, 1009 insertions(+)
create mode 100755 tools/testing/selftests/net/netfilter/bridge_fastpath.sh
diff --git a/tools/testing/selftests/net/netfilter/Makefile b/tools/testing/selftests/net/netfilter/Makefile
index 3bdcbbdba925..50afe91bc3e2 100644
--- a/tools/testing/selftests/net/netfilter/Makefile
+++ b/tools/testing/selftests/net/netfilter/Makefile
@@ -8,6 +8,7 @@ MNL_LDLIBS := $(shell $(HOSTPKG_CONFIG) --libs libmnl 2>/dev/null || echo -lmnl)
TEST_PROGS := br_netfilter.sh bridge_brouter.sh
TEST_PROGS += br_netfilter_queue.sh
+TEST_PROGS += bridge_fastpath.sh
TEST_PROGS += conntrack_dump_flush.sh
TEST_PROGS += conntrack_icmp_related.sh
TEST_PROGS += conntrack_ipip_mtu.sh
diff --git a/tools/testing/selftests/net/netfilter/bridge_fastpath.sh b/tools/testing/selftests/net/netfilter/bridge_fastpath.sh
new file mode 100755
index 000000000000..82f2ddc946b8
--- /dev/null
+++ b/tools/testing/selftests/net/netfilter/bridge_fastpath.sh
@@ -0,0 +1,1008 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Check if conntrack, nft chain and fastpath is functional in setups
+# where a bridge is in the fastpath.
+#
+# Commandline options make it possible to use real ethernet pairs
+# instead of veth-device pairs. Any, or all, pairs can be tested using
+# real hardware pairs. This is can be useful to test dsa-ports,
+# switchdev (dsa) foreign ports and switchdev ports supporting
+# SWITCHDEV_OBJ_ID_PORT_VLAN.
+#
+# First tcp is tested. Conntrack and nft chain are tested using a counter.
+# When there is a fastpath possible between the interfaces then the
+# fastpath is also tested.
+# When there is a hardware offloaded fastpath possible between the
+# interfaces then the hardware offloaded path is also tested.
+#
+# Setup is as a typical router:
+#
+# nsclientwan
+# |
+# nsrt
+# | |
+# nsclient1 nsclient2
+#
+# Masquerading for ipv4 only.
+#
+# First check if a bridge table forward chain can be setup, skip
+# these tests if this is not possible.
+# Then check if a inet table forward chain can be setup, skip
+# these tests if this is not possible.
+#
+# Different setups of paths are tested that involve a bridge in the
+# fastpath. This can be in the forward-fastpath or in the bridge-fastpath.
+#
+# The first series, in the bridge-fastpath, using a vlan-unaware bridge.
+# Traffic with the following vlan-tags is checked:
+# a. without vlan
+# b. single vlan
+# c. double q vlan (only on veth-devices)
+# d. 802.1ad vlan (only on veth-devices)
+# e. pppoe (when available)
+# f. pppoe-in-q (when available)
+#
+# (for items c to f fastpath can only work when a conntrack zone is set)
+# (double tag testing results in broken tcp traffic on most hardware,
+# in this test setup, use '-a' argument to test it anyway)
+# (pppoe testing takes place if pppd and pppoe-server are installed)
+#
+# The second series, in the bridge-fastpath, using a vlan-aware bridge.
+# Here we test all combinations of ingress/egress with or without single
+# vlan encaps.
+#
+# The third series, in the forward-fastpath, using a vlan-aware bridge,
+# without a vlan-device linked to the master port. We test the same combinations
+# of ingress/egress with or without single vlan encaps.
+#
+# The fourth series, in the forward-fastpath, using a vlan-aware bridge,
+# with a vlan-device linked to the master port. We test the same combinations
+# of ingress/egress with or without single vlan encaps.
+#
+# Note 1: Using dsa userports on both sides of eth-pairs client1 or client2
+# gives erratic and unpredictable results. Use, for example, an usb-eth device
+# on the client side to test a dsa-userport.
+#
+# Note 2: Testing the hardware offloaded fastpath, it is not checked if the
+# packets do not follow the software fastpath instead. A universal way to
+# check this should be added at some point.
+#
+# Note 3: Some interfaces to test on the router side, are netns immutable.
+# Use the -d or --defaultnsrouter option so that the interfaces of the router
+# do not have to change netns. The router is build up in the default netns.
+#
+
+source lib.sh
+
+checktool "nft --version" "run test without nft"
+checktool "socat -h" "run test without socat"
+checktool "bridge -V" "run test without bridge"
+
+NR_OF_TESTS=4
+VID1=100
+VID2=101
+BRWAN=brwan
+BRLAN=brlan
+BRCL=brcl
+LINKUP_TIMEOUT=10
+PING_TIMEOUT=10
+SOCAT_TIMEOUT=10
+filesize=2 # MiB
+
+filein=$(mktemp)
+file1out=$(mktemp)
+file2out=$(mktemp)
+pppoeserveroptions=$(mktemp)
+pppoeserverpid=$(mktemp)
+
+setup_ns nsclientwan nsclientlan1 nsclientlan2
+
+ WAN=0 ; LAN1=1 ; LAN2=2 ; ADWAN=3 ; ADLAN=4
+nsa=( $nsclientwan $nsclientlan1 $nsclientlan2 ) # $nsrt $nsrt
+AD4=( '192.168.1.1' '192.168.2.101' '192.168.2.102' '192.168.1.2' '192.168.2.1' )
+AD6=( 'dead:1::1' 'dead:2::101' 'dead:2::102' 'dead:1::2' 'dead:2::1' )
+
+tests_string=$(seq 1 $NR_OF_TESTS)
+
+while [ "${1:-}" != '' ]; do
+ case "$1" in
+ '-0' | '--pairwan')
+ shift
+ vethcl[$WAN]="${1%,*}"
+ vethrt[$WAN]="${1#*,}"
+ ;;
+ '-1' | '--pairlan1')
+ shift
+ vethcl[$LAN1]="${1%,*}"
+ vethrt[$LAN1]="${1#*,}"
+ ;;
+ '-2' | '--pairlan2')
+ shift
+ vethcl[$LAN2]="${1%,*}"
+ vethrt[$LAN2]="${1#*,}"
+ ;;
+ '-s' | '--filesize')
+ shift
+ filesize=$1
+ ;;
+ '-p' | '--parts')
+ shift
+ tests_string=$1
+ ;;
+ '-4' | '--ipv4')
+ do_ipv4=1
+ ;;
+ '-6' | '--ipv6')
+ do_ipv6=1
+ ;;
+ '-n' | '--noskip')
+ noskip=1
+ ;;
+ '-d' | '--defaultnsrouter')
+ defaultnsrouter=1
+ ;;
+ '-f' | '--fixmac')
+ fixmac=1
+ ;;
+ '-t' | '--showtree')
+ showtree=1
+ ;;
+ *)
+ cat <<-EOF
+ Usage: $(basename $0) [OPTION]...
+ -0 --pairwan eth0cl,eth0rt pair of real interfaces to use on wan side
+ -1 --pairlan1 eth1cl,eth1rt pair of real interfaces to use on lan1 side
+ -2 --pairlan2 eth2cl,eth2rt pair of real interfaces to use on lan2 side
+ -s --filesize filesize to use for testing
+ -p --parts partnumbers of tests to run, comma separated
+ -4|-6 --ipv4|--ipv6 test ipv4/6 only
+ -d --defaultnsrouter router in default network namespace, caution!
+ -f --fixmac change mac address when conflict found
+ -n --noskip also perform the normally skipped tests
+ -t --showtree show the tree of used interfaces
+ EOF
+ exit $ksft_skip
+ ;;
+ esac
+ shift
+done
+
+for i in ${tests_string//','/' '}; do
+ tests[$i]="yes"
+done
+
+if [ -n "$defaultnsrouter" ]; then
+ nsrt="nsrt-$(mktemp -u XXXXXX)"
+ touch /var/run/netns/$nsrt
+ mount --bind /proc/1/ns/net /var/run/netns/$nsrt
+else
+ setup_ns nsrt
+fi
+nsa+=($nsrt $nsrt)
+
+cleanup() {
+ if [ -n "$defaultnsrouter" ]; then
+ umount /var/run/netns/$nsrt
+ rm -f /var/run/netns/$nsrt
+ fi
+ cleanup_all_ns
+ rm -f "$filein" "$file1out" "$file2out" "$pppoeserveroptions" "$pppoeserverpid"
+}
+
+trap cleanup EXIT
+
+head -c $(($filesize * 1024 * 1024)) < /dev/urandom > "$filein"
+
+check_mac()
+{
+ local ns=$1
+ local dev=$2
+ local othermacs=$3
+ local mac
+
+ mac=$(ip -net "$ns" -br link show dev "$dev" | \
+ grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}')
+
+ if [[ ! "$othermacs" =~ "$mac" ]]; then
+ echo $mac
+ return 0
+ fi
+ echo "WARN: Conflicting mac address $dev $mac" 1>&2
+
+ [ -z "$fixmac" ] && return 1
+
+ for (( j = 0 ; j < 10 ; j++ )); do
+ mac="${mac::6}$(printf %02x:%02x:%02x:%02x $(($RANDOM%256)) \
+ $(($RANDOM%256)) $(($RANDOM%256)) $(($RANDOM%256)))"
+ [[ "$othermacs" =~ "$mac" ]] && continue
+ echo $mac
+ ip -net "$ns" link set dev "$dev" address "$mac" 1>&2
+ return $?
+ done
+ return 1
+}
+
+is_linkup()
+{
+ local ns=$1
+ local dev=$2
+
+ if [ -n "$(ip -net "$ns" link show dev "$dev" up 2>/dev/null | \
+ grep 'state UP')" ]; then
+ return 0
+ fi
+ return 1
+}
+
+set_pair_link()
+{
+ local arg=$1
+ local all="${@:2}"
+ local lret=0
+ local i j
+
+ for i in $WAN $LAN1 $LAN2; do
+ ns="${nsa[$i]}"
+ ip -net "$ns" link set "${vethcl[$i]}" $arg
+ lret=$(($lret | $?))
+ ip -net "$nsrt" link set "${vethrt[$i]}" $arg
+ lret=$(($lret | $?))
+ done
+ [ $lret -ne 0 ] && return 1
+
+ [[ "$arg" != "up" ]] && return 0
+
+ for j in $(seq 1 $(($LINKUP_TIMEOUT * 5 ))); do
+ lret=0
+ for i in $all; do
+ ns="${nsa[$i]}"
+ is_linkup $ns "${vethcl[$i]}"
+ lret=$(($lret | $?))
+ is_linkup $nsrt "${vethrt[$i]}"
+ lret=$(($lret | $?))
+ done
+ [ $lret -eq 0 ] && break
+ sleep 0.2
+ done
+ return $lret
+}
+
+wait_ping()
+{
+ local i1=$1
+ local i2=$2
+ local ns1=${nsa[$i1]}
+ local j
+
+ for j in $(seq 1 $(($PING_TIMEOUT * 5 ))); do
+ ip netns exec "$ns1" ping -c 1 -w $PING_TIMEOUT -i 0.2 \
+ -q "${AD4[$i2]}" >/dev/null 2>&1
+ [ $? -le 1 ] && return $?
+ sleep 0.2
+ done
+ return 1
+}
+
+add_addr()
+{
+ local i=$1
+ local dev=$2
+ local ns=${nsa[$i]}
+ local ad4=${AD4[$i]}
+ local ad6=${AD6[$i]}
+
+ ip -net "$ns" addr add "${ad4}/24" dev "$dev"
+ ip -net "$ns" addr add "${ad6}/64" dev "$dev" nodad
+ if [[ "$ns" == "nsclientlan"* ]]; then
+ ip -net "$ns" route add default via "${AD4[$ADLAN]}"
+ ip -net "$ns" route add default via "${AD6[$ADLAN]}"
+ elif [[ "$ns" == "nsclientwan"* ]]; then
+ ip -net "$ns" route add default via "${AD6[$ADWAN]}"
+ fi
+
+}
+
+del_addr()
+{
+ local i=$1
+ local dev=$2
+ local ns=${nsa[$i]}
+ local ad4=${AD4[$i]}
+ local ad6=${AD6[$i]}
+
+ if [[ "$ns" == "nsclientlan"* ]]; then
+ ip -net "$ns" route del default via "${AD6[$ADLAN]}"
+ ip -net "$ns" route del default via "${AD4[$ADLAN]}"
+ elif [[ "$ns" == "nsclientwan"* ]]; then
+ ip -net "$ns" route del default via "${AD6[$ADWAN]}"
+ fi
+ ip -net "$ns" addr del "${ad6}/64" dev "$dev" nodad
+ ip -net "$ns" addr del "${ad4}/24" dev "$dev"
+}
+
+set_client()
+{
+ local i=$1
+ local vlan=$2
+ local arg=$3
+ local ns=${nsa[$i]}
+ local vdev="${vethcl[$i]}"
+ local brdev="$BRCL"
+ local proto=""
+ local pvidslave=""
+
+ unset_client $i
+
+ if [[ "$vlan" == "qq" ]]; then
+ ip -net "$ns" link add link "$vdev" name "$vdev.$VID1" type vlan id $VID1
+ ip -net "$ns" link add link "$vdev.$VID1" name "$vdev.$VID1.$VID2" \
+ type vlan id $VID2
+ ip -net "$ns" link set "$vdev.$VID1" up
+ ip -net "$ns" link set "$vdev.$VID1.$VID2" up
+ add_addr $i "$vdev.$VID1.$VID2"
+ return
+ fi
+
+ [[ "$vlan" == "none" ]] && pvidslave="pvid untagged"
+ [[ "$vlan" == "ad" ]] && proto="vlan_protocol 802.1ad"
+
+ ip -net "$ns" link add "$brdev" type bridge vlan_filtering 1 vlan_default_pvid 0 $proto
+ ip -net "$ns" link set "$vdev" master "$brdev"
+ ip -net "$ns" link set "$brdev" up
+
+ bridge -net "$ns" vlan add dev "$brdev" vid $VID1 pvid untagged self
+ bridge -net "$ns" vlan add dev "$vdev" vid $VID1 $pvidslave
+
+ if [[ "$vlan" == "ad" ]]; then
+ ip -net "$ns" link add link "$brdev" name "$brdev.$VID2" type vlan id $VID2
+ brdev="$brdev.$VID2"
+ ip -net "$ns" link set "$brdev" up
+ fi
+
+ if [[ "$arg" != "noaddress" ]]; then
+ add_addr $i "$brdev"
+ fi
+}
+
+unset_client()
+{
+ local i=$1
+ local ns=${nsa[$i]}
+ local vdev="${vethcl[$i]}"
+ local brdev="$BRCL"
+
+ ip -net "$ns" link del "$brdev" type bridge 2>/dev/null
+ ip -net "$ns" link del "$vdev.$VID1" 2>/dev/null
+}
+
+add_pppoe()
+{
+ local i1=$1
+ local i2=$2
+ local dev1=$3
+ local dev2=$4
+ local desc=$5
+ local ns1=${nsa[$i1]}
+ local ns2=${nsa[$i2]}
+
+ ppp1=0
+ while [ -n "$(ip -net "$ns1" link show ppp$ppp1 2>/dev/null)" ]
+ do ((ppp1++)); done
+ echo "noauth defaultroute noipdefault unit $ppp1" >"$pppoeserveroptions"
+ ppp1="ppp$ppp1"
+
+ if ! ip netns exec "$ns1" pppoe-server -k -L "${AD4[$i1]}" -R "${AD4[$i2]}" \
+ -I $dev1 -X "$pppoeserverpid" -O "$pppoeserveroptions" >/dev/null; then
+ echo "ERROR: $desc: failed to setup pppoe server" 1>&2
+ return 1
+ fi
+
+ if ! ip netns exec "$ns2" pppd plugin pppoe.so nic-$dev2 persist holdoff 0 noauth \
+ defaultroute noipdefault noaccomp nodeflate noproxyarp nopcomp \
+ novj novjccomp linkname "selftest-$$" >/dev/null; then
+ echo "ERROR: $desc: failed to setup pppoe client" 1>&2
+ return 1
+ fi
+
+ if ! wait_ping $i1 $i2; then
+ echo "ERROR: $desc: failed to setup functional pppoe connection" 1>&2
+ return 1
+ fi
+
+ ppp2=$(cat "/run/pppd/ppp-selftest-$$.pid" | tail -n 1)
+
+ ip -net "$ns1" addr add "${AD6[$i1]}/64" dev "$ppp1" nodad
+ ip -net "$ns2" addr add "${AD6[$i2]}/64" dev "$ppp2" nodad
+
+ return 0
+}
+
+del_pppoe()
+{
+ local i1=$1
+ local i2=$2
+ local dev1=$3
+ local dev2=$4
+ local ns1=${nsa[$i1]}
+ local ns2=${nsa[$i2]}
+
+ [[ -n "$ppp1" ]] && ip -net "$ns1" addr del "${AD6[$i1]}/64" dev "$ppp1"
+ [[ -n "$ppp2" ]] && ip -net "$ns2" addr del "${AD6[$i2]}/64" dev "$ppp2"
+
+ kill -9 $(cat "/run/pppd/ppp-selftest-$$.pid" | head -n 1) \
+ $(cat "$pppoeserverpid" | head -n 1)
+}
+
+listener_ready()
+{
+ local ns=$1
+ local ipv=$2
+
+ ss -N "$ns" --ipv$ipv -lnt -o "sport = :8080" | grep -q 8080
+}
+
+test_tcp() {
+ local i1=$1
+ local i2=$2
+ local dofast=$3
+ local desc=$4
+ local ns1=${nsa[$i1]}
+ local ns2=${nsa[$i2]}
+ local i=-1
+ local lret=0
+ local ads=""
+ local ipv ad a lpid bytes limit error
+
+ if [ -n "$do_ipv4" ]; then ads="${AD4[$i2]}"
+ elif [ -n "$do_ipv6" ]; then ads="${AD6[$i2]}"
+ else ads="${AD4[$i2]} ${AD6[$i2]}"
+ fi
+ for ad in $ads; do
+ ((i++))
+ if [[ "$ad" =~ ":" ]]
+ then ipv="6"; a="[${ad}]"
+ else ipv="4"; a="${ad}"
+ fi
+
+ rm -f "$file1out" "$file2out"
+
+ # ip netns exec "$nsrt" nft reset counters >/dev/null
+ # But on some systems this results in 4GB values in packet and byte count, so:
+ (echo "flush ruleset"; ip netns exec "$nsrt" nft --stateless list ruleset) | \
+ ip netns exec "$nsrt" nft -f -
+
+ timeout "$SOCAT_TIMEOUT" ip netns exec "$ns2" socat TCP$ipv-LISTEN:8080,reuseaddr \
+ STDIO <"$filein" >"$file2out" 2>/dev/null &
+ lpid=$!
+ busywait 1000 listener_ready "$ns2" "$ipv"
+
+ timeout "$SOCAT_TIMEOUT" ip netns exec "$ns1" socat TCP$ipv:$a:8080 \
+ STDIO <"$filein" >"$file1out" 2>/dev/null
+ wait $lpid
+
+ if [ $? -ne 0 ]; then
+ error[$i]="ipv$ipv: tcp broken"
+ continue
+ fi
+ if ! cmp "$filein" "$file1out" >/dev/null 2>&1; then
+ error[$i]="ipv$ipv: file mismatch to ${ad}"
+ continue
+ fi
+ if ! cmp "$filein" "$file2out" >/dev/null 2>&1; then
+ error[$i]="ipv$ipv: file mismatch from ${ad}"
+ continue
+ fi
+
+ limit=$((2 * $filesize * 1024 * 1024))
+ bytes=$(ip netns exec "$nsrt" nft list counter $family filter "check" | \
+ grep "packets" | cut -d' ' -f4)
+ if [ -z "$dofast" ] && [ "$bytes" -lt "$limit" ]; then
+
+ error[$i]="ipv$ipv: established bytes $bytes < $limit"
+ continue
+ fi
+ if [ -n "$dofast" ] && [ "$bytes" -gt "$((limit/2))" ]; then
+ # Significant reduction of bytes expected
+ error[$i]="ipv$ipv: counted bytes $bytes > $((limit/2))"
+ continue
+ fi
+ done
+
+ if [ -n "${error[0]}" ]; then
+ if [[ "${error[0]#*:}" == "${error[1]#*:}" ]]; then
+ echo "ERROR: $desc: ipv4/6:${error[0]#*:}" 1>&2
+ return 1
+ fi
+ echo "ERROR: $desc: ${error[0]}" 1>&2
+ lret=1
+ fi
+ if [ -n "${error[1]}" ]; then
+ echo "ERROR: $desc: ${error[1]}" 1>&2
+ lret=1
+ fi
+ if [ $lret -eq 0 ]; then
+ echo "PASS: $desc"
+ fi
+ return $lret
+}
+
+test_paths() {
+ local i1=$1
+ local i2=$2
+ local desc=$3
+ local ns1=${nsa[$i1]}
+ local ns2=${nsa[$i2]}
+
+
+ if ! setup_nftables $i1 $i2; then
+ echo "ERROR: $desc: cannot setup nftables" 1>&2
+ return 1
+ fi
+ if ! test_tcp $i1 $i2 "" "$desc without fastpath"; then
+ return 1
+ fi
+
+ if ! setup_fastpath $i1 $i2 "" 2>/dev/null; then
+ return 0
+ fi
+ if ! test_tcp $i1 $i2 "fast" "$desc with fastpath"; then
+ return 1
+ fi
+
+ if ! setup_fastpath $i1 $i2 "hw" 2>/dev/null; then
+ return 0
+ fi
+ if ! test_tcp $i1 $i2 "fast" "$desc with hw_fastpath"; then
+ return 1
+ fi
+
+ return 0
+
+}
+
+add_masq()
+{
+ if [[ $family != "bridge" ]]; then
+ ip netns exec "$nsrt" nft -f - <<-EOF
+ table ip nat {
+ chain postrouting {
+ type nat hook postrouting priority 0;
+ oifname ${BRWAN} masquerade
+ }
+ }
+ EOF
+ else
+ return 0
+ fi
+}
+
+add_zone()
+{
+ local devs=$1
+
+ if [[ $family == "bridge" ]]; then
+ ip netns exec "$nsrt" nft -f - <<-EOF
+ table ${family} filter {
+ chain preroutingzones {
+ type filter hook prerouting priority -300;
+ iif ${devs} ct zone set 23
+ }
+ }
+ EOF
+ fi
+}
+
+setup_nftables()
+{
+ local devs="{ ${vethrt[$1]} , ${vethrt[$2]} }"
+ local i1=$1
+ local i2=$2
+
+ ip netns exec "$nsrt" nft flush ruleset
+
+ if ! add_masq; then
+ return 1
+ fi
+
+ add_zone "${devs}" 2>/dev/null
+
+ ip netns exec "$nsrt" nft -f - <<-EOF
+ table ${family} filter {
+ counter check { }
+ chain prerouting {
+ type filter hook prerouting priority 0; policy accept;
+ ct state established ip saddr ${AD4[$i1]} tcp dport 8080 counter name "check"
+ ct state established ip saddr ${AD4[$i2]} tcp sport 8080 counter name "check"
+ ct state established ip6 saddr ${AD6[$i1]} tcp dport 8080 counter name "check"
+ ct state established ip6 saddr ${AD6[$i2]} tcp sport 8080 counter name "check"
+ }
+ }
+ EOF
+}
+
+setup_fastpath()
+{
+ local devs="{ ${vethrt[$1]} , ${vethrt[$2]} }"
+ local arg=$3
+ local flags=""
+
+ [[ "$arg" == "hw" ]] && flags="flags offload"
+
+ ip netns exec "$nsrt" nft flush ruleset
+
+ if ! add_masq; then
+ return 1
+ fi
+
+ add_zone "${devs}" 2>/dev/null
+
+ ip netns exec "$nsrt" nft -f - <<-EOF
+ table ${family} filter {
+ counter check { }
+ flowtable f {
+ hook ingress priority filter
+ devices = ${devs}
+ ${flags}
+ }
+ chain forward {
+ type filter hook forward priority 0; policy accept;
+ counter name "check"
+ ct state established flow add @f
+ }
+ }
+ EOF
+}
+
+test_1_unaware_bridge()
+{
+ local lret=0
+ local i
+
+ for i in $LAN1 $LAN2; do
+ set_client $i none
+ done
+
+ test_paths $LAN1 $LAN2 "unaware bridge, without encaps, "
+ lret=$(($lret | $?))
+
+ for i in $LAN1 $LAN2; do
+ set_client $i q
+ done
+
+ test_paths $LAN1 $LAN2 "unaware bridge, with single vlan encap, "
+ lret=$(($lret | $?))
+
+ for i in $LAN1 $LAN2; do
+ set_client $i qq
+ done
+
+ # Skip testing double tagged packets on real hardware
+ if [ -n "$lan_all_veth" ] || [ -n "$noskip" ]; then
+
+ test_paths $LAN1 $LAN2 "unaware bridge, with double q vlan encaps,"
+ lret=$(($lret | $?))
+
+ for i in $LAN1 $LAN2; do
+ set_client $i ad
+ done
+
+ test_paths $LAN1 $LAN2 "unaware bridge, with 802.1ad vlan encaps, "
+ lret=$(($lret | $?))
+
+ fi
+ # End Skip testing double tagged packets
+
+ if [ -n "$(command -v pppd 2>/dev/null)" ] &&
+ [ -n "$(command -v pppoe-server 2>/dev/null)" ]; then
+ # Start pppoe
+
+ for i in $LAN1 $LAN2; do
+ set_client $i none noaddress
+ done
+
+ if add_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" "unaware bridge, with pppoe encap"; then
+ test_paths $LAN1 $LAN2 "unaware bridge, with pppoe encap, "
+ lret=$(($lret | $?))
+ fi
+
+ del_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL"
+
+ for i in $LAN1 $LAN2; do
+ set_client $i q noaddress
+ done
+
+ if add_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL" "unaware bridge, with pppoe-in-q encaps"; then
+ test_paths $LAN1 $LAN2 "unaware bridge, with pppoe-in-q encaps, "
+ lret=$(($lret | $?))
+ fi
+
+ del_pppoe $LAN1 $LAN2 "$BRCL" "$BRCL"
+
+ # End pppoe
+ fi
+
+ for i in $LAN1 $LAN2; do
+ unset_client $i
+ done
+ return $lret
+}
+
+test_2_aware_bridge()
+{
+ local lret=0
+ local i
+
+ for i in $LAN1 $LAN2; do
+ bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged
+ set_client $i none
+ done
+ test_paths $LAN1 $LAN2 "aware bridge, without/without vlan encap,"
+ lret=$(($lret | $?))
+
+ i=$LAN1
+ bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged
+ bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1
+ set_client $i q
+
+ test_paths $LAN1 $LAN2 "aware bridge, with/without vlan encap, "
+ lret=$(($lret | $?))
+
+ i=$LAN2
+ bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged
+ bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1
+ set_client $i q
+
+ test_paths $LAN1 $LAN2 "aware bridge, with/with vlan encap, "
+ lret=$(($lret | $?))
+
+ i=$LAN1
+ bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1
+ bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged
+ set_client $i none
+
+ test_paths $LAN1 $LAN2 "aware bridge, without/with vlan encap, "
+ lret=$(($lret | $?))
+
+ i=$LAN1
+ bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged
+ unset_client $i
+ i=$LAN2
+ bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1
+ unset_client $i
+
+ return $lret
+}
+
+test_3_forward_without_vlandev()
+{
+ local wo=$1
+ local lret=0
+ local i
+
+ [[ "$wo" == "" ]] && wo="without"
+
+ for i in $LAN1 $LAN2; do
+ bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1 pvid untagged
+ set_client $i none
+ done
+
+ test_paths $LAN1 $WAN "forward, $wo vlan-device, without vlan encap, client1,"
+ lret=$(($lret | $?))
+ if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then
+ test_paths $LAN2 $WAN "forward, $wo vlan-device, without vlan encap, client2,"
+ lret=$(($lret | $?))
+ fi
+
+ for i in $LAN1 $LAN2; do
+ bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1 pvid untagged
+ bridge -net "$nsrt" vlan add dev "${vethrt[$i]}" vid $VID1
+ set_client $i q
+ done
+
+ test_paths $LAN1 $WAN "forward, $wo vlan-device, with vlan encap, client1,"
+ lret=$(($lret | $?))
+ if [ -z "$lan_all_veth" ] || [ -n "$noskip" ]; then
+ test_paths $LAN2 $WAN "forward, $wo vlan-device, with vlan encap, client2,"
+ lret=$(($lret | $?))
+ fi
+
+ for i in $LAN1 $LAN2; do
+ bridge -net "$nsrt" vlan del dev "${vethrt[$i]}" vid $VID1
+ unset_client $i
+ done
+ return $lret
+}
+
+test_4_forward_with_vlandev()
+{
+ test_3_forward_without_vlandev "with"
+ return $?
+}
+
+test_5_bond()
+{
+ local lret=0
+ local i
+
+ for i in $LAN1; do
+ unset_client $i
+ done
+ return $lret
+}
+
+ret=0
+### Start Initial Setup ###
+
+for i in 4 6; do
+ ip netns exec "$nsrt" sysctl -q net.ipv$i.conf.all.forwarding=1
+done
+
+### Use brwan to make sure software fastpath is ###
+### direct xmit in other direction also ###
+
+ip -net "$nsrt" link add $BRWAN type bridge
+ret=$(($ret | $?))
+ip -net "$nsrt" link set $BRWAN up
+ret=$(($ret | $?))
+if [ $ret -ne 0 ]; then
+ echo "SKIP: Can't create bridge"
+ exit $ksft_skip
+fi
+
+# If both lan clients are veth-devices, only test 1 in the forward path
+if [ -z "${vethcl[$LAN1]}" ] && [ -z "${vethcl[$LAN2]}" ]; then
+ lan_all_veth=1
+fi
+
+for i in $WAN $LAN1 $LAN2; do
+ ns="${nsa[$i]}"
+ if [ -z "${vethcl[$i]}" ]; then
+ vethcl[$i]="veth${i}cl"
+ vethrt[$i]="veth${i}rt"
+ ip link add "${vethcl[$i]}" netns "$ns" type veth \
+ peer name "${vethrt[$i]}" netns "$nsrt"
+ ret=$(($ret | $?))
+ else # Use pair of interconnected hardware interfaces
+ ip link set "${vethrt[$i]}" netns "$nsrt"
+ ret=$(($ret | $?))
+ ip link set "${vethcl[$i]}" netns "$ns"
+ ret=$(($ret | $?))
+ fi
+done
+if [ $ret -ne 0 ]; then
+ echo "SKIP: (v)eth pairs cannot be used"
+ exit $ksft_skip
+fi
+
+if [ -n "$showtree" ]; then
+ cat <<-EOF
+ Setup:
+ CLIENT 0
+ ${vethcl[$WAN]}
+ |
+ ${vethrt[$WAN]}
+ WAN
+ ROUTER
+ LAN1 LAN2
+ $(printf "%14.14s" ${vethrt[$LAN1]}) ${vethrt[$LAN2]}
+ | |
+ $(printf "%14.14s" ${vethcl[$LAN1]}) ${vethcl[$LAN2]}
+ CLIENT 1 CLIENT 2
+
+ EOF
+fi
+
+for n in nsclientwan nsclientlan; do
+ routerside=""; clientside=""
+ for i in $WAN $LAN1 $LAN2; do
+ ns="${nsa[$i]}"
+ [[ "$ns" != "$n"* ]] && continue
+ mac=$(check_mac $ns ${vethcl[$i]} "$routerside $clientside")
+ ret=$(($ret | $?))
+ clientside+=" $mac"
+ mac=$(check_mac $nsrt ${vethrt[$i]} "$clientside")
+ ret=$(($ret | $?))
+ routerside+=" $mac"
+ done
+done
+if [ $ret -ne 0 ]; then
+ echo "SKIP: conflicting mac address"
+ exit $ksft_skip
+fi
+
+set_pair_link up $WAN $LAN1 $LAN2
+ret=$(($ret | $?))
+if [ $ret -ne 0 ]; then
+ echo "SKIP: setting (v)eth pairs link up failed"
+ exit $ksft_skip
+fi
+
+i=$WAN
+ip -net "$nsrt" link set "${vethrt[$i]}" master $BRWAN
+set_client $i none
+add_addr $ADWAN "$BRWAN"
+
+family="bridge"
+setup_nftables $LAN1 $LAN2 2>/dev/null
+if [ $? -ne 0 ]; then
+ echo "INFO: Cannot add nftables table $family"
+ tests[1]=""; test[2]=""
+fi
+family="inet"
+if ! setup_nftables $WAN $LAN1 2>/dev/null; then
+ echo "INFO: Cannot add nftables table $family"
+ tests[3]=""; test[4]=""; tests[5]=""
+fi
+
+### End Initial Setup ###
+
+if [ -n "${tests[1]}" ]; then
+ # Setup brlan as vlan unaware bridge
+ family="bridge"
+ ip -net "$nsrt" link add $BRLAN type bridge
+ ip -net "$nsrt" link set $BRLAN up
+ for i in $LAN1 $LAN2; do
+ ip -net "$nsrt" link set "${vethrt[$i]}" master $BRLAN
+ done
+ test_1_unaware_bridge
+ ret=$(($ret | $?))
+ ip -net "$nsrt" link del $BRLAN type bridge
+fi
+
+if [ -n "${tests[2]}" ] || [ -n "${tests[3]}" ] || [ -n "${tests[4]}" ]; then
+ # Setup brlan as vlan aware bridge
+ family="bridge"
+
+ ip -net "$nsrt" link add $BRLAN type bridge vlan_filtering 1 vlan_default_pvid 0
+ ip -net "$nsrt" link set $BRLAN up
+ bridge -net "$nsrt" vlan add dev $BRLAN vid $VID1 pvid untagged self
+ add_addr $ADLAN "$BRLAN"
+ for i in $LAN1 $LAN2; do
+ ip -net "$nsrt" link set "${vethrt[$i]}" master $BRLAN
+ done
+
+ if [ -n "${tests[2]}" ]; then
+ test_2_aware_bridge
+ ret=$(($ret | $?))
+ fi
+
+ family="inet"
+
+ if [ -n "${tests[3]}" ]; then
+ test_3_forward_without_vlandev
+ ret=$(($ret | $?))
+ fi
+
+ if [ -n "${tests[4]}" ]; then
+ # Setup vlan-device linked to brlan master port
+ del_addr $ADLAN "$BRLAN"
+ ip -net "$nsrt" link set $BRLAN down
+ bridge -net "$nsrt" vlan del dev $BRLAN vid $VID1 pvid untagged self
+ bridge -net "$nsrt" vlan add dev $BRLAN vid $VID1 self
+ ip -net "$nsrt" link add link $BRLAN name $BRLAN.$VID1 type vlan id $VID1
+ ip -net "$nsrt" link set $BRLAN up
+ ip -net "$nsrt" link set "$BRLAN.$VID1" up
+ add_addr $ADLAN "$BRLAN.$VID1"
+ test_4_forward_with_vlandev
+ ret=$(($ret | $?))
+ fi
+
+ ip -net "$nsrt" link del $BRLAN type bridge
+fi
+
+### Finish tests ###
+
+ip -net "$nsrt" link del $BRWAN type bridge
+
+for i in $WAN $LAN1 $LAN2; do
+ unset_client $i
+done
+
+if [ $ret -eq 0 ]; then
+ echo "PASS: all tests passed"
+else
+ echo "ERROR: bridge fastpath test has failed"
+fi
+
+exit $ret
--
2.47.1
On Sat, Jun 21, 2025 at 05:08:18PM +0530, Dev Jain wrote:
>
> On 21/06/25 4:40 pm, wang lian wrote:
> > From cb505647eb5f418d1ff5e807361f4c3a337c251f Mon Sep 17 00:00:00 2001
> > From: Lian Wang <lianux.mm(a)gmail.com>
> > Date: Sat, 21 Jun 2025 18:51:49 +0800
> > Subject: [PATCH] selftests/mm: add test for (BATCH_PROCESS)MADV_DONTNEED
> >
> > Let's add a simple test for MADV_DONTNEED and PROCESS_MADV_DONTNEED,
> > and inspired by SeongJae Park's test at GitHub[1] add batch test
> > for PROCESS_MADV_DONTNEED,but for now it influence by workload and
> > need add some race conditions test.We can add it later.
> >
> > Signed-off-by: Lian Wang <lianux.mm(a)gmail.com>
> > References
> > ==========
> >
> > [1] https://github.com/sjp38/eval_proc_madvise
> >
> >
>
> Hello Lian,
>
>
> Thank you for your patch. Please configure your editor to take a TAB as 8
> spaces. And,
>
> your email client to sending plain text messages instead of HTML. Please
> resend the
>
> patch after making these changes.
Thanks for resending this, but please do put '[RESEND PATCH]' rather than
'[PATCH]' so we can differentiate!
Have reviewed the resent version.
Cheers, Lorenzo
validate_addr() function checks whether the address returned by mmap()
lies in the low or high VA space, according to whether a high addr hint
was passed or not. The fix commit mentioned below changed the code in
such a way that this function will always return failure when passed
high_addr == 1; addr will be >= HIGH_ADDR_MARK always, we will fall
down to "if (addr < HIGH_ADDR_MARK)" and return failure. Fix this.
Fixes: d1d86ce28d0f ("selftests/mm: virtual_address_range: conform to TAP format output")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
tools/testing/selftests/mm/virtual_address_range.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c
index b380e102b22f..169dbd692bf5 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -77,8 +77,11 @@ static void validate_addr(char *ptr, int high_addr)
{
unsigned long addr = (unsigned long) ptr;
- if (high_addr && addr < HIGH_ADDR_MARK)
- ksft_exit_fail_msg("Bad address %lx\n", addr);
+ if (high_addr) {
+ if (addr < HIGH_ADDR_MARK)
+ ksft_exit_fail_msg("Bad address %lx\n", addr);
+ return;
+ }
if (addr > HIGH_ADDR_MARK)
ksft_exit_fail_msg("Bad address %lx\n", addr);
--
2.30.2
The coredump.socket_detect_userspace_client test occasionally fails:
# RUN coredump.socket_detect_userspace_client ...
# stackdump_test.c:500:socket_detect_userspace_client:Expected 0 (0) != WIFEXITED(status) (0)
# socket_detect_userspace_client: Test terminated by assertion
# FAIL coredump.socket_detect_userspace_client
not ok 3 coredump.socket_detect_userspace_client
because there is no guarantee that client's write() happens before server's
close(). The client gets terminated SIGPIPE, and thus the test fails.
Add a read() to server to make sure server's close() doesn't happen before
client's write().
Fixes: 7b6724fe9a6b ("selftests/coredump: add tests for AF_UNIX coredumps")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
---
tools/testing/selftests/coredump/stackdump_test.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/coredump/stackdump_test.c b/tools/testing/selftests/coredump/stackdump_test.c
index 9984413be9f06..68f8e479ac368 100644
--- a/tools/testing/selftests/coredump/stackdump_test.c
+++ b/tools/testing/selftests/coredump/stackdump_test.c
@@ -461,10 +461,15 @@ TEST_F(coredump, socket_detect_userspace_client)
_exit(EXIT_FAILURE);
}
+ ret = read(fd_coredump, &c, 1);
+
close(fd_coredump);
close(fd_server);
close(fd_peer_pidfd);
close(fd_core_file);
+
+ if (ret < 1)
+ _exit(EXIT_FAILURE);
_exit(EXIT_SUCCESS);
}
self->pid_coredump_server = pid_coredump_server;
--
2.39.5
This started with a patch that enabled `clippy::ptr_as_ptr`. Benno
Lossin suggested I also look into `clippy::ptr_cast_constness` and I
discovered `clippy::as_ptr_cast_mut`. This series now enables all 3
lints. It also enables `clippy::as_underscore` which ensures other
pointer casts weren't missed.
As a later addition, `clippy::cast_lossless` and `clippy::ref_as_ptr`
are also enabled.
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v12:
- Remove stale mention of a dependency. (Miguel Ojeda)
- Apply to config, cpufreq, and nova. (Miguel Ojeda)
- Link to v11: https://lore.kernel.org/r/20250611-ptr-as-ptr-v11-0-ce5b41c6e9c6@gmail.com
Changes in v11:
- Rebase on v6.16-rc1.
- Replace some `as <integer>` with `as bindings::T` and others with `as
ffi::T`. (Miguel Ojeda)
- Revert explicit `ffi::c_void` import which is in the prelude. (Miguel Ojeda)
- Link to v10: https://lore.kernel.org/r/20250418-ptr-as-ptr-v10-0-3d63d27907aa@gmail.com
Changes in v10:
- Move fragment from "rust: enable `clippy::ptr_cast_constness` lint" to
"rust: enable `clippy::ptr_as_ptr` lint". (Boqun Feng)
- Replace `(...).into()` with `T::from(...)` where the destination type
isn't obvious in "rust: enable `clippy::cast_lossless` lint". (Boqun
Feng)
- Link to v9: https://lore.kernel.org/r/20250416-ptr-as-ptr-v9-0-18ec29b1b1f3@gmail.com
Changes in v9:
- Replace ref-to-ptr coercion using `let` bindings with
`core::ptr::from_{ref,mut}`. (Boqun Feng).
- Link to v8: https://lore.kernel.org/r/20250409-ptr-as-ptr-v8-0-3738061534ef@gmail.com
Changes in v8:
- Use coercion to go ref -> ptr.
- rustfmt.
- Rebase on v6.15-rc1.
- Extract first commit to its own series as it is shared with other
series.
- Link to v7: https://lore.kernel.org/r/20250325-ptr-as-ptr-v7-0-87ab452147b9@gmail.com
Changes in v7:
- Add patch to enable `clippy::ref_as_ptr`.
- Link to v6: https://lore.kernel.org/r/20250324-ptr-as-ptr-v6-0-49d1b7fd4290@gmail.com
Changes in v6:
- Drop strict provenance patch.
- Fix URLs in doc comments.
- Add patch to enable `clippy::cast_lossless`.
- Rebase on rust-next.
- Link to v5: https://lore.kernel.org/r/20250317-ptr-as-ptr-v5-0-5b5f21fa230a@gmail.com
Changes in v5:
- Use `pointer::addr` in OF. (Boqun Feng)
- Add documentation on stubs. (Benno Lossin)
- Mark stubs `#[inline]`.
- Pick up Alice's RB on a shared commit from
https://lore.kernel.org/all/Z9f-3Aj3_FWBZRrm@google.com/.
- Link to v4: https://lore.kernel.org/r/20250315-ptr-as-ptr-v4-0-b2d72c14dc26@gmail.com
Changes in v4:
- Add missing SoB. (Benno Lossin)
- Use `without_provenance_mut` in alloc. (Boqun Feng)
- Limit strict provenance lints to the `kernel` crate to avoid complex
logic in the build system. This can be revisited on MSRV >= 1.84.0.
- Rebase on rust-next.
- Link to v3: https://lore.kernel.org/r/20250314-ptr-as-ptr-v3-0-e7ba61048f4a@gmail.com
Changes in v3:
- Fixed clippy warning in rust/kernel/firmware.rs. (kernel test robot)
Link: https://lore.kernel.org/all/202503120332.YTCpFEvv-lkp@intel.com/
- s/as u64/as bindings::phys_addr_t/g. (Benno Lossin)
- Use strict provenance APIs and enable lints. (Benno Lossin)
- Link to v2: https://lore.kernel.org/r/20250309-ptr-as-ptr-v2-0-25d60ad922b7@gmail.com
Changes in v2:
- Fixed typo in first commit message.
- Added additional patches, converted to series.
- Link to v1: https://lore.kernel.org/r/20250307-ptr-as-ptr-v1-1-582d06514c98@gmail.com
---
Tamir Duberstein (6):
rust: enable `clippy::ptr_as_ptr` lint
rust: enable `clippy::ptr_cast_constness` lint
rust: enable `clippy::as_ptr_cast_mut` lint
rust: enable `clippy::as_underscore` lint
rust: enable `clippy::cast_lossless` lint
rust: enable `clippy::ref_as_ptr` lint
Makefile | 6 ++++
drivers/gpu/drm/drm_panic_qr.rs | 4 +--
drivers/gpu/nova-core/driver.rs | 2 +-
drivers/gpu/nova-core/regs.rs | 2 +-
drivers/gpu/nova-core/regs/macros.rs | 2 +-
rust/bindings/lib.rs | 3 ++
rust/kernel/alloc/allocator_test.rs | 2 +-
rust/kernel/alloc/kvec.rs | 4 +--
rust/kernel/block/mq/operations.rs | 2 +-
rust/kernel/block/mq/request.rs | 11 +++++--
rust/kernel/configfs.rs | 22 +++++---------
rust/kernel/cpufreq.rs | 2 +-
rust/kernel/device.rs | 4 +--
rust/kernel/device_id.rs | 4 +--
rust/kernel/devres.rs | 17 +++++------
rust/kernel/dma.rs | 6 ++--
rust/kernel/drm/device.rs | 6 ++--
rust/kernel/error.rs | 2 +-
rust/kernel/firmware.rs | 3 +-
rust/kernel/fs/file.rs | 2 +-
rust/kernel/io.rs | 18 ++++++------
rust/kernel/kunit.rs | 11 ++++---
rust/kernel/list/impl_list_item_mod.rs | 2 +-
rust/kernel/miscdevice.rs | 2 +-
rust/kernel/mm/virt.rs | 52 +++++++++++++++++-----------------
rust/kernel/net/phy.rs | 4 +--
rust/kernel/of.rs | 6 ++--
rust/kernel/pci.rs | 11 ++++---
rust/kernel/platform.rs | 4 ++-
rust/kernel/print.rs | 6 ++--
rust/kernel/seq_file.rs | 2 +-
rust/kernel/str.rs | 14 ++++-----
rust/kernel/sync/poll.rs | 2 +-
rust/kernel/time/hrtimer/pin.rs | 2 +-
rust/kernel/time/hrtimer/pin_mut.rs | 2 +-
rust/kernel/uaccess.rs | 4 +--
rust/kernel/workqueue.rs | 8 +++---
rust/uapi/lib.rs | 3 ++
38 files changed, 139 insertions(+), 120 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250307-ptr-as-ptr-21b1867fc4d4
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
DAMON sysfs interface is the bridge between the user space and the
kernel space for DAMON parameters. There is no good and simple test to
see if the parameters are set as expected. Existing DAMON selftests
therefore test end-to-end features. For example, damos_quota_goal.py
runs a DAMOS scheme with quota goal set against a test program running
an artificial access pattern, and see if the result is as expected.
Such tests cover only a few part of DAMON. Adding more tests is also
complicated. Finally, the reliability of the test itself on different
systems is bad.
'drgn' is a tool that can extract kernel internal data structures like
DAMON parameters. Add a test that passes specific DAMON parameters via
DAMON sysfs reusing _damon_sysfs.py, extract resulting DAMON parameters
via 'drgn', and compare those. Note that this test is not adding
exhaustive tests of all DAMON parameters and input combinations but very
basic things. Advancing the test infrastructure and adding more tests
are future works.
SeongJae Park (6):
selftests/damon: add drgn script for extracting damon status
selftests/damon/_damon_sysfs: set Kdamond.pid in start()
selftests/damon: add python and drgn-based DAMON sysfs test
selftests/damon/sysfs.py: test monitoring attribute parameters
selftests/damon/sysfs.py: test adaptive targets parameter
selftests/damon/sysfs.py: test DAMOS schemes parameters setup
tools/testing/selftests/damon/Makefile | 1 +
tools/testing/selftests/damon/_damon_sysfs.py | 3 +
.../selftests/damon/drgn_dump_damon_status.py | 161 ++++++++++++++++++
tools/testing/selftests/damon/sysfs.py | 115 +++++++++++++
4 files changed, 280 insertions(+)
create mode 100755 tools/testing/selftests/damon/drgn_dump_damon_status.py
create mode 100755 tools/testing/selftests/damon/sysfs.py
base-commit: 59f618c718d036132b59bcf997943d4f5520149f
--
2.39.5
This commit adds a new kernel selftest to verify RTNLGRP_IPV6_ACADDR
notifications. The test works by adding/removing a dummy interface,
enabling packet forwarding, and then confirming that user space can
correctly receive anycast notifications.
The test relies on the iproute2 version to be 6.13+.
Tested by the following command:
$ vng -v --user root --cpus 16 -- \
make -C tools/testing/selftests TARGETS=net
TEST_PROGS=rtnetlink_notification.sh \
TEST_GEN_PROGS="" run_tests
Cc: Maciej Żenczykowski <maze(a)google.com>
Cc: Lorenzo Colitti <lorenzo(a)google.com>
Signed-off-by: Yuyang Huang <yuyanghuang(a)google.com>
---
Changelog since v1:
- Remote unrelated clean up code.
.../selftests/net/rtnetlink_notification.sh | 44 ++++++++++++++++++-
1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/rtnetlink_notification.sh b/tools/testing/selftests/net/rtnetlink_notification.sh
index 39c1b815bbe4..3f9780232bd6 100755
--- a/tools/testing/selftests/net/rtnetlink_notification.sh
+++ b/tools/testing/selftests/net/rtnetlink_notification.sh
@@ -8,9 +8,11 @@
ALL_TESTS="
kci_test_mcast_addr_notification
+ kci_test_anycast_addr_notification
"
source lib.sh
+test_dev="test-dummy1"
kci_test_mcast_addr_notification()
{
@@ -18,7 +20,6 @@ kci_test_mcast_addr_notification()
local tmpfile
local monitor_pid
local match_result
- local test_dev="test-dummy1"
tmpfile=$(mktemp)
defer rm "$tmpfile"
@@ -56,6 +57,47 @@ kci_test_mcast_addr_notification()
return $RET
}
+kci_test_anycast_addr_notification()
+{
+ RET=0
+ local tmpfile
+ local monitor_pid
+ local match_result
+
+ tmpfile=$(mktemp)
+ defer rm "$tmpfile"
+
+ ip monitor acaddress > "$tmpfile" &
+ monitor_pid=$!
+ defer kill_process "$monitor_pid"
+ sleep 1
+
+ if [ ! -e "/proc/$monitor_pid" ]; then
+ RET=$ksft_skip
+ log_test "anycast addr notification: iproute2 too old"
+ return "$RET"
+ fi
+
+ ip link add name "$test_dev" type dummy
+ check_err $? "failed to add dummy interface"
+ ip link set "$test_dev" up
+ check_err $? "failed to set dummy interface up"
+ sysctl -qw net.ipv6.conf."$test_dev".forwarding=1
+ ip link del dev "$test_dev"
+ check_err $? "Failed to delete dummy interface"
+ sleep 1
+
+ # There should be 2 line matches as follows.
+ # 9: dummy2 inet6 any fe80:: scope global
+ # Deleted 9: dummy2 inet6 any fe80:: scope global
+ match_result=$(grep -cE "$test_dev.*(fe80::)" "$tmpfile")
+ if [ "$match_result" -ne 2 ]; then
+ RET=$ksft_fail
+ fi
+ log_test "anycast addr notification: Expected 2 matches, got $match_result"
+ return "$RET"
+}
+
#check for needed privileges
if [ "$(id -u)" -ne 0 ];then
RET=$ksft_skip
--
2.50.0.rc2.701.gf1e915cc24-goog
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 0fda241fa62b0..b3920c59bf401 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 0d0a5fae59547..bc4d940f66d40 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -17,7 +17,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
param_test_mm_cid_benchmark param_test_mm_cid_compare_twice \
- syscall_errors_test
+ syscall_errors_test mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..7ddde3b657dd6
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.49.0
This series is built on top of the Fuad's v7 "mapping guest_memfd backed
memory at the host" [1].
With James's KVM userfault [2], it is possible to handle stage-2 faults
in guest_memfd in userspace. However, KVM itself also triggers faults
in guest_memfd in some cases, for example: PV interfaces like kvmclock,
PV EOI and page table walking code when fetching the MMIO instruction on
x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3]
that KVM would be accessing those pages via userspace page tables. In
order for such faults to be handled in userspace, guest_memfd needs to
support userfaultfd.
Changes since v2 [4]:
- James: Fix sgp type when calling shmem_get_folio_gfp
- James: Improved vm_ops->fault() error handling
- James: Add and make use of the can_userfault() VMA operation
- James: Add UFFD_FEATURE_MINOR_GUEST_MEMFD feature flag
- James: Fix typos and add more checks in the test
Nikita
[1] https://lore.kernel.org/kvm/20250318161823.4005529-1-tabba@google.com/T/
[2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com/…
[3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAo…
[4] https://lore.kernel.org/kvm/20250402160721.97596-1-kalyazin@amazon.com/T/
Nikita Kalyazin (6):
mm: userfaultfd: generic continue for non hugetlbfs
mm: provide can_userfault vma operation
mm: userfaultfd: use can_userfault vma operation
KVM: guest_memfd: add support for userfaultfd minor
mm: userfaultfd: add UFFD_FEATURE_MINOR_GUEST_MEMFD
KVM: selftests: test userfaultfd minor for guest_memfd
fs/userfaultfd.c | 3 +-
include/linux/mm.h | 5 +
include/linux/mm_types.h | 4 +
include/linux/userfaultfd_k.h | 10 +-
include/uapi/linux/userfaultfd.h | 8 +-
mm/hugetlb.c | 9 +-
mm/shmem.c | 17 +++-
mm/userfaultfd.c | 47 ++++++---
.../testing/selftests/kvm/guest_memfd_test.c | 99 +++++++++++++++++++
virt/kvm/guest_memfd.c | 10 ++
10 files changed, 188 insertions(+), 24 deletions(-)
base-commit: 3cc51efc17a2c41a480eed36b31c1773936717e0
--
2.47.1
Currently testing of userspace and in-kernel API use two different
frameworks. kselftests for the userspace ones and Kunit for the
in-kernel ones. Besides their different scopes, both have different
strengths and limitations:
Kunit:
* Tests are normal kernel code.
* They use the regular kernel toolchain.
* They can be packaged and distributed as modules conveniently.
Kselftests:
* Tests are normal userspace code
* They need a userspace toolchain.
A kernel cross toolchain is likely not enough.
* A fair amout of userland is required to run the tests,
which means a full distro or handcrafted rootfs.
* There is no way to conveniently package and run kselftests with a
given kernel image.
* The kselftests makefiles are not as powerful as regular kbuild.
For example they are missing proper header dependency tracking or more
complex compiler option modifications.
Therefore kunit is much easier to run against different kernel
configurations and architectures.
This series aims to combine kselftests and kunit, avoiding both their
limitations. It works by compiling the userspace kselftests as part of
the regular kernel build, embedding them into the kunit kernel or module
and executing them from there. If the kernel toolchain is not fit to
produce userspace because of a missing libc, the kernel's own nolibc can
be used instead.
The structured TAP output from the kselftest is integrated into the
kunit KTAP output transparently, the kunit parser can parse the combined
logs together.
Further room for improvements:
* Call each test in its completely dedicated namespace
* Handle additional test files besides the test executable through
archives. CPIO, cramfs, etc.
* Compatibility with kselftest_harness.h (in progress)
* Expose the blobs in debugfs
* Provide some convience wrappers around compat userprogs
* Figure out a migration path/coexistence solution for
kunit UAPI and tools/testing/selftests/
Output from the kunit example testcase, note the output of
"example_uapi_tests".
$ ./tools/testing/kunit/kunit.py run --kunitconfig lib/kunit example
...
Running tests with:
$ .kunit/linux kunit.filter_glob=example kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[11:53:53] ================== example (10 subtests) ===================
[11:53:53] [PASSED] example_simple_test
[11:53:53] [SKIPPED] example_skip_test
[11:53:53] [SKIPPED] example_mark_skipped_test
[11:53:53] [PASSED] example_all_expect_macros_test
[11:53:53] [PASSED] example_static_stub_test
[11:53:53] [PASSED] example_static_stub_using_fn_ptr_test
[11:53:53] [PASSED] example_priv_test
[11:53:53] =================== example_params_test ===================
[11:53:53] [SKIPPED] example value 3
[11:53:53] [PASSED] example value 2
[11:53:53] [PASSED] example value 1
[11:53:53] [SKIPPED] example value 0
[11:53:53] =============== [PASSED] example_params_test ===============
[11:53:53] [PASSED] example_slow_test
[11:53:53] ======================= (4 subtests) =======================
[11:53:53] [PASSED] procfs
[11:53:53] [PASSED] userspace test 2
[11:53:53] [SKIPPED] userspace test 3: some reason
[11:53:53] [PASSED] userspace test 4
[11:53:53] ================ [PASSED] example_uapi_test ================
[11:53:53] ===================== [PASSED] example =====================
[11:53:53] ============================================================
[11:53:53] Testing complete. Ran 16 tests: passed: 11, skipped: 5
[11:53:53] Elapsed time: 67.543s total, 1.823s configuring, 65.655s building, 0.058s running
Based on v6.15-rc1.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v3:
- Reintroduce CONFIG_CC_CAN_LINK_STATIC
- Enable CONFIG_ARCH_HAS_NOLIBC for m68k and SPARC
- Properly handle 'clean' target for userprogs
- Use ramfs over tmpfs to reduce dependencies
- Inherit userprogs byte order and ABI from kernel
- Drop now unnecessary "#ifndef NOLIBC"
- Pick up review tags
- Drop usage of __private in blob.h,
sparse complains and it is not really necessary
- Fix execution on loongarch when using clang
- Drop userprogs libgcc handling, it was ugly and is not yet necessary
- Link to v2: https://lore.kernel.org/r/20250407-kunit-kselftests-v2-0-454114e287fd@linut…
Changes in v2:
- Rebase onto v6.15-rc1
- Add documentation and kernel docs
- Resolve invalid kconfig breakages
- Drop already applied patch "kbuild: implement CONFIG_HEADERS_INSTALL for Usermode Linux"
- Drop userprogs CONFIG_WERROR integration, it doesn't need to be part of this series
- Replace patch prefix "kconfig" with "kbuild"
- Rename kunit_uapi_run_executable() to kunit_uapi_run_kselftest()
- Generate private, conflict-free symbols in the blob framework
- Handle kselftest exit codes
- Handle SIGABRT
- Forward output also to kunit debugfs log
- Install a fd=0 stdin filedescriptor
- Link to v1: https://lore.kernel.org/r/20250217-kunit-kselftests-v1-0-42b4524c3b0a@linut…
---
Thomas Weißschuh (16):
kbuild: userprogs: avoid duplicating of flags inherited from kernel
kbuild: userprogs: also inherit byte order and ABI from kernel
init: re-add CONFIG_CC_CAN_LINK_STATIC
kbuild: userprogs: add nolibc support
kbuild: introduce CONFIG_ARCH_HAS_NOLIBC
kbuild: doc: add label for userprogs section
kbuild: introduce blob framework
kunit: tool: Add test for nested test result reporting
kunit: tool: Don't overwrite test status based on subtest counts
kunit: tool: Parse skipped tests from kselftest.h
kunit: Always descend into kunit directory during build
kunit: qemu_configs: loongarch: Enable LSX/LSAX
kunit: Introduce UAPI testing framework
kunit: uapi: Add example for UAPI tests
kunit: uapi: Introduce preinit executable
kunit: uapi: Validate usability of /proc
Documentation/dev-tools/kunit/api/index.rst | 5 +
Documentation/dev-tools/kunit/api/uapi.rst | 12 +
Documentation/kbuild/makefiles.rst | 38 ++-
MAINTAINERS | 2 +
Makefile | 7 +-
include/kunit/uapi.h | 24 ++
include/linux/blob.h | 31 +++
init/Kconfig | 7 +
lib/Makefile | 4 -
lib/kunit/Kconfig | 10 +
lib/kunit/Makefile | 20 +-
lib/kunit/kunit-example-test.c | 15 ++
lib/kunit/kunit-example-uapi.c | 54 ++++
lib/kunit/uapi-preinit.c | 63 +++++
lib/kunit/uapi.c | 294 +++++++++++++++++++++
scripts/Makefile.blobs | 19 ++
scripts/Makefile.build | 6 +
scripts/Makefile.clean | 2 +-
scripts/Makefile.userprogs | 13 +-
scripts/blob-wrap.c | 27 ++
tools/include/nolibc/Kconfig.nolibc | 15 ++
tools/testing/kunit/kunit_parser.py | 13 +-
tools/testing/kunit/kunit_tool_test.py | 9 +
tools/testing/kunit/qemu_configs/loongarch.py | 2 +
.../test_is_test_passed-failure-nested.log | 10 +
.../test_data/test_is_test_passed-kselftest.log | 3 +-
26 files changed, 686 insertions(+), 19 deletions(-)
---
base-commit: f07a3558c4a5d76f3fea004075e5151c4516d055
change-id: 20241015-kunit-kselftests-56273bc40442
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
nolibc only supports symbol-based stackprotectors, based on the global
variable __stack_chk_guard. Support for this differs between
architectures and toolchains. Some use the symbol mode by default, some
require a flag to enable it and some don't support it at all.
Before the nolibc test Makefile required the availability of
"-mstack-protector-guard=global" to enable stackprotectors.
While this flag makes sure that the correct mode is available it doesn't
work where the correct mode is the only supported one and therefore the
flag is not implemented.
Switch to a more dynamic probing mechanism.
This correctly enables stack protectors for mips, loongarch and m68k.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
tools/testing/selftests/nolibc/Makefile | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile
index 94176ffe46463548cc9bc787933b6cefa83d6502..853f3a846d4c0fb187922d3063ec3d1a9a30ae46 100644
--- a/tools/testing/selftests/nolibc/Makefile
+++ b/tools/testing/selftests/nolibc/Makefile
@@ -195,7 +195,10 @@ CFLAGS_sparc32 = $(call cc-option,-m32)
ifeq ($(origin XARCH),command line)
CFLAGS_XARCH = $(CFLAGS_$(XARCH))
endif
-CFLAGS_STACKPROTECTOR ?= $(call cc-option,-mstack-protector-guard=global $(call cc-option,-fstack-protector-all))
+_CFLAGS_STACKPROTECTOR = $(call cc-option,-fstack-protector-all) $(call cc-option,-mstack-protector-guard=global)
+CFLAGS_STACKPROTECTOR ?= $(call try-run, \
+ echo 'void foo(void) {}' | $(CC) -x c - -o - -S $(_CFLAGS_STACKPROTECTOR) | grep -q __stack_chk_guard, \
+ $(_CFLAGS_STACKPROTECTOR))
CFLAGS_SANITIZER ?= $(call cc-option,-fsanitize=undefined -fsanitize-trap=all)
CFLAGS ?= -Os -fno-ident -fno-asynchronous-unwind-tables -std=c89 -W -Wall -Wextra \
$(call cc-option,-fno-stack-protector) $(call cc-option,-Wmissing-prototypes) \
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250530-nolibc-stackprotector-robust-77c9f55a3921
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
When running the khugepaged selftest for shmem (./khugepaged all:shmem),
I encountered the following test failures:
"
Run test: collapse_full (khugepaged:shmem)
Collapse multiple fully populated PTE table.... Fail
...
Run test: collapse_single_pte_entry (khugepaged:shmem)
Collapse PTE table with single PTE entry present.... Fail
...
Run test: collapse_full_of_compound (khugepaged:shmem)
Allocate huge page... OK
Split huge page leaving single PTE page table full of compound pages... OK
Collapse PTE table full of compound pages.... Fail
"
The reason for the failure is that, it will set MADV_NOHUGEPAGE to prevent
khugepaged from continuing to scan shmem VMA after khugepaged finishes
scanning in the wait_for_scan() function. Moreover, shmem requires a refault
to establish PMD mappings.
However, after commit 2b0f922323cc, PMD mappings are prevented if the VMA is
set with MADV_NOHUGEPAGE flag, so shmem cannot establish PMD mappings during
refault.
To fix this issue, we can set the MADV_NOHUGEPAGE flag after the shmem refault.
With this fix, the shmem test case passes.
Fixes: 2b0f922323cc ("mm: don't install PMD mappings when THPs are disabled by the hw/process/vma")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
---
tools/testing/selftests/mm/khugepaged.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 8a4d34cce36b..d462f62d8116 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -561,8 +561,6 @@ static bool wait_for_scan(const char *msg, char *p, int nr_hpages,
usleep(TICK);
}
- madvise(p, nr_hpages * hpage_pmd_size, MADV_NOHUGEPAGE);
-
return timeout == -1;
}
@@ -585,6 +583,7 @@ static void khugepaged_collapse(const char *msg, char *p, int nr_hpages,
if (ops != &__anon_ops)
ops->fault(p, 0, nr_hpages * hpage_pmd_size);
+ madvise(p, nr_hpages * hpage_pmd_size, MADV_NOHUGEPAGE);
if (ops->check_huge(p, expect ? nr_hpages : 0))
success("OK");
else
--
2.43.5
When writing a test for fusectl, I referred to this Makefile as a
reference for creating a FUSE daemon in the selftests.
While doing so, I noticed that there is a minor issue in the Makefile.
The fuse_mnt.c file is not actually compiled into fuse_mnt.o,
and the code setting CFLAGS for it never takes effect.
The reason fuse_mnt compiles successfully is because CFLAGS is set
at the very beginning of the file.
Signed-off-by: Chen Linxuan <chenlinxuan(a)uniontech.com>
---
tools/testing/selftests/memfd/Makefile | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/testing/selftests/memfd/Makefile b/tools/testing/selftests/memfd/Makefile
index 163b6f68631c4..e9b886c65153d 100644
--- a/tools/testing/selftests/memfd/Makefile
+++ b/tools/testing/selftests/memfd/Makefile
@@ -1,5 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS += -D_FILE_OFFSET_BITS=64
CFLAGS += $(KHDR_INCLUDES)
TEST_GEN_PROGS := memfd_test
@@ -16,10 +15,9 @@ ifeq ($(VAR_LDLIBS),)
VAR_LDLIBS := -lfuse -pthread
endif
-fuse_mnt.o: CFLAGS += $(VAR_CFLAGS)
-
include ../lib.mk
+$(OUTPUT)/fuse_mnt: CFLAGS += $(VAR_CFLAGS)
$(OUTPUT)/fuse_mnt: LDLIBS += $(VAR_LDLIBS)
$(OUTPUT)/memfd_test: memfd_test.c common.c
--
2.43.0
The netdevsim driver previously lacked RX statistics support, which
prevented its use with the GenerateTraffic() test framework, as this
framework verifies traffic flow by checking RX byte counts.
This patch migrates netdevsim from its custom statistics collection to
the NETDEV_PCPU_STAT_DSTATS framework, as suggested by Jakub. This
change not only standardizes the statistics handling but also adds the
necessary RX statistics support required by the test framework.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v4:
- Protect dev_dstats_rx_dropped_add() by disabling BH (Jakub)
- Link to v3: https://lore.kernel.org/r/20250617-netdevsim_stat-v3-0-afe4bdcbf237@debian.…
Changes in v3:
- Rely on netdev from caller instead of napi->dev in nsim_queue_free().
- Link to v2: https://lore.kernel.org/r/20250613-netdevsim_stat-v2-0-98fa38836c48@debian.…
Changes in v2:
- Changed the RX collection place from nsim_napi_rx() to nsim_rcv (Joe Damato)
- Collect RX dropped packets statistic in nsim_queue_free() (Jakub)
- Added a helper in dstat to add values to RX dropped packets
- Link to v1: https://lore.kernel.org/r/20250611-netdevsim_stat-v1-0-c11b657d96bf@debian.…
---
Breno Leitao (4):
netdevsim: migrate to dstats stats collection
netdevsim: collect statistics at RX side
net: add dev_dstats_rx_dropped_add() helper
netdevsim: account dropped packet length in stats on queue free
drivers/net/netdevsim/netdev.c | 56 ++++++++++++++++-----------------------
drivers/net/netdevsim/netdevsim.h | 5 ----
include/linux/netdevice.h | 10 +++++++
3 files changed, 33 insertions(+), 38 deletions(-)
---
base-commit: 3b5b1c428260152e47c9584bc176f358b87ca82d
change-id: 20250610-netdevsim_stat-95995921e03e
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Users can leak memory by repeatedly writing a string to DAMOS sysfs
memcg_path file. Fix it (patch 1) and add a selftest (patch 2) to avoid
reoccurrance of the bug.
SeongJae Park (2):
mm/damon/sysfs-schemes: free old damon_sysfs_scheme_filter->memcg_path
on write
selftets/damon: add a test for memcg_path leak
mm/damon/sysfs-schemes.c | 1 +
tools/testing/selftests/damon/Makefile | 1 +
.../selftests/damon/sysfs_memcg_path_leak.sh | 43 +++++++++++++++++++
3 files changed, 45 insertions(+)
create mode 100755 tools/testing/selftests/damon/sysfs_memcg_path_leak.sh
base-commit: 05b89e828eb4f791f721cbdc65f36e1a8287a9d3
--
2.39.5