From: Roberto Sassu <roberto.sassu(a)huawei.com>
One of the desirable features in security is the ability to restrict import
of data to a given system based on data authenticity. If data import can be
restricted, it would be possible to enforce a system-wide policy based on
the signing keys the system owner trusts.
This feature is widely used in the kernel. For example, if the restriction
is enabled, kernel modules can be plugged in only if they are signed with a
key whose public part is in the primary or secondary keyring.
For eBPF, it can be useful as well. For example, it might be useful to
authenticate data an eBPF program makes security decisions on.
After a discussion in the eBPF mailing list, it was decided that the stated
goal should be accomplished by introducing four new kfuncs:
bpf_lookup_user_key() and bpf_lookup_system_key(), for retrieving a keyring
with keys trusted for signature verification, respectively from its serial
and from a pre-determined ID; bpf_key_put(), to release the reference
obtained with the former two kfuncs, bpf_verify_pkcs7_signature(), for
verifying PKCS#7 signatures.
Other than the key serial, bpf_lookup_user_key() also accepts key lookup
flags, that influence the behavior of the lookup. bpf_lookup_system_key()
accepts pre-determined IDs defined in include/linux/verification.h.
bpf_key_put() accepts the new bpf_key structure, introduced to tell whether
the other structure member, a key pointer, is valid or not. The reason is
that verify_pkcs7_signature() also accepts invalid pointers, set with the
pre-determined ID, to select a system-defined keyring. key_put() must be
called only for valid key pointers.
Since the two key lookup functions allocate memory and one increments a key
reference count, they must be used in conjunction with bpf_key_put(). The
latter must be called only if the lookup functions returned a non-NULL
pointer. The verifier denies the execution of eBPF programs that don't
respect this rule.
The two key lookup functions should be used in alternative, depending on
the use case. While bpf_lookup_user_key() provides great flexibility, it
seems suboptimal in terms of security guarantees, as even if the eBPF
program is assumed to be trusted, the serial used to obtain the key pointer
might come from untrusted user space not choosing one that the system
administrator approves to enforce a mandatory policy.
bpf_lookup_system_key() instead provides much stronger guarantees,
especially if the pre-determined ID is not passed by user space but is
hardcoded in the eBPF program, and that program is signed. In this case,
bpf_verify_pkcs7_signature() will always perform signature verification
with a key that the system administrator approves, i.e. the primary,
secondary or platform keyring.
Nevertheless, key permission checks need to be done accurately. Since
bpf_lookup_user_key() cannot determine how a key will be used by other
kfuncs, it has to defer the permission check to the actual kfunc using the
key. It does it by calling lookup_user_key() with KEY_DEFER_PERM_CHECK as
needed permission. Later, bpf_verify_pkcs7_signature(), if called,
completes the permission check by calling key_validate(). It does not need
to call key_task_permission() with permission KEY_NEED_SEARCH, as it is
already done elsewhere by the key subsystem. Future kfuncs using the
bpf_key structure need to implement the proper checks as well.
Finally, the last kfunc, bpf_verify_pkcs7_signature(), accepts the data and
signature to verify as eBPF dynamic pointers, to minimize the number of
kfunc parameters, and the keyring with keys for signature verification as a
bpf_key structure, returned by one of the two key lookup functions.
bpf_lookup_user_key() and bpf_verify_pkcs7_signature() can be called only
from sleepable programs, because of memory allocation and crypto
operations. For example, the lsm.s/bpf attach point is suitable,
fexit/array_map_update_elem is not.
The correctness of implementation of the new kfuncs and of their usage is
checked with the introduced tests.
The patch set includes a patch from another author (dependency) for sake of
completeness. It is organized as follows.
Patch 1 from KP Singh allows kfuncs to be used by LSM programs. Patch 2
allows dynamic pointers to be used as kfunc parameters. Patch 3 exports
bpf_dynptr_get_size(), to obtain the real size of data carried by a dynamic
pointer. Patch 4 makes available for new eBPF kfuncs some key-related
definitions. Patch 5 introduces the bpf_lookup_*_key() and bpf_key_put()
kfuncs. Patch 6 introduces the bpf_verify_pkcs7_signature() kfunc. Patch 7
changes the testing kernel configuration to compile everything as built-in.
Finally, patches 8-10 introduce the tests.
Changelog
v11:
- Move stringify_struct() macro to include/linux/btf.h (suggested by
Daniel)
- Change kernel configuration options in
tools/testing/selftests/bpf/config* from =m to =y
v10:
- Introduce key_lookup_flags_check() and system_keyring_id_check() inline
functions to check parameters (suggested by KP)
- Fix descriptions and comment of key-related kfuncs (suggested by KP)
- Register kfunc set only once (suggested by Alexei)
- Move needed kernel options to the architecture-independent configuration
for testing
v9:
- Drop patch to introduce KF_SLEEPABLE kfunc flag (already merged)
- Rename valid_ptr member of bpf_key to has_ref (suggested by Daniel)
- Check dynamic pointers in kfunc definition with bpf_dynptr_kern struct
definition instead of string, to detect structure renames (suggested by
Daniel)
- Explicitly say that we permit initialized dynamic pointers in kfunc
definition (suggested by Daniel)
- Remove noinline __weak from kfuncs definition (reported by Daniel)
- Simplify key lookup flags check in bpf_lookup_user_key() (suggested by
Daniel)
- Explain the reason for deferring key permission check (suggested by
Daniel)
- Allocate memory with GFP_ATOMIC in bpf_lookup_system_key(), and remove
KF_SLEEPABLE kfunc flag from kfunc declaration (suggested by Daniel)
- Define only one kfunc set and remove the loop for registration
(suggested by Alexei)
v8:
- Define the new bpf_key structure to carry the key pointer and whether
that pointer is valid or not (suggested by Daniel)
- Drop patch to mark a kfunc parameter with the __maybe_null suffix
- Improve documentation of kfuncs
- Introduce bpf_lookup_system_key() to obtain a key pointer suitable for
verify_pkcs7_signature() (suggested by Daniel)
- Use the new kfunc registration API
- Drop patch to test the __maybe_null suffix
- Add tests for bpf_lookup_system_key()
v7:
- Add support for using dynamic and NULL pointers in kfunc (suggested by
Alexei)
- Add new kfunc-related tests
v6:
- Switch back to key lookup helpers + signature verification (until v5),
and defer permission check from bpf_lookup_user_key() to
bpf_verify_pkcs7_signature()
- Add additional key lookup test to illustrate the usage of the
KEY_LOOKUP_CREATE flag and validate the flags (suggested by Daniel)
- Make description of flags of bpf_lookup_user_key() more user-friendly
(suggested by Daniel)
- Fix validation of flags parameter in bpf_lookup_user_key() (reported by
Daniel)
- Rename bpf_verify_pkcs7_signature() keyring-related parameters to
user_keyring and system_keyring to make their purpose more clear
- Accept keyring-related parameters of bpf_verify_pkcs7_signature() as
alternatives (suggested by KP)
- Replace unsigned long type with u64 in helper declaration (suggested by
Daniel)
- Extend the bpf_verify_pkcs7_signature() test by calling the helper
without data, by ensuring that the helper enforces the keyring-related
parameters as alternatives, by ensuring that the helper rejects
inaccessible and expired keyrings, and by checking all system keyrings
- Move bpf_lookup_user_key() and bpf_key_put() usage tests to
ref_tracking.c (suggested by John)
- Call bpf_lookup_user_key() and bpf_key_put() only in sleepable programs
v5:
- Move KEY_LOOKUP_ to include/linux/key.h
for validation of bpf_verify_pkcs7_signature() parameter
- Remove bpf_lookup_user_key() and bpf_key_put() helpers, and the
corresponding tests
- Replace struct key parameter of bpf_verify_pkcs7_signature() with the
keyring serial and lookup flags
- Call lookup_user_key() and key_put() in bpf_verify_pkcs7_signature()
code, to ensure that the retrieved key is used according to the
permission requested at lookup time
- Clarified keyring precedence in the description of
bpf_verify_pkcs7_signature() (suggested by John)
- Remove newline in the second argument of ASSERT_
- Fix helper prototype regular expression in bpf_doc.py
v4:
- Remove bpf_request_key_by_id(), don't return an invalid pointer that
other helpers can use
- Pass the keyring ID (without ULONG_MAX, suggested by Alexei) to
bpf_verify_pkcs7_signature()
- Introduce bpf_lookup_user_key() and bpf_key_put() helpers (suggested by
Alexei)
- Add lookup_key_norelease test, to ensure that the verifier blocks eBPF
programs which don't decrement the key reference count
- Parse raw PKCS#7 signature instead of module-style signature in the
verify_pkcs7_signature test (suggested by Alexei)
- Parse kernel module in user space and pass raw PKCS#7 signature to the
eBPF program for signature verification
v3:
- Rename bpf_verify_signature() back to bpf_verify_pkcs7_signature() to
avoid managing different parameters for each signature verification
function in one helper (suggested by Daniel)
- Use dynamic pointers and export bpf_dynptr_get_size() (suggested by
Alexei)
- Introduce bpf_request_key_by_id() to give more flexibility to the caller
of bpf_verify_pkcs7_signature() to retrieve the appropriate keyring
(suggested by Alexei)
- Fix test by reordering the gcc command line, always compile sign-file
- Improve helper support check mechanism in the test
v2:
- Rename bpf_verify_pkcs7_signature() to a more generic
bpf_verify_signature() and pass the signature type (suggested by KP)
- Move the helper and prototype declaration under #ifdef so that user
space can probe for support for the helper (suggested by Daniel)
- Describe better the keyring types (suggested by Daniel)
- Include linux/bpf.h instead of vmlinux.h to avoid implicit or
redeclaration
- Make the test selfcontained (suggested by Alexei)
v1:
- Don't define new map flag but introduce simple wrapper of
verify_pkcs7_signature() (suggested by Alexei and KP)
KP Singh (1):
bpf: Allow kfuncs to be used in LSM programs
Roberto Sassu (9):
btf: Handle dynamic pointer parameter in kfuncs
bpf: Export bpf_dynptr_get_size()
KEYS: Move KEY_LOOKUP_ to include/linux/key.h
bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs
bpf: Add bpf_verify_pkcs7_signature() kfunc
selftests/bpf: Compile kernel with everything as built-in
selftests/bpf: Add verifier tests for bpf_lookup_*_key() and
bpf_key_put()
selftests/bpf: Add additional tests for bpf_lookup_*_key()
selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc
include/linux/bpf.h | 7 +
include/linux/bpf_verifier.h | 3 +
include/linux/btf.h | 9 +
include/linux/key.h | 11 +
include/linux/verification.h | 8 +
kernel/bpf/btf.c | 19 +
kernel/bpf/helpers.c | 2 +-
kernel/bpf/verifier.c | 4 +-
kernel/trace/bpf_trace.c | 180 ++++++++
security/keys/internal.h | 2 -
tools/testing/selftests/bpf/Makefile | 14 +-
tools/testing/selftests/bpf/config | 32 +-
tools/testing/selftests/bpf/config.x86_64 | 7 +-
.../selftests/bpf/prog_tests/lookup_key.c | 112 +++++
.../bpf/prog_tests/verify_pkcs7_sig.c | 399 ++++++++++++++++++
.../selftests/bpf/progs/test_lookup_key.c | 46 ++
.../bpf/progs/test_verify_pkcs7_sig.c | 100 +++++
tools/testing/selftests/bpf/test_verifier.c | 3 +-
.../selftests/bpf/verifier/ref_tracking.c | 139 ++++++
.../testing/selftests/bpf/verify_sig_setup.sh | 104 +++++
20 files changed, 1173 insertions(+), 28 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/lookup_key.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/verify_pkcs7_sig.c
create mode 100644 tools/testing/selftests/bpf/progs/test_lookup_key.c
create mode 100644 tools/testing/selftests/bpf/progs/test_verify_pkcs7_sig.c
create mode 100755 tools/testing/selftests/bpf/verify_sig_setup.sh
--
2.25.1
The primary reason to invoke the oom reaper from the exit_mmap path used
to be a prevention of an excessive oom killing if the oom victim exit
races with the oom reaper (see [1] for more details). The invocation has
moved around since then because of the interaction with the munlock
logic but the underlying reason has remained the same (see [2]).
Munlock code is no longer a problem since [3] and there shouldn't be
any blocking operation before the memory is unmapped by exit_mmap so
the oom reaper invocation can be dropped. The unmapping part can be done
with the non-exclusive mmap_sem and the exclusive one is only required
when page tables are freed.
Remove the oom_reaper from exit_mmap which will make the code easier to
read. This is really unlikely to make any observable difference although
some microbenchmarks could benefit from one less branch that needs to be
evaluated even though it almost never is true.
[1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
---
Notes:
- Rebased over git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
mm-unstable branch per Andrew's request but applies cleany to Linus' ToT
- Conflicts with maple-tree patchset. Resolving these was discussed in
https://lore.kernel.org/all/20220519223438.qx35hbpfnnfnpouw@revolver/
include/linux/oom.h | 2 --
mm/mmap.c | 31 ++++++++++++-------------------
mm/oom_kill.c | 2 +-
3 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 02d1e7bbd8cd..6cdde62b078b 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -106,8 +106,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm)
return 0;
}
-bool __oom_reap_task_mm(struct mm_struct *mm);
-
long oom_badness(struct task_struct *p,
unsigned long totalpages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2b9305ed0dda..b7918e6bb0db 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3110,30 +3110,13 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Manually reap the mm to free as much memory as possible.
- * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
- * this mm from further consideration. Taking mm->mmap_lock for
- * write after setting MMF_OOM_SKIP will guarantee that the oom
- * reaper will not run on this mm again after mmap_lock is
- * dropped.
- *
- * Nothing can be holding mm->mmap_lock here and the above call
- * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
- * __oom_reap_task_mm() will not block.
- */
- (void)__oom_reap_task_mm(mm);
- set_bit(MMF_OOM_SKIP, &mm->flags);
- }
-
- mmap_write_lock(mm);
+ mmap_read_lock(mm);
arch_exit_mmap(mm);
vma = mm->mmap;
if (!vma) {
/* Can happen if dup_mmap() received an OOM */
- mmap_write_unlock(mm);
+ mmap_read_unlock(mm);
return;
}
@@ -3143,6 +3126,16 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
+ mmap_read_unlock(mm);
+
+ /*
+ * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
+ * because the memory has been already freed. Do not bother checking
+ * mm_is_oom_victim because setting a bit unconditionally is cheaper.
+ */
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+
+ mmap_write_lock(mm);
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8a70bca67c94..98dca2b42357 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -538,7 +538,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait);
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-bool __oom_reap_task_mm(struct mm_struct *mm)
+static bool __oom_reap_task_mm(struct mm_struct *mm)
{
struct vm_area_struct *vma;
bool ret = true;
--
2.36.1.255.ge46751e96f-goog
I got the following error message when I ran
make kselftest summary=1 TARGETS=kvm
rseq_test.c: In function ‘main’:
rseq_test.c:230:33: warning: implicit declaration of function ‘gettid’;
did you mean ‘getgid’? [-Wimplicit-function-declaration]
(void *)(unsigned long)gettid());
^~~~~~
getgid
/tmp/ccNexT4G.o: In function `main':
linux_mainline/tools/testing/selftests/kvm/rseq_test.c:230:
undefined reference to `gettid'
collect2: error: ld returned 1 exit status
../lib.mk:136:
recipe for target 'linux_mainline/tools/testing/selftests/kvm/rseq_test' failed
As per suggestion
I renamed gettid to getgid
after rerunning the kselftest command
the following selftests messages were returned
not ok 7 selftests: kvm: hyperv_clock # exit=254
not ok 11 selftests: kvm: kvm_clock_test # exit=254
not ok 51 selftests: kvm: access_tracking_perf_test # exit=254
not ok 53 selftests: kvm: dirty_log_test # exit=254
not ok 58 selftests: kvm: max_guest_memory_test # TIMEOUT 120 seconds
not ok 60 selftests: kvm: memslot_perf_test # exit=142
Signed-off-by: Akhil Raj <lf32.dev(a)gmail.com>
---
tools/testing/selftests/kvm/rseq_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index fac248a43666..aa83a0537f0c 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -227,7 +227,7 @@ int main(int argc, char *argv[])
ucall_init(vm, NULL);
pthread_create(&migration_thread, NULL, migration_worker,
- (void *)(unsigned long)gettid());
+ (void *)(unsigned long)getgid());
for (i = 0; !done; i++) {
vcpu_run(vcpu);
--
2.17.1
From: Roberto Sassu <roberto.sassu(a)huawei.com>
One of the desirable features in security is the ability to restrict import
of data to a given system based on data authenticity. If data import can be
restricted, it would be possible to enforce a system-wide policy based on
the signing keys the system owner trusts.
This feature is widely used in the kernel. For example, if the restriction
is enabled, kernel modules can be plugged in only if they are signed with a
key whose public part is in the primary or secondary keyring.
For eBPF, it can be useful as well. For example, it might be useful to
authenticate data an eBPF program makes security decisions on.
After a discussion in the eBPF mailing list, it was decided that the stated
goal should be accomplished by introducing four new kfuncs:
bpf_lookup_user_key() and bpf_lookup_system_key(), for retrieving a keyring
with keys trusted for signature verification, respectively from its serial
and from a pre-determined ID; bpf_key_put(), to release the reference
obtained with the former two kfuncs, bpf_verify_pkcs7_signature(), for
verifying PKCS#7 signatures.
Other than the key serial, bpf_lookup_user_key() also accepts key lookup
flags, that influence the behavior of the lookup. bpf_lookup_system_key()
accepts pre-determined IDs defined in include/linux/verification.h.
bpf_key_put() accepts the new bpf_key structure, introduced to tell whether
the other structure member, a key pointer, is valid or not. The reason is
that verify_pkcs7_signature() also accepts invalid pointers, set with the
pre-determined ID, to select a system-defined keyring. key_put() must be
called only for valid key pointers.
Since the two key lookup functions allocate memory and one increments a key
reference count, they must be used in conjunction with bpf_key_put(). The
latter must be called only if the lookup functions returned a non-NULL
pointer. The verifier denies the execution of eBPF programs that don't
respect this rule.
The two key lookup functions should be used in alternative, depending on
the use case. While bpf_lookup_user_key() provides great flexibility, it
seems suboptimal in terms of security guarantees, as even if the eBPF
program is assumed to be trusted, the serial used to obtain the key pointer
might come from untrusted user space not choosing one that the system
administrator approves to enforce a mandatory policy.
bpf_lookup_system_key() instead provides much stronger guarantees,
especially if the pre-determined ID is not passed by user space but is
hardcoded in the eBPF program, and that program is signed. In this case,
bpf_verify_pkcs7_signature() will always perform signature verification
with a key that the system administrator approves, i.e. the primary,
secondary or platform keyring.
Nevertheless, key permission checks need to be done accurately. Since
bpf_lookup_user_key() cannot determine how a key will be used by other
kfuncs, it has to defer the permission check to the actual kfunc using the
key. It does it by calling lookup_user_key() with KEY_DEFER_PERM_CHECK as
needed permission. Later, bpf_verify_pkcs7_signature(), if called,
completes the permission check by calling key_validate(). It does not need
to call key_task_permission() with permission KEY_NEED_SEARCH, as it is
already done elsewhere by the key subsystem. Future kfuncs using the
bpf_key structure need to implement the proper checks as well.
Finally, the last kfunc, bpf_verify_pkcs7_signature(), accepts the data and
signature to verify as eBPF dynamic pointers, to minimize the number of
kfunc parameters, and the keyring with keys for signature verification as a
bpf_key structure, returned by one of the two key lookup functions.
bpf_lookup_user_key() and bpf_verify_pkcs7_signature() can be called only
from sleepable programs, because of memory allocation and crypto
operations. For example, the lsm.s/bpf attach point is suitable,
fexit/array_map_update_elem is not.
The correctness of implementation of the new kfuncs and of their usage is
checked with the introduced tests.
The patch set includes a patch from another author (dependency) for sake of
completeness. It is organized as follows.
Patch 1 from KP Singh allows kfuncs to be used by LSM programs. Patch 2
allows dynamic pointers to be used as kfunc parameters. Patch 3 exports
bpf_dynptr_get_size(), to obtain the real size of data carried by a dynamic
pointer. Patch 4 makes available for new eBPF kfuncs some key-related
definitions. Patch 5 introduces the bpf_lookup_*_key() and bpf_key_put()
kfuncs. Patch 6 introduces the bpf_verify_pkcs7_signature() kfunc. Patch 7
changes the testing kernel configuration to compile everything as built-in.
Finally, patches 8-10 introduce the tests.
Changelog
v12:
- Blacklist lookup_key and verify_pkcs7_sig tests for s390x (JIT does not
support calling kernel function)
v11:
- Move stringify_struct() macro to include/linux/btf.h (suggested by
Daniel)
- Change kernel configuration options in
tools/testing/selftests/bpf/config* from =m to =y
v10:
- Introduce key_lookup_flags_check() and system_keyring_id_check() inline
functions to check parameters (suggested by KP)
- Fix descriptions and comment of key-related kfuncs (suggested by KP)
- Register kfunc set only once (suggested by Alexei)
- Move needed kernel options to the architecture-independent configuration
for testing
v9:
- Drop patch to introduce KF_SLEEPABLE kfunc flag (already merged)
- Rename valid_ptr member of bpf_key to has_ref (suggested by Daniel)
- Check dynamic pointers in kfunc definition with bpf_dynptr_kern struct
definition instead of string, to detect structure renames (suggested by
Daniel)
- Explicitly say that we permit initialized dynamic pointers in kfunc
definition (suggested by Daniel)
- Remove noinline __weak from kfuncs definition (reported by Daniel)
- Simplify key lookup flags check in bpf_lookup_user_key() (suggested by
Daniel)
- Explain the reason for deferring key permission check (suggested by
Daniel)
- Allocate memory with GFP_ATOMIC in bpf_lookup_system_key(), and remove
KF_SLEEPABLE kfunc flag from kfunc declaration (suggested by Daniel)
- Define only one kfunc set and remove the loop for registration
(suggested by Alexei)
v8:
- Define the new bpf_key structure to carry the key pointer and whether
that pointer is valid or not (suggested by Daniel)
- Drop patch to mark a kfunc parameter with the __maybe_null suffix
- Improve documentation of kfuncs
- Introduce bpf_lookup_system_key() to obtain a key pointer suitable for
verify_pkcs7_signature() (suggested by Daniel)
- Use the new kfunc registration API
- Drop patch to test the __maybe_null suffix
- Add tests for bpf_lookup_system_key()
v7:
- Add support for using dynamic and NULL pointers in kfunc (suggested by
Alexei)
- Add new kfunc-related tests
v6:
- Switch back to key lookup helpers + signature verification (until v5),
and defer permission check from bpf_lookup_user_key() to
bpf_verify_pkcs7_signature()
- Add additional key lookup test to illustrate the usage of the
KEY_LOOKUP_CREATE flag and validate the flags (suggested by Daniel)
- Make description of flags of bpf_lookup_user_key() more user-friendly
(suggested by Daniel)
- Fix validation of flags parameter in bpf_lookup_user_key() (reported by
Daniel)
- Rename bpf_verify_pkcs7_signature() keyring-related parameters to
user_keyring and system_keyring to make their purpose more clear
- Accept keyring-related parameters of bpf_verify_pkcs7_signature() as
alternatives (suggested by KP)
- Replace unsigned long type with u64 in helper declaration (suggested by
Daniel)
- Extend the bpf_verify_pkcs7_signature() test by calling the helper
without data, by ensuring that the helper enforces the keyring-related
parameters as alternatives, by ensuring that the helper rejects
inaccessible and expired keyrings, and by checking all system keyrings
- Move bpf_lookup_user_key() and bpf_key_put() usage tests to
ref_tracking.c (suggested by John)
- Call bpf_lookup_user_key() and bpf_key_put() only in sleepable programs
v5:
- Move KEY_LOOKUP_ to include/linux/key.h
for validation of bpf_verify_pkcs7_signature() parameter
- Remove bpf_lookup_user_key() and bpf_key_put() helpers, and the
corresponding tests
- Replace struct key parameter of bpf_verify_pkcs7_signature() with the
keyring serial and lookup flags
- Call lookup_user_key() and key_put() in bpf_verify_pkcs7_signature()
code, to ensure that the retrieved key is used according to the
permission requested at lookup time
- Clarified keyring precedence in the description of
bpf_verify_pkcs7_signature() (suggested by John)
- Remove newline in the second argument of ASSERT_
- Fix helper prototype regular expression in bpf_doc.py
v4:
- Remove bpf_request_key_by_id(), don't return an invalid pointer that
other helpers can use
- Pass the keyring ID (without ULONG_MAX, suggested by Alexei) to
bpf_verify_pkcs7_signature()
- Introduce bpf_lookup_user_key() and bpf_key_put() helpers (suggested by
Alexei)
- Add lookup_key_norelease test, to ensure that the verifier blocks eBPF
programs which don't decrement the key reference count
- Parse raw PKCS#7 signature instead of module-style signature in the
verify_pkcs7_signature test (suggested by Alexei)
- Parse kernel module in user space and pass raw PKCS#7 signature to the
eBPF program for signature verification
v3:
- Rename bpf_verify_signature() back to bpf_verify_pkcs7_signature() to
avoid managing different parameters for each signature verification
function in one helper (suggested by Daniel)
- Use dynamic pointers and export bpf_dynptr_get_size() (suggested by
Alexei)
- Introduce bpf_request_key_by_id() to give more flexibility to the caller
of bpf_verify_pkcs7_signature() to retrieve the appropriate keyring
(suggested by Alexei)
- Fix test by reordering the gcc command line, always compile sign-file
- Improve helper support check mechanism in the test
v2:
- Rename bpf_verify_pkcs7_signature() to a more generic
bpf_verify_signature() and pass the signature type (suggested by KP)
- Move the helper and prototype declaration under #ifdef so that user
space can probe for support for the helper (suggested by Daniel)
- Describe better the keyring types (suggested by Daniel)
- Include linux/bpf.h instead of vmlinux.h to avoid implicit or
redeclaration
- Make the test selfcontained (suggested by Alexei)
v1:
- Don't define new map flag but introduce simple wrapper of
verify_pkcs7_signature() (suggested by Alexei and KP)
KP Singh (1):
bpf: Allow kfuncs to be used in LSM programs
Roberto Sassu (9):
btf: Handle dynamic pointer parameter in kfuncs
bpf: Export bpf_dynptr_get_size()
KEYS: Move KEY_LOOKUP_ to include/linux/key.h
bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs
bpf: Add bpf_verify_pkcs7_signature() kfunc
selftests/bpf: Compile kernel with everything as built-in
selftests/bpf: Add verifier tests for bpf_lookup_*_key() and
bpf_key_put()
selftests/bpf: Add additional tests for bpf_lookup_*_key()
selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc
include/linux/bpf.h | 7 +
include/linux/bpf_verifier.h | 3 +
include/linux/btf.h | 9 +
include/linux/key.h | 11 +
include/linux/verification.h | 8 +
kernel/bpf/btf.c | 19 +
kernel/bpf/helpers.c | 2 +-
kernel/bpf/verifier.c | 4 +-
kernel/trace/bpf_trace.c | 180 ++++++++
security/keys/internal.h | 2 -
tools/testing/selftests/bpf/DENYLIST.s390x | 2 +
tools/testing/selftests/bpf/Makefile | 14 +-
tools/testing/selftests/bpf/config | 32 +-
tools/testing/selftests/bpf/config.x86_64 | 7 +-
.../selftests/bpf/prog_tests/lookup_key.c | 112 +++++
.../bpf/prog_tests/verify_pkcs7_sig.c | 399 ++++++++++++++++++
.../selftests/bpf/progs/test_lookup_key.c | 46 ++
.../bpf/progs/test_verify_pkcs7_sig.c | 100 +++++
tools/testing/selftests/bpf/test_verifier.c | 3 +-
.../selftests/bpf/verifier/ref_tracking.c | 139 ++++++
.../testing/selftests/bpf/verify_sig_setup.sh | 104 +++++
21 files changed, 1175 insertions(+), 28 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/lookup_key.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/verify_pkcs7_sig.c
create mode 100644 tools/testing/selftests/bpf/progs/test_lookup_key.c
create mode 100644 tools/testing/selftests/bpf/progs/test_verify_pkcs7_sig.c
create mode 100755 tools/testing/selftests/bpf/verify_sig_setup.sh
--
2.25.1
Due to bpf_map_lookup_elem being declared static we need to also
declare subprog_noise as static.
Fixes the following error:
progs/tailcall_bpf2bpf4.c:26:9: error: 'bpf_map_lookup_elem' is static but used in inline function 'subprog_noise' which is not static [-Werror]
26 | bpf_map_lookup_elem(&nop_table, &key);
| ^~~~~~~~~~~~~~~~~~~
Signed-off-by: James Hilliard <james.hilliard1(a)gmail.com>
---
tools/testing/selftests/bpf/progs/tailcall_bpf2bpf4.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf4.c b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf4.c
index b67e8022d500..a017d6b2f1dd 100644
--- a/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf4.c
+++ b/tools/testing/selftests/bpf/progs/tailcall_bpf2bpf4.c
@@ -19,7 +19,7 @@ struct {
int count = 0;
int noise = 0;
-__always_inline int subprog_noise(void)
+static __always_inline int subprog_noise(void)
{
__u32 key = 0;
--
2.34.1
On 8/24/22 22:44, Stephen Rothwell wrote:
> Hi all,
>
> Changes since 20220824:
>
on x86_64:
ld: vmlinux.o: in function `kunit_kcalloc':
include/kunit/test.h:369: undefined reference to `kunit_kmalloc_array'
ld: vmlinux.o: in function `ne_misc_dev_test_merge_phys_contig_memory_regions':
drivers/virt/nitro_enclaves/ne_misc_dev_test.c:120: undefined reference to `kunit_do_failed_assertion'
ld: drivers/virt/nitro_enclaves/ne_misc_dev_test.c:128: undefined reference to `kunit_binary_assert_format'
ld: drivers/virt/nitro_enclaves/ne_misc_dev_test.c:128: undefined reference to `kunit_do_failed_assertion'
ld: drivers/virt/nitro_enclaves/ne_misc_dev_test.c:129: undefined reference to `kunit_binary_assert_format'
ld: drivers/virt/nitro_enclaves/ne_misc_dev_test.c:129: undefined reference to `kunit_do_failed_assertion'
ld: drivers/virt/nitro_enclaves/ne_misc_dev_test.c:135: undefined reference to `kunit_binary_assert_format'
ld: drivers/virt/nitro_enclaves/ne_misc_dev_test.c:135: undefined reference to `kunit_do_failed_assertion'
ld: drivers/virt/nitro_enclaves/ne_misc_dev_test.c:137: undefined reference to `kunit_binary_assert_format'
ld: drivers/virt/nitro_enclaves/ne_misc_dev_test.c:137: undefined reference to `kunit_do_failed_assertion'
ld: drivers/virt/nitro_enclaves/ne_misc_dev_test.c:141: undefined reference to `kunit_kfree'
ld: vmlinux.o: in function `kexec_purgatory_size':
(.rodata+0x209ee0): undefined reference to `kunit_unary_assert_format'
Full randconfig file is attached.
--
~Randy
Hi,
Continuing the documentation refactoring started by Harinder Singh[1],
removes kunit-tool.rst, which had its information rearranged into run_wrapper,
and employs further work in the index and the getting-started guide.
This series was written on top of another[2] that haven't got applied yet,
but the only dependency it has is the "kunit-on-qemu" anchor used in start.rst.
Changelog:
v1 -> v2:
- Update expected output for `kunit.py run` from "Generating .config ..." to
"Configuring KUnit Kernel ..."
- Update run_wrapper titles as suggested by Sadiya Kazi
- Remove confusing recommendation from start.rst intro, highlighted by Tim Bird
- Fix grammars nits pointed by Maíra Canal and Sadiya Kazi
- Add some reviewed-by
Thanks again for your feedbacks,
Tales
[1] https://lore.kernel.org/r/20211217044911.798817-1-sharinder@google.com/
[2] https://lore.kernel.org/r/20220813042055.136832-1-tales.aparecida@gmail.com/
Tales Aparecida (8):
Documentation: KUnit: remove duplicated docs for kunit_tool
Documentation: KUnit: avoid repeating "kunit.py run" in start.rst
Documentation: KUnit: add note about mrproper in start.rst
Documentation: KUnit: Reword start guide for selecting tests
Documentation: KUnit: add intro to the getting-started page
Documentation: KUnit: update links in the index page
lib: overflow: update reference to kunit-tool
lib: stackinit: update reference to kunit-tool
Documentation/dev-tools/kunit/index.rst | 16 +-
Documentation/dev-tools/kunit/kunit-tool.rst | 232 ------------------
Documentation/dev-tools/kunit/run_wrapper.rst | 34 +--
Documentation/dev-tools/kunit/start.rst | 136 ++++++----
lib/overflow_kunit.c | 2 +-
lib/stackinit_kunit.c | 2 +-
6 files changed, 117 insertions(+), 305 deletions(-)
delete mode 100644 Documentation/dev-tools/kunit/kunit-tool.rst
base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
prerequisite-patch-id: b794218cd939a6644aaf5fb2a73997c56a624c80
prerequisite-patch-id: ccd24491ae99152ebdc6dcb8ddb9499d3456a4a0
prerequisite-patch-id: cc17b80d42fd5f5049e144da5c04e922036a33eb
prerequisite-patch-id: ba7edd270c6f285352e0e17bfe65ff6119192113
--
2.37.2
Hi,
Continuing the documentation refactoring started by Harinder Singh[1],
removes kunit-tool.rst, which had its information rearranged into run_wrapper,
and employs further work in the index and the getting-started guide.
This series was written on top of another[2] that haven't got applied yet,
but the only dependency it has is the "kunit-on-qemu" anchor used in start.rst.
The patches working with the start.rst might be squashable, feel free to suggest so, if
you concur. Still about this file, I realize the note about "mrproper" was removed in
the recent refactoring, but having worked in the last months with people new to kunit,
it showed itself as a common occurrence, so I'm suggesting here to restore it.
Regarding the last two patches, I wasn't sure about either renaming run_wrapper to
kunit-tool to keep old references working or do as I did, updating the references I
could find.
Thanks in advance for your feedbacks,
Tales
[1] https://lore.kernel.org/r/20211217044911.798817-1-sharinder@google.com/
[2] https://lore.kernel.org/r/20220813042055.136832-1-tales.aparecida@gmail.com/
Tales Aparecida (8):
Documentation: KUnit: remove duplicated docs for kunit_tool
Documentation: KUnit: avoid repeating "kunit.py run" in start.rst
Documentation: KUnit: restore note about mrproper in start.rst
Documentation: KUnit: Reword start guide for selecting tests
Documentation: KUnit: add intro to the getting-started page
Documentation: KUnit: update links in the index page
lib: overflow: update reference to kunit-tool
lib: stackinit: update reference to kunit-tool
Documentation/dev-tools/kunit/index.rst | 16 +-
Documentation/dev-tools/kunit/kunit-tool.rst | 232 ------------------
Documentation/dev-tools/kunit/run_wrapper.rst | 4 +-
Documentation/dev-tools/kunit/start.rst | 137 +++++++----
lib/overflow_kunit.c | 2 +-
lib/stackinit_kunit.c | 2 +-
6 files changed, 103 insertions(+), 290 deletions(-)
delete mode 100644 Documentation/dev-tools/kunit/kunit-tool.rst
base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
prerequisite-patch-id: b794218cd939a6644aaf5fb2a73997c56a624c80
prerequisite-patch-id: ccd24491ae99152ebdc6dcb8ddb9499d3456a4a0
prerequisite-patch-id: cc17b80d42fd5f5049e144da5c04e922036a33eb
prerequisite-patch-id: ba7edd270c6f285352e0e17bfe65ff6119192113
--
2.37.1
Updated the kunit_tool.rst guide to streamline it. The following changes
were made:
1. Updated headings
2. Reworded content across sections
3. Added a cross reference to full list of command-line args
Signed-off-by: Sadiya Kazi <sadiyakazi(a)google.com>
---
Documentation/dev-tools/kunit/kunit-tool.rst | 82 ++++++++++----------
1 file changed, 42 insertions(+), 40 deletions(-)
diff --git a/Documentation/dev-tools/kunit/kunit-tool.rst b/Documentation/dev-tools/kunit/kunit-tool.rst
index ae52e0f489f9..33186679f5de 100644
--- a/Documentation/dev-tools/kunit/kunit-tool.rst
+++ b/Documentation/dev-tools/kunit/kunit-tool.rst
@@ -1,8 +1,10 @@
.. SPDX-License-Identifier: GPL-2.0
-=================
-kunit_tool How-To
-=================
+========================
+Understanding kunit_tool
+========================
+
+This page introduces the kunit_tool and covers the concepts and working of this tool.
What is kunit_tool?
===================
@@ -10,39 +12,37 @@ What is kunit_tool?
kunit_tool is a script (``tools/testing/kunit/kunit.py``) that aids in building
the Linux kernel as UML (`User Mode Linux
<http://user-mode-linux.sourceforge.net/>`_), running KUnit tests, parsing
-the test results and displaying them in a user friendly manner.
+the test results and displaying them in a user-friendly manner.
kunit_tool addresses the problem of being able to run tests without needing a
virtual machine or actual hardware with User Mode Linux. User Mode Linux is a
Linux architecture, like ARM or x86; however, unlike other architectures it
-compiles the kernel as a standalone Linux executable that can be run like any
+compiles the kernel as a standalone Linux executable. This executable can be run like any
other program directly inside of a host operating system. To be clear, it does
not require any virtualization support: it is just a regular program.
-What is a .kunitconfig?
-=======================
+What is .kunitconfig?
+=====================
-It's just a defconfig that kunit_tool looks for in the build directory
-(``.kunit`` by default). kunit_tool uses it to generate a .config as you might
-expect. In addition, it verifies that the generated .config contains the CONFIG
-options in the .kunitconfig; the reason it does this is so that it is easy to
-be sure that a CONFIG that enables a test actually ends up in the .config.
+.kunitconfig is a default configuration file (defconfig) that kunit_tool looks
+for in the build directory (``.kunit``). The kunit_tool uses this file to
+generate a .config. Additionally, it also verifies that the generated .config contains the CONFIG options in the .kunitconfig file. This is done to make sure that a CONFIG that enables a test is actually part of the .config file.
-It's also possible to pass a separate .kunitconfig fragment to kunit_tool,
+It is also possible to pass a separate .kunitconfig fragment to kunit_tool,
which is useful if you have several different groups of tests you wish
-to run independently, or if you want to use pre-defined test configs for
+to run independently, or if you want to use pre-defined test configurations for
certain subsystems.
-Getting Started with kunit_tool
+Getting started with kunit_tool
===============================
-If a kunitconfig is present at the root directory, all you have to do is:
+If a kunitconfig is present at the root directory, run the following command:
.. code-block:: bash
./tools/testing/kunit/kunit.py run
-However, you most likely want to use it with the following options:
+However, most likely you may want to use it with the following options:
.. code-block:: bash
@@ -68,20 +68,20 @@ For a list of all the flags supported by kunit_tool, you can run:
./tools/testing/kunit/kunit.py run --help
-Configuring, Building, and Running Tests
+Configuring, building, and running tests
========================================
-It's also possible to run just parts of the KUnit build process independently,
-which is useful if you want to make manual changes to part of the process.
+It is also possible to run specific parts of the KUnit build process independently.
+This is useful if you want to make manual changes to part of the process.
-A .config can be generated from a .kunitconfig by using the ``config`` argument
+If you want to generate a .config from a .kunitconfig, you can use the ``config`` argument
when running kunit_tool:
.. code-block:: bash
./tools/testing/kunit/kunit.py config
-Similarly, if you just want to build a KUnit kernel from the current .config,
+Similarly, if you want to build a KUnit kernel from the current .config,
you can use the ``build`` argument:
.. code-block:: bash
@@ -95,33 +95,31 @@ run the kernel and display the test results with the ``exec`` argument:
./tools/testing/kunit/kunit.py exec
-The ``run`` command which is discussed above is equivalent to running all three
+The ``run`` command, discussed above is equivalent to running all three
of these in sequence.
All of these commands accept a number of optional command-line arguments. The
``--help`` flag will give a complete list of these, or keep reading this page
for a guide to some of the more useful ones.
-Parsing Test Results
+Parsing test results
====================
-KUnit tests output their results in TAP (Test Anything Protocol) format.
-kunit_tool will, when running tests, parse this output and print a summary
-which is much more pleasant to read. If you wish to look at the raw test
-results in TAP format, you can pass the ``--raw_output`` argument.
+The output of the KUnit test results are displayed in TAP (Test Anything Protocol) format.
+When running tests, the kunit_tool parses this output and prints a plaintext, human-readable summary. To view the raw test results in TAP format, you can use the ``--raw_output`` argument.
.. code-block:: bash
./tools/testing/kunit/kunit.py run --raw_output
The raw output from test runs may contain other, non-KUnit kernel log
-lines. You can see just KUnit output with ``--raw_output=kunit``:
+lines. To view only the KUnit output, you can use ``--raw_output=kunit``:
.. code-block:: bash
./tools/testing/kunit/kunit.py run --raw_output=kunit
-If you have KUnit results in their raw TAP format, you can parse them and print
+If you have KUnit results in the raw TAP format, you can parse them and print
the human-readable summary with the ``parse`` command for kunit_tool. This
accepts a filename for an argument, or will read from standard input.
@@ -135,11 +133,11 @@ accepts a filename for an argument, or will read from standard input.
This is very useful if you wish to run tests in a configuration not supported
by kunit_tool (such as on real hardware, or an unsupported architecture).
-Filtering Tests
+Filtering tests
===============
-It's possible to run only a subset of the tests built into a kernel by passing
-a filter to the ``exec`` or ``run`` commands. For example, if you only wanted
+It is possible to run only a subset of the tests built into a kernel by passing
+a filter to the ``exec`` or ``run`` commands. For example, if you want
to run KUnit resource tests, you could use:
.. code-block:: bash
@@ -148,15 +146,14 @@ to run KUnit resource tests, you could use:
This uses the standard glob format for wildcards.
-Running Tests on QEMU
+Running tests on QEMU
=====================
-kunit_tool supports running tests on QEMU as well as via UML (as mentioned
-elsewhere). The default way of running tests on QEMU requires two flags:
+kunit_tool supports running tests on QEMU as well as via UML. The default way of running tests on QEMU requires two flags:
``--arch``
Selects a collection of configs (Kconfig as well as QEMU configs
- options, etc) that allow KUnit tests to be run on the specified
+ options and so on) that allow KUnit tests to be run on the specified
architecture in a minimal way; this is usually not much slower than
using UML. The architecture argument is the same as the name of the
option passed to the ``ARCH`` variable used by Kbuild. Not all
@@ -196,8 +193,8 @@ look something like this:
--jobs=12 \
--qemu_config=./tools/testing/kunit/qemu_configs/x86_64.py
-Other Useful Options
-====================
+Other useful options
+======================
kunit_tool has a number of other command-line arguments which can be useful
when adapting it to fit your environment or needs.
@@ -228,5 +225,10 @@ Some of the more useful ones are:
dependencies by adding ``CONFIG_KUNIT_ALL_TESTS=1`` to your
.kunitconfig is preferable.
-There are several other options (and new ones are often added), so do check
+There are several other options (and new ones are often added), so do run
``--help`` if you're looking for something not mentioned here.
+For more information on these options, see `Command-line-arguments
+<https://www.kernel.org/doc/html/latest/dev-tools/kunit/run_wrapper.html#com…>`__
+
+
+.
--
2.37.1.595.g718a3a8f04-goog
From: Kyle Huey <me(a)kylehuey.com>
When management of the PKRU register was moved away from XSTATE, emulation
of PKRU's existence in XSTATE was added for APIs that read XSTATE, but not
for APIs that write XSTATE. This can be seen by running gdb and executing
`p $pkru`, `set $pkru = 42`, and `p $pkru`. On affected kernels (5.14+) the
write to the PKRU register (which gdb performs through ptrace) is ignored.
There are three relevant APIs: PTRACE_SETREGSET with NT_X86_XSTATE,
sigreturn, and KVM_SET_XSAVE. KVM_SET_XSAVE has its own special handling to
make PKRU writes take effect (in fpu_copy_uabi_to_guest_fpstate). Push that
down into copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE
and sigreturn pass in pointers to the appropriate PKRU value.
This also adds code to initialize the PKRU value to the hardware init value
(namely 0) if the PKRU bit is not set in the XSTATE header to match XRSTOR.
This is a change to the current KVM_SET_XSAVE behavior.
Changelog since v4:
- Selftest additionally checks PKRU readbacks through ptrace.
- Selftest flips all PKRU bits (except the key used for PROT_EXEC).
Changelog since v3:
- The v3 patch is now part 1 of 2.
- Adds a selftest in part 2 of 2.
Changelog since v2:
- Removed now unused variables in fpu_copy_uabi_to_guest_fpstate
Changelog since v1:
- Handles the error case of copy_to_buffer().
Signed-off-by: Kyle Huey <me(a)kylehuey.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: kvm(a)vger.kernel.org # For edge case behavior of KVM_SET_XSAVE
Cc: stable(a)vger.kernel.org # 5.14+
Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()")
---
arch/x86/kernel/fpu/core.c | 13 +------------
arch/x86/kernel/fpu/regset.c | 2 +-
arch/x86/kernel/fpu/signal.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 28 +++++++++++++++++++++++-----
arch/x86/kernel/fpu/xstate.h | 4 ++--
5 files changed, 28 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3b28c5b25e12..46b935bc87c8 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
{
struct fpstate *kstate = gfpu->fpstate;
const union fpregs_state *ustate = buf;
- struct pkru_state *xpkru;
- int ret;
if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) {
if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE)
@@ -406,16 +404,7 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
- ret = copy_uabi_from_kernel_to_xstate(kstate, ustate);
- if (ret)
- return ret;
-
- /* Retrieve PKRU if not in init state */
- if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) {
- xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU);
- *vpkru = xpkru->pkru;
- }
- return 0;
+ return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
}
EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate);
#endif /* CONFIG_KVM */
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 75ffaef8c299..6d056b68f4ed 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -167,7 +167,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
}
fpu_force_restore(fpu);
- ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf);
+ ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf, &target->thread.pkru);
out:
vfree(tmpbuf);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 91d4b6de58ab..558076dbde5b 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -396,7 +396,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
fpregs = &fpu->fpstate->regs;
if (use_xsave() && !fx_only) {
- if (copy_sigframe_from_user_to_xstate(fpu->fpstate, buf_fx))
+ if (copy_sigframe_from_user_to_xstate(tsk, buf_fx))
return false;
} else {
if (__copy_from_user(&fpregs->fxsave, buf_fx,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..e01d3514ae68 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1197,7 +1197,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
- const void __user *ubuf)
+ const void __user *ubuf, u32 *pkru)
{
struct xregs_state *xsave = &fpstate->regs.xsave;
unsigned int offset, size;
@@ -1235,6 +1235,24 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
for (i = 0; i < XFEATURE_MAX; i++) {
mask = BIT_ULL(i);
+ if (i == XFEATURE_PKRU) {
+ /*
+ * Retrieve PKRU if not in init state, otherwise
+ * initialize it.
+ */
+ if (hdr.xfeatures & mask) {
+ struct pkru_state xpkru = {0};
+
+ if (copy_from_buffer(&xpkru, xstate_offsets[i],
+ sizeof(xpkru), kbuf, ubuf))
+ return -EFAULT;
+
+ *pkru = xpkru.pkru;
+ } else {
+ *pkru = 0;
+ }
+ }
+
if (hdr.xfeatures & mask) {
void *dst = __raw_xsave_addr(xsave, i);
@@ -1264,9 +1282,9 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
* Convert from a ptrace standard-format kernel buffer to kernel XSAVE[S]
* format and copy to the target thread. Used by ptrace and KVM.
*/
-int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
+int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru)
{
- return copy_uabi_to_xstate(fpstate, kbuf, NULL);
+ return copy_uabi_to_xstate(fpstate, kbuf, NULL, pkru);
}
/*
@@ -1274,10 +1292,10 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
* XSAVE[S] format and copy to the target thread. This is called from the
* sigreturn() and rt_sigreturn() system calls.
*/
-int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate,
+int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
const void __user *ubuf)
{
- return copy_uabi_to_xstate(fpstate, NULL, ubuf);
+ return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru);
}
static bool validate_independent_components(u64 mask)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 5ad47031383b..a4ecb04d8d64 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -46,8 +46,8 @@ extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
u32 pkru_val, enum xstate_copy_mode copy_mode);
extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode mode);
-extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf);
-extern int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate, const void __user *ubuf);
+extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru);
+extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void __user *ubuf);
extern void fpu__init_cpu_xstate(void);
--
2.37.1
On 2022-08-11 13:28, Ido Schimmel wrote:
>> > I'm talking about roaming, not forwarding. Let's say you have a locked
>> > entry with MAC X pointing to port Y. Now you get a packet with SMAC X
>> > from port Z which is unlocked. Will the FDB entry roam to port Z? I
>> > think it should, but at least in current implementation it seems that
>> > the "locked" flag will not be reset and having locked entries pointing
>> > to an unlocked port looks like a bug.
>> >
>>
In general I have been thinking that the said setup is a network
configuration error as I was arguing in an earlier conversation with
Vladimir. In this setup we must remember that SMAC X becomes DMAC X in
the return traffic on the open port. But the question arises to me why
MAC X would be behind the locked port without getting authed while being
behind an open port too?
In a real life setup, I don't think you would want random hosts behind a
locked port in the MAB case, but only the hosts you will let through.
Other hosts should be regarded as intruders.
If we are talking about a station move, then the locked entry will age
out and MAC X will function normally on the open port after the timeout,
which was a case that was taken up in earlier discussions.
But I will anyhow do some testing with this 'edge case' (of being behind
both a locked and an unlocked port) if I may call it so, and see to that
the offloaded and non-offloaded cases correspond to each other, and will
work satisfactory.
I think it will be good to have a flag to enable the mac-auth/MAB
feature, and I suggest just calling the flag 'mab', as it is short.
Otherwise I don't see any major issues with the whole feature as it is.
Hello,
This patch series implements a new ioctl on the pagemap proc fs file to
get, clear and perform both get and clear at the same time atomically on
the specified range of the memory.
Soft-dirty PTE bit of the memory pages can be viewed by using pagemap
procfs file. The soft-dirty PTE bit for the whole memory range of the
process can be cleared by writing to the clear_refs file. This series
adds features that weren't present earlier.
- There is no atomic get soft-dirty PTE bit status and clear operation
present.
- The soft-dirty PTE bit of only a part of memory cannot be cleared.
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The proc fs interface is enough for that as I think the process
is frozen. We have the use case where we need to track the soft-dirty
PTE bit for the running processes. We need this tracking and clear
mechanism of a region of memory while the process is running to emulate
the getWriteWatch() syscall of Windows. This syscall is used by games to
keep track of dirty pages and keep processing only the dirty pages. This
new ioctl can be used by the CRIU project and other applications which
require soft-dirty PTE bit information.
As in the current kernel there is no way to clear a part of memory (instead
of clearing the Soft-Dirty bits for the entire process) and get+clear
operation cannot be performed atomically, there are other methods to mimic
this information entirely in userspace with poor performance:
- The mprotect syscall and SIGSEGV handler for bookkeeping
- The userfaultfd syscall with the handler for bookkeeping
Some benchmarks can be seen [1].
This ioctl can be used by the CRIU project and other applications which
require soft-dirty PTE bit information. The following operations are
supported in this ioctl:
- Get the pages that are soft-dirty.
- Clear the pages which are soft-dirty.
- The optional flag to ignore the VM_SOFTDIRTY and only track per page
soft-dirty PTE bit
There are two decisions which have been taken about how to get the output
from the syscall.
- Return offsets of the pages from the start in the vec
- Stop execution when vec is filled with dirty pages
These two arguments doesn't follow the mincore() philosophy where the
output array corresponds to the address range in one to one fashion, hence
the output buffer length isn't passed and only a flag is set if the page
is present. This makes mincore() easy to use with less control. We are
passing the size of the output array and putting return data consecutively
which is offset of dirty pages from the start. The user can convert these
offsets back into the dirty page addresses easily. Suppose, the user want
to get first 10 dirty pages from a total memory of 100 pages. He'll
allocate output buffer of size 10 and the ioctl will abort after finding the
10 pages. This behaviour is needed to support Windows' getWriteWatch(). The
behaviour like mincore() can be achieved by passing output buffer of 100
size. This interface can be used for any desired behaviour.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
Regards,
Muhammad Usama Anjum
Cc: Gabriel Krisman Bertazi <krisman(a)collabora.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Peter Enderborg <peter.enderborg(a)sony.com>
Muhammad Usama Anjum (4):
fs/proc/task_mmu: update functions to clear the soft-dirty bit
fs/proc/task_mmu: Implement IOCTL to get and clear soft dirty PTE bit
selftests: vm: add pagemap ioctl tests
mm: add documentation of the new ioctl on pagemap
Documentation/admin-guide/mm/soft-dirty.rst | 42 +-
fs/proc/task_mmu.c | 337 ++++++++++-
include/uapi/linux/fs.h | 13 +
tools/include/uapi/linux/fs.h | 13 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 2 +
tools/testing/selftests/vm/pagemap_ioctl.c | 629 ++++++++++++++++++++
7 files changed, 1005 insertions(+), 32 deletions(-)
create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c
--
2.30.2
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit b71b7bfeac38c7a21c423ddafb29aa6258949df8 ]
"ns1" is a too generic name, use a random suffix to avoid
errors when such a netns exists. Also allows to run multiple
instances of the script in parallel.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/netfilter/nft_flowtable.sh | 246 +++++++++---------
1 file changed, 128 insertions(+), 118 deletions(-)
diff --git a/tools/testing/selftests/netfilter/nft_flowtable.sh b/tools/testing/selftests/netfilter/nft_flowtable.sh
index d4ffebb989f8..c336e6c148d1 100755
--- a/tools/testing/selftests/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/netfilter/nft_flowtable.sh
@@ -14,6 +14,11 @@
# nft_flowtable.sh -o8000 -l1500 -r2000
#
+sfx=$(mktemp -u "XXXXXXXX")
+ns1="ns1-$sfx"
+ns2="ns2-$sfx"
+nsr1="nsr1-$sfx"
+nsr2="nsr2-$sfx"
# Kselftest framework requirement - SKIP code is 4.
ksft_skip=4
@@ -36,18 +41,17 @@ checktool (){
checktool "nft --version" "run test without nft tool"
checktool "ip -Version" "run test without ip tool"
checktool "which nc" "run test without nc (netcat)"
-checktool "ip netns add nsr1" "create net namespace"
+checktool "ip netns add $nsr1" "create net namespace $nsr1"
-ip netns add ns1
-ip netns add ns2
-
-ip netns add nsr2
+ip netns add $ns1
+ip netns add $ns2
+ip netns add $nsr2
cleanup() {
- for i in 1 2; do
- ip netns del ns$i
- ip netns del nsr$i
- done
+ ip netns del $ns1
+ ip netns del $ns2
+ ip netns del $nsr1
+ ip netns del $nsr2
rm -f "$ns1in" "$ns1out"
rm -f "$ns2in" "$ns2out"
@@ -59,22 +63,21 @@ trap cleanup EXIT
sysctl -q net.netfilter.nf_log_all_netns=1
-ip link add veth0 netns nsr1 type veth peer name eth0 netns ns1
-ip link add veth1 netns nsr1 type veth peer name veth0 netns nsr2
+ip link add veth0 netns $nsr1 type veth peer name eth0 netns $ns1
+ip link add veth1 netns $nsr1 type veth peer name veth0 netns $nsr2
-ip link add veth1 netns nsr2 type veth peer name eth0 netns ns2
+ip link add veth1 netns $nsr2 type veth peer name eth0 netns $ns2
for dev in lo veth0 veth1; do
- for i in 1 2; do
- ip -net nsr$i link set $dev up
- done
+ ip -net $nsr1 link set $dev up
+ ip -net $nsr2 link set $dev up
done
-ip -net nsr1 addr add 10.0.1.1/24 dev veth0
-ip -net nsr1 addr add dead:1::1/64 dev veth0
+ip -net $nsr1 addr add 10.0.1.1/24 dev veth0
+ip -net $nsr1 addr add dead:1::1/64 dev veth0
-ip -net nsr2 addr add 10.0.2.1/24 dev veth1
-ip -net nsr2 addr add dead:2::1/64 dev veth1
+ip -net $nsr2 addr add 10.0.2.1/24 dev veth1
+ip -net $nsr2 addr add dead:2::1/64 dev veth1
# set different MTUs so we need to push packets coming from ns1 (large MTU)
# to ns2 (smaller MTU) to stack either to perform fragmentation (ip_no_pmtu_disc=1),
@@ -106,49 +109,56 @@ do
esac
done
-if ! ip -net nsr1 link set veth0 mtu $omtu; then
+if ! ip -net $nsr1 link set veth0 mtu $omtu; then
exit 1
fi
-ip -net ns1 link set eth0 mtu $omtu
+ip -net $ns1 link set eth0 mtu $omtu
-if ! ip -net nsr2 link set veth1 mtu $rmtu; then
+if ! ip -net $nsr2 link set veth1 mtu $rmtu; then
exit 1
fi
-ip -net ns2 link set eth0 mtu $rmtu
+ip -net $ns2 link set eth0 mtu $rmtu
# transfer-net between nsr1 and nsr2.
# these addresses are not used for connections.
-ip -net nsr1 addr add 192.168.10.1/24 dev veth1
-ip -net nsr1 addr add fee1:2::1/64 dev veth1
-
-ip -net nsr2 addr add 192.168.10.2/24 dev veth0
-ip -net nsr2 addr add fee1:2::2/64 dev veth0
-
-for i in 1 2; do
- ip netns exec nsr$i sysctl net.ipv4.conf.veth0.forwarding=1 > /dev/null
- ip netns exec nsr$i sysctl net.ipv4.conf.veth1.forwarding=1 > /dev/null
-
- ip -net ns$i link set lo up
- ip -net ns$i link set eth0 up
- ip -net ns$i addr add 10.0.$i.99/24 dev eth0
- ip -net ns$i route add default via 10.0.$i.1
- ip -net ns$i addr add dead:$i::99/64 dev eth0
- ip -net ns$i route add default via dead:$i::1
- if ! ip netns exec ns$i sysctl net.ipv4.tcp_no_metrics_save=1 > /dev/null; then
+ip -net $nsr1 addr add 192.168.10.1/24 dev veth1
+ip -net $nsr1 addr add fee1:2::1/64 dev veth1
+
+ip -net $nsr2 addr add 192.168.10.2/24 dev veth0
+ip -net $nsr2 addr add fee1:2::2/64 dev veth0
+
+for i in 0 1; do
+ ip netns exec $nsr1 sysctl net.ipv4.conf.veth$i.forwarding=1 > /dev/null
+ ip netns exec $nsr2 sysctl net.ipv4.conf.veth$i.forwarding=1 > /dev/null
+done
+
+for ns in $ns1 $ns2;do
+ ip -net $ns link set lo up
+ ip -net $ns link set eth0 up
+
+ if ! ip netns exec $ns sysctl net.ipv4.tcp_no_metrics_save=1 > /dev/null; then
echo "ERROR: Check Originator/Responder values (problem during address addition)"
exit 1
fi
-
# don't set ip DF bit for first two tests
- ip netns exec ns$i sysctl net.ipv4.ip_no_pmtu_disc=1 > /dev/null
+ ip netns exec $ns sysctl net.ipv4.ip_no_pmtu_disc=1 > /dev/null
done
-ip -net nsr1 route add default via 192.168.10.2
-ip -net nsr2 route add default via 192.168.10.1
+ip -net $ns1 addr add 10.0.1.99/24 dev eth0
+ip -net $ns2 addr add 10.0.2.99/24 dev eth0
+ip -net $ns1 route add default via 10.0.1.1
+ip -net $ns2 route add default via 10.0.2.1
+ip -net $ns1 addr add dead:1::99/64 dev eth0
+ip -net $ns2 addr add dead:2::99/64 dev eth0
+ip -net $ns1 route add default via dead:1::1
+ip -net $ns2 route add default via dead:2::1
+
+ip -net $nsr1 route add default via 192.168.10.2
+ip -net $nsr2 route add default via 192.168.10.1
-ip netns exec nsr1 nft -f - <<EOF
+ip netns exec $nsr1 nft -f - <<EOF
table inet filter {
flowtable f1 {
hook ingress priority 0
@@ -197,18 +207,18 @@ if [ $? -ne 0 ]; then
fi
# test basic connectivity
-if ! ip netns exec ns1 ping -c 1 -q 10.0.2.99 > /dev/null; then
- echo "ERROR: ns1 cannot reach ns2" 1>&2
+if ! ip netns exec $ns1 ping -c 1 -q 10.0.2.99 > /dev/null; then
+ echo "ERROR: $ns1 cannot reach ns2" 1>&2
exit 1
fi
-if ! ip netns exec ns2 ping -c 1 -q 10.0.1.99 > /dev/null; then
- echo "ERROR: ns2 cannot reach ns1" 1>&2
+if ! ip netns exec $ns2 ping -c 1 -q 10.0.1.99 > /dev/null; then
+ echo "ERROR: $ns2 cannot reach $ns1" 1>&2
exit 1
fi
if [ $ret -eq 0 ];then
- echo "PASS: netns routing/connectivity: ns1 can reach ns2"
+ echo "PASS: netns routing/connectivity: $ns1 can reach $ns2"
fi
ns1in=$(mktemp)
@@ -312,24 +322,24 @@ make_file "$ns2in"
# First test:
# No PMTU discovery, nsr1 is expected to fragment packets from ns1 to ns2 as needed.
-if test_tcp_forwarding ns1 ns2; then
+if test_tcp_forwarding $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2"
else
echo "FAIL: flow offload for ns1/ns2:" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
ret=1
fi
# delete default route, i.e. ns2 won't be able to reach ns1 and
# will depend on ns1 being masqueraded in nsr1.
# expect ns1 has nsr1 address.
-ip -net ns2 route del default via 10.0.2.1
-ip -net ns2 route del default via dead:2::1
-ip -net ns2 route add 192.168.10.1 via 10.0.2.1
+ip -net $ns2 route del default via 10.0.2.1
+ip -net $ns2 route del default via dead:2::1
+ip -net $ns2 route add 192.168.10.1 via 10.0.2.1
# Second test:
# Same, but with NAT enabled.
-ip netns exec nsr1 nft -f - <<EOF
+ip netns exec $nsr1 nft -f - <<EOF
table ip nat {
chain prerouting {
type nat hook prerouting priority 0; policy accept;
@@ -343,47 +353,47 @@ table ip nat {
}
EOF
-if test_tcp_forwarding_nat ns1 ns2; then
+if test_tcp_forwarding_nat $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2 with NAT"
else
echo "FAIL: flow offload for ns1/ns2 with NAT" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
ret=1
fi
# Third test:
# Same as second test, but with PMTU discovery enabled.
-handle=$(ip netns exec nsr1 nft -a list table inet filter | grep something-to-grep-for | cut -d \# -f 2)
+handle=$(ip netns exec $nsr1 nft -a list table inet filter | grep something-to-grep-for | cut -d \# -f 2)
-if ! ip netns exec nsr1 nft delete rule inet filter forward $handle; then
+if ! ip netns exec $nsr1 nft delete rule inet filter forward $handle; then
echo "FAIL: Could not delete large-packet accept rule"
exit 1
fi
-ip netns exec ns1 sysctl net.ipv4.ip_no_pmtu_disc=0 > /dev/null
-ip netns exec ns2 sysctl net.ipv4.ip_no_pmtu_disc=0 > /dev/null
+ip netns exec $ns1 sysctl net.ipv4.ip_no_pmtu_disc=0 > /dev/null
+ip netns exec $ns2 sysctl net.ipv4.ip_no_pmtu_disc=0 > /dev/null
-if test_tcp_forwarding_nat ns1 ns2; then
+if test_tcp_forwarding_nat $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2 with NAT and pmtu discovery"
else
echo "FAIL: flow offload for ns1/ns2 with NAT and pmtu discovery" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
fi
# Another test:
# Add bridge interface br0 to Router1, with NAT enabled.
-ip -net nsr1 link add name br0 type bridge
-ip -net nsr1 addr flush dev veth0
-ip -net nsr1 link set up dev veth0
-ip -net nsr1 link set veth0 master br0
-ip -net nsr1 addr add 10.0.1.1/24 dev br0
-ip -net nsr1 addr add dead:1::1/64 dev br0
-ip -net nsr1 link set up dev br0
+ip -net $nsr1 link add name br0 type bridge
+ip -net $nsr1 addr flush dev veth0
+ip -net $nsr1 link set up dev veth0
+ip -net $nsr1 link set veth0 master br0
+ip -net $nsr1 addr add 10.0.1.1/24 dev br0
+ip -net $nsr1 addr add dead:1::1/64 dev br0
+ip -net $nsr1 link set up dev br0
-ip netns exec nsr1 sysctl net.ipv4.conf.br0.forwarding=1 > /dev/null
+ip netns exec $nsr1 sysctl net.ipv4.conf.br0.forwarding=1 > /dev/null
# br0 with NAT enabled.
-ip netns exec nsr1 nft -f - <<EOF
+ip netns exec $nsr1 nft -f - <<EOF
flush table ip nat
table ip nat {
chain prerouting {
@@ -398,59 +408,59 @@ table ip nat {
}
EOF
-if test_tcp_forwarding_nat ns1 ns2; then
+if test_tcp_forwarding_nat $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2 with bridge NAT"
else
echo "FAIL: flow offload for ns1/ns2 with bridge NAT" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
ret=1
fi
# Another test:
# Add bridge interface br0 to Router1, with NAT and VLAN.
-ip -net nsr1 link set veth0 nomaster
-ip -net nsr1 link set down dev veth0
-ip -net nsr1 link add link veth0 name veth0.10 type vlan id 10
-ip -net nsr1 link set up dev veth0
-ip -net nsr1 link set up dev veth0.10
-ip -net nsr1 link set veth0.10 master br0
-
-ip -net ns1 addr flush dev eth0
-ip -net ns1 link add link eth0 name eth0.10 type vlan id 10
-ip -net ns1 link set eth0 up
-ip -net ns1 link set eth0.10 up
-ip -net ns1 addr add 10.0.1.99/24 dev eth0.10
-ip -net ns1 route add default via 10.0.1.1
-ip -net ns1 addr add dead:1::99/64 dev eth0.10
-
-if test_tcp_forwarding_nat ns1 ns2; then
+ip -net $nsr1 link set veth0 nomaster
+ip -net $nsr1 link set down dev veth0
+ip -net $nsr1 link add link veth0 name veth0.10 type vlan id 10
+ip -net $nsr1 link set up dev veth0
+ip -net $nsr1 link set up dev veth0.10
+ip -net $nsr1 link set veth0.10 master br0
+
+ip -net $ns1 addr flush dev eth0
+ip -net $ns1 link add link eth0 name eth0.10 type vlan id 10
+ip -net $ns1 link set eth0 up
+ip -net $ns1 link set eth0.10 up
+ip -net $ns1 addr add 10.0.1.99/24 dev eth0.10
+ip -net $ns1 route add default via 10.0.1.1
+ip -net $ns1 addr add dead:1::99/64 dev eth0.10
+
+if test_tcp_forwarding_nat $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2 with bridge NAT and VLAN"
else
echo "FAIL: flow offload for ns1/ns2 with bridge NAT and VLAN" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
ret=1
fi
# restore test topology (remove bridge and VLAN)
-ip -net nsr1 link set veth0 nomaster
-ip -net nsr1 link set veth0 down
-ip -net nsr1 link set veth0.10 down
-ip -net nsr1 link delete veth0.10 type vlan
-ip -net nsr1 link delete br0 type bridge
-ip -net ns1 addr flush dev eth0.10
-ip -net ns1 link set eth0.10 down
-ip -net ns1 link set eth0 down
-ip -net ns1 link delete eth0.10 type vlan
+ip -net $nsr1 link set veth0 nomaster
+ip -net $nsr1 link set veth0 down
+ip -net $nsr1 link set veth0.10 down
+ip -net $nsr1 link delete veth0.10 type vlan
+ip -net $nsr1 link delete br0 type bridge
+ip -net $ns1 addr flush dev eth0.10
+ip -net $ns1 link set eth0.10 down
+ip -net $ns1 link set eth0 down
+ip -net $ns1 link delete eth0.10 type vlan
# restore address in ns1 and nsr1
-ip -net ns1 link set eth0 up
-ip -net ns1 addr add 10.0.1.99/24 dev eth0
-ip -net ns1 route add default via 10.0.1.1
-ip -net ns1 addr add dead:1::99/64 dev eth0
-ip -net ns1 route add default via dead:1::1
-ip -net nsr1 addr add 10.0.1.1/24 dev veth0
-ip -net nsr1 addr add dead:1::1/64 dev veth0
-ip -net nsr1 link set up dev veth0
+ip -net $ns1 link set eth0 up
+ip -net $ns1 addr add 10.0.1.99/24 dev eth0
+ip -net $ns1 route add default via 10.0.1.1
+ip -net $ns1 addr add dead:1::99/64 dev eth0
+ip -net $ns1 route add default via dead:1::1
+ip -net $nsr1 addr add 10.0.1.1/24 dev veth0
+ip -net $nsr1 addr add dead:1::1/64 dev veth0
+ip -net $nsr1 link set up dev veth0
KEY_SHA="0x"$(ps -xaf | sha1sum | cut -d " " -f 1)
KEY_AES="0x"$(ps -xaf | md5sum | cut -d " " -f 1)
@@ -480,23 +490,23 @@ do_esp() {
}
-do_esp nsr1 192.168.10.1 192.168.10.2 10.0.1.0/24 10.0.2.0/24 $SPI1 $SPI2
+do_esp $nsr1 192.168.10.1 192.168.10.2 10.0.1.0/24 10.0.2.0/24 $SPI1 $SPI2
-do_esp nsr2 192.168.10.2 192.168.10.1 10.0.2.0/24 10.0.1.0/24 $SPI2 $SPI1
+do_esp $nsr2 192.168.10.2 192.168.10.1 10.0.2.0/24 10.0.1.0/24 $SPI2 $SPI1
-ip netns exec nsr1 nft delete table ip nat
+ip netns exec $nsr1 nft delete table ip nat
# restore default routes
-ip -net ns2 route del 192.168.10.1 via 10.0.2.1
-ip -net ns2 route add default via 10.0.2.1
-ip -net ns2 route add default via dead:2::1
+ip -net $ns2 route del 192.168.10.1 via 10.0.2.1
+ip -net $ns2 route add default via 10.0.2.1
+ip -net $ns2 route add default via dead:2::1
-if test_tcp_forwarding ns1 ns2; then
+if test_tcp_forwarding $ns1 $ns2; then
echo "PASS: ipsec tunnel mode for ns1/ns2"
else
echo "FAIL: ipsec tunnel mode for ns1/ns2"
- ip netns exec nsr1 nft list ruleset 1>&2
- ip netns exec nsr1 cat /proc/net/xfrm_stat 1>&2
+ ip netns exec $nsr1 nft list ruleset 1>&2
+ ip netns exec $nsr1 cat /proc/net/xfrm_stat 1>&2
fi
exit $ret
--
2.35.1
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit b71b7bfeac38c7a21c423ddafb29aa6258949df8 ]
"ns1" is a too generic name, use a random suffix to avoid
errors when such a netns exists. Also allows to run multiple
instances of the script in parallel.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/netfilter/nft_flowtable.sh | 246 +++++++++---------
1 file changed, 128 insertions(+), 118 deletions(-)
diff --git a/tools/testing/selftests/netfilter/nft_flowtable.sh b/tools/testing/selftests/netfilter/nft_flowtable.sh
index d4ffebb989f8..c336e6c148d1 100755
--- a/tools/testing/selftests/netfilter/nft_flowtable.sh
+++ b/tools/testing/selftests/netfilter/nft_flowtable.sh
@@ -14,6 +14,11 @@
# nft_flowtable.sh -o8000 -l1500 -r2000
#
+sfx=$(mktemp -u "XXXXXXXX")
+ns1="ns1-$sfx"
+ns2="ns2-$sfx"
+nsr1="nsr1-$sfx"
+nsr2="nsr2-$sfx"
# Kselftest framework requirement - SKIP code is 4.
ksft_skip=4
@@ -36,18 +41,17 @@ checktool (){
checktool "nft --version" "run test without nft tool"
checktool "ip -Version" "run test without ip tool"
checktool "which nc" "run test without nc (netcat)"
-checktool "ip netns add nsr1" "create net namespace"
+checktool "ip netns add $nsr1" "create net namespace $nsr1"
-ip netns add ns1
-ip netns add ns2
-
-ip netns add nsr2
+ip netns add $ns1
+ip netns add $ns2
+ip netns add $nsr2
cleanup() {
- for i in 1 2; do
- ip netns del ns$i
- ip netns del nsr$i
- done
+ ip netns del $ns1
+ ip netns del $ns2
+ ip netns del $nsr1
+ ip netns del $nsr2
rm -f "$ns1in" "$ns1out"
rm -f "$ns2in" "$ns2out"
@@ -59,22 +63,21 @@ trap cleanup EXIT
sysctl -q net.netfilter.nf_log_all_netns=1
-ip link add veth0 netns nsr1 type veth peer name eth0 netns ns1
-ip link add veth1 netns nsr1 type veth peer name veth0 netns nsr2
+ip link add veth0 netns $nsr1 type veth peer name eth0 netns $ns1
+ip link add veth1 netns $nsr1 type veth peer name veth0 netns $nsr2
-ip link add veth1 netns nsr2 type veth peer name eth0 netns ns2
+ip link add veth1 netns $nsr2 type veth peer name eth0 netns $ns2
for dev in lo veth0 veth1; do
- for i in 1 2; do
- ip -net nsr$i link set $dev up
- done
+ ip -net $nsr1 link set $dev up
+ ip -net $nsr2 link set $dev up
done
-ip -net nsr1 addr add 10.0.1.1/24 dev veth0
-ip -net nsr1 addr add dead:1::1/64 dev veth0
+ip -net $nsr1 addr add 10.0.1.1/24 dev veth0
+ip -net $nsr1 addr add dead:1::1/64 dev veth0
-ip -net nsr2 addr add 10.0.2.1/24 dev veth1
-ip -net nsr2 addr add dead:2::1/64 dev veth1
+ip -net $nsr2 addr add 10.0.2.1/24 dev veth1
+ip -net $nsr2 addr add dead:2::1/64 dev veth1
# set different MTUs so we need to push packets coming from ns1 (large MTU)
# to ns2 (smaller MTU) to stack either to perform fragmentation (ip_no_pmtu_disc=1),
@@ -106,49 +109,56 @@ do
esac
done
-if ! ip -net nsr1 link set veth0 mtu $omtu; then
+if ! ip -net $nsr1 link set veth0 mtu $omtu; then
exit 1
fi
-ip -net ns1 link set eth0 mtu $omtu
+ip -net $ns1 link set eth0 mtu $omtu
-if ! ip -net nsr2 link set veth1 mtu $rmtu; then
+if ! ip -net $nsr2 link set veth1 mtu $rmtu; then
exit 1
fi
-ip -net ns2 link set eth0 mtu $rmtu
+ip -net $ns2 link set eth0 mtu $rmtu
# transfer-net between nsr1 and nsr2.
# these addresses are not used for connections.
-ip -net nsr1 addr add 192.168.10.1/24 dev veth1
-ip -net nsr1 addr add fee1:2::1/64 dev veth1
-
-ip -net nsr2 addr add 192.168.10.2/24 dev veth0
-ip -net nsr2 addr add fee1:2::2/64 dev veth0
-
-for i in 1 2; do
- ip netns exec nsr$i sysctl net.ipv4.conf.veth0.forwarding=1 > /dev/null
- ip netns exec nsr$i sysctl net.ipv4.conf.veth1.forwarding=1 > /dev/null
-
- ip -net ns$i link set lo up
- ip -net ns$i link set eth0 up
- ip -net ns$i addr add 10.0.$i.99/24 dev eth0
- ip -net ns$i route add default via 10.0.$i.1
- ip -net ns$i addr add dead:$i::99/64 dev eth0
- ip -net ns$i route add default via dead:$i::1
- if ! ip netns exec ns$i sysctl net.ipv4.tcp_no_metrics_save=1 > /dev/null; then
+ip -net $nsr1 addr add 192.168.10.1/24 dev veth1
+ip -net $nsr1 addr add fee1:2::1/64 dev veth1
+
+ip -net $nsr2 addr add 192.168.10.2/24 dev veth0
+ip -net $nsr2 addr add fee1:2::2/64 dev veth0
+
+for i in 0 1; do
+ ip netns exec $nsr1 sysctl net.ipv4.conf.veth$i.forwarding=1 > /dev/null
+ ip netns exec $nsr2 sysctl net.ipv4.conf.veth$i.forwarding=1 > /dev/null
+done
+
+for ns in $ns1 $ns2;do
+ ip -net $ns link set lo up
+ ip -net $ns link set eth0 up
+
+ if ! ip netns exec $ns sysctl net.ipv4.tcp_no_metrics_save=1 > /dev/null; then
echo "ERROR: Check Originator/Responder values (problem during address addition)"
exit 1
fi
-
# don't set ip DF bit for first two tests
- ip netns exec ns$i sysctl net.ipv4.ip_no_pmtu_disc=1 > /dev/null
+ ip netns exec $ns sysctl net.ipv4.ip_no_pmtu_disc=1 > /dev/null
done
-ip -net nsr1 route add default via 192.168.10.2
-ip -net nsr2 route add default via 192.168.10.1
+ip -net $ns1 addr add 10.0.1.99/24 dev eth0
+ip -net $ns2 addr add 10.0.2.99/24 dev eth0
+ip -net $ns1 route add default via 10.0.1.1
+ip -net $ns2 route add default via 10.0.2.1
+ip -net $ns1 addr add dead:1::99/64 dev eth0
+ip -net $ns2 addr add dead:2::99/64 dev eth0
+ip -net $ns1 route add default via dead:1::1
+ip -net $ns2 route add default via dead:2::1
+
+ip -net $nsr1 route add default via 192.168.10.2
+ip -net $nsr2 route add default via 192.168.10.1
-ip netns exec nsr1 nft -f - <<EOF
+ip netns exec $nsr1 nft -f - <<EOF
table inet filter {
flowtable f1 {
hook ingress priority 0
@@ -197,18 +207,18 @@ if [ $? -ne 0 ]; then
fi
# test basic connectivity
-if ! ip netns exec ns1 ping -c 1 -q 10.0.2.99 > /dev/null; then
- echo "ERROR: ns1 cannot reach ns2" 1>&2
+if ! ip netns exec $ns1 ping -c 1 -q 10.0.2.99 > /dev/null; then
+ echo "ERROR: $ns1 cannot reach ns2" 1>&2
exit 1
fi
-if ! ip netns exec ns2 ping -c 1 -q 10.0.1.99 > /dev/null; then
- echo "ERROR: ns2 cannot reach ns1" 1>&2
+if ! ip netns exec $ns2 ping -c 1 -q 10.0.1.99 > /dev/null; then
+ echo "ERROR: $ns2 cannot reach $ns1" 1>&2
exit 1
fi
if [ $ret -eq 0 ];then
- echo "PASS: netns routing/connectivity: ns1 can reach ns2"
+ echo "PASS: netns routing/connectivity: $ns1 can reach $ns2"
fi
ns1in=$(mktemp)
@@ -312,24 +322,24 @@ make_file "$ns2in"
# First test:
# No PMTU discovery, nsr1 is expected to fragment packets from ns1 to ns2 as needed.
-if test_tcp_forwarding ns1 ns2; then
+if test_tcp_forwarding $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2"
else
echo "FAIL: flow offload for ns1/ns2:" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
ret=1
fi
# delete default route, i.e. ns2 won't be able to reach ns1 and
# will depend on ns1 being masqueraded in nsr1.
# expect ns1 has nsr1 address.
-ip -net ns2 route del default via 10.0.2.1
-ip -net ns2 route del default via dead:2::1
-ip -net ns2 route add 192.168.10.1 via 10.0.2.1
+ip -net $ns2 route del default via 10.0.2.1
+ip -net $ns2 route del default via dead:2::1
+ip -net $ns2 route add 192.168.10.1 via 10.0.2.1
# Second test:
# Same, but with NAT enabled.
-ip netns exec nsr1 nft -f - <<EOF
+ip netns exec $nsr1 nft -f - <<EOF
table ip nat {
chain prerouting {
type nat hook prerouting priority 0; policy accept;
@@ -343,47 +353,47 @@ table ip nat {
}
EOF
-if test_tcp_forwarding_nat ns1 ns2; then
+if test_tcp_forwarding_nat $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2 with NAT"
else
echo "FAIL: flow offload for ns1/ns2 with NAT" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
ret=1
fi
# Third test:
# Same as second test, but with PMTU discovery enabled.
-handle=$(ip netns exec nsr1 nft -a list table inet filter | grep something-to-grep-for | cut -d \# -f 2)
+handle=$(ip netns exec $nsr1 nft -a list table inet filter | grep something-to-grep-for | cut -d \# -f 2)
-if ! ip netns exec nsr1 nft delete rule inet filter forward $handle; then
+if ! ip netns exec $nsr1 nft delete rule inet filter forward $handle; then
echo "FAIL: Could not delete large-packet accept rule"
exit 1
fi
-ip netns exec ns1 sysctl net.ipv4.ip_no_pmtu_disc=0 > /dev/null
-ip netns exec ns2 sysctl net.ipv4.ip_no_pmtu_disc=0 > /dev/null
+ip netns exec $ns1 sysctl net.ipv4.ip_no_pmtu_disc=0 > /dev/null
+ip netns exec $ns2 sysctl net.ipv4.ip_no_pmtu_disc=0 > /dev/null
-if test_tcp_forwarding_nat ns1 ns2; then
+if test_tcp_forwarding_nat $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2 with NAT and pmtu discovery"
else
echo "FAIL: flow offload for ns1/ns2 with NAT and pmtu discovery" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
fi
# Another test:
# Add bridge interface br0 to Router1, with NAT enabled.
-ip -net nsr1 link add name br0 type bridge
-ip -net nsr1 addr flush dev veth0
-ip -net nsr1 link set up dev veth0
-ip -net nsr1 link set veth0 master br0
-ip -net nsr1 addr add 10.0.1.1/24 dev br0
-ip -net nsr1 addr add dead:1::1/64 dev br0
-ip -net nsr1 link set up dev br0
+ip -net $nsr1 link add name br0 type bridge
+ip -net $nsr1 addr flush dev veth0
+ip -net $nsr1 link set up dev veth0
+ip -net $nsr1 link set veth0 master br0
+ip -net $nsr1 addr add 10.0.1.1/24 dev br0
+ip -net $nsr1 addr add dead:1::1/64 dev br0
+ip -net $nsr1 link set up dev br0
-ip netns exec nsr1 sysctl net.ipv4.conf.br0.forwarding=1 > /dev/null
+ip netns exec $nsr1 sysctl net.ipv4.conf.br0.forwarding=1 > /dev/null
# br0 with NAT enabled.
-ip netns exec nsr1 nft -f - <<EOF
+ip netns exec $nsr1 nft -f - <<EOF
flush table ip nat
table ip nat {
chain prerouting {
@@ -398,59 +408,59 @@ table ip nat {
}
EOF
-if test_tcp_forwarding_nat ns1 ns2; then
+if test_tcp_forwarding_nat $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2 with bridge NAT"
else
echo "FAIL: flow offload for ns1/ns2 with bridge NAT" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
ret=1
fi
# Another test:
# Add bridge interface br0 to Router1, with NAT and VLAN.
-ip -net nsr1 link set veth0 nomaster
-ip -net nsr1 link set down dev veth0
-ip -net nsr1 link add link veth0 name veth0.10 type vlan id 10
-ip -net nsr1 link set up dev veth0
-ip -net nsr1 link set up dev veth0.10
-ip -net nsr1 link set veth0.10 master br0
-
-ip -net ns1 addr flush dev eth0
-ip -net ns1 link add link eth0 name eth0.10 type vlan id 10
-ip -net ns1 link set eth0 up
-ip -net ns1 link set eth0.10 up
-ip -net ns1 addr add 10.0.1.99/24 dev eth0.10
-ip -net ns1 route add default via 10.0.1.1
-ip -net ns1 addr add dead:1::99/64 dev eth0.10
-
-if test_tcp_forwarding_nat ns1 ns2; then
+ip -net $nsr1 link set veth0 nomaster
+ip -net $nsr1 link set down dev veth0
+ip -net $nsr1 link add link veth0 name veth0.10 type vlan id 10
+ip -net $nsr1 link set up dev veth0
+ip -net $nsr1 link set up dev veth0.10
+ip -net $nsr1 link set veth0.10 master br0
+
+ip -net $ns1 addr flush dev eth0
+ip -net $ns1 link add link eth0 name eth0.10 type vlan id 10
+ip -net $ns1 link set eth0 up
+ip -net $ns1 link set eth0.10 up
+ip -net $ns1 addr add 10.0.1.99/24 dev eth0.10
+ip -net $ns1 route add default via 10.0.1.1
+ip -net $ns1 addr add dead:1::99/64 dev eth0.10
+
+if test_tcp_forwarding_nat $ns1 $ns2; then
echo "PASS: flow offloaded for ns1/ns2 with bridge NAT and VLAN"
else
echo "FAIL: flow offload for ns1/ns2 with bridge NAT and VLAN" 1>&2
- ip netns exec nsr1 nft list ruleset
+ ip netns exec $nsr1 nft list ruleset
ret=1
fi
# restore test topology (remove bridge and VLAN)
-ip -net nsr1 link set veth0 nomaster
-ip -net nsr1 link set veth0 down
-ip -net nsr1 link set veth0.10 down
-ip -net nsr1 link delete veth0.10 type vlan
-ip -net nsr1 link delete br0 type bridge
-ip -net ns1 addr flush dev eth0.10
-ip -net ns1 link set eth0.10 down
-ip -net ns1 link set eth0 down
-ip -net ns1 link delete eth0.10 type vlan
+ip -net $nsr1 link set veth0 nomaster
+ip -net $nsr1 link set veth0 down
+ip -net $nsr1 link set veth0.10 down
+ip -net $nsr1 link delete veth0.10 type vlan
+ip -net $nsr1 link delete br0 type bridge
+ip -net $ns1 addr flush dev eth0.10
+ip -net $ns1 link set eth0.10 down
+ip -net $ns1 link set eth0 down
+ip -net $ns1 link delete eth0.10 type vlan
# restore address in ns1 and nsr1
-ip -net ns1 link set eth0 up
-ip -net ns1 addr add 10.0.1.99/24 dev eth0
-ip -net ns1 route add default via 10.0.1.1
-ip -net ns1 addr add dead:1::99/64 dev eth0
-ip -net ns1 route add default via dead:1::1
-ip -net nsr1 addr add 10.0.1.1/24 dev veth0
-ip -net nsr1 addr add dead:1::1/64 dev veth0
-ip -net nsr1 link set up dev veth0
+ip -net $ns1 link set eth0 up
+ip -net $ns1 addr add 10.0.1.99/24 dev eth0
+ip -net $ns1 route add default via 10.0.1.1
+ip -net $ns1 addr add dead:1::99/64 dev eth0
+ip -net $ns1 route add default via dead:1::1
+ip -net $nsr1 addr add 10.0.1.1/24 dev veth0
+ip -net $nsr1 addr add dead:1::1/64 dev veth0
+ip -net $nsr1 link set up dev veth0
KEY_SHA="0x"$(ps -xaf | sha1sum | cut -d " " -f 1)
KEY_AES="0x"$(ps -xaf | md5sum | cut -d " " -f 1)
@@ -480,23 +490,23 @@ do_esp() {
}
-do_esp nsr1 192.168.10.1 192.168.10.2 10.0.1.0/24 10.0.2.0/24 $SPI1 $SPI2
+do_esp $nsr1 192.168.10.1 192.168.10.2 10.0.1.0/24 10.0.2.0/24 $SPI1 $SPI2
-do_esp nsr2 192.168.10.2 192.168.10.1 10.0.2.0/24 10.0.1.0/24 $SPI2 $SPI1
+do_esp $nsr2 192.168.10.2 192.168.10.1 10.0.2.0/24 10.0.1.0/24 $SPI2 $SPI1
-ip netns exec nsr1 nft delete table ip nat
+ip netns exec $nsr1 nft delete table ip nat
# restore default routes
-ip -net ns2 route del 192.168.10.1 via 10.0.2.1
-ip -net ns2 route add default via 10.0.2.1
-ip -net ns2 route add default via dead:2::1
+ip -net $ns2 route del 192.168.10.1 via 10.0.2.1
+ip -net $ns2 route add default via 10.0.2.1
+ip -net $ns2 route add default via dead:2::1
-if test_tcp_forwarding ns1 ns2; then
+if test_tcp_forwarding $ns1 $ns2; then
echo "PASS: ipsec tunnel mode for ns1/ns2"
else
echo "FAIL: ipsec tunnel mode for ns1/ns2"
- ip netns exec nsr1 nft list ruleset 1>&2
- ip netns exec nsr1 cat /proc/net/xfrm_stat 1>&2
+ ip netns exec $nsr1 nft list ruleset 1>&2
+ ip netns exec $nsr1 cat /proc/net/xfrm_stat 1>&2
fi
exit $ret
--
2.35.1
These patches improve the coverage of ZA signal contexts a bit, adding
some validation that the actual data is correct and covering the case
where ZA is not enabled.
Mark Brown (2):
kselftest/arm64: Tighten up validation of ZA signal context
kselftest/arm64: Add a test for signal frames with ZA disabled
.../arm64/signal/testcases/za_no_regs.c | 119 ++++++++++++++++++
.../arm64/signal/testcases/za_regs.c | 16 ++-
2 files changed, 134 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/signal/testcases/za_no_regs.c
base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
--
2.30.2
This series fixes the reported issues, and implements the suggested
improvements, for the version of the cpumask tests [1] that was merged
with commit c41e8866c28c ("lib/test: introduce cpumask KUnit test
suite").
These changes include fixes for the tests, and better alignment with the
KUnit style guidelines.
[1] https://lore.kernel.org/all/85217b5de6d62257313ad7fde3e1969421acad75.165907…
Changes since v2:
Link: https://lore.kernel.org/lkml/cover.1661007338.git.sander@svanheule.net/
- Update commit message of "lib/test_cpumask: drop cpu_possible_mask
full test"
- Use *_MSG() macros to only print mask contents on failure
Changes since v1:
Link: https://lore.kernel.org/lkml/cover.1660068429.git.sander@svanheule.net/
- Collect tags
- Rewrite commit message of "lib/test_cpumask: drop cpu_possible_mask
full test"
- Also CC KUnit mailing list
Sander Vanheule (5):
lib/test_cpumask: drop cpu_possible_mask full test
lib/test_cpumask: fix cpu_possible_mask last test
lib/test_cpumask: follow KUnit style guidelines
lib/cpumask_kunit: log mask contents
lib/cpumask_kunit: add tests file to MAINTAINERS
MAINTAINERS | 1 +
lib/Kconfig.debug | 7 +++-
lib/Makefile | 2 +-
lib/{test_cpumask.c => cpumask_kunit.c} | 52 +++++++++++++++----------
4 files changed, 38 insertions(+), 24 deletions(-)
rename lib/{test_cpumask.c => cpumask_kunit.c} (58%)
--
2.37.2
To set socket noblock, we need to use
> fcntl(fd, F_SETFL, O_NONBLOCK);
rather than:
> fcntl(fd, O_NONBLOCK);
Signed-off-by: Qiao Ma <mqaio(a)linux.alibaba.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 0fbaccdc8861..b163b7cfd957 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -598,7 +598,7 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
struct timeval timeout;
fd_set w;
- fcntl(fd, fd_flags);
+ fcntl(fd, F_SETFL, fd_flags);
/* Account for pop bytes noting each iteration of apply will
* call msg_pop_data helper so we need to account for this
* by calculating the number of apply iterations. Note user
--
1.8.3.1
From: Andrew Delgadillo <adelg(a)google.com>
When testing a kernel, one of the earliest signals one can get is if a
kernel has become tainted. For example, an organization might be
interested in mass testing commits on their hardware. An obvious first
step would be to make sure every commit boots, and a next step would be
to make sure there are no warnings/crashes/lockups, hence the utility of
a taint test.
Signed-off-by: Andrew Delgadillo <adelg(a)google.com>
---
tools/testing/selftests/core/Makefile | 1 +
tools/testing/selftests/core/taint.sh | 28 +++++++++++++++++++++++++++
2 files changed, 29 insertions(+)
create mode 100755 tools/testing/selftests/core/taint.sh
diff --git a/tools/testing/selftests/core/Makefile b/tools/testing/selftests/core/Makefile
index f6f2d6f473c6a..695bdbfb02f90 100644
--- a/tools/testing/selftests/core/Makefile
+++ b/tools/testing/selftests/core/Makefile
@@ -2,6 +2,7 @@
CFLAGS += -g -I../../../../usr/include/
TEST_GEN_PROGS := close_range_test
+TEST_PROGS := taint.sh
include ../lib.mk
diff --git a/tools/testing/selftests/core/taint.sh b/tools/testing/selftests/core/taint.sh
new file mode 100755
index 0000000000000..661c2cb8cd9bf
--- /dev/null
+++ b/tools/testing/selftests/core/taint.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+set -oue pipefail
+
+# By default, we only want to check if our system has:
+# - seen an oops or bug
+# - a warning occurred
+# - a lockup occurred
+# The bit values for these, and more, can be found at
+# Documentation/admin-guide/tainted-kernels.html
+# This value can be overridden by passing a mask as the
+# first positional argument.
+taint_bitmask=$(( 128 + 512 + 16384 ))
+
+# If we have a positional argument, then override our
+# default bitmask.
+if [[ -n "${1-}" ]]; then
+ taint_bitmask=$1
+fi
+
+taint_bits=$(cat /proc/sys/kernel/tainted)
+
+result=$(( taint_bitmask & taint_bits ))
+if [[ "$result" -ne 0 ]]; then
+ exit 1
+fi
+
+exit 0
--
2.37.1.595.g718a3a8f04-goog
寄件人:JosephOpick <linux-kselftest(a)vger.kernel.org>
主旨:The Telegraph: Mude von Arbeit und Schulden? Raus aus dieser Scheie?
郵件內文:
Mude von niedrigen Lohnen und hohen Benzinpreisen? Es gibt eine Losung http://news-kc.cavemanorganics.com/news-bild-4196
--
This e-mail was sent from a contact form on Hearingaid (http://hearingaid.com.hk)
Hi Linus,
Please pull the following KUnit fixes update for Linux 6.0-rc3.
This KUnit fixes update for Linux 6.0-rc3 consists of fixes to mmc
test and fix to load .kunit_test_suites section when CONFIG_KUNIT=m,
and not just when KUnit is built-in.
Please note that this KUnit update touches mmc driver Kconfig and
kernel/module/main.c.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 568035b01cfb107af8d2e4bd2fb9aea22cf5b868:
Linux 6.0-rc1 (2022-08-14 15:50:18 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-fixes-6.0-rc3
for you to fetch changes up to 41a55567b9e31cb852670684404654ec4fd0d8d6:
module: kunit: Load .kunit_test_suites section when CONFIG_KUNIT=m (2022-08-15 13:51:07 -0600)
----------------------------------------------------------------
linux-kselftest-kunit-fixes-6.0-rc3
This KUnit fixes update for Linux 6.0-rc3 consists of fixes to mmc
test and fix to load .kunit_test_suites section when CONFIG_KUNIT=m,
and not just when KUnit is built-in.
----------------------------------------------------------------
David Gow (2):
mmc: sdhci-of-aspeed: test: Fix dependencies when KUNIT=m
module: kunit: Load .kunit_test_suites section when CONFIG_KUNIT=m
drivers/mmc/host/Kconfig | 1 +
kernel/module/main.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 6.0-rc3.
This Kselftest fixes update for Linux 6.0-rc3 consists of fixes
and warnings to vm and sgx test builds.
sgx test fails to build without this change on new distros.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 568035b01cfb107af8d2e4bd2fb9aea22cf5b868:
Linux 6.0-rc1 (2022-08-14 15:50:18 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-fixes-6.0-rc3
for you to fetch changes up to bdbf0617bbc3641af158d1aeffeebb1505f76263:
selftests/vm: fix inability to build any vm tests (2022-08-19 17:57:20 -0600)
----------------------------------------------------------------
linux-kselftest-fixes-6.0-rc3
This Kselftest fixes update for Linux 6.0-rc3 consists of fixes
and warnings to vm and sgx test builds.
----------------------------------------------------------------
Axel Rasmussen (1):
selftests/vm: fix inability to build any vm tests
Kristen Carlson Accardi (1):
selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning
tools/testing/selftests/lib.mk | 1 +
tools/testing/selftests/sgx/sigstruct.c | 6 ++++++
2 files changed, 7 insertions(+)
----------------------------------------------------------------
There are some use cases where the kernel binary is desired to be the same
for both production and testing. This poses a problem for users of KUnit
as built-in tests will automatically run at startup and test modules
can still be loaded leaving the kernel in an unsafe state. There is a
"test" taint flag that gets set if a test runs but nothing to prevent
the execution.
This patch adds the kunit.enable module parameter that will need to be
set to true in addition to KUNIT being enabled for KUnit tests to run.
The default value is true giving backwards compatibility. However, for
the production+testing use case the new config option
KUNIT_ENABLE_DEFAULT_DISABLED can be enabled to default kunit.enable to
false requiring the tester to opt-in by passing kunit.enable=1 to
the kernel.
Joe Fradley (2):
kunit: add kunit.enable to enable/disable KUnit test
kunit: no longer call module_info(test, "Y") for kunit modules
.../admin-guide/kernel-parameters.txt | 6 ++++++
include/kunit/test.h | 1 -
lib/kunit/Kconfig | 8 ++++++++
lib/kunit/test.c | 20 +++++++++++++++++++
4 files changed, 34 insertions(+), 1 deletion(-)
--
2.37.1.595.g718a3a8f04-goog
Some recent commits added new test binaries, but forgot to add those to
.gitignore. Now, after one does "make -C tools/testing/selftests", one
ends up with some untracked files in the kernel tree.
Add the test binaries to .gitignore, to avoid this minor annoyance.
Fixes: d8b6171bd58a ("selftests/io_uring: test zerocopy send")
Fixes: 6342140db660 ("selftests/timens: add a test for vfork+exit")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
tools/testing/selftests/net/.gitignore | 3 ++-
tools/testing/selftests/timens/.gitignore | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 0e5751af6247..02abf8fdfd3a 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -39,4 +39,5 @@ toeplitz
tun
cmsg_sender
unix_connect
-tap
\ No newline at end of file
+tap
+io_uring_zerocopy_tx
diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
index fe1eb8271b35..cae8dca0fbff 100644
--- a/tools/testing/selftests/timens/.gitignore
+++ b/tools/testing/selftests/timens/.gitignore
@@ -8,3 +8,4 @@ procfs
timens
timer
timerfd
+vfork_exec
--
2.37.1.595.g718a3a8f04-goog
In the "mode_filter_without_nnp" test in seccomp_bpf, there is currently
a TODO which asks to check the capability CAP_SYS_ADMIN instead of euid.
This patch adds support to check if the calling process has the flag
CAP_SYS_ADMIN, and also if this flag has CAP_EFFECTIVE set.
Signed-off-by: Gautam Menghani <gautammenghani201(a)gmail.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 136df5b76319..16b0edc520ef 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -392,6 +392,8 @@ TEST(mode_filter_without_nnp)
.filter = filter,
};
long ret;
+ cap_t cap = cap_get_proc();
+ cap_flag_value_t is_cap_sys_admin = 0;
ret = prctl(PR_GET_NO_NEW_PRIVS, 0, NULL, 0, 0);
ASSERT_LE(0, ret) {
@@ -400,8 +402,8 @@ TEST(mode_filter_without_nnp)
errno = 0;
ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog, 0, 0);
/* Succeeds with CAP_SYS_ADMIN, fails without */
- /* TODO(wad) check caps not euid */
- if (geteuid()) {
+ cap_get_flag(cap, CAP_SYS_ADMIN, CAP_EFFECTIVE, &is_cap_sys_admin);
+ if (!is_cap_sys_admin) {
EXPECT_EQ(-1, ret);
EXPECT_EQ(EACCES, errno);
} else {
--
2.34.1
--
Dear,
I had sent you a mail but i don't think you received it that's why am writing
you again,it's important we discuss.
Am waiting,
Abd-Jafaari Maddah
This series fixes the reported issues, and implements the suggested
improvements, for the version of the cpumask tests [1] that was merged
with commit c41e8866c28c ("lib/test: introduce cpumask KUnit test
suite").
These changes include fixes for the tests, and better alignment with the
KUnit style guidelines.
[1] https://lore.kernel.org/all/85217b5de6d62257313ad7fde3e1969421acad75.165907…
Changes since v1:
Link: https://lore.kernel.org/lkml/cover.1660068429.git.sander@svanheule.net/
- Collect tags
- Rewrite commit message of "lib/test_cpumask: drop cpu_possible_mask
full test"
- Also CC KUnit mailing list
Sander Vanheule (5):
lib/test_cpumask: drop cpu_possible_mask full test
lib/test_cpumask: fix cpu_possible_mask last test
lib/test_cpumask: follow KUnit style guidelines
lib/cpumask_kunit: log mask contents
lib/cpumask_kunit: add tests file to MAINTAINERS
MAINTAINERS | 1 +
lib/Kconfig.debug | 7 +++++--
lib/Makefile | 2 +-
lib/{test_cpumask.c => cpumask_kunit.c} | 13 +++++++++++--
4 files changed, 18 insertions(+), 5 deletions(-)
rename lib/{test_cpumask.c => cpumask_kunit.c} (90%)
--
2.37.2
When replacing KUNIT_BINARY_LE_MSG_ASSERTION() with
KUNIT_BINARY_INT_ASSERTION() for KUNIT_EXPECT_LE_MSG(), the assert_type
parameter was changed from KUNIT_EXPECTATION to KUNIT_ASSERTION. This
causes KUNIT_EXPECT_LE_MSG() and KUNIT_ASSERT_LE_MSG() to behave the
same way, and tests after a failed KUNIT_EXPECT_LE_MSG() are not run.
Call KUNIT_BINARY_INT_ASSERTIO() with KUNIT_EXPECTATION for again match
the documented behavior for KUNIT_EXPECT_* macros.
Fixes: 40f39777ce4f ("kunit: decrease macro layering for integer asserts")
Signed-off-by: Sander Vanheule <sander(a)svanheule.net>
---
include/kunit/test.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index c958855681cc..617ec995671d 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -826,7 +826,7 @@ do { \
#define KUNIT_EXPECT_LE_MSG(test, left, right, fmt, ...) \
KUNIT_BINARY_INT_ASSERTION(test, \
- KUNIT_ASSERTION, \
+ KUNIT_EXPECTATION, \
left, <=, right, \
fmt, \
##__VA_ARGS__)
--
2.37.2
Hi Yury,
Replying back in plaintext, as you sent an HTML message.
On Sun, 2022-08-21 at 09:18 -0400, Yury Norov wrote:
>
>
> On Sun, Aug 21, 2022, 09:08 Sander Vanheule <sander(a)svanheule.net> wrote:
> > Hi Yury,
> >
> > On Sat, 2022-08-20 at 14:35 -0700, Yury Norov wrote:
> > > On Sat, Aug 20, 2022 at 05:03:09PM +0200, Sander Vanheule wrote:
> > > > When the number of CPUs that can possibly be brought online is known at
> > > > boot time, e.g. when HOTPLUG is disabled, nr_cpu_ids may be smaller than
> > > > NR_CPUS. In that case, cpu_possible_mask would not be completely filled,
> > > > and cpumask_full(cpu_possible_mask) can return false for valid system
> > > > configurations.
> > >
> > > It doesn't mean we can just give up. You can check validity of possible
> > > cpumask like this:
> > > KUNIT_EXPECT_EQ(test, nr_cpu_ids, cpumask_first_zero(&mask_all))
> > > KUNIT_EXPECT_EQ(test, NR_CPUS, cpumask_first(&mask_all))
> >
> > Did you mean cpu_possible_mask, or mask_all?
>
> cpu_possble_as of curse.
>
> > For cpu_possible_mask, these tests are in test_cpumask_first(), albeit under
> > a
> > slightly different form. Together with the tests in test_cpumask_weight()
> > and
> > test_cpumask_last(), cpu_possible_mask is already one of the more
> > constrained
> > masks.
> >
> >
> > For mask_all, the mask is filled up with nr_cpumask_bits <= NR_CPUS. I could
> > add
> > cpumask_first(), cpumask_first_zero(), and cpumask_last() tests though.
> >
> > More tests could be also added for cpu_all_mask, since this does have all
> > NR_CPUS bits set, but I think that belongs in a separate patch.
> >
> > I think the extra mask_all and cpu_all_mask test are out of scope for this
> > patch, but they could be added in another patch (for 6.1).
>
> If you think that possible mask is tested by other parts, then can you notice
> that in comments?
Sure, I'll update the commit message to note the other constraints on
cpu_possible_mask.
Best,
Sander
When we stopped using KSFT_KHDR_INSTALL, a side effect is we also
changed the value of `top_srcdir`. This can be seen by looking at the
code removed by commit 49de12ba06ef
("selftests: drop KSFT_KHDR_INSTALL make target").
(Note though that this commit didn't break this, technically the one
before it did since that's the one that stopped KSFT_KHDR_INSTALL from
being used, even though the code was still there.)
Previously lib.mk reconfigured `top_srcdir` when KSFT_KHDR_INSTALL was
being used. Now, that's no longer the case.
As a result, the path to gup_test.h in vm/Makefile was wrong, and
since it's a dependency of all of the vm binaries none of them could
be built. Instead, we'd get an "error" like:
make[1]: *** No rule to make target
'/[...]/tools/testing/selftests/vm/compaction_test', needed by
'all'. Stop.
So, modify lib.mk so it once again sets top_srcdir to the root of the
kernel tree.
Fixes: f2745dc0ba3d ("selftests: stop using KSFT_KHDR_INSTALL")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
tools/testing/selftests/lib.mk | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 947fc72413e9..d44c72b3abe3 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -40,6 +40,7 @@ ifeq (0,$(MAKELEVEL))
endif
endif
selfdir = $(realpath $(dir $(filter %/lib.mk,$(MAKEFILE_LIST))))
+top_srcdir = $(selfdir)/../../..
# The following are built by lib.mk common compile rules.
# TEST_CUSTOM_PROGS should be used by tests that require
--
2.37.1.595.g718a3a8f04-goog
There are currently 3 ip tunnels that are capable of carrying
L2 traffic: gretap, vxlan and geneve.
They all are capable to inherit the TOS/TTL for the outer
IP-header from the inner frame.
Add a test that verifies that these fields are correctly inherited.
These tests failed before the following commits:
b09ab9c92e50 ("ip6_tunnel: allow to inherit from VLAN encapsulated IP")
3f8a8447fd0b ("ip6_gre: use actual protocol to select xmit")
41337f52b967 ("ip6_gre: set DSCP for non-IP")
7ae29fd1be43 ("ip_tunnel: allow to inherit from VLAN encapsulated IP")
7074732c8fae ("ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN")
ca2bb69514a8 ("geneve: do not use RT_TOS for IPv6 flowlabel")
b4ab94d6adaa ("geneve: fix TOS inheriting for ipv4")
Signed-off-by: Matthias May <matthias.may(a)westermo.com>
---
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/l2_tos_ttl_inherit.sh | 390 ++++++++++++++++++
2 files changed, 391 insertions(+)
create mode 100755 tools/testing/selftests/net/l2_tos_ttl_inherit.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index c0ee2955fe54..11a288b67e2f 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -42,6 +42,7 @@ TEST_PROGS += arp_ndisc_evict_nocarrier.sh
TEST_PROGS += ndisc_unsolicited_na_test.sh
TEST_PROGS += arp_ndisc_untracked_subnets.sh
TEST_PROGS += stress_reuseport_listen.sh
+TEST_PROGS := l2_tos_ttl_inherit.sh
TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh
TEST_GEN_FILES = socket nettest
diff --git a/tools/testing/selftests/net/l2_tos_ttl_inherit.sh b/tools/testing/selftests/net/l2_tos_ttl_inherit.sh
new file mode 100755
index 000000000000..dca1e6f777a8
--- /dev/null
+++ b/tools/testing/selftests/net/l2_tos_ttl_inherit.sh
@@ -0,0 +1,390 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+# Author: Matthias May <matthias.may(a)westermo.com>
+#
+# This script evaluates ip tunnels that are capable of carrying L2 traffic
+# if they inherit or set the inheritable fields.
+# Namely these tunnels are: 'gretap', 'vxlan' and 'geneve'.
+# Checked inheritable fields are: TOS and TTL.
+# The outer tunnel protocol of 'IPv4' or 'IPv6' is verified-
+# As payload frames of type 'IPv4', 'IPv6' and 'other'(ARP) are verified.
+# In addition this script also checks if forcing a specific field in the
+# outer header is working.
+
+if [ "$(id -u)" != "0" ]; then
+ echo "Please run as root."
+ exit 0
+fi
+if ! which tcpdump > /dev/null 2>&1; then
+ echo "No tcpdump found. Required for this test."
+ exit 0
+fi
+
+expected_tos="0x00"
+expected_ttl="0"
+failed=false
+
+get_random_tos() {
+ # Get a random hex tos value between 0x00 and 0xfc, a multiple of 4
+ echo "0x$(tr -dc '0-9a-f' < /dev/urandom | head -c 1)\
+$(tr -dc '048c' < /dev/urandom | head -c 1)"
+}
+get_random_ttl() {
+ # Get a random dec value between 0 and 255
+ printf "%d" "0x$(tr -dc '0-9a-f' < /dev/urandom | head -c 2)"
+}
+get_field() {
+ # Expects to get the 'head -n 1' of a captured frame by tcpdump.
+ # Parses this first line and returns the specified field.
+ local field="$1"
+ local input="$2"
+ local found=false
+ input="$(echo "$input" | tr -d '(),')"
+ for input_field in $input; do
+ if $found; then
+ echo "$input_field"
+ return
+ fi
+ # The next field that we iterate over is the looked for value
+ if [ "$input_field" = "$field" ]; then
+ found=true
+ fi
+ done
+ echo "0"
+}
+setup() {
+ local type="$1"
+ local outer="$2"
+ local inner="$3"
+ local tos_ttl="$4"
+ local vlan="$5"
+ local test_tos="0x00"
+ local test_ttl="0"
+ local ns="ip netns exec testing"
+
+ # We don't want a test-tos of 0x00,
+ # because this is the value that we get when no tos is set.
+ expected_tos="$(get_random_tos)"
+ while [ "$expected_tos" = "0x00" ]; do
+ expected_tos="$(get_random_tos)"
+ done
+ if [ "$tos_ttl" = "random" ]; then
+ test_tos="$expected_tos"
+ tos="fixed $test_tos"
+ elif [ "$tos_ttl" = "inherit" ]; then
+ test_tos="$tos_ttl"
+ tos="inherit $expected_tos"
+ fi
+
+ # We don't want a test-ttl of 64 or 0,
+ # because 64 is when no ttl is set and 0 is not a valid ttl.
+ expected_ttl="$(get_random_ttl)"
+ while [ "$expected_ttl" = "64" ] || [ "$expected_ttl" = "0" ]; do
+ expected_ttl="$(get_random_ttl)"
+ done
+
+ if [ "$tos_ttl" = "random" ]; then
+ test_ttl="$expected_ttl"
+ ttl="fixed $test_ttl"
+ elif [ "$tos_ttl" = "inherit" ]; then
+ test_ttl="$tos_ttl"
+ ttl="inherit $expected_ttl"
+ fi
+ printf "│%7s │%6s │%6s │%13s │%13s │%6s │" \
+ "$type" "$outer" "$inner" "$tos" "$ttl" "$vlan"
+
+ # Create 'testing' netns, veth pair and connect main ns with testing ns
+ ip netns add testing
+ ip link add type veth
+ ip link set veth1 netns testing
+ ip link set veth0 up
+ $ns ip link set veth1 up
+ ip addr flush dev veth0
+ $ns ip addr flush dev veth1
+
+ local local_addr1=""
+ local local_addr2=""
+ if [ "$type" = "gre" ] || [ "$type" = "vxlan" ]; then
+ if [ "$outer" = "4" ]; then
+ local_addr1="local 198.18.0.1"
+ local_addr2="local 198.18.0.2"
+ elif [ "$outer" = "6" ]; then
+ local_addr1="local fdd1:ced0:5d88:3fce::1"
+ local_addr2="local fdd1:ced0:5d88:3fce::2"
+ fi
+ fi
+ local vxlan=""
+ if [ "$type" = "vxlan" ]; then
+ vxlan="vni 100 dstport 4789"
+ fi
+ local geneve=""
+ if [ "$type" = "geneve" ]; then
+ geneve="vni 100"
+ fi
+ # Create tunnel and assign outer IPv4/IPv6 addresses
+ if [ "$outer" = "4" ]; then
+ if [ "$type" = "gre" ]; then
+ type="gretap"
+ fi
+ ip addr add 198.18.0.1/24 dev veth0
+ $ns ip addr add 198.18.0.2/24 dev veth1
+ ip link add name tep0 type $type $local_addr1 remote \
+ 198.18.0.2 tos $test_tos ttl $test_ttl $vxlan $geneve
+ $ns ip link add name tep1 type $type $local_addr2 remote \
+ 198.18.0.1 tos $test_tos ttl $test_ttl $vxlan $geneve
+ elif [ "$outer" = "6" ]; then
+ if [ "$type" = "gre" ]; then
+ type="ip6gretap"
+ fi
+ ip addr add fdd1:ced0:5d88:3fce::1/64 dev veth0
+ $ns ip addr add fdd1:ced0:5d88:3fce::2/64 dev veth1
+ ip link add name tep0 type $type $local_addr1 \
+ remote fdd1:ced0:5d88:3fce::2 tos $test_tos ttl $test_ttl \
+ $vxlan $geneve
+ $ns ip link add name tep1 type $type $local_addr2 \
+ remote fdd1:ced0:5d88:3fce::1 tos $test_tos ttl $test_ttl \
+ $vxlan $geneve
+ fi
+
+ # Bring L2-tunnel link up and create VLAN on top
+ ip link set tep0 up
+ $ns ip link set tep1 up
+ ip addr flush dev tep0
+ $ns ip addr flush dev tep1
+ local parent
+ if $vlan; then
+ parent="vlan99-"
+ ip link add link tep0 name ${parent}0 type vlan id 99
+ $ns ip link add link tep1 name ${parent}1 type vlan id 99
+ ip link set ${parent}0 up
+ $ns ip link set ${parent}1 up
+ ip addr flush dev ${parent}0
+ $ns ip addr flush dev ${parent}1
+ else
+ parent="tep"
+ fi
+
+ # Assign inner IPv4/IPv6 addresses
+ if [ "$inner" = "4" ] || [ "$inner" = "other" ]; then
+ ip addr add 198.19.0.1/24 brd + dev ${parent}0
+ $ns ip addr add 198.19.0.2/24 brd + dev ${parent}1
+ elif [ "$inner" = "6" ]; then
+ ip addr add fdd4:96cf:4eae:443b::1/64 dev ${parent}0
+ $ns ip addr add fdd4:96cf:4eae:443b::2/64 dev ${parent}1
+ fi
+}
+
+verify() {
+ local outer="$1"
+ local inner="$2"
+ local tos_ttl="$3"
+ local vlan="$4"
+
+ local ping_pid out captured_tos captured_ttl result
+
+ local ping_dst
+ if [ "$inner" = "4" ]; then
+ ping_dst="198.19.0.2"
+ elif [ "$inner" = "6" ]; then
+ ping_dst="fdd4:96cf:4eae:443b::2"
+ elif [ "$inner" = "other" ]; then
+ ping_dst="198.19.0.3" # Generates ARPs which are not IPv4/IPv6
+ fi
+ if [ "$tos_ttl" = "inherit" ]; then
+ ping -i 0.1 $ping_dst -Q "$expected_tos" -t "$expected_ttl" \
+ 2>/dev/null 1>&2 & ping_pid="$!"
+ else
+ ping -i 0.1 $ping_dst 2>/dev/null 1>&2 & ping_pid="$!"
+ fi
+ local tunnel_type_offset tunnel_type_proto req_proto_offset req_offset
+ if [ "$type" = "gre" ]; then
+ tunnel_type_proto="0x2f"
+ elif [ "$type" = "vxlan" ] || [ "$type" = "geneve" ]; then
+ tunnel_type_proto="0x11"
+ fi
+ if [ "$outer" = "4" ]; then
+ tunnel_type_offset="9"
+ if [ "$inner" = "4" ]; then
+ req_proto_offset="47"
+ req_offset="58"
+ if [ "$type" = "vxlan" ] || [ "$type" = "geneve" ]; then
+ req_proto_offset="$((req_proto_offset + 12))"
+ req_offset="$((req_offset + 12))"
+ fi
+ if $vlan; then
+ req_proto_offset="$((req_proto_offset + 4))"
+ req_offset="$((req_offset + 4))"
+ fi
+ out="$(tcpdump --immediate-mode -p -c 1 -v -i veth0 -n \
+ ip[$tunnel_type_offset] = $tunnel_type_proto and \
+ ip[$req_proto_offset] = 0x01 and \
+ ip[$req_offset] = 0x08 2>/dev/null | head -n 1)"
+ elif [ "$inner" = "6" ]; then
+ req_proto_offset="44"
+ req_offset="78"
+ if [ "$type" = "vxlan" ] || [ "$type" = "geneve" ]; then
+ req_proto_offset="$((req_proto_offset + 12))"
+ req_offset="$((req_offset + 12))"
+ fi
+ if $vlan; then
+ req_proto_offset="$((req_proto_offset + 4))"
+ req_offset="$((req_offset + 4))"
+ fi
+ out="$(tcpdump --immediate-mode -p -c 1 -v -i veth0 -n \
+ ip[$tunnel_type_offset] = $tunnel_type_proto and \
+ ip[$req_proto_offset] = 0x3a and \
+ ip[$req_offset] = 0x80 2>/dev/null | head -n 1)"
+ elif [ "$inner" = "other" ]; then
+ req_proto_offset="36"
+ req_offset="45"
+ if [ "$type" = "vxlan" ] || [ "$type" = "geneve" ]; then
+ req_proto_offset="$((req_proto_offset + 12))"
+ req_offset="$((req_offset + 12))"
+ fi
+ if $vlan; then
+ req_proto_offset="$((req_proto_offset + 4))"
+ req_offset="$((req_offset + 4))"
+ fi
+ if [ "$tos_ttl" = "inherit" ]; then
+ expected_tos="0x00"
+ expected_ttl="64"
+ fi
+ out="$(tcpdump --immediate-mode -p -c 1 -v -i veth0 -n \
+ ip[$tunnel_type_offset] = $tunnel_type_proto and \
+ ip[$req_proto_offset] = 0x08 and \
+ ip[$((req_proto_offset + 1))] = 0x06 and \
+ ip[$req_offset] = 0x01 2>/dev/null | head -n 1)"
+ fi
+ elif [ "$outer" = "6" ]; then
+ if [ "$type" = "gre" ]; then
+ tunnel_type_offset="40"
+ elif [ "$type" = "vxlan" ] || [ "$type" = "geneve" ]; then
+ tunnel_type_offset="6"
+ fi
+ if [ "$inner" = "4" ]; then
+ local req_proto_offset="75"
+ local req_offset="86"
+ if [ "$type" = "vxlan" ] || [ "$type" = "geneve" ]; then
+ req_proto_offset="$((req_proto_offset + 4))"
+ req_offset="$((req_offset + 4))"
+ fi
+ if $vlan; then
+ req_proto_offset="$((req_proto_offset + 4))"
+ req_offset="$((req_offset + 4))"
+ fi
+ out="$(tcpdump --immediate-mode -p -c 1 -v -i veth0 -n \
+ ip6[$tunnel_type_offset] = $tunnel_type_proto and \
+ ip6[$req_proto_offset] = 0x01 and \
+ ip6[$req_offset] = 0x08 2>/dev/null | head -n 1)"
+ elif [ "$inner" = "6" ]; then
+ local req_proto_offset="72"
+ local req_offset="106"
+ if [ "$type" = "vxlan" ] || [ "$type" = "geneve" ]; then
+ req_proto_offset="$((req_proto_offset + 4))"
+ req_offset="$((req_offset + 4))"
+ fi
+ if $vlan; then
+ req_proto_offset="$((req_proto_offset + 4))"
+ req_offset="$((req_offset + 4))"
+ fi
+ out="$(tcpdump --immediate-mode -p -c 1 -v -i veth0 -n \
+ ip6[$tunnel_type_offset] = $tunnel_type_proto and \
+ ip6[$req_proto_offset] = 0x3a and \
+ ip6[$req_offset] = 0x80 2>/dev/null | head -n 1)"
+ elif [ "$inner" = "other" ]; then
+ local req_proto_offset="64"
+ local req_offset="73"
+ if [ "$type" = "vxlan" ] || [ "$type" = "geneve" ]; then
+ req_proto_offset="$((req_proto_offset + 4))"
+ req_offset="$((req_offset + 4))"
+ fi
+ if $vlan; then
+ req_proto_offset="$((req_proto_offset + 4))"
+ req_offset="$((req_offset + 4))"
+ fi
+ if [ "$tos_ttl" = "inherit" ]; then
+ expected_tos="0x00"
+ expected_ttl="64"
+ fi
+ out="$(tcpdump --immediate-mode -p -c 1 -v -i veth0 -n \
+ ip6[$tunnel_type_offset] = $tunnel_type_proto and \
+ ip6[$req_proto_offset] = 0x08 and \
+ ip6[$((req_proto_offset + 1))] = 0x06 and \
+ ip6[$req_offset] = 0x01 2>/dev/null | head -n 1)"
+ fi
+ fi
+ kill -9 $ping_pid
+ wait $ping_pid 2>/dev/null
+ result="FAIL"
+ if [ "$outer" = "4" ]; then
+ captured_ttl="$(get_field "ttl" "$out")"
+ captured_tos="$(printf "0x%02x" "$(get_field "tos" "$out")")"
+ if [ "$captured_tos" = "$expected_tos" ] &&
+ [ "$captured_ttl" = "$expected_ttl" ]; then
+ result="OK"
+ fi
+ elif [ "$outer" = "6" ]; then
+ captured_ttl="$(get_field "hlim" "$out")"
+ captured_tos="$(printf "0x%02x" "$(get_field "class" "$out")")"
+ if [ "$captured_tos" = "$expected_tos" ] &&
+ [ "$captured_ttl" = "$expected_ttl" ]; then
+ result="OK"
+ fi
+ fi
+
+ printf "%7s │\n" "$result"
+ if [ "$result" = "FAIL" ]; then
+ failed=true
+ if [ "$captured_tos" != "$expected_tos" ]; then
+ printf "│%43s%27s │\n" \
+ "Expected TOS value: $expected_tos" \
+ "Captured TOS value: $captured_tos"
+ fi
+ if [ "$captured_ttl" != "$expected_ttl" ]; then
+ printf "│%43s%27s │\n" \
+ "Expected TTL value: $expected_ttl" \
+ "Captured TTL value: $captured_ttl"
+ fi
+ printf "│%71s│\n" " "
+ fi
+}
+
+cleanup() {
+ ip link del veth0 2>/dev/null
+ ip netns del testing 2>/dev/null
+ ip link del tep0 2>/dev/null
+}
+
+printf "┌────────┬───────┬───────┬──────────────┬"
+printf "──────────────┬───────┬────────┐\n"
+for type in gre vxlan geneve; do
+ if ! $(modprobe "$type" 2>/dev/null); then
+ continue
+ fi
+ for outer in 4 6; do
+ printf "├────────┼───────┼───────┼──────────────┼"
+ printf "──────────────┼───────┼────────┤\n"
+ printf "│ Type │ outer | inner │ tos │"
+ printf " ttl │ vlan │ result │\n"
+ for inner in 4 6 other; do
+ printf "├────────┼───────┼───────┼──────────────┼"
+ printf "──────────────┼───────┼────────┤\n"
+ for tos_ttl in inherit random; do
+ for vlan in false true; do
+ setup "$type" "$outer" "$inner" \
+ "$tos_ttl" "$vlan"
+ verify "$outer" "$inner" "$tos_ttl" \
+ "$vlan"
+ cleanup
+ done
+ done
+ done
+ done
+done
+printf "└────────┴───────┴───────┴──────────────┴"
+printf "──────────────┴───────┴────────┘\n"
+
+if $failed; then
+ exit 1
+fi
--
2.35.1
This series is based on torvalds/master.
The series is split up like so:
- Patch 1 is a simple fixup which we should take in any case (even by itself).
- Patches 2-5 add the feature, configurable selftest support, and docs.
Why not ...?
============
- Why not /proc/[pid]/userfaultfd? Two main points (additional discussion [1]):
- /proc/[pid]/* files are all owned by the user/group of the process, and
they don't really support chmod/chown. So, without extending procfs it
doesn't solve the problem this series is trying to solve.
- The main argument *for* this was to support creating UFFDs for remote
processes. But, that use case clearly calls for CAP_SYS_PTRACE, so to
support this we could just use the UFFD syscall as-is.
- Why not use a syscall? Access to syscalls is generally controlled by
capabilities. We don't have a capability which is used for userfaultfd access
without also granting more / other permissions as well, and adding a new
capability was rejected [2].
- It's possible a LSM could be used to control access instead, but I have
some concerns. I don't think this approach would be as easy to use,
particularly if we were to try to solve this with something heavyweight
like SELinux. Maybe we could pursue adding a new LSM specifically for
this user case, but it may be too narrow of a case to justify that.
Changelog
=========
v6->v7:
- Handle misc_register() failure properly by propagating the error instead if
just WARN_ON-ing. [Greg]
- Remove no-op open function from file_operations, since its caller detects
the lack of an open implementation and proceeds normally anyway. [Greg]
v5->v6:
- Modified selftest to exit with KSFT_SKIP *only* when features are
unsupported, exiting with 1 in other error cases. [Mike]
- Improved wording in two spots in the documentation. [Mike]
- Picked up some Acked-by's.
v4->v5:
- Call userfaultfd_syscall_allowed() directly in the syscall, so we don't
have to plumb a flag into new_userfaultfd(). [Nadav]
- Refactored run_vmtests.sh to loop over UFFD test mods. [Nadav]
- Reworded cover letter.
- Picked up some Acked-by's.
v3->v4:
- Picked up an Acked-by on 5/5.
- Updated cover letter to cover "why not ...".
- Refactored userfaultfd_allowed() into userfaultfd_syscall_allowed(). [Peter]
- Removed obsolete comment from a previous version. [Peter]
- Refactored userfaultfd_open() in selftest. [Peter]
- Reworded admin-guide documentation. [Mike, Peter]
- Squashed 2 commits adding /dev/userfaultfd to selftest and making selftest
configurable. [Peter]
- Added "syscall" test modifier (the default behavior) to selftest. [Peter]
v2->v3:
- Rebased onto linux-next/akpm-base, in order to be based on top of the
run_vmtests.sh refactor which was merged previously.
- Picked up some Reviewed-by's.
- Fixed ioctl definition (_IO instead of _IOWR), and stopped using
compat_ptr_ioctl since it is unneeded for ioctls which don't take a pointer.
- Removed the "handle_kernel_faults" bool, simplifying the code. The result is
logically equivalent, but simpler.
- Fixed userfaultfd selftest so it returns KSFT_SKIP appropriately.
- Reworded documentation per Shuah's feedback on v2.
- Improved example usage for userfaultfd selftest.
v1->v2:
- Add documentation update.
- Test *both* userfaultfd(2) and /dev/userfaultfd via the selftest.
[1]: https://patchwork.kernel.org/project/linux-mm/cover/20220719195628.3415852-…
[2]: https://lore.kernel.org/lkml/686276b9-4530-2045-6bd8-170e5943abe4@schaufler…
Axel Rasmussen (5):
selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh
userfaultfd: add /dev/userfaultfd for fine grained access control
userfaultfd: selftests: modify selftest to use /dev/userfaultfd
userfaultfd: update documentation to describe /dev/userfaultfd
selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh
Documentation/admin-guide/mm/userfaultfd.rst | 41 ++++++++++-
Documentation/admin-guide/sysctl/vm.rst | 3 +
fs/userfaultfd.c | 71 +++++++++++++-----
include/uapi/linux/userfaultfd.h | 4 ++
tools/testing/selftests/vm/run_vmtests.sh | 15 ++--
tools/testing/selftests/vm/userfaultfd.c | 76 +++++++++++++++++---
6 files changed, 176 insertions(+), 34 deletions(-)
--
2.37.1.595.g718a3a8f04-goog
This series is based on torvalds/master.
The series is split up like so:
- Patch 1 is a simple fixup which we should take in any case (even by itself).
- Patches 2-5 add the feature, configurable selftest support, and docs.
Why not ...?
============
- Why not /proc/[pid]/userfaultfd? Two main points (additional discussion [1]):
- /proc/[pid]/* files are all owned by the user/group of the process, and
they don't really support chmod/chown. So, without extending procfs it
doesn't solve the problem this series is trying to solve.
- The main argument *for* this was to support creating UFFDs for remote
processes. But, that use case clearly calls for CAP_SYS_PTRACE, so to
support this we could just use the UFFD syscall as-is.
- Why not use a syscall? Access to syscalls is generally controlled by
capabilities. We don't have a capability which is used for userfaultfd access
without also granting more / other permissions as well, and adding a new
capability was rejected [2].
- It's possible a LSM could be used to control access instead, but I have
some concerns. I don't think this approach would be as easy to use,
particularly if we were to try to solve this with something heavyweight
like SELinux. Maybe we could pursue adding a new LSM specifically for
this user case, but it may be too narrow of a case to justify that.
Changelog
=========
v5->v6:
- Modified selftest to exit with KSFT_SKIP *only* when features are
unsupported, exiting with 1 in other error cases. [Mike]
- Improved wording in two spots in the documentation. [Mike]
- Picked up some Acked-by's.
v4->v5:
- Call userfaultfd_syscall_allowed() directly in the syscall, so we don't
have to plumb a flag into new_userfaultfd(). [Nadav]
- Refactored run_vmtests.sh to loop over UFFD test mods. [Nadav]
- Reworded cover letter.
- Picked up some Acked-by's.
v3->v4:
- Picked up an Acked-by on 5/5.
- Updated cover letter to cover "why not ...".
- Refactored userfaultfd_allowed() into userfaultfd_syscall_allowed(). [Peter]
- Removed obsolete comment from a previous version. [Peter]
- Refactored userfaultfd_open() in selftest. [Peter]
- Reworded admin-guide documentation. [Mike, Peter]
- Squashed 2 commits adding /dev/userfaultfd to selftest and making selftest
configurable. [Peter]
- Added "syscall" test modifier (the default behavior) to selftest. [Peter]
v2->v3:
- Rebased onto linux-next/akpm-base, in order to be based on top of the
run_vmtests.sh refactor which was merged previously.
- Picked up some Reviewed-by's.
- Fixed ioctl definition (_IO instead of _IOWR), and stopped using
compat_ptr_ioctl since it is unneeded for ioctls which don't take a pointer.
- Removed the "handle_kernel_faults" bool, simplifying the code. The result is
logically equivalent, but simpler.
- Fixed userfaultfd selftest so it returns KSFT_SKIP appropriately.
- Reworded documentation per Shuah's feedback on v2.
- Improved example usage for userfaultfd selftest.
v1->v2:
- Add documentation update.
- Test *both* userfaultfd(2) and /dev/userfaultfd via the selftest.
[1]: https://patchwork.kernel.org/project/linux-mm/cover/20220719195628.3415852-…
[2]: https://lore.kernel.org/lkml/686276b9-4530-2045-6bd8-170e5943abe4@schaufler…
Axel Rasmussen (5):
selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh
userfaultfd: add /dev/userfaultfd for fine grained access control
userfaultfd: selftests: modify selftest to use /dev/userfaultfd
userfaultfd: update documentation to describe /dev/userfaultfd
selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh
Documentation/admin-guide/mm/userfaultfd.rst | 41 ++++++++++-
Documentation/admin-guide/sysctl/vm.rst | 3 +
fs/userfaultfd.c | 73 ++++++++++++++-----
include/uapi/linux/userfaultfd.h | 4 ++
tools/testing/selftests/vm/run_vmtests.sh | 15 ++--
tools/testing/selftests/vm/userfaultfd.c | 76 +++++++++++++++++---
6 files changed, 178 insertions(+), 34 deletions(-)
--
2.37.1.595.g718a3a8f04-goog
Fix the comment to accurately describe the test and recently added
SYSTEM_SUSPEND test case.
What was once psci_cpu_on_test was renamed and extended to squeeze in a
test case for PSCI SYSTEM_SUSPEND. Nonetheless, the author of those
changes (whoever they may be...) failed to update the file comment to
reflect what had changed.
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
---
tools/testing/selftests/kvm/aarch64/psci_test.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index f7621f6e938e..8a77bd06427c 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -1,12 +1,14 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * psci_cpu_on_test - Test that the observable state of a vCPU targeted by the
- * CPU_ON PSCI call matches what the caller requested.
+ * psci_test - Tests relating to KVM's PSCI implementation.
*
* Copyright (c) 2021 Google LLC.
*
- * This is a regression test for a race between KVM servicing the PSCI call and
- * userspace reading the vCPUs registers.
+ * This test includes:
+ * - A regression test for a race between KVM servicing the PSCI CPU_ON call
+ * and userspace reading the targeted vCPU's registers.
+ * - A test for KVM's handling of PSCI SYSTEM_SUSPEND and the associated
+ * KVM_EXIT_SYSTEM_SUSPEND UAPI.
*/
#define _GNU_SOURCE
base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
--
2.37.1.595.g718a3a8f04-goog
This creates a test collection in drivers/net/bonding for bonding
specific kernel selftests.
The first test is a reproducer that provisions a bond and given the
specific order in how the ip-link(8) commands are issued the bond never
transmits an LACPDU frame on any of its slaves.
Signed-off-by: Jonathan Toppins <jtoppins(a)redhat.com>
---
Notes:
v2:
* fully integrated the test into the kselftests infrastructure
* moved the reproducer to under
tools/testing/selftests/drivers/net/bonding
* reduced the test to its minimial amount and used ip-link(8) for
all bond interface configuration
v3:
* rebase to latest net/master
* remove `#set -x` requested by Hangbin
v4:
* no changes
v5:
* no changes
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
.../selftests/drivers/net/bonding/Makefile | 6 ++
.../net/bonding/bond-break-lacpdu-tx.sh | 81 +++++++++++++++++++
.../selftests/drivers/net/bonding/config | 1 +
.../selftests/drivers/net/bonding/settings | 1 +
6 files changed, 91 insertions(+)
create mode 100644 tools/testing/selftests/drivers/net/bonding/Makefile
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
create mode 100644 tools/testing/selftests/drivers/net/bonding/config
create mode 100644 tools/testing/selftests/drivers/net/bonding/settings
diff --git a/MAINTAINERS b/MAINTAINERS
index f2d64020399b..e5fb14dc302d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3672,6 +3672,7 @@ F: Documentation/networking/bonding.rst
F: drivers/net/bonding/
F: include/net/bond*
F: include/uapi/linux/if_bonding.h
+F: tools/testing/selftests/net/bonding/
BOSCH SENSORTEC BMA400 ACCELEROMETER IIO DRIVER
M: Dan Robertson <dan(a)dlrobertson.com>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 10b34bb03bc1..c2064a35688b 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -12,6 +12,7 @@ TARGETS += cpu-hotplug
TARGETS += damon
TARGETS += drivers/dma-buf
TARGETS += drivers/s390x/uvdevice
+TARGETS += drivers/net/bonding
TARGETS += efivarfs
TARGETS += exec
TARGETS += filesystems
diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile
new file mode 100644
index 000000000000..ab6c54b12098
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for net selftests
+
+TEST_PROGS := bond-break-lacpdu-tx.sh
+
+include ../../../lib.mk
diff --git a/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
new file mode 100755
index 000000000000..47ab90596acb
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+# Regression Test:
+# Verify LACPDUs get transmitted after setting the MAC address of
+# the bond.
+#
+# https://bugzilla.redhat.com/show_bug.cgi?id=2020773
+#
+# +---------+
+# | fab-br0 |
+# +---------+
+# |
+# +---------+
+# | fbond |
+# +---------+
+# | |
+# +------+ +------+
+# |veth1 | |veth2 |
+# +------+ +------+
+#
+# We use veths instead of physical interfaces
+
+set -e
+tmp=$(mktemp -q dump.XXXXXX)
+cleanup() {
+ ip link del fab-br0 >/dev/null 2>&1 || :
+ ip link del fbond >/dev/null 2>&1 || :
+ ip link del veth1-bond >/dev/null 2>&1 || :
+ ip link del veth2-bond >/dev/null 2>&1 || :
+ modprobe -r bonding >/dev/null 2>&1 || :
+ rm -f -- ${tmp}
+}
+
+trap cleanup 0 1 2
+cleanup
+sleep 1
+
+# create the bridge
+ip link add fab-br0 address 52:54:00:3B:7C:A6 mtu 1500 type bridge \
+ forward_delay 15
+
+# create the bond
+ip link add fbond type bond mode 4 miimon 200 xmit_hash_policy 1 \
+ ad_actor_sys_prio 65535 lacp_rate fast
+
+# set bond address
+ip link set fbond address 52:54:00:3B:7C:A6
+ip link set fbond up
+
+# set again bond sysfs parameters
+ip link set fbond type bond ad_actor_sys_prio 65535
+
+# create veths
+ip link add name veth1-bond type veth peer name veth1-end
+ip link add name veth2-bond type veth peer name veth2-end
+
+# add ports
+ip link set fbond master fab-br0
+ip link set veth1-bond down master fbond
+ip link set veth2-bond down master fbond
+
+# bring up
+ip link set veth1-end up
+ip link set veth2-end up
+ip link set fab-br0 up
+ip link set fbond up
+ip addr add dev fab-br0 10.0.0.3
+
+tcpdump -n -i veth1-end -e ether proto 0x8809 >${tmp} 2>&1 &
+sleep 15
+pkill tcpdump >/dev/null 2>&1
+rc=0
+num=$(grep "packets captured" ${tmp} | awk '{print $1}')
+if test "$num" -gt 0; then
+ echo "PASS, captured ${num}"
+else
+ echo "FAIL"
+ rc=1
+fi
+exit $rc
diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config
new file mode 100644
index 000000000000..dc1c22de3c92
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/config
@@ -0,0 +1 @@
+CONFIG_BONDING=y
diff --git a/tools/testing/selftests/drivers/net/bonding/settings b/tools/testing/selftests/drivers/net/bonding/settings
new file mode 100644
index 000000000000..867e118223cd
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/bonding/settings
@@ -0,0 +1 @@
+timeout=60
--
2.31.1