Currently the test setup does not support running nolibc-test built with
LLVM in qemu-system. Enable this.
FYI, sparc32 on LLVM seems to be broken at the moment. To me this looks
like a LLVM regression, emitting invalid object code.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
selftests/nolibc: deduplicate invocations of toplevel Makefile
selftests/nolibc: don't pass CC to toplevel Makefile
selftests/nolibc: always compile the kernel with GCC
tools/testing/selftests/nolibc/Makefile.nolibc | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
---
base-commit: b9e50363178a40c76bebaf2f00faa2b0b6baf8d1
change-id: 20250719-nolibc-llvm-system-311762b62829
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
This patch series refactors all futex selftests to use
kselftest_harness.h instead of futex's logging.h, as discussed here [1].
This allows to remove a lot of boilerplate code and to simplify some
parts of the test logic, mainly when the test needs to exit early. The
result of this is more than 500 lines removed from
tools/testing/selftests/futex/. Also, this enables new tests to use
kselftest.h features like ASSERT_s and such.
There are some caveats around this refactor:
- logging.h had verbosity levels, while kselftest_harness.h doesn't. I
created a new print function called ksft_print_dbg_msg() that prints
the message if the user uses the -d flag, so now there's an
equivalent of this feature.
- futex_requeue_pi test accepted command line arguments to be used as
test parameters (e.g. ./futex_requeue_pi -b -l -t 500000). This
doesn't work with kselftest_harness.h because there's no
straightforward way to send command line arguments to the test.
I used FIXTURE_VARIANT() to achieve the same result, but now the
parameters live inside of the test file, instead of on
functional/run.sh. This increased a little bit the number of test
cases for futex_requeue_pi, from 22 to 24.
- test_harness_run() calls mmap(MAP_SHARED) before running the test and
this has caused a side effect on test futex_numa_mpol.c. This test
also calls mmap() and then try to access an address out of
boundaries of this mapped memory for a "Memory out of range" subtest,
where the kernel should return -EACCESS. After the refactor, the test
address might be fall inside the first memory mapped region, thus
being a valid address and succeeding the syscall, making the test
fail. To fix that, I created a small "buffer zone" with
mmap(PROT_NONE) between both mmaps.
I have compared the results of run.sh before and after this patchset and
didn't find any regression from the test results.
Thanks,
André
[1] https://lore.kernel.org/lkml/87ecv6p364.ffs@tglx/
---
Changes in v2:
- Rebased on top of tip/master
- Dropped priv_hash global test variant now that this feature was
dropped
- Added include <stdbool.h> in the first patch
- Link to v1: https://lore.kernel.org/r/20250704-tonyk-robust_test_cleanup-v1-0-c0ff4f24c…
---
André Almeida (15):
selftests: kselftest: Create ksft_print_dbg_msg()
selftests/futex: Refactor futex_requeue_pi with kselftest_harness.h
selftests/futex: Refactor futex_requeue_pi_mismatched_ops with kselftest_harness.h
selftests/futex: Refactor futex_requeue_pi_signal_restart with kselftest_harness.h
selftests/futex: Refactor futex_wait_timeout with kselftest_harness.h
selftests/futex: Refactor futex_wait_wouldblock with kselftest_harness.h
selftests/futex: Refactor futex_wait_unitialized_heap with kselftest_harness.h
selftests/futex: Refactor futex_wait_private_mapped_file with kselftest_harness.h
selftests/futex: Refactor futex_wait with kselftest_harness.h
selftests/futex: Refactor futex_requeue with kselftest_harness.h
selftests/futex: Refactor futex_waitv with kselftest_harness.h
selftests/futex: Refactor futex_priv_hash with kselftest_harness.h
selftests/futex: Refactor futex_numa_mpol with kselftest_harness.h
selftests/futex: Drop logging.h include from futex_numa
selftests/futex: Remove logging.h file
tools/testing/selftests/futex/functional/Makefile | 3 +-
.../selftests/futex/functional/futex_numa.c | 3 +-
.../selftests/futex/functional/futex_numa_mpol.c | 57 ++---
.../selftests/futex/functional/futex_priv_hash.c | 49 +---
.../selftests/futex/functional/futex_requeue.c | 76 ++----
.../selftests/futex/functional/futex_requeue_pi.c | 261 ++++++++++-----------
.../functional/futex_requeue_pi_mismatched_ops.c | 80 ++-----
.../functional/futex_requeue_pi_signal_restart.c | 129 +++-------
.../selftests/futex/functional/futex_wait.c | 103 +++-----
.../functional/futex_wait_private_mapped_file.c | 83 ++-----
.../futex/functional/futex_wait_timeout.c | 139 +++++------
.../functional/futex_wait_uninitialized_heap.c | 76 ++----
.../futex/functional/futex_wait_wouldblock.c | 75 ++----
.../selftests/futex/functional/futex_waitv.c | 98 ++++----
tools/testing/selftests/futex/functional/run.sh | 62 +----
tools/testing/selftests/futex/include/logging.h | 148 ------------
tools/testing/selftests/kselftest.h | 14 ++
tools/testing/selftests/kselftest_harness.h | 13 +-
18 files changed, 465 insertions(+), 1004 deletions(-)
---
base-commit: ed0272f0675f31642c3d445a596b544de9db405b
change-id: 20250703-tonyk-robust_test_cleanup-d1f3406365d9
Best regards,
--
André Almeida <andrealmeid(a)igalia.com>
This patch series introduces LANDLOCK_SCOPE_MEMFD_EXEC, a new Landlock
scoping mechanism that restricts execution of anonymous memory file
descriptors (memfd) created via memfd_create(2). This addresses security
gaps where processes can bypass W^X policies and execute arbitrary code
through anonymous memory objects.
Fixes: https://github.com/landlock-lsm/linux/issues/37
SECURITY PROBLEM
================
Current Landlock filesystem restrictions do not cover memfd objects,
allowing processes to:
1. Read-to-execute bypass: Create writable memfd, inject code,
then execute via mmap(PROT_EXEC) or direct execve()
2. Anonymous execution: Execute code without touching the filesystem via
execve("/proc/self/fd/N") where N is a memfd descriptor
3. Cross-domain access violations: Pass memfd between processes to
bypass domain restrictions
These scenarios can occur in sandboxed environments where filesystem
access is restricted but memfd creation remains possible.
IMPLEMENTATION
==============
The implementation adds hierarchical execution control through domain
scoping:
Core Components:
- is_memfd_file(): Reliable memfd detection via "memfd:" dentry prefix
- domain_is_scoped(): Cross-domain hierarchy checking (moved to domain.c)
- LSM hooks: mmap_file, file_mprotect, bprm_creds_for_exec
- Creation-time restrictions: hook_file_alloc_security
Security Matrix:
Execution decisions follow domain hierarchy rules preventing both
same-domain bypass attempts and cross-domain access violations while
preserving legitimate hierarchical access patterns.
Domain Hierarchy with LANDLOCK_SCOPE_MEMFD_EXEC:
===============================================
Root (no domain) - No restrictions
|
+-- Domain A [SCOPE_MEMFD_EXEC] Layer 1
| +-- memfd_A (tagged with Domain A as creator)
| |
| +-- Domain A1 (child) [NO SCOPE] Layer 2
| | +-- Inherits Layer 1 restrictions from parent
| | +-- memfd_A1 (can create, inherits restrictions)
| | +-- Domain A1a [SCOPE_MEMFD_EXEC] Layer 3
| | +-- memfd_A1a (tagged with Domain A1a)
| |
| +-- Domain A2 (child) [SCOPE_MEMFD_EXEC] Layer 2
| +-- memfd_A2 (tagged with Domain A2 as creator)
| +-- CANNOT access memfd_A1 (different subtree)
|
+-- Domain B [SCOPE_MEMFD_EXEC] Layer 1
+-- memfd_B (tagged with Domain B as creator)
+-- CANNOT access ANY memfd from Domain A subtree
Execution Decision Matrix:
========================
Executor-> | A | A1 | A1a | A2 | B | Root
Creator | | | | | |
------------|-----|----|-----|----|----|-----
Domain A | X | X | X | X | X | Y
Domain A1 | Y | X | X | X | X | Y
Domain A1a | Y | Y | X | X | X | Y
Domain A2 | Y | X | X | X | X | Y
Domain B | X | X | X | X | X | Y
Root | Y | Y | Y | Y | Y | Y
Legend: Y = Execution allowed, X = Execution denied
Scenarios Covered:
- Direct mmap(PROT_EXEC) on memfd files
- Two-stage mmap(PROT_READ) + mprotect(PROT_EXEC) bypass attempts
- execve("/proc/self/fd/N") anonymous execution
- execveat() and fexecve() file descriptor execution
- Cross-process memfd inheritance and IPC passing
TESTING
=======
All patches have been validated with:
- scripts/checkpatch.pl --strict (clean)
- Selftests covering same-domain restrictions, cross-domain
hierarchy enforcement, and regular file isolation
- KUnit tests for memfd detection edge cases
DISCLAIMER
==========
My understanding of Landlock scoping semantics may be limited, but this
implementation reflects my current understanding based on available
documentation and code analysis. I welcome feedback and corrections
regarding the scoping logic and domain hierarchy enforcement.
Signed-off-by: Abhinav Saxena <xandfury(a)gmail.com>
---
Abhinav Saxena (4):
landlock: add LANDLOCK_SCOPE_MEMFD_EXEC scope
landlock: implement memfd detection
landlock: add memfd exec LSM hooks and scoping
selftests/landlock: add memfd execution tests
include/uapi/linux/landlock.h | 5 +
security/landlock/.kunitconfig | 1 +
security/landlock/audit.c | 4 +
security/landlock/audit.h | 1 +
security/landlock/cred.c | 14 -
security/landlock/domain.c | 67 ++++
security/landlock/domain.h | 4 +
security/landlock/fs.c | 405 ++++++++++++++++++++-
security/landlock/limits.h | 2 +-
security/landlock/task.c | 67 ----
.../selftests/landlock/scoped_memfd_exec_test.c | 325 +++++++++++++++++
11 files changed, 812 insertions(+), 83 deletions(-)
---
base-commit: 5b74b2eff1eeefe43584e5b7b348c8cd3b723d38
change-id: 20250716-memfd-exec-ac0d582018c3
Best regards,
--
Abhinav Saxena <xandfury(a)gmail.com>
Reading /proc/pid/maps requires read-locking mmap_lock which prevents any
other task from concurrently modifying the address space. This guarantees
coherent reporting of virtual address ranges, however it can block
important updates from happening. Oftentimes /proc/pid/maps readers are
low priority monitoring tasks and them blocking high priority tasks
results in priority inversion.
Locking the entire address space is required to present fully coherent
picture of the address space, however even current implementation does not
strictly guarantee that by outputting vmas in page-size chunks and
dropping mmap_lock in between each chunk. Address space modifications are
possible while mmap_lock is dropped and userspace reading the content is
expected to deal with possible concurrent address space modifications.
Considering these relaxed rules, holding mmap_lock is not strictly needed
as long as we can guarantee that a concurrently modified vma is reported
either in its original form or after it was modified.
This patchset switches from holding mmap_lock while reading /proc/pid/maps
to taking per-vma locks as we walk the vma tree. This reduces the
contention with tasks modifying the address space because they would have
to contend for the same vma as opposed to the entire address space.
Previous version of this patchset [1] tried to perform /proc/pid/maps
reading under RCU, however its implementation is quite complex and the
results are worse than the new version because it still relied on
mmap_lock speculation which retries if any part of the address space gets
modified. New implementaion is both simpler and results in less
contention. Note that similar approach would not work for /proc/pid/smaps
reading as it also walks the page table and that's not RCU-safe.
Paul McKenney's designed a test [2] to measure mmap/munmap latencies while
concurrently reading /proc/pid/maps. The test has a pair of processes
scanning /proc/PID/maps, and another process unmapping and remapping 4K
pages from a 128MB range of anonymous memory. At the end of each 10
second run, the latency of each mmap() or munmap() operation is measured,
and for each run the maximum and mean latency is printed. The map/unmap
process is started first, its PID is passed to the scanners, and then the
map/unmap process waits until both scanners are running before starting
its timed test. The scanners keep scanning until the specified
/proc/PID/maps file disappears.
The latest results from Paul:
Stock mm-unstable, all of the runs had maximum latencies in excess of
0.5 milliseconds, and with 80% of the runs' latencies exceeding a full
millisecond, and ranging up beyond 4 full milliseconds. In contrast,
99% of the runs with this patch series applied had maximum latencies
of less than 0.5 milliseconds, with the single outlier at only 0.608
milliseconds.
From a median-performance (as opposed to maximum-latency) viewpoint,
this patch series also looks good, with stock mm weighing in at 11
microseconds and patch series at 6 microseconds, better than a 2x
improvement.
Before the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
0.011 0.008 0.521
0.011 0.008 0.552
0.011 0.008 0.590
0.011 0.008 0.660
...
0.011 0.015 2.987
0.011 0.015 3.038
0.011 0.016 3.431
0.011 0.016 4.707
After the change:
./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2
0.006 0.005 0.026
0.006 0.005 0.029
0.006 0.005 0.034
0.006 0.005 0.035
...
0.006 0.006 0.421
0.006 0.006 0.423
0.006 0.006 0.439
0.006 0.006 0.608
The patchset also adds a number of tests to check for /proc/pid/maps data
coherency. They are designed to detect any unexpected data tearing while
performing some common address space modifications (vma split, resize and
remap). Even before these changes, reading /proc/pid/maps might have
inconsistent data because the file is read page-by-page with mmap_lock
being dropped between the pages. An example of user-visible inconsistency
can be that the same vma is printed twice: once before it was modified and
then after the modifications. For example if vma was extended, it might be
found and reported twice. What is not expected is to see a gap where there
should have been a vma both before and after modification. This patchset
increases the chances of such tearing, therefore it's even more important
now to test for unexpected inconsistencies.
In [3] Lorenzo identified the following possible vma merging/splitting
scenarios:
Merges with changes to existing vmas:
1 Merge both - mapping a vma over another one and between two vmas which
can be merged after this replacement;
2. Merge left full - mapping a vma at the end of an existing one and
completely over its right neighbor;
3. Merge left partial - mapping a vma at the end of an existing one and
partially over its right neighbor;
4. Merge right full - mapping a vma before the start of an existing one
and completely over its left neighbor;
5. Merge right partial - mapping a vma before the start of an existing one
and partially over its left neighbor;
Merges without changes to existing vmas:
6. Merge both - mapping a vma into a gap between two vmas which can be
merged after the insertion;
7. Merge left - mapping a vma at the end of an existing one;
8. Merge right - mapping a vma before the start end of an existing one;
Splits
9. Split with new vma at the lower address;
10. Split with new vma at the higher address;
If such merges or splits happen concurrently with the /proc/maps reading
we might report a vma twice, once before the modification and once after
it is modified:
Case 1 might report overwritten and previous vma along with the final
merged vma;
Case 2 might report previous and the final merged vma;
Case 3 might cause us to retry once we detect the temporary gap caused by
shrinking of the right neighbor;
Case 4 might report overritten and the final merged vma;
Case 5 might cause us to retry once we detect the temporary gap caused by
shrinking of the left neighbor;
Case 6 might report previous vma and the gap along with the final marged
vma;
Case 7 might report previous and the final merged vma;
Case 8 might report the original gap and the final merged vma covering the
gap;
Case 9 might cause us to retry once we detect the temporary gap caused by
shrinking of the original vma at the vma start;
Case 10 might cause us to retry once we detect the temporary gap caused by
shrinking of the original vma at the vma end;
In all these cases the retry mechanism prevents us from reporting possible
temporary gaps.
Changes since v7 [4]:
- Refactored tests to use kselftest harness, per David Hildenbrand and
Lorenzo Stoakes
- Removed PROCMAP_QUERY selftest, per David Hildenbrand and
Lorenzo Stoakes
- Added Acked-by, per David Hildenbrand
- Replaced sentinels values with named definitions, per Vlastimil Babka
- Added Reviewed-by, per Vlastimil Babka
!!! NOTES FOR APPLYING THE PATCHSET !!!
Applies cleanly over mm-unstable after reverting v7 version of this
patchset (from 94951ab6fe6f to e47914e6c28f in mm-unstable).
[1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/
[2] https://github.com/paulmckrcu/proc-mmap_sem-test
[3] https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.lo…
[4] https://lore.kernel.org/all/20250716030557.1547501-1-surenb@google.com/
Suren Baghdasaryan (6):
selftests/proc: add /proc/pid/maps tearing from vma split test
selftests/proc: extend /proc/pid/maps tearing test to include vma
resizing
selftests/proc: extend /proc/pid/maps tearing test to include vma
remapping
selftests/proc: add verbose mode for /proc/pid/maps tearing tests
fs/proc/task_mmu: remove conversion of seq_file position to unsigned
fs/proc/task_mmu: read proc/pid/maps under per-vma lock
fs/proc/internal.h | 5 +
fs/proc/task_mmu.c | 158 +++-
include/linux/mmap_lock.h | 11 +
mm/madvise.c | 3 +-
mm/mmap_lock.c | 93 +++
tools/testing/selftests/proc/.gitignore | 1 +
tools/testing/selftests/proc/Makefile | 1 +
tools/testing/selftests/proc/proc-maps-race.c | 741 ++++++++++++++++++
8 files changed, 997 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/proc/proc-maps-race.c
--
2.50.0.727.gbf7dc18ff4-goog
Problem
=======
When host APEI is unable to claim synchronous external abort (SEA)
during stage-2 guest abort, today KVM directly injects an async SError
into the VCPU then resumes it. The injected SError usually results in
unpleasant guest kernel panic.
One of the major situation of guest SEA is when VCPU consumes recoverable
uncorrected memory error (UER), which is not uncommon at all in modern
datacenter servers with large amounts of physical memory. Although SError
and guest panic is sufficient to stop the propagation of corrupted memory
there is room to recover from an UER in a more graceful manner.
Proposed Solution
=================
Alternatively KVM can replay the SEA to the faulting VCPU, via existing
KVM_SET_VCPU_EVENTS API. If the memory poison consumption or the fault
that cause SEA is not from guest kernel, the blast radius can be limited
to the consuming or faulting guest userspace process, so the VM can keep
running.
In addition, instead of doing under the hood without involving userspace,
there are benefits to redirect the SEA to VMM:
- VM customers care about the disruptions caused by memory errors, and
VMM usually has the responsibility to start the process of notifying
the customers of memory error events in their VMs. For example some
cloud provider emits a critical log in their observability UI [1], and
provides playbook for customers on how to mitigate disruptions to
their workloads.
- VMM can protect future memory error consumption by unmapping the poisoned
pages from stage-2 page table with KVM userfault, or by splitting the
memslot that contains the poisoned guest pages [2].
- VMM can keep track of SEA events in the VM. When VMM thinks the status
on the host or the VM is bad enough, e.g. number of distinct SEAs
exceeds a threshold, it can restart the VM on another healthy host.
- Behavior parity with x86 architecture. When machine check exception
(MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to
let VMM either recover from the MCE, or terminate itself with VM.
The prior RFC proposes to implement SIGBUS on arm64 as well, but
Marc preferred VCPU exit over signal [3]. However, implementation
aside, returning SEA to VMM is on par with returning MCE to VMM.
Once SEA is redirected to VMM, among other actions, VMM is encouraged
to inject external aborts into the faulting VCPU, which is already
supported by KVM on arm64. We notice injecting instruction abort is not
fully supported by KVM_SET_VCPU_EVENTS. Complement it in the patchset.
New UAPIs
=========
This patchset introduces following userspace-visiable changes to empower
VMM to control what happens next for SEA on guest memory:
- KVM_CAP_ARM_SEA_TO_USER. While taking SEA, if userspace has enabled
this new capability at VM creation, and the SEA is not caused by
memory allocated for stage-2 translation table, instead of injecting
SError, return KVM_EXIT_ARM_SEA to userspace.
- KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details
about the SEA is provided in arm_sea as much as possible, including
sanitized ESR value at EL2, if guest virtual and physical addresses
(GPA and GVA) are available and the values if available.
- KVM_CAP_ARM_INJECT_EXT_IABT. VMM today can inject external data abort
to VCPU via KVM_SET_VCPU_EVENTS API. However, in case of instruction
abort, VMM cannot inject it via KVM_SET_VCPU_EVENTS.
KVM_CAP_ARM_INJECT_EXT_IABT is just a natural extend to
KVM_CAP_ARM_INJECT_EXT_DABT that tells VMM KVM_SET_VCPU_EVENTS now
supports external instruction abort.
* From v1 [4]:
- Rebased on commit 4d62121ce9b5 ("KVM: arm64: vgic-debug: Avoid
dereferencing NULL ITE pointer").
- Sanitize ESR_EL2 before reporting it to userspace.
- Do not do KVM_EXIT_ARM_SEA when SEA is caused by memory allocated to
stage-2 translation table.
[1] https://cloud.google.com/solutions/sap/docs/manage-host-errors
[2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com
[3] https://lore.kernel.org/kvm/86pljbqqh0.wl-maz@kernel.org
[4] https://lore.kernel.org/kvm/20250505161412.1926643-1-jiaqiyan@google.com
Jiaqi Yan (5):
KVM: arm64: VM exit to userspace to handle SEA
KVM: arm64: Set FnV for VCPU when FAR_EL2 is invalid
KVM: selftests: Test for KVM_EXIT_ARM_SEA and KVM_CAP_ARM_SEA_TO_USER
KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT
Documentation: kvm: new uAPI for handling SEA
Raghavendra Rao Ananta (1):
KVM: arm64: Allow userspace to inject external instruction aborts
Documentation/virt/kvm/api.rst | 128 ++++++-
arch/arm64/include/asm/kvm_emulate.h | 67 ++++
arch/arm64/include/asm/kvm_host.h | 8 +
arch/arm64/include/asm/kvm_ras.h | 2 +-
arch/arm64/include/uapi/asm/kvm.h | 3 +-
arch/arm64/kvm/arm.c | 6 +
arch/arm64/kvm/guest.c | 13 +-
arch/arm64/kvm/inject_fault.c | 3 +
arch/arm64/kvm/mmu.c | 59 ++-
include/uapi/linux/kvm.h | 12 +
tools/arch/arm64/include/asm/esr.h | 2 +
tools/arch/arm64/include/uapi/asm/kvm.h | 3 +-
tools/testing/selftests/kvm/Makefile.kvm | 2 +
.../testing/selftests/kvm/arm64/inject_iabt.c | 98 +++++
.../testing/selftests/kvm/arm64/sea_to_user.c | 340 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 1 +
16 files changed, 718 insertions(+), 29 deletions(-)
create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c
create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c
--
2.49.0.1266.g31b7d2e469-goog
Hi ,
Interested in getting the GSX 2025 attendee list?
Expo Name: Global Security Exchange (GSX) 2025
Total Number of records: 17,000 records
List includes: Company Name, Contact Name, Job Title, Mailing Address, Phone, Emails, etc.
Are you considering buying these leads? If yes, I can send you the pricing information.
Awaiting your message
Regards
Ben Graham
Demand Generation Manager
US Marketing Data Inc.,
Please reply with REMOVE if you don't wish to receive further emails
"auto" was defined as a keyword back in the K&R days, but as a storage
type specifier. No one ever used it, since it was and is the default
storage type for local variables.
C++11 recycled the keyword to allow a type to be declared based on the
type of an initializer. This was finally adopted into standard C in
C23.
gcc and clang provide the "__auto_type" alias keyword as an extension
for pre-C23, however, there is no reason to pollute the bulk of the
source base with this temporary keyword; instead define "auto" as a
macro unless the compiler is running in C23+ mode.
This macro is added in <linux/compiler_types.h> because that header is
included in some of the tools headers, wheres <linux/compiler.h> is
not as it has a bunch of very kernel-specific things in it.
---
arch/nios2/include/asm/uaccess.h | 4 ++--
arch/x86/include/asm/bug.h | 2 +-
arch/x86/include/asm/string_64.h | 6 +++---
arch/x86/include/asm/uaccess_64.h | 2 +-
fs/proc/inode.c | 16 ++++++++--------
include/linux/cleanup.h | 4 ++--
include/linux/compiler.h | 2 +-
include/linux/compiler_types.h | 13 +++++++++++++
include/linux/minmax.h | 6 +++---
tools/testing/selftests/bpf/prog_tests/socket_helpers.h | 9 +++++++--
tools/virtio/linux/compiler.h | 2 +-
11 files changed, 42 insertions(+), 24 deletions(-)
Hi all,
I was starting to work on the memfd-exec[1] feature and observed that
Landlock's scoped-IPC features (abstract UNIX sockets and signals)
follow a consistent high-level model, which I'm calling a
resource-accessor pattern:
Resource Process <-> Accessor Process
- Resource process: owns or manages the asset
- socket creator (bind/accept)
- signal handler
- memfd creator
- Accessor process: attempts to use the asset
- socket client (connect/sendto)
- signal sender
- memfd executor
RESOURCE-ACCESSOR PATTERN FUNDAMENTALS
======================================
This pattern appears fundamental to Landlock scoping because:
1. Consistent enforcement model: Landlock restrictions are enforced
only on the accessor side; the resource side remains unconstrained
across all scope types.
2. Reflects actual security boundaries: In practice, sandboxed
processes typically need to access resources created by other
processes, not the reverse.
3. Scalable design: This model works consistently whether processes
are in parent-child relationships or independent peer domains.
4. Real-world usage patterns: Container runtimes and sandbox
orchestrators routinely start multiple workers that restrict
themselves independently.
CURRENT TEST COVERAGE GAP
=========================
Existing self-tests cover hierarchical resource <-> accessor pairs
but do not exercise the case where each task enters an independent
domain. While 'sibling_domain' tests exist, they still use
parent-child relationship patterns rather than true peer domains.
Current Coverage (Linear Hierarchies Only):
-------------------------------------------
Type 1: Parent-Child (scoped_domains)
P1 ---- P2
Type 2: Three Generations (scoped_vs_unscoped)
P1 ---- P2 ---- P3
Variations tested for both types:
- No domains
- Various scoped domain combinations
- Nested domains within inherited domains
- Mixed domain types (SCOPE vs OTHER vs NONE)
Missing Coverage (True Sibling Scenarios):
------------------------------------------
Root
|
+-- Child A [various domain types]
|
+-- Child B [various domain types]
Missing test scenarios:
- A <-> B cross-sibling communication
- Mixed sibling domain combinations
- Sibling isolation enforcement
- Parent -> A, Parent -> B differential access
SOLUTION
========
This series implements the missing sibling pattern using the
resource-accessor model. The tests create a fork tree that looks
like this:
coordinator (no domain)
|
+-- resource_proc (Domain X) /* owns the resource */
|
+-- accessor_proc (Domain Y) /* tries to access */
This directly addresses the missing coverage by creating two
independent child processes that establish peer domains, rather than
the hierarchical parent-child domains covered by existing tests.
Both children call landlock_restrict_self() for the first time, so
their struct landlock_domain->parent pointers are NULL, creating
true peer domains. The harness exposes four test variants:
Variant name | Resource domain | Accessor domain | Result
-------------------|-----------------|-----------------|----------
none_to_none | none | none | ALLOW
none_to_scoped | none | scoped | DENY
scoped_to_none | scoped | none | ALLOW
scoped_to_scoped | scoped | scoped (peer) | DENY
The scoped_to_scoped case was missing from current coverage.
TESTING
=======
All patches apply cleanly to v6.14-rc2 and pass on landlock/master.
The helpers are small and re-use the existing kselftest_harness.h
fixture/variant pattern. All patches have been validated with
scripts/checkpatch.pl --strict and show no warnings.
This series introduces **no kernel changes**, only selftests additions.
Feedback very welcome.
Thanks,
Abhinav
[1] https://github.com/landlock-lsm/linux/issues/37
Links:
- Landlock documentation: https://docs.kernel.org/userspace-api/landlock.html
- Landlock LSM kernel docs: https://docs.kernel.org/security/landlock.html
- Existing tests: tools/testing/selftests/landlock/scoped_*
Signed-off-by: Abhinav Saxena <xandfury(a)gmail.com>
---
Abhinav Saxena (3):
selftests/landlock: move sandbox_type to common
selftests/landlock: add cross-domain variants
selftests/landlock: add cross-domain signal tests
tools/testing/selftests/landlock/scoped_common.h | 7 +
.../landlock/scoped_cross_domain_variants.h | 54 +++++
.../landlock/scoped_multiple_domain_variants.h | 7 -
.../selftests/landlock/scoped_signal_test.c | 237 +++++++++++++++++++++
4 files changed, 298 insertions(+), 7 deletions(-)
---
base-commit: 5b74b2eff1eeefe43584e5b7b348c8cd3b723d38
change-id: 20250715-landlock_abstractions-dbc0aabf1063
Best regards,
--
Abhinav Saxena <xandfury(a)gmail.com>