The new KUnit module handling has KUnit test suites listed in a
.kunit_test_suites section of each module. This should be loaded when
the module is, but at the moment this only happens if KUnit is built-in.
Also load this when KUnit is enabled as a module: it'll not be usable
unless KUnit is loaded, but such modules are likely to depend on KUnit
anyway, so it's unlikely to ever be loaded needlessly.
Fixes: 3d6e44623841 ("kunit: unify module and builtin suite definitions")
Signed-off-by: David Gow <davidgow(a)google.com>
---
kernel/module/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 324a770f789c..222a0b7263b6 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2094,7 +2094,7 @@ static int find_module_sections(struct module *mod, struct load_info *info)
sizeof(*mod->static_call_sites),
&mod->num_static_call_sites);
#endif
-#ifdef CONFIG_KUNIT
+#if IS_ENABLED(CONFIG_KUNIT)
mod->kunit_suites = section_objs(info, ".kunit_test_suites",
sizeof(*mod->kunit_suites),
&mod->num_kunit_suites);
--
2.37.0.144.g8ac04bfd2-goog
Hello,
This patch series implements a new syscall, process_memwatch. Currently,
only the support to watch soft-dirty PTE bit is added. This syscall is
generic to watch the memory of the process. There is enough room to add
more operations like this to watch memory in the future.
Soft-dirty PTE bit of the memory pages can be viewed by using pagemap
procfs file. The soft-dirty PTE bit for the memory in a process can be
cleared by writing to the clear_refs file. This series adds features that
weren't possible through the Proc FS interface.
- There is no atomic get soft-dirty PTE bit status and clear operation
possible.
- The soft-dirty PTE bit of only a part of memory cannot be cleared.
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The Proc FS interface is enough for that as I think the process
is frozen. We have the use case where we need to track the soft-dirty
PTE bit for running processes. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows. This syscall is used by games to keep
track of dirty pages and keep processing only the dirty pages. This
syscall can be used by the CRIU project and other applications which
require soft-dirty PTE bit information.
As in the current kernel there is no way to clear a part of memory (instead
of clearing the Soft-Dirty bits for the entire processi) and get+clear
operation cannot be performed atomically, there are other methods to mimic
this information entirely in userspace with poor performance:
- The mprotect syscall and SIGSEGV handler for bookkeeping
- The userfaultfd syscall with the handler for bookkeeping
long process_memwatch(int pidfd, unsigned long start, int len,
unsigned int flags, void *vec, int vec_len);
This syscall can be used by the CRIU project and other applications which
require soft-dirty PTE bit information. The following operations are
supported in this syscall:
- Get the pages that are soft-dirty.
- Clear the pages which are soft-dirty.
- The optional flag to ignore the VM_SOFTDIRTY and only track per page
soft-dirty PTE bit
There are two decisions which have been taken about how to get the output
from the syscall.
- Return offsets of the pages from the start in the vec
- Stop execution when vec is filled with dirty pages
These two arguments doesn't follow the mincore() philosophy where the
output array corresponds to the address range in one to one fashion, hence
the output buffer length isn't passed and only a flag is set if the page
is present. This makes mincore() easy to use with less control. We are
passing the size of the output array and putting return data consecutively
which is offset of dirty pages from the start. The user can convert these
offsets back into the dirty page addresses easily. Suppose, the user want
to get first 10 dirty pages from a total memory of 100 pages. He'll
allocate output buffer of size 10 and process_memwatch() syscall will
abort after finding the 10 pages. This behaviour is needed to support
Windows' getWriteWatch(). The behaviour like mincore() can be achieved by
passing output buffer of 100 size. This interface can be used for any
desired behaviour.
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (5):
fs/proc/task_mmu: make functions global to be used in other files
mm: Implement process_memwatch syscall
mm: wire up process_memwatch syscall for x86
selftests: vm: add process_memwatch syscall tests
mm: add process_memwatch syscall documentation
Documentation/admin-guide/mm/soft-dirty.rst | 48 +-
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/proc/task_mmu.c | 84 +--
include/linux/mm_inline.h | 99 +++
include/linux/syscalls.h | 3 +-
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/memwatch.h | 12 +
kernel/sys_ni.c | 1 +
mm/Makefile | 2 +-
mm/memwatch.c | 285 ++++++++
tools/include/uapi/asm-generic/unistd.h | 5 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 1 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 2 +
tools/testing/selftests/vm/memwatch_test.c | 635 ++++++++++++++++++
16 files changed, 1098 insertions(+), 87 deletions(-)
create mode 100644 include/uapi/linux/memwatch.h
create mode 100644 mm/memwatch.c
create mode 100644 tools/testing/selftests/vm/memwatch_test.c
--
2.30.2
Add bpf trampoline support for arm64. Most of the logic is the same as
x86.
Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
result:
#9 /1 bpf_cookie/kprobe:OK
#9 /2 bpf_cookie/multi_kprobe_link_api:FAIL
#9 /3 bpf_cookie/multi_kprobe_attach_api:FAIL
#9 /4 bpf_cookie/uprobe:OK
#9 /5 bpf_cookie/tracepoint:OK
#9 /6 bpf_cookie/perf_event:OK
#9 /7 bpf_cookie/trampoline:OK
#9 /8 bpf_cookie/lsm:OK
#9 bpf_cookie:FAIL
#18 /1 bpf_tcp_ca/dctcp:OK
#18 /2 bpf_tcp_ca/cubic:OK
#18 /3 bpf_tcp_ca/invalid_license:OK
#18 /4 bpf_tcp_ca/dctcp_fallback:OK
#18 /5 bpf_tcp_ca/rel_setsockopt:OK
#18 bpf_tcp_ca:OK
#51 /1 dummy_st_ops/dummy_st_ops_attach:OK
#51 /2 dummy_st_ops/dummy_init_ret_value:OK
#51 /3 dummy_st_ops/dummy_init_ptr_arg:OK
#51 /4 dummy_st_ops/dummy_multiple_args:OK
#51 dummy_st_ops:OK
#55 fentry_fexit:OK
#56 fentry_test:OK
#57 /1 fexit_bpf2bpf/target_no_callees:OK
#57 /2 fexit_bpf2bpf/target_yes_callees:OK
#57 /3 fexit_bpf2bpf/func_replace:OK
#57 /4 fexit_bpf2bpf/func_replace_verify:OK
#57 /5 fexit_bpf2bpf/func_sockmap_update:OK
#57 /6 fexit_bpf2bpf/func_replace_return_code:OK
#57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK
#57 /8 fexit_bpf2bpf/func_replace_multi:OK
#57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK
#57 fexit_bpf2bpf:OK
#58 fexit_sleep:OK
#59 fexit_stress:OK
#60 fexit_test:OK
#67 get_func_args_test:OK
#68 get_func_ip_test:OK
#104 modify_return:OK
#237 xdp_bpf2bpf:OK
bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
failed due to lack of multi_kprobe on arm64.
v5:
- As Alexei suggested, remove is_valid_bpf_tramp_flags()
v4: https://lore.kernel.org/bpf/20220517071838.3366093-1-xukuohai@huawei.com/
- Run the test cases on raspberry pi 4b
- Rebase and add cookie to trampoline
- As Steve suggested, move trace_direct_tramp() back to entry-ftrace.S to
avoid messing up generic code with architecture specific code
- As Jakub suggested, merge patch 4 and patch 5 of v3 to provide full function
in one patch
- As Mark suggested, add a comment for the use of aarch64_insn_patch_text_nosync()
- Do not generate trampoline for long jump to avoid triggering ftrace_bug
- Round stack size to multiples of 16B to avoid SPAlignmentFault
- Use callee saved register x20 to reduce the use of mov_i64
- Add missing BTI J instructions
- Trivial spelling and code style fixes
v3: https://lore.kernel.org/bpf/20220424154028.1698685-1-xukuohai@huawei.com/
- Append test results for bpf_tcp_ca, dummy_st_ops, fexit_bpf2bpf,
xdp_bpf2bpf
- Support to poke bpf progs
- Fix return value of arch_prepare_bpf_trampoline() to the total number
of bytes instead of number of instructions
- Do not check whether CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled in
arch_prepare_bpf_trampoline, since the trampoline may be hooked to a bpf
prog
- Restrict bpf_arch_text_poke() to poke bpf text only, as kernel functions
are poked by ftrace
- Rewrite trace_direct_tramp() in inline assembly in trace_selftest.c
to avoid messing entry-ftrace.S
- isolate arch_ftrace_set_direct_caller() with macro
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS to avoid compile error
when this macro is disabled
- Some trivial code sytle fixes
v2: https://lore.kernel.org/bpf/20220414162220.1985095-1-xukuohai@huawei.com/
- Add Song's ACK
- Change the multi-line comment in is_valid_bpf_tramp_flags() into net
style (patch 3)
- Fix a deadloop issue in ftrace selftest (patch 2)
- Replace pt_regs->x0 with pt_regs->orig_x0 in patch 1 commit message
- Replace "bpf trampoline" with "custom trampoline" in patch 1, as
ftrace direct call is not only used by bpf trampoline.
v1: https://lore.kernel.org/bpf/20220413054959.1053668-1-xukuohai@huawei.com/
Xu Kuohai (6):
arm64: ftrace: Add ftrace direct call support
ftrace: Fix deadloop caused by direct call in ftrace selftest
bpf: Remove is_valid_bpf_tramp_flags()
bpf, arm64: Impelment bpf_arch_text_poke() for arm64
bpf, arm64: bpf trampoline for arm64
selftests/bpf: Fix trivial typo in fentry_fexit.c
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/ftrace.h | 22 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/entry-ftrace.S | 28 +-
arch/arm64/net/bpf_jit.h | 1 +
arch/arm64/net/bpf_jit_comp.c | 523 +++++++++++++++++-
arch/x86/net/bpf_jit_comp.c | 20 -
kernel/bpf/bpf_struct_ops.c | 3 +
kernel/bpf/trampoline.c | 3 +
kernel/trace/trace_selftest.c | 2 +
.../selftests/bpf/prog_tests/fentry_fexit.c | 4 +-
11 files changed, 570 insertions(+), 39 deletions(-)
--
2.30.2
The emulator mishandles LEA with register source operand. Even though such
LEA is illegal, it can be encoded and fed to CPU. In which case real
hardware throws #UD. The emulator, instead, returns address of
x86_emulate_ctxt._regs. This info leak hurts host's kASLR.
Tell the decoder that illegal LEA is not to be emulated.
Signed-off-by: Michal Luczaj <mhal(a)rbox.co>
---
What the emulator does for LEA is simply:
case 0x8d: /* lea r16/r32, m */
ctxt->dst.val = ctxt->src.addr.mem.ea;
break;
And it makes sense if you assume that LEA's source operand is always
memory. But because there is a race window between VM-exit and the decoder
instruction fetch, emulator can be force fed an arbitrary opcode of choice.
Including some that are simply illegal and would cause #UD in normal
circumstances. Such as a LEA with a register-direct source operand -- for
which the emulator sets `op->addr.reg`, but reads `op->addr.mem.ea`.
union {
unsigned long *reg;
struct segmented_address {
ulong ea;
unsigned seg;
} mem;
...
} addr;
Because `reg` and `mem` are in union, emulator reveals address in host's
memory.
I hope this patch is not considered an `instr_dual` abuse?
arch/x86/kvm/emulate.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f8382abe22ff..7c14706372d0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4566,6 +4566,10 @@ static const struct mode_dual mode_dual_63 = {
N, I(DstReg | SrcMem32 | ModRM | Mov, em_movsxd)
};
+static const struct instr_dual instr_dual_8d = {
+ D(DstReg | SrcMem | ModRM | NoAccess), N
+};
+
static const struct opcode opcode_table[256] = {
/* 0x00 - 0x07 */
F6ALU(Lock, em_add),
@@ -4622,7 +4626,7 @@ static const struct opcode opcode_table[256] = {
I2bv(DstMem | SrcReg | ModRM | Mov | PageTable, em_mov),
I2bv(DstReg | SrcMem | ModRM | Mov, em_mov),
I(DstMem | SrcNone | ModRM | Mov | PageTable, em_mov_rm_sreg),
- D(ModRM | SrcMem | NoAccess | DstReg),
+ ID(0, &instr_dual_8d),
I(ImplicitOps | SrcMem16 | ModRM, em_mov_sreg_rm),
G(0, group1A),
/* 0x90 - 0x97 */
--
2.32.0
Build commands start with "make". It is missing. Add "make" to the start
of the build command.
Fixes: 820636106342 ("docs/kselftest: add more guidelines for adding new tests")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
Documentation/dev-tools/kselftest.rst | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index ee6467ca8293..9dd94c334f05 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -255,9 +255,9 @@ Contributing new tests (details)
* All changes should pass::
- kselftest-{all,install,clean,gen_tar}
- kselftest-{all,install,clean,gen_tar} O=abo_path
- kselftest-{all,install,clean,gen_tar} O=rel_path
+ make kselftest-{all,install,clean,gen_tar}
+ make kselftest-{all,install,clean,gen_tar} O=abs_path
+ make kselftest-{all,install,clean,gen_tar} O=rel_path
make -C tools/testing/selftests {all,install,clean,gen_tar}
make -C tools/testing/selftests {all,install,clean,gen_tar} O=abs_path
make -C tools/testing/selftests {all,install,clean,gen_tar} O=rel_path
--
2.30.2
From: Vincent Cheng <vincent.cheng.xh(a)renesas.com>
This series adds adjust phase to the PTP Hardware Clock device interface.
Some PTP hardware clocks have a write phase mode that has
a built-in hardware filtering capability. The write phase mode
utilizes a phase offset control word instead of a frequency offset
control word. Add adjust phase function to take advantage of this
capability.
Changes since v1:
- As suggested by Richard Cochran:
1. ops->adjphase is new so need to check for non-null function pointer.
2. Kernel coding style uses lower_case_underscores.
3. Use existing PTP clock API for delayed worker.
Vincent Cheng (3):
ptp: Add adjphase function to support phase offset control.
ptp: Add adjust_phase to ptp_clock_caps capability.
ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
drivers/ptp/ptp_chardev.c | 1 +
drivers/ptp/ptp_clock.c | 3 ++
drivers/ptp/ptp_clockmatrix.c | 92 +++++++++++++++++++++++++++++++++++
drivers/ptp/ptp_clockmatrix.h | 8 ++-
include/linux/ptp_clock_kernel.h | 6 ++-
include/uapi/linux/ptp_clock.h | 4 +-
tools/testing/selftests/ptp/testptp.c | 6 ++-
7 files changed, 114 insertions(+), 6 deletions(-)
--
2.7.4
While the sdhci-of-aspeed KUnit tests do work when builtin, and do work
when KUnit itself is being built as a module, the two together break.
This is because the KUnit tests (understandably) depend on KUnit, so a
built-in test cannot build if KUnit is a module.
Fix this by adding a dependency on (MMC_SDHCI_OF_ASPEED=m || KUNIT=y),
which only excludes this one problematic configuration.
This was reported on a nasty openrisc-randconfig run by the kernel test
robot, though for some reason (compiler optimisations removing the test
code?) I wasn't able to reproduce it locally on x86:
https://lore.kernel.org/linux-mm/202207140122.fzhlf60k-lkp@intel.com/T/
Fixes: 291cd54e5b05 ("mmc: sdhci-of-aspeed: test: Use kunit_test_suite() macro")
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: David Gow <davidgow(a)google.com>
---
drivers/mmc/host/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 10c563999d3d..e63608834411 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -171,6 +171,7 @@ config MMC_SDHCI_OF_ASPEED
config MMC_SDHCI_OF_ASPEED_TEST
bool "Tests for the ASPEED SDHCI driver" if !KUNIT_ALL_TESTS
depends on MMC_SDHCI_OF_ASPEED && KUNIT
+ depends on (MMC_SDHCI_OF_ASPEED=m || KUNIT=y)
default KUNIT_ALL_TESTS
help
Enable KUnit tests for the ASPEED SDHCI driver. Select this
--
2.37.0.170.g444d1eabd0-goog
While creating a LSM BPF MAC policy to block user namespace creation, we
used the LSM cred_prepare hook because that is the closest hook to prevent
a call to create_user_ns().
The calls look something like this:
cred = prepare_creds()
security_prepare_creds()
call_int_hook(cred_prepare, ...
if (cred)
create_user_ns(cred)
We noticed that error codes were not propagated from this hook and
introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds()
is not appropriate for MAC policies, and instead the hook is
meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce
a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function
and userns_create LSM hook, then marks the hook as sleepable in BPF.
Links:
1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare…
Past discussions:
V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/
V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take
struct cred
- Move security_create_user_ns() call after id mapping check in
create_user_ns()
- Update documentation to reflect changes
Frederick Lawler (4):
security, lsm: Introduce security_create_user_ns()
bpf-lsm: Make bpf_lsm_userns_create() sleepable
selftests/bpf: Add tests verifying bpf lsm userns_create hook
selinux: Implement userns_create hook
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 6 ++
kernel/bpf/bpf_lsm.c | 1 +
kernel/user_namespace.c | 5 ++
security/security.c | 5 ++
security/selinux/hooks.c | 9 ++
security/selinux/include/classmap.h | 2 +
.../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++
.../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++
10 files changed, 160 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
--
2.30.2
This series is based on torvalds/master.
The series is split up like so:
- Patch 1 is a simple fixup which we should take in any case (even by itself).
- Patches 2-6 add the feature, configurable selftest support, and docs.
Why not ...?
============
- Why not /proc/[pid]/userfaultfd? The proposed use case for this is for one
process to open a userfaultfd which can intercept another process' page
faults. This seems to me like exactly what CAP_SYS_PTRACE is for, though, so I
think this use case can simply use a syscall without the powers CAP_SYS_PTRACE
grants being "too much".
- Why not use a syscall? Access to syscalls is generally controlled by
capabilities. We don't have a capability which is used for userfaultfd access
without also granting more / other permissions as well, and adding a new
capability was rejected [1].
- It's possible a LSM could be used to control access instead. I suspect
adding a brand new one just for this would be rejected, but I think some
existing ones like SELinux can be used to filter syscall access. Enabling
SELinux for large production deployments which don't already use it is
likely to be a huge undertaking though, and I don't think this use case by
itself is enough to motivate that kind of architectural change.
Changelog
=========
v3->v4:
- Picked up an Acked-by on 5/5.
- Updated cover letter to cover "why not ...".
- Refactored userfaultfd_allowed() into userfaultfd_syscall_allowed(). [Peter]
- Removed obsolete comment from a previous version. [Peter]
- Refactored userfaultfd_open() in selftest. [Peter]
- Reworded admin-guide documentation. [Mike, Peter]
- Squashed 2 commits adding /dev/userfaultfd to selftest and making selftest
configurable. [Peter]
- Added "syscall" test modifier (the default behavior) to selftest. [Peter]
v2->v3:
- Rebased onto linux-next/akpm-base, in order to be based on top of the
run_vmtests.sh refactor which was merged previously.
- Picked up some Reviewed-by's.
- Fixed ioctl definition (_IO instead of _IOWR), and stopped using
compat_ptr_ioctl since it is unneeded for ioctls which don't take a pointer.
- Removed the "handle_kernel_faults" bool, simplifying the code. The result is
logically equivalent, but simpler.
- Fixed userfaultfd selftest so it returns KSFT_SKIP appropriately.
- Reworded documentation per Shuah's feedback on v2.
- Improved example usage for userfaultfd selftest.
v1->v2:
- Add documentation update.
- Test *both* userfaultfd(2) and /dev/userfaultfd via the selftest.
[1]: https://lore.kernel.org/lkml/686276b9-4530-2045-6bd8-170e5943abe4@schaufler…
Axel Rasmussen (5):
selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh
userfaultfd: add /dev/userfaultfd for fine grained access control
userfaultfd: selftests: modify selftest to use /dev/userfaultfd
userfaultfd: update documentation to describe /dev/userfaultfd
selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh
Documentation/admin-guide/mm/userfaultfd.rst | 41 +++++++++++-
Documentation/admin-guide/sysctl/vm.rst | 3 +
fs/userfaultfd.c | 69 ++++++++++++++++----
include/uapi/linux/userfaultfd.h | 4 ++
tools/testing/selftests/vm/run_vmtests.sh | 11 +++-
tools/testing/selftests/vm/userfaultfd.c | 69 +++++++++++++++++---
6 files changed, 169 insertions(+), 28 deletions(-)
--
2.37.0.170.g444d1eabd0-goog