This fixes the way the Authority Mask Register (AMR) is updated
by the existing pkey tests and adds a new test to verify the
functionality of execute-disabled pkeys.
Previous versions can be found at:
v2: https://lore.kernel.org/linuxppc-dev/20200527030342.13712-1-sandipan@linux.…
v1: https://lore.kernel.org/linuxppc-dev/20200508162332.65316-1-sandipan@linux.…
Changes in v3:
- Fixed AMR writes for existing pkey tests (new patch).
- Moved Hash MMU check under utilities (new patch) and removed duplicate
code.
- Fixed comments on why the pkey permission bits were redefined.
- Switched to existing mfspr() macro for reading AMR.
- Switched to sig_atomic_t as data type for variables updated in the
signal handlers.
- Switched to exit()-ing if the signal handlers come across an unexpected
condition instead of trying to reset page and pkey permissions.
- Switched to write() from printf() for printing error messages from
the signal handlers.
- Switched to getpagesize().
- Renamed fault counter to denote remaining faults.
- Dropped unnecessary randomization for choosing an address to fault at.
- Added additional information on change in permissions due to AMR and
IAMR bits in comments.
- Switched the first instruction word of the executable region to a trap
to test if it is actually overwritten by a no-op later.
- Added an new test scenario where the pkey imposes no restrictions and
an attempt is made to jump to the executable region again.
Changes in v2:
- Added .gitignore entry for test binary.
- Fixed builds for older distros where siginfo_t might not have si_pkey as
a formal member based on discussion with Michael.
Sandipan Das (3):
selftests: powerpc: Fix pkey access right updates
selftests: powerpc: Move Hash MMU check to utilities
selftests: powerpc: Add test for execute-disabled pkeys
tools/testing/selftests/powerpc/include/reg.h | 6 +
.../testing/selftests/powerpc/include/utils.h | 1 +
tools/testing/selftests/powerpc/mm/.gitignore | 1 +
tools/testing/selftests/powerpc/mm/Makefile | 5 +-
.../selftests/powerpc/mm/bad_accesses.c | 28 --
.../selftests/powerpc/mm/pkey_exec_prot.c | 388 ++++++++++++++++++
.../selftests/powerpc/ptrace/core-pkey.c | 2 +-
.../selftests/powerpc/ptrace/ptrace-pkey.c | 2 +-
tools/testing/selftests/powerpc/utils.c | 28 ++
9 files changed, 429 insertions(+), 32 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
--
2.25.1
Hi,
v2:
- switch harness from XFAIL to SKIP
- pass skip reason from test into TAP output
- add acks/reviews
v1: https://lore.kernel.org/lkml/20200611224028.3275174-1-keescook@chromium.org/
I finally got around to converting the kselftest_harness.h API to actually
use the kselftest.h API so all the tools using it can actually report
TAP correctly. As part of this, there are a bunch of related cleanups,
API updates, and additions.
Thanks!
-Kees
Kees Cook (8):
selftests/clone3: Reorder reporting output
selftests: Remove unneeded selftest API headers
selftests/binderfs: Fix harness API usage
selftests: Add header documentation and helpers
selftests/harness: Switch to TAP output
selftests/harness: Refactor XFAIL into SKIP
selftests/harness: Display signed values correctly
selftests/harness: Report skip reason
tools/testing/selftests/clone3/clone3.c | 2 +-
.../selftests/clone3/clone3_clear_sighand.c | 3 +-
.../testing/selftests/clone3/clone3_set_tid.c | 2 +-
.../filesystems/binderfs/binderfs_test.c | 284 +++++++++---------
tools/testing/selftests/kselftest.h | 78 ++++-
tools/testing/selftests/kselftest_harness.h | 169 ++++++++---
.../pid_namespace/regression_enomem.c | 1 -
.../selftests/pidfd/pidfd_getfd_test.c | 1 -
.../selftests/pidfd/pidfd_setns_test.c | 1 -
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 +-
.../selftests/uevent/uevent_filtering.c | 1 -
11 files changed, 356 insertions(+), 194 deletions(-)
--
2.25.1
The paravirtualized CR pinning patchset is a strengthened version of
existing control registers pinning for paravritualized guests. It
protects KVM guests from ROP based attacks which attempt to disable key
security features. Virtualized Linux guests such as Kata Containers, AWS
Lambda, and Chromos Termina will get this protection enabled by default
when they update their kernel / configs. Using virtualization allows us
to provide a stronger version of a proven exploit mitigation technique.
We’ve patched KVM to create 6 new KVM specific MSRs used to query which
bits may be pinned, and to set which bits are pinned high or low in
control registers 0 and 4. Linux guest support was added so that
non-kexec guests will be able to take advantage of this strengthened
protection by default. A plan for enabling guests with kexec is proposed
in this cover letter. As part of that plan, we add a command line flag
that allows users to opt-in to the protection on boot if they have kexec
built into their kernel, effectively opting out of kexec support.
Hibernation and suspend to ram were enabled by updating the location
where bits in control register 4 were saved to and restored from. The
work also includes minor patches for QEMU to ensure reboot works by
clearing the added MSRs and exposing the new CPUID feature bit. There is
one SMM related selftest added in this patchset and another patch for
kvm-unit-tests that will be sent separately.
Thank you to Sean and Drew who reviewed v2, to Boris, Paolo, Andy, and
Liran who reviewed v1, and to Sean, Dave, Kristen, and Rick who've
provided feedback throughout. I appreciate your time spent reviewing and
feedback.
Here are the previous RFC versions of this patchset for reference
RFC v2: https://lkml.org/lkml/2020/2/18/1162
RFC v1: https://lkml.org/lkml/2019/12/24/380
=== High level overview of the changes ===
- A CPUID feature bit as well as MSRs were added to KVM. Guests can use
the CPUID feature bit to determine if MSRs are available. Reading the
first 2 MSRs returns the bits which may be pinned for CR0/4
respectively. The next 4 MSRs are writeable and allow the guest and
host userspace to set which bits are pinned low or pinned high for
CR0/4.
- Hibernation and suspend-to-RAM are supported. This was done by
updating mmu_cr4_features on feature identification of the boot CPU.
As such, mmu_cr4_features is no longer read only after init.
- CPU hotplug is supported. Pinning is per vCPU. When running as a guest
pinning is requested after CPU identification for non-boot CPUs. The
boot CPU requests pinning a directly after existing pinning is setup.
- Nested virtualization is supported. SVM / VMX restore pinned bits on
VM-Exit if they had been unset in the host VMCB / VMCS.
- As suggested by Sean, unpinning of pinned bits on return from SMM due
to modification of SMRAM will cause an unhandleable emulation fault
resulting in termination of the guest.
- kexec support is still pending, since the plan is a bit long it's been
moved to the end of the cover letter. It talks about the decision to
make a command line parameter, why we opt in to pinning (and
effectively out of kexec). Being that those changes wouldn't be
localized to KVM (and this patchset is on top of kvm/next).
- As Paolo requested, a patch will be sent immediately following this
patchset for kvm-unit-tests with the unit tests for general
functionality. selftests are included for SMM specific functionality.
=== Chanages since RFCv2 ===
- Related to Drew's comments
- Used linux/stringify.h in selftests
- Added comments on why we don't use GUEST_* due to SMM trickiness.
We opt not to use GUEST_SYNC() because we also have to make a sync call
from SMM. As such, the address of the ucall struct we'd need to pass isn't
something we can put into the machine code in a maintainable way. At least
so far as I could tell.
- Related to Sean's comments
- Allowed pinning bits high or low rather than just high
- Cleaner code path for guest and host when writing to MSRs
- Didn't use read modify write behavior due to that requiring changes to
selftest save restore code which made me suspect that there might be
issues with other VMMs. The issue was because we read CR0/4 in the RWM
operation on the MSR, we must do KVM_SET_REGS before KVM_SET_MSRS, we also
had to call KVM_SET_SREGS and add checks for if we're in SMM or not. This
made it a bit more messy overall so I went with the first approach Sean
suggested where we just have pin high/low semantics.
- If the guest writes values to the allowed MSRs that are not the correct
value the wrmsr fails.
- If SMRAM modification would result in unpinning bits we bail with
X86EMUL_UNHANDLEABLE
- Added silent restoration of pinned bits for SVM and VMX when they may have
been modified in VMCB / VMCS. This didn't seem like a place were we'd want
to inject a fault, please let me know if we should.
=== Description of changes and rational ===
Paravirtualized Control Register pinning is a strengthened version of
existing protections on the Write Protect, Supervisor Mode Execution /
Access Protection, and User-Mode Instruction Prevention bits. The
existing protections prevent native_write_cr*() functions from writing
values which disable those bits. This patchset prevents any guest
writes to control registers from disabling pinned bits, not just writes
from native_write_cr*(). This stops attackers within the guest from
using ROP to disable protection bits.
https://web.archive.org/web/20171029060939/http://www.blackbunny.io/linux-k…
The protection is implemented by adding MSRs to KVM which contain the
bits that are allowed to be pinned, and the bits which are pinned. The
guest or userspace can enable bit pinning by reading MSRs to check
which bits are allowed to be pinned, and then writing MSRs to set which
bits they want pinned.
Other hypervisors such as HyperV have implemented similar protections
for Control Registers and MSRs; which security researchers have found
effective.
https://www.abatchy.com/2018/01/kernel-exploitation-4
We add a CR pin feature bit to the KVM cpuid, read only MSRs which
guests use to identify which bits they may request be pinned, and CR
pinned low/high MSRs which contain the pinned bits. Guests can request
that KVM pin bits within control register 0 or 4 via the CR pinned MSRs.
Writes to the MSRs fail if they include bits that aren't allowed to be
pinned. Host userspace may clear or modify pinned bits at any time. Once
pinned bits are set, the guest may pin more allowed bits, but may never
clear pinned bits.
In the event that the guest vCPU attempts to disable any of the pinned
bits, the vCPU that issued the write is sent a general protection
fault, and the register is left unchanged.
When running with KVM guest support and paravirtualized CR pinning
enabled, paravirtualized and existing pinning are setup at the same
point on the boot CPU. Non-boot CPUs setup pinning upon identification.
Pinning is not active when running in SMM. Entering SMM disables pinned
bits. Writes to control registers within SMM would therefore trigger
general protection faults if pinning was enforced. Upon exit from SMM,
SMRAM is modified to ensure the values of CR0/4 that will be restored
contain the correct values for pinned bits. CR0/4 values are then
restored from SMRAM as usual.
When running with nested virtualization, should pinned bits be cleared
from host VMCS / VMCB, on VM-Exit, they will be silently restored.
Should userspace expose the CR pining CPUID feature bit, it must zero
CR pinned MSRs on reboot. If it does not, it runs the risk of having
the guest enable pinning and subsequently cause general protection
faults on next boot due to early boot code setting control registers to
values which do not contain the pinned bits.
Hibernation to disk and suspend-to-RAM are supported. identify_cpu was
updated to ensure SMEP/SMAP/UMIP are present in mmu_cr4_features. This
is necessary to ensure protections stay active during hibernation image
restoration.
Guests using the kexec system call currently do not support
paravirtualized control register pinning. This is due to early boot
code writing known good values to control registers, these values do
not contain the protected bits. This is due to CPU feature
identification being done at a later time, when the kernel properly
checks if it can enable protections. As such, the pv_cr_pin command
line option has been added which instructs the kernel to disable kexec
in favor of enabling paravirtualized control register pinning.
crashkernel is also disabled when the pv_cr_pin parameter is specified
due to its reliance on kexec.
When we make kexec compatible, we will still need a way for a kernel
with support to know if the kernel it is attempting to load has
support. If a kernel with this enabled attempts to kexec a kernel where
this is not supported, it would trigger a fault almost immediately.
Liran suggested adding a section to the built image acting as a flag to
signify support for being kexec'd by a kernel with pinning enabled.
Should that approach be implemented, it is likely that the command line
flag (pv_cr_pin) would still be desired for some deprecation period. We
wouldn't want the default behavior to change from being able to kexec
older kernels to not being able to, as this might break some users
workflows. Since we require that the user opt-in to break kexec we've
held off on attempting to fix kexec in this patchset. This way no one
sees any behavior they are not explicitly opting in to.
Security conscious kernel configurations disable kexec already, per
KSPP guidelines. Projects such as Kata Containers, AWS Lambda, ChromeOS
Termina, and others using KVM to virtualize Linux will benefit from
this protection without the need to specify pv_cr_pin on the command
line.
Pinning of sensitive CR bits has already been implemented to protect
against exploits directly calling native_write_cr*(). The current
protection cannot stop ROP attacks which jump directly to a MOV CR
instruction. Guests running with paravirtualized CR pinning are now
protected against the use of ROP to disable CR bits. The same bits that
are being pinned natively may be pinned via the CR pinned MSRs. These
bits are WP in CR0, and SMEP, SMAP, and UMIP in CR4.
Future patches could implement similar MSRs to protect bits in MSRs.
The NXE bit of the EFER MSR is a prime candidate.
=== Plan for kexec support ===
Andy's suggestion of a boot option has been incorporated as the
pv_cr_pin command line option. Boris mentioned that short-term
solutions become immutable. However, for the reasons outlined below
we need a way for the user to opt-in to pinning over kexec if both
are compiled in, and the command line parameter seems to be a good
way to do that. Liran's proposed solution of a flag within the ELF
would allow us to identify which kernels have support is assumed to
be implemented in the following scenarios.
We then have the following cases (without the addition of pv_cr_pin):
- Kernel running without pinning enabled kexecs kernel with pinning.
- Loaded kernel has kexec
- Do not enable pinning
- Loaded kernel lacks kexec
- Enable pinning
- Kernel running with pinning enabled kexecs kernel with pinning (as
identified by ELF addition).
- Okay
- Kernel running with pinning enabled kexecs kernel without pinning
(as identified by lack of ELF addition).
- User is presented with an error saying that they may not kexec
a kernel without pinning support.
With the addition of pv_cr_pin we have the following situations:
- Kernel running without pinning enabled kexecs kernel with pinning.
- Loaded kernel has kexec
- pv_cr_pin command line parameter present for new kernel
- Enable pinning
- pv_cr_pin command line parameter not present for new kernel
- Do not enable pinning
- Loaded kernel lacks kexec
- Enable pinning
- Kernel running with pinning enabled kexecs kernel with pinning (as
identified by ELF addition).
- Okay
- Kernel running with kexec and pinning enabled (opt-in via pv_cr_pin)
kexecs kernel without pinning (as identified by lack of ELF addition).
- User is presented with an error saying that they have opted
into pinning support and may not kexec a kernel without pinning
support.
Without the command line parameter I'm not sure how we could preserve
users workflows which might rely on kexecing older kernels (ones
which wouldn't have support). I see the benefit here being that users
have to opt-in to the possibility of breaking their workflow, via
their addition of the pv_cr_pin command line flag. Which could of
course also be called nokexec. A deprecation period could then be
chosen where eventually pinning takes preference over kexec and users
are presented with the error if they try to kexec an older kernel.
Input on this would be much appreciated, as well as if this is the
best way to handle things or if there's another way that would be
preferred. This is just what we were able to come up with to ensure
users didn't get anything broken they didn't agree to have broken.
Thanks,
John
John Andersen (4):
X86: Update mmu_cr4_features during feature identification
KVM: x86: Introduce paravirt feature CR0/CR4 pinning
selftests: kvm: add test for CR pinning with SMM
X86: Use KVM CR pin MSRs
.../admin-guide/kernel-parameters.txt | 11 +
Documentation/virt/kvm/msr.rst | 53 +++++
arch/x86/Kconfig | 10 +
arch/x86/include/asm/kvm_host.h | 7 +
arch/x86/include/asm/kvm_para.h | 28 +++
arch/x86/include/uapi/asm/kvm_para.h | 7 +
arch/x86/kernel/cpu/common.c | 11 +-
arch/x86/kernel/kvm.c | 39 ++++
arch/x86/kernel/setup.c | 12 +-
arch/x86/kvm/cpuid.c | 3 +-
arch/x86/kvm/emulate.c | 3 +-
arch/x86/kvm/kvm_emulate.h | 2 +-
arch/x86/kvm/svm/nested.c | 11 +-
arch/x86/kvm/vmx/nested.c | 10 +-
arch/x86/kvm/x86.c | 106 ++++++++-
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/processor.h | 13 ++
.../selftests/kvm/x86_64/smm_cr_pin_test.c | 207 ++++++++++++++++++
19 files changed, 521 insertions(+), 14 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c
base-commit: 49b3deaad3452217d62dbd78da8df24eb0c7e169
--
2.21.0
This is v2 of the patch to fix TAP output for skipped tests. I noticed
and fixed two other occurrences of "not ok ... # SKIP" which according
to the TAP specification should be marked as "ok ... # SKIP" instead.
Unfortunately, closer analysis showed ksft_exit_skip to be a badly
misused API. It should be used when the remainder of the testcase
is being skipped, but TAP only supports this before the test plan
has been emitted (in which case you're supposed to print "1..0 # SKIP".
Therefore, in patch 1 I'm mostly trying to do something sensible,
printing "1..0 # SKIP" is possible or "ok ... # SKIP" if not (which is
no worse than what was doing before). The remaining five patches
show what needs to be done in order to avoid ksft_exit_skip misuse;
while working on them I found other bugs that I've fixed at the same
time; see patch 2 for an example.
In the interest of full disclosure, I won't be able to do more cleanups
of ksft_exit_skip callers. However, I have fixed all those that did
call ksft_set_plan() as those at least try to produce TAP output.
Paolo
Paolo Bonzini (6):
kselftest: fix TAP output for skipped tests
selftests: breakpoints: fix computation of test plan
selftests: breakpoints: do not use ksft_exit_skip after ksft_set_plan
selftests: pidfd: do not use ksft_exit_skip after ksft_set_plan
selftests: sigaltstack: do not use ksft_exit_skip after ksft_set_plan
selftests: sync_test: do not use ksft_exit_skip after ksft_set_plan
.../breakpoints/step_after_suspend_test.c | 53 +++++++++++--------
tools/testing/selftests/kselftest.h | 28 +++++++---
tools/testing/selftests/kselftest/runner.sh | 2 +-
tools/testing/selftests/pidfd/pidfd_test.c | 39 +++++++++++---
tools/testing/selftests/sigaltstack/sas.c | 4 +-
tools/testing/selftests/sync/sync_test.c | 2 +-
6 files changed, 87 insertions(+), 41 deletions(-)
--
2.26.2
Hello!
v5:
- merge ioctl fixes into Sargun's patches directly
- adjust new API to avoid "ufd_required" argument
- drop general clean up patches now present in for-next/seccomp
v4: https://lore.kernel.org/lkml/20200616032524.460144-1-keescook@chromium.org/
This continues the thread-merge between [1] and [2]. tl;dr: add a way for
a seccomp user_notif process manager to inject files into the managed
process in order to handle emulation of various fd-returning syscalls
across security boundaries. Containers folks and Chrome are in need
of the feature, and investigating this solution uncovered (and fixed)
implementation issues with existing file sending routines.
I intend to carry this in the seccomp tree, unless someone has objections.
:) Please review and test!
-Kees
[1] https://lore.kernel.org/lkml/20200603011044.7972-1-sargun@sargun.me/
[2] https://lore.kernel.org/lkml/20200610045214.1175600-1-keescook@chromium.org/
Kees Cook (5):
net/scm: Regularize compat handling of scm_detach_fds()
fs: Move __scm_install_fd() to __fd_install_received()
fs: Add fd_install_received() wrapper for __fd_install_received()
pidfd: Replace open-coded partial fd_install_received()
fs: Expand __fd_install_received() to accept fd
Sargun Dhillon (2):
seccomp: Introduce addfd ioctl to seccomp user notifier
selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
fs/file.c | 63 +++++
include/linux/file.h | 19 ++
include/uapi/linux/seccomp.h | 22 ++
kernel/pid.c | 11 +-
kernel/seccomp.c | 172 ++++++++++++-
net/compat.c | 55 ++---
net/core/scm.c | 50 +---
tools/testing/selftests/seccomp/seccomp_bpf.c | 229 ++++++++++++++++++
8 files changed, 540 insertions(+), 81 deletions(-)
--
2.25.1
Commit 8b59cd81dc5e ("kbuild: ensure full rebuild when the compiler is
updated") introduced a new CONFIG option CONFIG_CC_VERSION_TEXT. On my
system, this is set to "gcc (GCC) 10.1.0" which breaks KUnit config
parsing which did not like the spaces in the string.
Fix this by updating the regex to allow strings containing spaces.
Fixes: 8b59cd81dc5e ("kbuild: ensure full rebuild when the compiler is updated")
Signed-off-by: Rikard Falkeborn <rikard.falkeborn(a)gmail.com>
---
Maybe it would have been sufficient to just use
CONFIG_PATTERN = r'^CONFIG_(\w+)=(.*)$' instead?
tools/testing/kunit/kunit_config.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py
index e75063d603b5..02ffc3a3e5dc 100644
--- a/tools/testing/kunit/kunit_config.py
+++ b/tools/testing/kunit/kunit_config.py
@@ -10,7 +10,7 @@ import collections
import re
CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_(\w+) is not set$'
-CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+)$'
+CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+|".*")$'
KconfigEntryBase = collections.namedtuple('KconfigEntry', ['name', 'value'])
--
2.27.0
The test_vmlinux test uses hrtimer_nanosleep as hook to test tracing
programs. But it seems Clang may have done an aggressive optimization,
causing fentry and kprobe to not hook on this function properly on a
Clang build kernel.
A possible fix is switching to use a more reliable function, e.g. the
ones exported to kernel modules such as hrtimer_range_start_ns. After
we switch to using hrtimer_range_start_ns, the test passes again even
on a clang build kernel.
Tested:
In a clang build kernel, the test fail even when the flags
{fentry, kprobe}_called are set unconditionally in handle__kprobe()
and handle__fentry(), which implies the programs do not hook on
hrtimer_nanosleep() properly. This could be because clang's code
transformation is too aggressive.
test_vmlinux:PASS:skel_open 0 nsec
test_vmlinux:PASS:skel_attach 0 nsec
test_vmlinux:PASS:tp 0 nsec
test_vmlinux:PASS:raw_tp 0 nsec
test_vmlinux:PASS:tp_btf 0 nsec
test_vmlinux:FAIL:kprobe not called
test_vmlinux:FAIL:fentry not called
After we switch to hrtimer_range_start_ns, the test passes.
test_vmlinux:PASS:skel_open 0 nsec
test_vmlinux:PASS:skel_attach 0 nsec
test_vmlinux:PASS:tp 0 nsec
test_vmlinux:PASS:raw_tp 0 nsec
test_vmlinux:PASS:tp_btf 0 nsec
test_vmlinux:PASS:kprobe 0 nsec
test_vmlinux:PASS:fentry 0 nsec
Signed-off-by: Hao Luo <haoluo(a)google.com>
---
tools/testing/selftests/bpf/progs/test_vmlinux.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/test_vmlinux.c b/tools/testing/selftests/bpf/progs/test_vmlinux.c
index 5611b564d3b1..29fa09d6a6c6 100644
--- a/tools/testing/selftests/bpf/progs/test_vmlinux.c
+++ b/tools/testing/selftests/bpf/progs/test_vmlinux.c
@@ -63,20 +63,20 @@ int BPF_PROG(handle__tp_btf, struct pt_regs *regs, long id)
return 0;
}
-SEC("kprobe/hrtimer_nanosleep")
-int BPF_KPROBE(handle__kprobe,
- ktime_t rqtp, enum hrtimer_mode mode, clockid_t clockid)
+SEC("kprobe/hrtimer_start_range_ns")
+int BPF_KPROBE(handle__kprobe, struct hrtimer *timer, ktime_t tim, u64 delta_ns,
+ const enum hrtimer_mode mode)
{
- if (rqtp == MY_TV_NSEC)
+ if (tim == MY_TV_NSEC)
kprobe_called = true;
return 0;
}
-SEC("fentry/hrtimer_nanosleep")
-int BPF_PROG(handle__fentry,
- ktime_t rqtp, enum hrtimer_mode mode, clockid_t clockid)
+SEC("fentry/hrtimer_start_range_ns")
+int BPF_PROG(handle__fentry, struct hrtimer *timer, ktime_t tim, u64 delta_ns,
+ const enum hrtimer_mode mode)
{
- if (rqtp == MY_TV_NSEC)
+ if (tim == MY_TV_NSEC)
fentry_called = true;
return 0;
}
--
2.27.0.212.ge8ba1cc988-goog
The goal for this series is to introduce the hmm_range_fault() output
array flags HMM_PFN_PMD and HMM_PFN_PUD. This allows a device driver to
know that a given 4K PFN is actually mapped by the CPU using either a
PMD sized or PUD sized CPU page table entry and therefore the device
driver can safely map system memory using larger device MMU PTEs.
The series is based on 5.8.0-rc3 and is intended for Jason Gunthorpe's
hmm tree. These were originally part of a larger series:
https://lore.kernel.org/linux-mm/20200619215649.32297-1-rcampbell@nvidia.co…
Changes in v2:
Make the hmm_range_fault() API changes into a separate series and add
two output flags for PMD/PUD instead of a single compund page flag as
suggested by Jason Gunthorpe.
Make the nouveau page table changes a separate patch as suggested by
Ben Skeggs.
Only add support for 2MB nouveau mappings initially since changing the
1:1 CPU/GPU page table size assumptions requires a bigger set of changes.
Rebase to 5.8.0-rc3.
Ralph Campbell (5):
nouveau/hmm: fault one page at a time
mm/hmm: add output flags for PMD/PUD page mapping
nouveau: fix mapping 2MB sysmem pages
nouveau/hmm: support mapping large sysmem pages
hmm: add tests for HMM_PFN_PMD flag
drivers/gpu/drm/nouveau/nouveau_svm.c | 238 ++++++++----------
drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c | 5 +-
.../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c | 82 ++++++
include/linux/hmm.h | 11 +-
lib/test_hmm.c | 4 +
lib/test_hmm_uapi.h | 4 +
mm/hmm.c | 13 +-
tools/testing/selftests/vm/hmm-tests.c | 76 ++++++
8 files changed, 290 insertions(+), 143 deletions(-)
--
2.20.1