Real-time setups try hard to ensure proper isolation between time
critical applications and e.g. network processing performed by the
network stack in softirq and RPS is used to move the softirq
activity away from the isolated core.
If the network configuration is dynamic, with netns and devices
routinely created at run-time, enforcing the correct RPS setting
on each newly created device allowing to transient bad configuration
became complex.
These series try to address the above, introducing a new
sysctl knob: rps_default_mask. The new sysctl entry allows
configuring a systemwide RPS mask, to be enforced since receive
queue creation time without any fourther per device configuration
required.
Additionally, a simple self-test is introduced to check the
rps_default_mask behavior.
v1 -> v2:
- fix sparse warning in patch 2/3
Paolo Abeni (3):
net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper
net/core: introduce default_rps_mask netns attribute
self-tests: introduce self-tests for RPS default mask
Documentation/admin-guide/sysctl/net.rst | 6 ++
include/linux/netdevice.h | 1 +
net/core/net-sysfs.c | 73 +++++++++++--------
net/core/sysctl_net_core.c | 58 +++++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 3 +
.../testing/selftests/net/rps_default_mask.sh | 57 +++++++++++++++
7 files changed, 169 insertions(+), 30 deletions(-)
create mode 100755 tools/testing/selftests/net/rps_default_mask.sh
--
2.26.2
This series provides initial support for the ARMv9 Scalable Matrix
Extension (SME). SME takes the approach used for vectors in SVE and
extends this to provide architectural support for matrix operations. A
more detailed overview can be found in [1].
For the kernel SME can be thought of as a series of features which are
intended to be used together by applications but operate mostly
orthogonally:
- The ZA matrix register.
- Streaming mode, in which ZA can be accessed and a subset of SVE
features are available.
- A second vector length, used for streaming mode SVE and ZA and
controlled using a similar interface to that for SVE.
- TPIDR2, a new userspace controllable system register intended for use
by the C library for storing context related to the ZA ABI.
A substantial part of the series is dedicated to refactoring the
existing SVE support so that we don't need to duplicate code for
handling vector lengths and the SVE registers, this involves creating an
array of vector types and making the users take the vector type as a
parameter. I'm not 100% happy with this but wasn't able to come up with
anything better, duplicating code definitely felt like a bad idea so
this felt like the least bad thing. If this approach makes sense to
people it might make sense to split this off into a separate series
and/or merge it while the rest is pending review to try to make things a
little more digestable, the series is very large so it'd probably make
things easier to digest if some of the preparatory refactoring could be
merged before the rest is ready.
One feature of the architecture of particular note is that switching
to and from streaming mode may change the size of and invalidate the
contents of the SVE registers, and when in streaming mode the FFR is not
accessible. This complicates aspects of the ABI like signal handling
and ptrace.
This initial implementation is mainly intended to get the ABI in place,
there are several areas which will be worked on going forwards - some of
these will be blockers, others could be handled in followup serieses:
- SME is currently not supported for KVM guests, this will be done as a
followup series. A host system can use SME and run KVM guests but
SME is not available in the guests.
- The KVM host support is done in a very simplistic way, were anyone to
attempt to use it in production there would be performance impacts on
hosts with SME support. As part of this we also add enumeration of
fine grained traps.
- There is not currently ptrace or signal support TPIDR2, this will be
done as a followup series.
- No support is currently provided for scheduler control of SME or SME
applications, given the size of the SME register state the context
switch overhead may be noticable so this may be needed especially for
real time applications. Similar concerns already exist for larger
SVE vector lengths but are amplified for SME, particularly as the
vector length increases.
- There has been no work on optimising the performance of anything the
kernel does.
It is not expected that any systems will be encountered that support SME
but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9.
The code attempts to handle any such systems that are encountered but
this hasn't been tested extensively.
v14:
- Rebase onto v5.18-rc3.
v13:
- Preserve ZA in both parent and child on clone() and add a test case
for this.
- Fix EFI integration for FA64.
- Minor tweaks to the ABI document following Catlain's review.
- Add and make use of thread_get_cur_vl() helper.
- Fix some issues with SVE/FPSIMD register type moves in streaming SVE
ptrace.
- Typo fixes.
- Roll in separately posted series extending ptrace coverage in
kselftest for better integrated testing of the series.
v12:
- Fix some typos in the ABI document.
- Print a message when we skip a vector length in the signal tests.
- Add note of earliest toolchain versions with SME to manual encodings
for future reference now that's landed.
- Drop reference to PCS in sme.rst, it's not referenced and one of the
links was broken.
- Encode smstop and smstart as sysregs in the kernel.
- Don't redundantly flush the SVE register state when loading FPSIMD
state with SME enabled for the task, the architecture will do this
for us.
- Introduce and use task_get_cur_vl() to get the vector length for the
currently active SVE registers.
- Fix support for !FA64 mode in signal and syscall tests.
- Simplify instruction sequence for ssve_regs signal test.
- Actually include the ZA signal test in the patch set.
v11:
- Rebase onto v5.17-rc3.
- Provide a sme-inst.h to collect manual encodings in kselftest.
v10:
- Actually do the rebase of fixups from the previous version into
relevant patches.
v9:
- Remove defensive programming around IS_ENABLED() and FGT in KVM code.
- Fix naming of TPIDR2 FGT register bit.
- Add patches making handling of floating point register bits more
consistent (also sent as separate series).
- Drop now unused enumeration of fine grained traps.
v8:
- Rebase onto v5.17-rc1.
- Support interoperation with KVM, SME is disabled for KVM guests with
minimal handling for cleaning up SME state when entering and leaving
the guest.
- Document and implement that signal handlers are invoked with ZA and
streaming mode disabled.
- Use the RDSVL instruction introduced in EAC2 of the architecture to
obtain the streaming mode vector length during enumeration, ZA state
loading/saving and in test programs.
- Store a pointer to SVCR in fpsimd_last_state and use it in fpsimd_save()
for interoperation with KVM.
- Add a test case sme_trap_no_sm checking that we generate a SIGILL
when using an instruction that requires streaming mode without
enabling it.
- Add basic ZA context form validation to testcases helper library.
- Move signal tests over to validating streaming VL from ZA information.
- Pulled in patch removing ARRAY_SIZE() so that kselftest builds
cleanly and to avoid trivial conflicts.
v7:
- Rebase onto v5.16-rc3.
- Reduce indentation when supporting custom triggers for signal tests
as suggested by Catalin.
- Change to specifying a width for all CPU features rather than adding
single bit specific infrastructure.
- Don't require zeroing of non-shared SVE state during syscalls.
v6:
- Rebase onto v5.16-rc1.
- Return to disabling TIF_SVE on kernel entry even if we have SME
state, this avoids the need for KVM to handle the case where TIF_SVE
is set on guest entry.
- Add syscall-abi.h to SME updates to syscall-abi, mistakenly omitted
from commit.
v5:
- Rebase onto currently merged SVE and kselftest patches.
- Add support for the FA64 option, introduced in the recently published
EAC1 update to the specification.
- Pull in test program for the syscall ABI previously sent separately
with some revisions and add coverage for the SME ABI.
- Fix checking for options with 1 bit fields in ID_AA64SMFR0_EL1.
- Minor fixes and clarifications to the ABI documentation.
v4:
- Rebase onto merged patches.
- Remove an uneeded NULL check in vec_proc_do_default_vl().
- Include patch to factor out utility routines in kselftests written in
assembler.
- Specify -ffreestanding when building TPIDR2 test.
v3:
- Skip FFR rather than predicate registers in sve_flush_live().
- Don't assume a bool is all zeros in sve_flush_live() as per AAPCS.
- Don't redundantly specify a zero index when clearing FFR.
v2:
- Fix several issues with !SME and !SVE configurations.
- Preserve TPIDR2 when creating a new thread/process unless
CLONE_SETTLS is set.
- Report traps due to using features in an invalid mode as SIGILL.
- Spell out streaming mode behaviour in SVE ABI documentation more
directly.
- Document TPIDR2 in the ABI document.
- Use SMSTART and SMSTOP rather than read/modify/write sequences.
- Rework logic for exiting streaming mode on syscall.
- Don't needlessly initialise SVCR on access trap.
- Always restore SME VL for userspace if SME traps are disabled.
- Only yield to encourage preemption every 128 iterations in za-test,
otherwise do a getpid(), and validate SVCR after syscall.
- Leave streaming mode disabled except when reading the vector length
in za-test, and disable ZA after detecting a mismatch.
- Add SME support to vlset.
- Clarifications and typo fixes in comments.
- Move sme_alloc() forward declaration back a patch.
[1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-…
Mark Brown (39):
kselftest/arm64: Fix comment for ptrace_sve_get_fpsimd_data()
kselftest/arm64: Remove assumption that tasks start FPSIMD only
kselftest/arm64: Validate setting via FPSIMD and read via SVE regsets
arm64/sme: Provide ABI documentation for SME
arm64/sme: System register and exception syndrome definitions
arm64/sme: Manually encode SME instructions
arm64/sme: Early CPU setup for SME
arm64/sme: Basic enumeration support
arm64/sme: Identify supported SME vector lengths at boot
arm64/sme: Implement sysctl to set the default vector length
arm64/sme: Implement vector length configuration prctl()s
arm64/sme: Implement support for TPIDR2
arm64/sme: Implement SVCR context switching
arm64/sme: Implement streaming SVE context switching
arm64/sme: Implement ZA context switching
arm64/sme: Implement traps and syscall handling for SME
arm64/sme: Disable ZA and streaming mode when handling signals
arm64/sme: Implement streaming SVE signal handling
arm64/sme: Implement ZA signal handling
arm64/sme: Implement ptrace support for streaming mode SVE registers
arm64/sme: Add ptrace support for ZA
arm64/sme: Disable streaming mode and ZA when flushing CPU state
arm64/sme: Save and restore streaming mode over EFI runtime calls
KVM: arm64: Hide SME system registers from guests
KVM: arm64: Trap SME usage in guest
KVM: arm64: Handle SME host state when running guests
arm64/sme: Provide Kconfig for SME
kselftest/arm64: Add manual encodings for SME instructions
kselftest/arm64: sme: Add SME support to vlset
kselftest/arm64: Add tests for TPIDR2
kselftest/arm64: Extend vector configuration API tests to cover SME
kselftest/arm64: sme: Provide streaming mode SVE stress test
kselftest/arm64: signal: Handle ZA signal context in core code
kselftest/arm64: Add stress test for SME ZA context switching
kselftest/arm64: signal: Add SME signal handling tests
kselftest/arm64: Add streaming SVE to SVE ptrace tests
kselftest/arm64: Add coverage for the ZA ptrace interface
kselftest/arm64: Add SME support to syscall ABI test
selftests/arm64: Add a testcase for handling of ZA on clone()
Documentation/arm64/elf_hwcaps.rst | 33 +
Documentation/arm64/index.rst | 1 +
Documentation/arm64/sme.rst | 428 +++++++++++++
Documentation/arm64/sve.rst | 70 ++-
arch/arm64/Kconfig | 11 +
arch/arm64/include/asm/cpu.h | 4 +
arch/arm64/include/asm/cpufeature.h | 24 +
arch/arm64/include/asm/el2_setup.h | 64 +-
arch/arm64/include/asm/esr.h | 13 +-
arch/arm64/include/asm/exception.h | 1 +
arch/arm64/include/asm/fpsimd.h | 123 +++-
arch/arm64/include/asm/fpsimdmacros.h | 87 +++
arch/arm64/include/asm/hwcap.h | 8 +
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/processor.h | 26 +-
arch/arm64/include/asm/sysreg.h | 67 ++
arch/arm64/include/asm/thread_info.h | 2 +
arch/arm64/include/uapi/asm/hwcap.h | 8 +
arch/arm64/include/uapi/asm/ptrace.h | 69 ++-
arch/arm64/include/uapi/asm/sigcontext.h | 55 +-
arch/arm64/kernel/cpufeature.c | 106 ++++
arch/arm64/kernel/cpuinfo.c | 13 +
arch/arm64/kernel/entry-common.c | 11 +
arch/arm64/kernel/entry-fpsimd.S | 36 ++
arch/arm64/kernel/fpsimd.c | 585 ++++++++++++++++--
arch/arm64/kernel/process.c | 44 +-
arch/arm64/kernel/ptrace.c | 358 +++++++++--
arch/arm64/kernel/signal.c | 188 +++++-
arch/arm64/kernel/syscall.c | 29 +-
arch/arm64/kernel/traps.c | 1 +
arch/arm64/kvm/fpsimd.c | 43 +-
arch/arm64/kvm/hyp/nvhe/switch.c | 30 +
arch/arm64/kvm/hyp/vhe/switch.c | 11 +-
arch/arm64/kvm/sys_regs.c | 9 +-
arch/arm64/tools/cpucaps | 2 +
include/uapi/linux/elf.h | 2 +
include/uapi/linux/prctl.h | 9 +
kernel/sys.c | 12 +
tools/testing/selftests/arm64/abi/.gitignore | 1 +
tools/testing/selftests/arm64/abi/Makefile | 9 +-
.../selftests/arm64/abi/syscall-abi-asm.S | 79 ++-
.../testing/selftests/arm64/abi/syscall-abi.c | 204 +++++-
.../testing/selftests/arm64/abi/syscall-abi.h | 15 +
tools/testing/selftests/arm64/abi/tpidr2.c | 298 +++++++++
tools/testing/selftests/arm64/fp/.gitignore | 5 +
tools/testing/selftests/arm64/fp/Makefile | 19 +-
tools/testing/selftests/arm64/fp/rdvl-sme.c | 14 +
tools/testing/selftests/arm64/fp/rdvl.S | 10 +
tools/testing/selftests/arm64/fp/rdvl.h | 1 +
tools/testing/selftests/arm64/fp/sme-inst.h | 51 ++
tools/testing/selftests/arm64/fp/ssve-stress | 59 ++
tools/testing/selftests/arm64/fp/sve-ptrace.c | 175 +++++-
tools/testing/selftests/arm64/fp/sve-test.S | 20 +
tools/testing/selftests/arm64/fp/vec-syscfg.c | 10 +
tools/testing/selftests/arm64/fp/vlset.c | 10 +-
.../testing/selftests/arm64/fp/za-fork-asm.S | 61 ++
tools/testing/selftests/arm64/fp/za-fork.c | 156 +++++
tools/testing/selftests/arm64/fp/za-ptrace.c | 356 +++++++++++
tools/testing/selftests/arm64/fp/za-stress | 59 ++
tools/testing/selftests/arm64/fp/za-test.S | 388 ++++++++++++
.../testing/selftests/arm64/signal/.gitignore | 3 +
.../selftests/arm64/signal/test_signals.h | 4 +
.../arm64/signal/test_signals_utils.c | 6 +
.../testcases/fake_sigreturn_sme_change_vl.c | 92 +++
.../arm64/signal/testcases/sme_trap_no_sm.c | 38 ++
.../signal/testcases/sme_trap_non_streaming.c | 45 ++
.../arm64/signal/testcases/sme_trap_za.c | 36 ++
.../selftests/arm64/signal/testcases/sme_vl.c | 68 ++
.../arm64/signal/testcases/ssve_regs.c | 135 ++++
.../arm64/signal/testcases/testcases.c | 36 ++
.../arm64/signal/testcases/testcases.h | 3 +-
.../arm64/signal/testcases/za_regs.c | 128 ++++
73 files changed, 4991 insertions(+), 191 deletions(-)
create mode 100644 Documentation/arm64/sme.rst
create mode 100644 tools/testing/selftests/arm64/abi/syscall-abi.h
create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c
create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c
create mode 100644 tools/testing/selftests/arm64/fp/sme-inst.h
create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress
create mode 100644 tools/testing/selftests/arm64/fp/za-fork-asm.S
create mode 100644 tools/testing/selftests/arm64/fp/za-fork.c
create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c
create mode 100644 tools/testing/selftests/arm64/fp/za-stress
create mode 100644 tools/testing/selftests/arm64/fp/za-test.S
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_no_sm.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_non_streaming.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_za.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/za_regs.c
base-commit: b2d229d4ddb17db541098b83524d901257e93845
--
2.30.2
Hi there,
this is the v2 of the patchset. The v1 can be found at [1]. There is only one
change in patch 1, which changed the target directory to build the test modules.
All other changes happen in patch 2.
Thanks for reviewing!
Changes from v1:
# test_modules/Makefile
* Build the test modules targeting /lib/modules, instead of ksrc when building
from the kernel source.
# test_modules/test_klp_syscall.c
* Added a parameter array to receive the pids that should transition to the
new system call. (suggedted by Joe)
* Create a new sysfs file /sys/kernel/test_klp_syscall/npids to show how many
pids from the argument need to transition to the new state. (suggested by
Joe)
* Fix the PPC32 support by adding the syscall wrapper for archs that select it
by default, without erroring out. PPC does not set SYSCALL_WRAPPER, so
having it set in v1 was a mistake. (suggested by Joe)
* The aarch64 syscall prefix was added too, since the livepatch support will come soon.
# test_binaries/test_klp-call_getpid.c
* Change %d/%u in printf (suggested byu Joe)
* Change run -> stop variable name, and inverted the assignments (suggested by
* Joe).
# File test-syscall.sh
* Fixed test-syscall.sh to call test_klp-call-getpid in test_binaries dir
* Load test_klp_syscall passed the pids of the test_klp-call_getpid instances.
Check the sysfs file from test_klp_syscall module to check that all pids
transitioned correctly. (suggested by Joe)
* Simplified the loop that calls test_klp-call_getpid. (suggested by Joe)
* Removed the "success" comment from the script, as it's implicit that it
succeed. Otherwise load_lp would error out. (suggested by Joe)
* Changed the commit message of patch 2 to further detail what means "tricky"
when livepatching syscalls. (suggested by Joe)
[1]: 20220603143242.870-1-mpdesouza(a)suse.com
Marcos Paulo de Souza (2):
livepatch: Move tests from lib/livepatch to selftests/livepatch
selftests: livepatch: Test livepatching a heavily called syscall
arch/s390/configs/debug_defconfig | 1 -
arch/s390/configs/defconfig | 1 -
lib/Kconfig.debug | 22 ---
lib/Makefile | 2 -
lib/livepatch/Makefile | 14 --
tools/testing/selftests/livepatch/Makefile | 35 +++-
tools/testing/selftests/livepatch/README | 5 +-
tools/testing/selftests/livepatch/config | 1 -
.../testing/selftests/livepatch/functions.sh | 34 ++--
.../selftests/livepatch/test-callbacks.sh | 50 +++---
.../selftests/livepatch/test-ftrace.sh | 6 +-
.../selftests/livepatch/test-livepatch.sh | 10 +-
.../selftests/livepatch/test-shadow-vars.sh | 2 +-
.../testing/selftests/livepatch/test-state.sh | 18 +--
.../selftests/livepatch/test-syscall.sh | 52 ++++++
.../test_binaries/test_klp-call_getpid.c | 48 ++++++
.../selftests/livepatch/test_modules/Makefile | 20 +++
.../test_modules}/test_klp_atomic_replace.c | 0
.../test_modules}/test_klp_callbacks_busy.c | 0
.../test_modules}/test_klp_callbacks_demo.c | 0
.../test_modules}/test_klp_callbacks_demo2.c | 0
.../test_modules}/test_klp_callbacks_mod.c | 0
.../test_modules}/test_klp_livepatch.c | 0
.../test_modules}/test_klp_shadow_vars.c | 0
.../livepatch/test_modules}/test_klp_state.c | 0
.../livepatch/test_modules}/test_klp_state2.c | 0
.../livepatch/test_modules}/test_klp_state3.c | 0
.../livepatch/test_modules/test_klp_syscall.c | 150 ++++++++++++++++++
28 files changed, 360 insertions(+), 111 deletions(-)
delete mode 100644 lib/livepatch/Makefile
create mode 100755 tools/testing/selftests/livepatch/test-syscall.sh
create mode 100644 tools/testing/selftests/livepatch/test_binaries/test_klp-call_getpid.c
create mode 100644 tools/testing/selftests/livepatch/test_modules/Makefile
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_atomic_replace.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_busy.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_demo.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_demo2.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_callbacks_mod.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_livepatch.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_shadow_vars.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_state.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_state2.c (100%)
rename {lib/livepatch => tools/testing/selftests/livepatch/test_modules}/test_klp_state3.c (100%)
create mode 100644 tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
--
2.35.3
Show for each node if every memory descriptor in that node has the
EFI_MEMORY_CPU_CRYPTO attribute.
fwupd project plans to use it as part of a check to see if the users
have properly configured memory hardware encryption
capabilities. fwupd's people have seen cases where it seems like there
is memory encryption because all the hardware is capable of doing it,
but on a closer look there is not, either because of system firmware
or because some component requires updating to enable the feature.
The MKTME/TME spec says that it will only encrypt those memory regions
which are flagged with the EFI_MEMORY_CPU_CRYPTO attribute.
If all nodes are capable of encryption and if the system have tme/sme
on we can pretty confidently say that the device is actively
encrypting all its memory.
It's planned to make this check part of an specification that can be
passed to people purchasing hardware
These checks will run at every boot. The specification is called Host
Security ID: https://fwupd.github.io/libfwupdplugin/hsi.html.
We choosed to do it a per-node basis because although an ABI that
shows that the whole system memory is capable of encryption would be
useful for the fwupd use case, doing it in a per-node basis would make
the path easier to give the capability to the user to target
allocations from applications to NUMA nodes which have encryption
capabilities in the future.
Changes since v8:
Add unit tests to e820_range_* functions
Changes since v7:
Less kerneldocs
Less verbosity in the e820 code
Changes since v6:
Fixes in __e820__handle_range_update
Const correctness in e820.c
Correct alignment in memblock.h
Rework memblock_overlaps_region
Changes since v5:
Refactor e820__range_{update, remove, set_crypto_capable} in order to
avoid code duplication.
Warn the user when a node has both encryptable and non-encryptable
regions.
Check that e820_table has enough size to store both current e820_table
and EFI memmap.
Changes since v4:
Add enum to represent the cryptographic capabilities in e820:
e820_crypto_capabilities.
Revert __e820__range_update, only adding the new argument for
__e820__range_add about crypto capabilities.
Add a function __e820__range_update_crypto similar to
__e820__range_update but to only update this new field.
Changes since v3:
Update date in Doc/ABI file.
More information about the fwupd usecase and the rationale behind
doing it in a per-NUMA-node.
Changes since v2:
e820__range_mark_crypto -> e820__range_mark_crypto_capable.
In e820__range_remove: Create a region with crypto capabilities
instead of creating one without it and then mark it.
Changes since v1:
Modify __e820__range_update to update the crypto capabilities of a
range; now this function will change the crypto capability of a range
if it's called with the same old_type and new_type. Rework
efi_mark_e820_regions_as_crypto_capable based on this.
Update do_add_efi_memmap to mark the regions as it creates them.
Change the type of crypto_capable in e820_entry from bool to u8.
Fix e820__update_table changes.
Remove memblock_add_crypto_capable. Now you have to add the region and
mark it then.
Better place for crypto_capable in pglist_data.
Martin Fernandez (9):
mm/memblock: Tag memblocks with crypto capabilities
mm/mmzone: Tag pg_data_t with crypto capabilities
x86/e820: Add infrastructure to refactor e820__range_{update,remove}
x86/e820: Refactor __e820__range_update
x86/e820: Refactor e820__range_remove
x86/e820: Tag e820_entry with crypto capabilities
x86/e820: Add unit tests for e820_range_* functions
x86/efi: Mark e820_entries as crypto capable from EFI memmap
drivers/node: Show in sysfs node's crypto capabilities
Documentation/ABI/testing/sysfs-devices-node | 10 +
arch/x86/Kconfig.debug | 10 +
arch/x86/include/asm/e820/api.h | 1 +
arch/x86/include/asm/e820/types.h | 12 +-
arch/x86/kernel/e820.c | 393 ++++++++++++++-----
arch/x86/kernel/e820_test.c | 249 ++++++++++++
arch/x86/platform/efi/efi.c | 37 ++
drivers/base/node.c | 10 +
include/linux/memblock.h | 5 +
include/linux/mmzone.h | 3 +
mm/memblock.c | 62 +++
mm/page_alloc.c | 1 +
12 files changed, 695 insertions(+), 98 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-devices-node
create mode 100644 arch/x86/kernel/e820_test.c
--
2.30.2
Hi,
here comes the v7 of the HID-BPF series.
Again, for a full explanation of HID-BPF, please refer to the last patch
in this series (24/24).
This version sees some minor improvements compared to v6, only
focusing on the reviews I got.
I also wanted to mention that I have started working on the userspace
side to "see" how the BPF programs would look when automatically loaded.
See the udev-hid-bpf project[0] for the code.
The idea is to define the HID-BPF userspace programs so we can reuse
the same semantic in the kernel.
I am quite happy with the results: this looks pretty similar to a kernel
module in term of design. The .bpf.c file is a standalone compilation
unit, and instead of having a table of ids, the filename is used (based
on the modalias). This allows to have a "probe" like function that we
can run to decide if the program needs to be attached or not.
All in all, the end result is that we can write the bpf program, compile
it locally, and send the result to the user. The user needs to drop the
.bpf.o in a local folder, and udev-hid-bpf will pick it up the next time
the device is plugged in. No other operations is required.
Next step will be to drop the same source file in the kernel source
tree, and have some magic to automatically load the compiled program
when the device is loaded.
Cheers,
Benjamin
[0] https://gitlab.freedesktop.org/bentiss/udev-hid-bpf (warning: probably
not the best rust code ever)
Benjamin Tissoires (24):
selftests/bpf: fix config for CLS_BPF
bpf/verifier: allow kfunc to read user provided context
bpf/verifier: do not clear meta in check_mem_size
selftests/bpf: add test for accessing ctx from syscall program type
bpf/verifier: allow kfunc to return an allocated mem
selftests/bpf: Add tests for kfunc returning a memory pointer
bpf: prepare for more bpf syscall to be used from kernel and user
space.
libbpf: add map_get_fd_by_id and map_delete_elem in light skeleton
HID: core: store the unique system identifier in hid_device
HID: export hid_report_type to uapi
HID: convert defines of HID class requests into a proper enum
HID: Kconfig: split HID support and hid-core compilation
HID: initial BPF implementation
selftests/bpf: add tests for the HID-bpf initial implementation
HID: bpf: allocate data memory for device_event BPF programs
selftests/bpf/hid: add test to change the report size
HID: bpf: introduce hid_hw_request()
selftests/bpf: add tests for bpf_hid_hw_request
HID: bpf: allow to change the report descriptor
selftests/bpf: add report descriptor fixup tests
selftests/bpf: Add a test for BPF_F_INSERT_HEAD
samples/bpf: add new hid_mouse example
HID: bpf: add Surface Dial example
Documentation: add HID-BPF docs
Documentation/hid/hid-bpf.rst | 512 +++++++++
Documentation/hid/index.rst | 1 +
drivers/Makefile | 2 +-
drivers/hid/Kconfig | 20 +-
drivers/hid/Makefile | 2 +
drivers/hid/bpf/Kconfig | 18 +
drivers/hid/bpf/Makefile | 11 +
drivers/hid/bpf/entrypoints/Makefile | 93 ++
drivers/hid/bpf/entrypoints/README | 4 +
drivers/hid/bpf/entrypoints/entrypoints.bpf.c | 66 ++
.../hid/bpf/entrypoints/entrypoints.lskel.h | 682 ++++++++++++
drivers/hid/bpf/hid_bpf_dispatch.c | 553 ++++++++++
drivers/hid/bpf/hid_bpf_dispatch.h | 28 +
drivers/hid/bpf/hid_bpf_jmp_table.c | 577 ++++++++++
drivers/hid/hid-core.c | 49 +-
include/linux/bpf.h | 9 +-
include/linux/btf.h | 10 +
include/linux/hid.h | 38 +-
include/linux/hid_bpf.h | 148 +++
include/uapi/linux/hid.h | 26 +-
include/uapi/linux/hid_bpf.h | 25 +
kernel/bpf/btf.c | 91 +-
kernel/bpf/syscall.c | 10 +-
kernel/bpf/verifier.c | 65 +-
net/bpf/test_run.c | 23 +
samples/bpf/.gitignore | 2 +
samples/bpf/Makefile | 27 +
samples/bpf/hid_mouse.bpf.c | 134 +++
samples/bpf/hid_mouse.c | 147 +++
samples/bpf/hid_surface_dial.bpf.c | 161 +++
samples/bpf/hid_surface_dial.c | 212 ++++
tools/include/uapi/linux/hid.h | 62 ++
tools/include/uapi/linux/hid_bpf.h | 25 +
tools/lib/bpf/skel_internal.h | 23 +
tools/testing/selftests/bpf/Makefile | 5 +-
tools/testing/selftests/bpf/config | 5 +-
tools/testing/selftests/bpf/prog_tests/hid.c | 990 ++++++++++++++++++
.../selftests/bpf/prog_tests/kfunc_call.c | 76 ++
tools/testing/selftests/bpf/progs/hid.c | 206 ++++
.../selftests/bpf/progs/kfunc_call_test.c | 125 +++
40 files changed, 5184 insertions(+), 79 deletions(-)
create mode 100644 Documentation/hid/hid-bpf.rst
create mode 100644 drivers/hid/bpf/Kconfig
create mode 100644 drivers/hid/bpf/Makefile
create mode 100644 drivers/hid/bpf/entrypoints/Makefile
create mode 100644 drivers/hid/bpf/entrypoints/README
create mode 100644 drivers/hid/bpf/entrypoints/entrypoints.bpf.c
create mode 100644 drivers/hid/bpf/entrypoints/entrypoints.lskel.h
create mode 100644 drivers/hid/bpf/hid_bpf_dispatch.c
create mode 100644 drivers/hid/bpf/hid_bpf_dispatch.h
create mode 100644 drivers/hid/bpf/hid_bpf_jmp_table.c
create mode 100644 include/linux/hid_bpf.h
create mode 100644 include/uapi/linux/hid_bpf.h
create mode 100644 samples/bpf/hid_mouse.bpf.c
create mode 100644 samples/bpf/hid_mouse.c
create mode 100644 samples/bpf/hid_surface_dial.bpf.c
create mode 100644 samples/bpf/hid_surface_dial.c
create mode 100644 tools/include/uapi/linux/hid.h
create mode 100644 tools/include/uapi/linux/hid_bpf.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/hid.c
create mode 100644 tools/testing/selftests/bpf/progs/hid.c
--
2.36.1
The primary reason to invoke the oom reaper from the exit_mmap path used
to be a prevention of an excessive oom killing if the oom victim exit
races with the oom reaper (see [1] for more details). The invocation has
moved around since then because of the interaction with the munlock
logic but the underlying reason has remained the same (see [2]).
Munlock code is no longer a problem since [3] and there shouldn't be
any blocking operation before the memory is unmapped by exit_mmap so
the oom reaper invocation can be dropped. The unmapping part can be done
with the non-exclusive mmap_sem and the exclusive one is only required
when page tables are freed.
Remove the oom_reaper from exit_mmap which will make the code easier to
read. This is really unlikely to make any observable difference although
some microbenchmarks could benefit from one less branch that needs to be
evaluated even though it almost never is true.
[1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
---
Notes:
- Rebased over git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
mm-unstable branch per Andrew's request but applies cleany to Linus' ToT
- Conflicts with maple-tree patchset. Resolving these was discussed in
https://lore.kernel.org/all/20220519223438.qx35hbpfnnfnpouw@revolver/
include/linux/oom.h | 2 --
mm/mmap.c | 31 ++++++++++++-------------------
mm/oom_kill.c | 2 +-
3 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 02d1e7bbd8cd..6cdde62b078b 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -106,8 +106,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm)
return 0;
}
-bool __oom_reap_task_mm(struct mm_struct *mm);
-
long oom_badness(struct task_struct *p,
unsigned long totalpages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2b9305ed0dda..b7918e6bb0db 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3110,30 +3110,13 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Manually reap the mm to free as much memory as possible.
- * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
- * this mm from further consideration. Taking mm->mmap_lock for
- * write after setting MMF_OOM_SKIP will guarantee that the oom
- * reaper will not run on this mm again after mmap_lock is
- * dropped.
- *
- * Nothing can be holding mm->mmap_lock here and the above call
- * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
- * __oom_reap_task_mm() will not block.
- */
- (void)__oom_reap_task_mm(mm);
- set_bit(MMF_OOM_SKIP, &mm->flags);
- }
-
- mmap_write_lock(mm);
+ mmap_read_lock(mm);
arch_exit_mmap(mm);
vma = mm->mmap;
if (!vma) {
/* Can happen if dup_mmap() received an OOM */
- mmap_write_unlock(mm);
+ mmap_read_unlock(mm);
return;
}
@@ -3143,6 +3126,16 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
+ mmap_read_unlock(mm);
+
+ /*
+ * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
+ * because the memory has been already freed. Do not bother checking
+ * mm_is_oom_victim because setting a bit unconditionally is cheaper.
+ */
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+
+ mmap_write_lock(mm);
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8a70bca67c94..98dca2b42357 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -538,7 +538,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait);
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-bool __oom_reap_task_mm(struct mm_struct *mm)
+static bool __oom_reap_task_mm(struct mm_struct *mm)
{
struct vm_area_struct *vma;
bool ret = true;
--
2.36.1.255.ge46751e96f-goog
In the "mode_filter_without_nnp" test in seccomp_bpf, there is currently
a TODO which asks to check the capability CAP_SYS_ADMIN instead of euid.
This patch adds support to check if the calling process has the flag
CAP_SYS_ADMIN, and also if this flag has CAP_EFFECTIVE set.
Signed-off-by: Gautam Menghani <gautammenghani201(a)gmail.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 136df5b76319..16b0edc520ef 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -392,6 +392,8 @@ TEST(mode_filter_without_nnp)
.filter = filter,
};
long ret;
+ cap_t cap = cap_get_proc();
+ cap_flag_value_t is_cap_sys_admin = 0;
ret = prctl(PR_GET_NO_NEW_PRIVS, 0, NULL, 0, 0);
ASSERT_LE(0, ret) {
@@ -400,8 +402,8 @@ TEST(mode_filter_without_nnp)
errno = 0;
ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog, 0, 0);
/* Succeeds with CAP_SYS_ADMIN, fails without */
- /* TODO(wad) check caps not euid */
- if (geteuid()) {
+ cap_get_flag(cap, CAP_SYS_ADMIN, CAP_EFFECTIVE, &is_cap_sys_admin);
+ if (!is_cap_sys_admin) {
EXPECT_EQ(-1, ret);
EXPECT_EQ(EACCES, errno);
} else {
--
2.34.1
This patch set extends the locked port feature for devices
that are behind a locked port, but do not have the ability to
authorize themselves as a supplicant using IEEE 802.1X.
Such devices can be printers, meters or anything related to
fixed installations. Instead of 802.1X authorization, devices
can get access based on their MAC addresses being whitelisted.
For an authorization daemon to detect that a device is trying
to get access through a locked port, the bridge will add the
MAC address of the device to the FDB with a locked flag to it.
Thus the authorization daemon can catch the FDB add event and
check if the MAC address is in the whitelist and if so replace
the FDB entry without the locked flag enabled, and thus open
the port for the device.
This feature is known as MAC-Auth or MAC Authentication Bypass
(MAB) in Cisco terminology, where the full MAB concept involves
additional Cisco infrastructure for authorization. There is no
real authentication process, as the MAC address of the device
is the only input the authorization daemon, in the general
case, has to base the decision if to unlock the port or not.
With this patch set, an implementation of the offloaded case is
supplied for the mv88e6xxx driver. When a packet ingresses on
a locked port, an ATU miss violation event will occur. When
handling such ATU miss violation interrupts, the MAC address of
the device is added to the FDB with a zero destination port
vector (DPV) and the MAC address is communicated through the
switchdev layer to the bridge, so that a FDB entry with the
locked flag enabled can be added.
Log:
v3: Added timers and lists in the driver (mv88e6xxx)
to keep track of and remove locked entries.
v4: Leave out enforcing a limit to the number of
locked entries in the bridge.
Removed the timers in the driver and use the
worker only. Add locked FDB flag to all drivers
using port_fdb_add() from the dsa api and let
all drivers ignore entries with this flag set.
Change how to get the ageing timeout of locked
entries. See global1_atu.c and switchdev.c.
Use struct mv88e6xxx_port for locked entries
variables instead of struct dsa_port.
Hans Schultz (6):
net: bridge: add locked entry fdb flag to extend locked port feature
net: switchdev: add support for offloading of fdb locked flag
drivers: net: dsa: add locked fdb entry flag to drivers
net: dsa: mv88e6xxx: allow reading FID when handling ATU violations
net: dsa: mv88e6xxx: mac-auth/MAB implementation
selftests: forwarding: add test of MAC-Auth Bypass to locked port
tests
drivers/net/dsa/b53/b53_common.c | 5 +
drivers/net/dsa/b53/b53_priv.h | 1 +
drivers/net/dsa/hirschmann/hellcreek.c | 5 +
drivers/net/dsa/lan9303-core.c | 5 +
drivers/net/dsa/lantiq_gswip.c | 5 +
drivers/net/dsa/microchip/ksz9477.c | 5 +
drivers/net/dsa/mt7530.c | 5 +
drivers/net/dsa/mv88e6xxx/Makefile | 1 +
drivers/net/dsa/mv88e6xxx/chip.c | 54 +++-
drivers/net/dsa/mv88e6xxx/chip.h | 15 +
drivers/net/dsa/mv88e6xxx/global1.h | 1 +
drivers/net/dsa/mv88e6xxx/global1_atu.c | 32 +-
drivers/net/dsa/mv88e6xxx/port.c | 30 +-
drivers/net/dsa/mv88e6xxx/port.h | 2 +
drivers/net/dsa/mv88e6xxx/switchdev.c | 280 ++++++++++++++++++
drivers/net/dsa/mv88e6xxx/switchdev.h | 37 +++
drivers/net/dsa/ocelot/felix.c | 5 +
drivers/net/dsa/qca8k.c | 5 +
drivers/net/dsa/sja1105/sja1105_main.c | 5 +
include/net/dsa.h | 7 +
include/net/switchdev.h | 1 +
include/uapi/linux/neighbour.h | 1 +
net/bridge/br.c | 3 +-
net/bridge/br_fdb.c | 19 +-
net/bridge/br_input.c | 10 +-
net/bridge/br_private.h | 5 +-
net/bridge/br_switchdev.c | 1 +
net/dsa/dsa_priv.h | 4 +-
net/dsa/port.c | 7 +-
net/dsa/slave.c | 4 +-
net/dsa/switch.c | 10 +-
.../net/forwarding/bridge_locked_port.sh | 30 +-
32 files changed, 566 insertions(+), 34 deletions(-)
create mode 100644 drivers/net/dsa/mv88e6xxx/switchdev.c
create mode 100644 drivers/net/dsa/mv88e6xxx/switchdev.h
--
2.30.2