Real-time setups try hard to ensure proper isolation between time
critical applications and e.g. network processing performed by the
network stack in softirq and RPS is used to move the softirq
activity away from the isolated core.
If the network configuration is dynamic, with netns and devices
routinely created at run-time, enforcing the correct RPS setting
on each newly created device allowing to transient bad configuration
became complex.
These series try to address the above, introducing a new
sysctl knob: rps_default_mask. The new sysctl entry allows
configuring a systemwide RPS mask, to be enforced since receive
queue creation time without any fourther per device configuration
required.
Additionally, a simple self-test is introduced to check the
rps_default_mask behavior.
v1 -> v2:
- fix sparse warning in patch 2/3
Paolo Abeni (3):
net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper
net/core: introduce default_rps_mask netns attribute
self-tests: introduce self-tests for RPS default mask
Documentation/admin-guide/sysctl/net.rst | 6 ++
include/linux/netdevice.h | 1 +
net/core/net-sysfs.c | 73 +++++++++++--------
net/core/sysctl_net_core.c | 58 +++++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 3 +
.../testing/selftests/net/rps_default_mask.sh | 57 +++++++++++++++
7 files changed, 169 insertions(+), 30 deletions(-)
create mode 100755 tools/testing/selftests/net/rps_default_mask.sh
--
2.26.2
From: Vincent Cheng <vincent.cheng.xh(a)renesas.com>
This series adds adjust phase to the PTP Hardware Clock device interface.
Some PTP hardware clocks have a write phase mode that has
a built-in hardware filtering capability. The write phase mode
utilizes a phase offset control word instead of a frequency offset
control word. Add adjust phase function to take advantage of this
capability.
Changes since v1:
- As suggested by Richard Cochran:
1. ops->adjphase is new so need to check for non-null function pointer.
2. Kernel coding style uses lower_case_underscores.
3. Use existing PTP clock API for delayed worker.
Vincent Cheng (3):
ptp: Add adjphase function to support phase offset control.
ptp: Add adjust_phase to ptp_clock_caps capability.
ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
drivers/ptp/ptp_chardev.c | 1 +
drivers/ptp/ptp_clock.c | 3 ++
drivers/ptp/ptp_clockmatrix.c | 92 +++++++++++++++++++++++++++++++++++
drivers/ptp/ptp_clockmatrix.h | 8 ++-
include/linux/ptp_clock_kernel.h | 6 ++-
include/uapi/linux/ptp_clock.h | 4 +-
tools/testing/selftests/ptp/testptp.c | 6 ++-
7 files changed, 114 insertions(+), 6 deletions(-)
--
2.7.4
Add RSEQ, restartable sequence, support and related selftest to RISCV.
The Kconfig option HAVE_REGS_AND_STACK_ACCESS_API is also required by
RSEQ because RSEQ will modify the content of pt_regs.sepc through
instruction_pointer_set() during the fixup procedure. In order to select
the config HAVE_REGS_AND_STACK_ACCESS_API, the missing APIs for accessing
pt_regs are also added in this patch set.
The relevant RSEQ tests in kselftest require the Binutils patch "RISC-V:
Fix linker problems with TLS copy relocs" to avoid placing
PREINIT_ARRAY and TLS variable of librseq.so at the same address.
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=3e7bd7f…
A segmental fault will happen if binutils misses this patch.
Patrick Stählin (1):
riscv: add required functions to enable HAVE_REGS_AND_STACK_ACCESS_API
Vincent Chen (2):
riscv: Add support for restartable sequence
rseq/selftests: Add support for riscv
Changes since v1:
1. Use the correct register name to access pt_regs
arch/riscv/Kconfig | 2 +
arch/riscv/include/asm/ptrace.h | 29 +-
arch/riscv/kernel/entry.S | 4 +
arch/riscv/kernel/ptrace.c | 99 +++++
arch/riscv/kernel/signal.c | 2 +
tools/testing/selftests/rseq/param_test.c | 23 ++
tools/testing/selftests/rseq/rseq-riscv.h | 622 ++++++++++++++++++++++++++++++
tools/testing/selftests/rseq/rseq.h | 2 +
8 files changed, 782 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/rseq-riscv.h
--
2.7.4
This is a repost of the mremap speed up patches, adding Kirill's
Acked-by's (from a separate discussion). The previous versions are
posted at:
v1 - https://lore.kernel.org/r/20200930222130.4175584-1-kaleshsingh@google.com
v2 - https://lore.kernel.org/r/20201002162101.665549-1-kaleshsingh@google.com
v3 - http://lore.kernel.org/r/20201005154017.474722-1-kaleshsingh@google.com
mremap time can be optimized by moving entries at the PMD/PUD level if
the source and destination addresses are PMD/PUD-aligned and
PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
x86. Other architectures where this type of move is supported and known to
be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD
and HAVE_MOVE_PUD.
Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
region on x86 and arm64:
- HAVE_MOVE_PMD is already enabled on x86 : N/A
- Enabling HAVE_MOVE_PUD on x86 : ~13x speed up
- Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
- Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
give a total of ~150x speed up on arm64.
Changes in v2:
- Reduce mremap_test time by only validating a configurable
threshold of the remapped region, as per John.
- Use a random pattern for mremap validation. Provide pattern
seed in test output, as per John.
- Moved set_pud_at() to separate patch, per Kirill.
- Use switch() instead of ifs in move_pgt_entry(), per Kirill.
- Update commit message with description of Android
garbage collector use case for HAVE_MOVE_PUD, as per Joel.
- Fix build test error reported by kernel test robot in [1].
Changes in v3:
- Make lines 80 cols or less where they don’t need to be longer,
per John.
- Removed unused PATTERN_SIZE in mremap_test
- Added Reviewed-by tag for patch 1/5 (mremap kselftest patch).
- Use switch() instead of ifs in get_extent(), per Kirill
- Add BUILD_BUG() is get_extent() default case.
- Move get_old_pud() and alloc_new_pud() out of
#ifdef CONFIG_HAVE_MOVE_PUD, per Kirill.
- Have get_old_pmd() and alloc_new_pmd() use get_old_pud() and
alloc_old_pud(), per Kirill.
- Replace #ifdef CONFIG_HAVE_MOVE_PMD / PUD in move_page_tables()
with IS_ENABLED(CONFIG_HAVE_MOVE_PMD / PUD), per Kirill.
- Fold Add set_pud_at() patch into patch 4/5, per Kirill.
[1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4F…
Kalesh Singh (5):
kselftests: vm: Add mremap tests
arm64: mremap speedup - Enable HAVE_MOVE_PMD
mm: Speedup mremap on 1GB or larger regions
arm64: mremap speedup - Enable HAVE_MOVE_PUD
x86: mremap speedup - Enable HAVE_MOVE_PUD
arch/Kconfig | 7 +
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/pgtable.h | 1 +
arch/x86/Kconfig | 1 +
mm/mremap.c | 230 ++++++++++++---
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/mremap_test.c | 344 +++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 11 +
9 files changed, 558 insertions(+), 40 deletions(-)
create mode 100644 tools/testing/selftests/vm/mremap_test.c
--
2.28.0.1011.ga647a8990f-goog
The eeh-basic test got its own 60 seconds timeout (defined in commit
414f50434aa2 "selftests/eeh: Bump EEH wait time to 60s") per breakable
device.
And we have discovered that the number of breakable devices varies
on different hardware. The device recovery time ranges from 0 to 35
seconds. In our test pool it will take about 30 seconds to run on a
Power8 system that with 5 breakable devices, 60 seconds to run on a
Power9 system that with 4 breakable devices.
Extend the timeout setting in the kselftest framework to 5 minutes
to give it a chance to finish.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/powerpc/eeh/Makefile | 2 +-
tools/testing/selftests/powerpc/eeh/settings | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/powerpc/eeh/settings
diff --git a/tools/testing/selftests/powerpc/eeh/Makefile b/tools/testing/selftests/powerpc/eeh/Makefile
index b397bab..ae963eb 100644
--- a/tools/testing/selftests/powerpc/eeh/Makefile
+++ b/tools/testing/selftests/powerpc/eeh/Makefile
@@ -3,7 +3,7 @@ noarg:
$(MAKE) -C ../
TEST_PROGS := eeh-basic.sh
-TEST_FILES := eeh-functions.sh
+TEST_FILES := eeh-functions.sh settings
top_srcdir = ../../../../..
include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/eeh/settings b/tools/testing/selftests/powerpc/eeh/settings
new file mode 100644
index 0000000..694d707
--- /dev/null
+++ b/tools/testing/selftests/powerpc/eeh/settings
@@ -0,0 +1 @@
+timeout=300
--
2.7.4
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
This is an implementation of "secret" mappings backed by a file descriptor.
I've dropped the boot time reservation patch for now as it is not strictly
required for the basic usage and can be easily added later either with or
without CMA.
v6 changes:
* Silence the warning about missing syscall, thanks to Qian Cai
* Replace spaces with tabs in Kconfig additions, per Randy
* Add a selftest.
v5 changes:
* rebase on v5.9-rc5
* drop boot time memory reservation patch
v4 changes:
* rebase on v5.9-rc1
* Do not redefine PMD_PAGE_ORDER in fs/dax.c, thanks Kirill
* Make secret mappings exclusive by default and only require flags to
memfd_secret() system call for uncached mappings, thanks again Kirill :)
v3 changes:
* Squash kernel-parameters.txt update into the commit that added the
command line option.
* Make uncached mode explicitly selectable by architectures. For now enable
it only on x86.
v2 changes:
* Follow Michael's suggestion and name the new system call 'memfd_secret'
* Add kernel-parameters documentation about the boot option
* Fix i386-tinyconfig regression reported by the kbuild bot.
CONFIG_SECRETMEM now depends on !EMBEDDED to disable it on small systems
from one side and still make it available unconditionally on
architectures that support SET_DIRECT_MAP.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will have desired protection bits set in the user page
table. For instance, current implementation allows uncached mappings.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
Additionally, the secret mappings may be used as a mean to protect guest
memory in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
[1] that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
I've hesitated whether to continue to use new flags to memfd_create() or to
add a new system call and I've decided to use a new system call after I've
started to look into man pages update. There would have been two completely
independent descriptions and I think it would have been very confusing.
Hiding secret memory mappings behind an anonymous file allows (ab)use of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
As the fragmentation of the direct map was one of the major concerns raised
during the previous postings, I've added an amortizing cache of PMD-size
pages to each file descriptor that is used as an allocation pool for the
secret memory areas.
v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
Mike Rapoport (6):
mm: add definition of PMD_PAGE_ORDER
mmap: make mlock_future_check() global
mm: introduce memfd_secret system call to create "secret" memory areas
arch, mm: wire up memfd_secret system call were relevant
mm: secretmem: use PMD-size pages to amortize direct map fragmentation
secretmem: test: add basic selftest for memfd_secret(2)
arch/Kconfig | 7 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/riscv/include/asm/unistd.h | 1 +
arch/x86/Kconfig | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/dax.c | 11 +-
include/linux/pgtable.h | 3 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 7 +-
include/uapi/linux/magic.h | 1 +
include/uapi/linux/secretmem.h | 8 +
kernel/sys_ni.c | 2 +
mm/Kconfig | 4 +
mm/Makefile | 1 +
mm/internal.h | 3 +
mm/mmap.c | 5 +-
mm/secretmem.c | 333 ++++++++++++++++++++++
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 296 +++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 17 ++
25 files changed, 703 insertions(+), 13 deletions(-)
create mode 100644 include/uapi/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c
--
2.28.0
This patch series is a result of discussion at the refcount_t BOF
the Linux Plumbers Conference. In this discussion, we identified
a need for looking closely and investigating atomic_t usages in
the kernel when it is used strictly as a counter without it
controlling object lifetimes and state changes.
There are a number of atomic_t usages in the kernel where atomic_t api
is used strictly for counting and not for managing object lifetime. In
some cases, atomic_t might not even be needed.
The purpose of these counters is to clearly differentiate atomic_t
counters from atomic_t usages that guard object lifetimes, hence prone
to overflow and underflow errors. It allows tools that scan for underflow
and overflow on atomic_t usages to detect overflow and underflows to scan
just the cases that are prone to errors.
Simple atomic counters api provides interfaces for simple atomic counters
that just count, and don't guard resource lifetimes. The interfaces are
built on top of atomic_t api, providing a smaller subset of atomic_t
interfaces necessary to support simple counters.
Counter wraps around to INT_MIN when it overflows and should not be used
to guard resource lifetimes, device usage and open counts that control
state changes, and pm states. Overflowing to INT_MIN is consistent with
the atomic_t api, which it is built on top of.
Using counter_atomic* to guard lifetimes could lead to use-after free
when it overflows and undefined behavior when used to manage state
changes and device usage/open states.
This patch series introduces Simple atomic counters. Counter atomic ops
leverage atomic_t and provide a sub-set of atomic_t ops.
In addition this patch series converts a few drivers to use the new api.
The following criteria is used for select variables for conversion:
1. Variable doesn't guard object lifetimes, manage state changes e.g:
device usage counts, device open counts, and pm states.
2. Variable is used for stats and counters.
3. The conversion doesn't change the overflow behavior.
Note: Would like to get this into Linux 5.10-rc1 so we can continue
updating drivers that can be updated to use this API. If this all looks
good, Kees, would you like to take this through your tree or would you
like to take this through mine.
Changes since Patch v2:
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
the patches with the tags.
-- Minor changes to address Greg's comment to remove default from
Kconfig
-- Added Copyrights to new files
Updates to address comments on v2 from Kees Cook
-- Updated Patch 1/11 to make clear that the counter wraps around to
INT_MIN and that this behavior is consistent with the atomic_t
api, on which this counter built api built on top of.
-- Other patch change logs updated with the correct wrap around
behavior.
-- Patch 1/11 is updated to add tests with constants for overflow
and underflow.
-- Patch 8/11 - added inits for the stat counters
-- Patch 10/11 - fixes the vmci_num_guest_devices != 0 to >0 which is
safer than checking for !=0.
Changes since Patch v1
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
the patches with the tags.
-- Addressed Kees's and Joel's comments:
1. Removed dec_return interfaces
2. Removed counter_simple interfaces to be added later with changes
to drivers that use them (if any).
Changes since RFC:
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
the patches with the tags.
-- Addressed Kees's comments:
1. Non-atomic counters renamed to counter_simple32 and counter_simple64
to clearly indicate size.
2. Added warning for counter_simple* usage and it should be used only
when there is no need for atomicity.
3. Renamed counter_atomic to counter_atomic32 to clearly indicate size.
4. Renamed counter_atomic_long to counter_atomic64 and it now uses
atomic64_t ops and indicates size.
5. Test updated for the API renames.
6. Added helper functions for test results printing
7. Verified that the test module compiles in kunit env. and test
module can be loaded to run the test.
8. Updated Documentation to reflect the intent to make the API
restricted so it can never be used to guard object lifetimes
and state management. I left _return ops for now, inc_return
is necessary for now as per the discussion we had on this topic.
-- Updated driver patches with API name changes.
-- We discussed if binder counters can be non-atomic. For now I left
them the same as the RFC patch - using counter_atomic32
-- Unrelated to this patch series:
The patch series review uncovered improvements could be made to
test_async_driver_probe and vmw_vmci/vmci_guest. I will track
these for fixing later.
Shuah Khan (11):
counters: Introduce counter_atomic* counters
selftests:lib:test_counters: add new test for counters
drivers/base: convert deferred_trigger_count and probe_count to
counter_atomic32
drivers/base/devcoredump: convert devcd_count to counter_atomic32
drivers/acpi: convert seqno counter_atomic32
drivers/acpi/apei: convert seqno counter_atomic32
drivers/android/binder: convert stats, transaction_log to
counter_atomic32
drivers/base/test/test_async_driver_probe: convert to use
counter_atomic32
drivers/char/ipmi: convert stats to use counter_atomic32
drivers/misc/vmw_vmci: convert num guest devices counter to
counter_atomic32
drivers/edac: convert pci counters to counter_atomic32
Documentation/core-api/counters.rst | 109 ++++++++++++
MAINTAINERS | 8 +
drivers/acpi/acpi_extlog.c | 5 +-
drivers/acpi/apei/ghes.c | 5 +-
drivers/android/binder.c | 41 ++---
drivers/android/binder_internal.h | 3 +-
drivers/base/dd.c | 19 +-
drivers/base/devcoredump.c | 5 +-
drivers/base/test/test_async_driver_probe.c | 26 +--
drivers/char/ipmi/ipmi_msghandler.c | 9 +-
drivers/char/ipmi/ipmi_si_intf.c | 9 +-
drivers/edac/edac_pci.h | 5 +-
drivers/edac/edac_pci_sysfs.c | 28 +--
drivers/misc/vmw_vmci/vmci_guest.c | 9 +-
include/linux/counters.h | 176 +++++++++++++++++++
lib/Kconfig | 9 +
lib/Makefile | 1 +
lib/test_counters.c | 162 +++++++++++++++++
tools/testing/selftests/lib/Makefile | 1 +
tools/testing/selftests/lib/config | 1 +
tools/testing/selftests/lib/test_counters.sh | 10 ++
21 files changed, 567 insertions(+), 74 deletions(-)
create mode 100644 Documentation/core-api/counters.rst
create mode 100644 include/linux/counters.h
create mode 100644 lib/test_counters.c
create mode 100755 tools/testing/selftests/lib/test_counters.sh
--
2.25.1