The primary reason to invoke the oom reaper from the exit_mmap path used
to be a prevention of an excessive oom killing if the oom victim exit
races with the oom reaper (see [1] for more details). The invocation has
moved around since then because of the interaction with the munlock
logic but the underlying reason has remained the same (see [2]).
Munlock code is no longer a problem since [3] and there shouldn't be
any blocking operation before the memory is unmapped by exit_mmap so
the oom reaper invocation can be dropped. The unmapping part can be done
with the non-exclusive mmap_sem and the exclusive one is only required
when page tables are freed.
Remove the oom_reaper from exit_mmap which will make the code easier to
read. This is really unlikely to make any observable difference although
some microbenchmarks could benefit from one less branch that needs to be
evaluated even though it almost never is true.
[1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
---
Notes:
- Rebased over git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
mm-unstable branch per Andrew's request but applies cleany to Linus' ToT
- Conflicts with maple-tree patchset. Resolving these was discussed in
https://lore.kernel.org/all/20220519223438.qx35hbpfnnfnpouw@revolver/
include/linux/oom.h | 2 --
mm/mmap.c | 31 ++++++++++++-------------------
mm/oom_kill.c | 2 +-
3 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 02d1e7bbd8cd..6cdde62b078b 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -106,8 +106,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm)
return 0;
}
-bool __oom_reap_task_mm(struct mm_struct *mm);
-
long oom_badness(struct task_struct *p,
unsigned long totalpages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2b9305ed0dda..b7918e6bb0db 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3110,30 +3110,13 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Manually reap the mm to free as much memory as possible.
- * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
- * this mm from further consideration. Taking mm->mmap_lock for
- * write after setting MMF_OOM_SKIP will guarantee that the oom
- * reaper will not run on this mm again after mmap_lock is
- * dropped.
- *
- * Nothing can be holding mm->mmap_lock here and the above call
- * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
- * __oom_reap_task_mm() will not block.
- */
- (void)__oom_reap_task_mm(mm);
- set_bit(MMF_OOM_SKIP, &mm->flags);
- }
-
- mmap_write_lock(mm);
+ mmap_read_lock(mm);
arch_exit_mmap(mm);
vma = mm->mmap;
if (!vma) {
/* Can happen if dup_mmap() received an OOM */
- mmap_write_unlock(mm);
+ mmap_read_unlock(mm);
return;
}
@@ -3143,6 +3126,16 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
+ mmap_read_unlock(mm);
+
+ /*
+ * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
+ * because the memory has been already freed. Do not bother checking
+ * mm_is_oom_victim because setting a bit unconditionally is cheaper.
+ */
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+
+ mmap_write_lock(mm);
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8a70bca67c94..98dca2b42357 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -538,7 +538,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait);
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-bool __oom_reap_task_mm(struct mm_struct *mm)
+static bool __oom_reap_task_mm(struct mm_struct *mm)
{
struct vm_area_struct *vma;
bool ret = true;
--
2.36.1.255.ge46751e96f-goog
Add bpf trampoline support for arm64. Most of the logic is the same as
x86.
Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
result:
#9 /1 bpf_cookie/kprobe:OK
#9 /2 bpf_cookie/multi_kprobe_link_api:FAIL
#9 /3 bpf_cookie/multi_kprobe_attach_api:FAIL
#9 /4 bpf_cookie/uprobe:OK
#9 /5 bpf_cookie/tracepoint:OK
#9 /6 bpf_cookie/perf_event:OK
#9 /7 bpf_cookie/trampoline:OK
#9 /8 bpf_cookie/lsm:OK
#9 bpf_cookie:FAIL
#18 /1 bpf_tcp_ca/dctcp:OK
#18 /2 bpf_tcp_ca/cubic:OK
#18 /3 bpf_tcp_ca/invalid_license:OK
#18 /4 bpf_tcp_ca/dctcp_fallback:OK
#18 /5 bpf_tcp_ca/rel_setsockopt:OK
#18 bpf_tcp_ca:OK
#51 /1 dummy_st_ops/dummy_st_ops_attach:OK
#51 /2 dummy_st_ops/dummy_init_ret_value:OK
#51 /3 dummy_st_ops/dummy_init_ptr_arg:OK
#51 /4 dummy_st_ops/dummy_multiple_args:OK
#51 dummy_st_ops:OK
#55 fentry_fexit:OK
#56 fentry_test:OK
#57 /1 fexit_bpf2bpf/target_no_callees:OK
#57 /2 fexit_bpf2bpf/target_yes_callees:OK
#57 /3 fexit_bpf2bpf/func_replace:OK
#57 /4 fexit_bpf2bpf/func_replace_verify:OK
#57 /5 fexit_bpf2bpf/func_sockmap_update:OK
#57 /6 fexit_bpf2bpf/func_replace_return_code:OK
#57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK
#57 /8 fexit_bpf2bpf/func_replace_multi:OK
#57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK
#57 fexit_bpf2bpf:OK
#58 fexit_sleep:OK
#59 fexit_stress:OK
#60 fexit_test:OK
#67 get_func_args_test:OK
#68 get_func_ip_test:OK
#104 modify_return:OK
#237 xdp_bpf2bpf:OK
bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
failed due to lack of multi_kprobe on arm64.
v5:
- As Alexei suggested, remove is_valid_bpf_tramp_flags()
v4: https://lore.kernel.org/bpf/20220517071838.3366093-1-xukuohai@huawei.com/
- Run the test cases on raspberry pi 4b
- Rebase and add cookie to trampoline
- As Steve suggested, move trace_direct_tramp() back to entry-ftrace.S to
avoid messing up generic code with architecture specific code
- As Jakub suggested, merge patch 4 and patch 5 of v3 to provide full function
in one patch
- As Mark suggested, add a comment for the use of aarch64_insn_patch_text_nosync()
- Do not generate trampoline for long jump to avoid triggering ftrace_bug
- Round stack size to multiples of 16B to avoid SPAlignmentFault
- Use callee saved register x20 to reduce the use of mov_i64
- Add missing BTI J instructions
- Trivial spelling and code style fixes
v3: https://lore.kernel.org/bpf/20220424154028.1698685-1-xukuohai@huawei.com/
- Append test results for bpf_tcp_ca, dummy_st_ops, fexit_bpf2bpf,
xdp_bpf2bpf
- Support to poke bpf progs
- Fix return value of arch_prepare_bpf_trampoline() to the total number
of bytes instead of number of instructions
- Do not check whether CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled in
arch_prepare_bpf_trampoline, since the trampoline may be hooked to a bpf
prog
- Restrict bpf_arch_text_poke() to poke bpf text only, as kernel functions
are poked by ftrace
- Rewrite trace_direct_tramp() in inline assembly in trace_selftest.c
to avoid messing entry-ftrace.S
- isolate arch_ftrace_set_direct_caller() with macro
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS to avoid compile error
when this macro is disabled
- Some trivial code sytle fixes
v2: https://lore.kernel.org/bpf/20220414162220.1985095-1-xukuohai@huawei.com/
- Add Song's ACK
- Change the multi-line comment in is_valid_bpf_tramp_flags() into net
style (patch 3)
- Fix a deadloop issue in ftrace selftest (patch 2)
- Replace pt_regs->x0 with pt_regs->orig_x0 in patch 1 commit message
- Replace "bpf trampoline" with "custom trampoline" in patch 1, as
ftrace direct call is not only used by bpf trampoline.
v1: https://lore.kernel.org/bpf/20220413054959.1053668-1-xukuohai@huawei.com/
Xu Kuohai (6):
arm64: ftrace: Add ftrace direct call support
ftrace: Fix deadloop caused by direct call in ftrace selftest
bpf: Remove is_valid_bpf_tramp_flags()
bpf, arm64: Impelment bpf_arch_text_poke() for arm64
bpf, arm64: bpf trampoline for arm64
selftests/bpf: Fix trivial typo in fentry_fexit.c
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/ftrace.h | 22 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/entry-ftrace.S | 28 +-
arch/arm64/net/bpf_jit.h | 1 +
arch/arm64/net/bpf_jit_comp.c | 523 +++++++++++++++++-
arch/x86/net/bpf_jit_comp.c | 20 -
kernel/bpf/bpf_struct_ops.c | 3 +
kernel/bpf/trampoline.c | 3 +
kernel/trace/trace_selftest.c | 2 +
.../selftests/bpf/prog_tests/fentry_fexit.c | 4 +-
11 files changed, 570 insertions(+), 39 deletions(-)
--
2.30.2
From: Vincent Cheng <vincent.cheng.xh(a)renesas.com>
This series adds adjust phase to the PTP Hardware Clock device interface.
Some PTP hardware clocks have a write phase mode that has
a built-in hardware filtering capability. The write phase mode
utilizes a phase offset control word instead of a frequency offset
control word. Add adjust phase function to take advantage of this
capability.
Changes since v1:
- As suggested by Richard Cochran:
1. ops->adjphase is new so need to check for non-null function pointer.
2. Kernel coding style uses lower_case_underscores.
3. Use existing PTP clock API for delayed worker.
Vincent Cheng (3):
ptp: Add adjphase function to support phase offset control.
ptp: Add adjust_phase to ptp_clock_caps capability.
ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
drivers/ptp/ptp_chardev.c | 1 +
drivers/ptp/ptp_clock.c | 3 ++
drivers/ptp/ptp_clockmatrix.c | 92 +++++++++++++++++++++++++++++++++++
drivers/ptp/ptp_clockmatrix.h | 8 ++-
include/linux/ptp_clock_kernel.h | 6 ++-
include/uapi/linux/ptp_clock.h | 4 +-
tools/testing/selftests/ptp/testptp.c | 6 ++-
7 files changed, 114 insertions(+), 6 deletions(-)
--
2.7.4
The default file permissions on a memfd include execute bits, which
means that such a memfd can be filled with a executable and passed to
the exec() family of functions. This is undesirable on systems where all
code is verified and all filesystems are intended to be mounted noexec,
since an attacker may be able to use a memfd to load unverified code and
execute it.
Additionally, execution via memfd is a common way to avoid scrutiny for
malicious code, since it allows execution of a program without a file
ever appearing on disk. This attack vector is not totally mitigated with
this new flag, since the default memfd file permissions must remain
executable to avoid breaking existing legitimate uses, but it should be
possible to use other security mechanisms to prevent memfd_create calls
without MFD_NOEXEC on systems where it is known that executable memfds
are not necessary.
This patch series adds a new MFD_NOEXEC flag for memfd_create(), which
allows creation of non-executable memfds, and as part of the
implementation of this new flag, it also adds a new F_SEAL_EXEC seal,
which will prevent modification of any of the execute bits of a sealed
memfd.
I am not sure if this is the best way to implement the desired behavior
(for example, the F_SEAL_EXEC seal is really more of an implementation
detail and feels a bit clunky to expose), so suggestions are welcome
for alternate approaches.
Daniel Verkamp (4):
mm/memfd: add F_SEAL_EXEC
mm/memfd: add MFD_NOEXEC flag to memfd_create
selftests/memfd: add tests for F_SEAL_EXEC
selftests/memfd: add tests for MFD_NOEXEC
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/memfd.h | 1 +
mm/memfd.c | 12 ++-
mm/shmem.c | 6 ++
tools/testing/selftests/memfd/memfd_test.c | 114 +++++++++++++++++++++
5 files changed, 133 insertions(+), 1 deletion(-)
--
2.35.1.1094.g7c7d902a7c-goog
This v2 series implements selftests targeting the feature floated by Chao
via:
https://lore.kernel.org/linux-mm/20220310140911.50924-1-chao.p.peng@linux.i…
Below changes aim to test the fd based approach for guest private memory
in context of normal (non-confidential) VMs executing on non-confidential
platforms.
priv_memfd_test.c file adds a suite of selftests to access private memory
from the guest via private/shared accesses and checking if the contents
can be leaked to/accessed by vmm via shared memory view.
Updates in V2:
1) Tests are added to exercise implicit/explicit memory conversion paths.
2) Test is added to exercise UPM feature without double memory allocation.
This series has dependency on following patches:
1) V5 series patches from Chao mentioned above.
2) https://github.com/vishals4gh/linux/commit/b9adedf777ad84af39042e9c19899600…
- Fixes host kernel crash with current implementation
3) https://github.com/vishals4gh/linux/commit/0577e351ee36d52c1f6cdcb1b8de7aa6…
- Confidential platforms along with the confidentiality aware software stack
support a notion of private/shared accesses from the confidential VMs.
Generally, a bit in the GPA conveys the shared/private-ness of the access.
Non-confidential platforms don't have a notion of private or shared accesses
from the guest VMs. To support this notion, KVM_HC_MAP_GPA_RANGE is modified
to allow marking an access from a VM within a GPA range as always shared or
private. There is an ongoing discussion about adding support for
software-only confidential VMs, which should replace this patch.
4) https://github.com/vishals4gh/linux/commit/8d46aea9a7d72e4b1b998066ce0dde08…
- Temporary placeholder to be able to test memory conversion paths
till the memory conversion exit error code is finalized.
5) https://github.com/vishals4gh/linux/commit/4c36706477c62d9416d635fa6ac4ef64…
- Fixes GFN calculation during memory conversion path.
Github link for the patches posted as part of this series:
https://github.com/vishals4gh/linux/commits/priv_memfd_selftests_rfc_v2
Austin Diviness (1):
selftests: kvm: Add hugepage support to priv_memfd_test suite.
Vishal Annapurve (7):
selftests: kvm: Fix inline assembly for hypercall
selftests: kvm: Add a basic selftest to test private memory
selftests: kvm: priv_memfd_test: Add support for memory conversion
selftests: kvm: priv_memfd_test: Add shared access test
selftests: kvm: Add implicit memory conversion tests
selftests: kvm: Add KVM_HC_MAP_GPA_RANGE hypercall test
selftests: kvm: priv_memfd: Add test avoiding double allocation
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 2 +-
tools/testing/selftests/kvm/priv_memfd_test.c | 1359 +++++++++++++++++
3 files changed, 1361 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/priv_memfd_test.c
--
2.36.0.512.ge40c2bad7a-goog
Add a new QEMU config for kunit_tool, x86_64-smp, which provides an
8-cpu SMP setup. No other kunit_tool configurations provide an SMP
setup, so this is the best bet for testing things like KCSAN, which
require a multicore/multi-cpu system.
The choice of 8 CPUs is pretty arbitrary: it's enough to get tests like
KCSAN to run with a nontrivial number of worker threads, while still
working relatively quickly on older machines.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is based off the discussion in:
https://groups.google.com/g/kasan-dev/c/A7XzC2pXRC8
---
tools/testing/kunit/qemu_configs/x86_64-smp.py | 13 +++++++++++++
1 file changed, 13 insertions(+)
create mode 100644 tools/testing/kunit/qemu_configs/x86_64-smp.py
diff --git a/tools/testing/kunit/qemu_configs/x86_64-smp.py b/tools/testing/kunit/qemu_configs/x86_64-smp.py
new file mode 100644
index 000000000000..a95623f5f8b7
--- /dev/null
+++ b/tools/testing/kunit/qemu_configs/x86_64-smp.py
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+from ..qemu_config import QemuArchParams
+
+QEMU_ARCH = QemuArchParams(linux_arch='x86_64',
+ kconfig='''
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SMP=y
+ ''',
+ qemu_arch='x86_64',
+ kernel_path='arch/x86/boot/bzImage',
+ kernel_command_line='console=ttyS0',
+ extra_qemu_params=['-smp', '8'])
--
2.36.0.550.gb090851708-goog
This patch set extends the locked port feature for devices
that are behind a locked port, but do not have the ability to
authorize themselves as a supplicant using IEEE 802.1X.
Such devices can be printers, meters or anything related to
fixed installations. Instead of 802.1X authorization, devices
can get access based on their MAC addresses being whitelisted.
For an authorization daemon to detect that a device is trying
to get access through a locked port, the bridge will add the
MAC address of the device to the FDB with a locked flag to it.
Thus the authorization daemon can catch the FDB add event and
check if the MAC address is in the whitelist and if so replace
the FDB entry without the locked flag enabled, and thus open
the port for the device.
This feature is known as MAC-Auth or MAC Authentication Bypass
(MAB) in Cisco terminology, where the full MAB concept involves
additional Cisco infrastructure for authorization. There is no
real authentication process, as the MAC address of the device
is the only input the authorization daemon, in the general
case, has to base the decision if to unlock the port or not.
With this patch set, an implementation of the offloaded case is
supplied for the mv88e6xxx driver. When a packet ingresses on
a locked port, an ATU miss violation event will occur. When
handling such ATU miss violation interrupts, the MAC address of
the device is added to the FDB with a zero destination port
vector (DPV) and the MAC address is communicated through the
switchdev layer to the bridge, so that a FDB entry with the
locked flag enabled can be added.
Hans Schultz (4):
net: bridge: add fdb flag to extent locked port feature
net: switchdev: add support for offloading of fdb locked flag
net: dsa: mv88e6xxx: mac-auth/MAB implementation
selftests: forwarding: add test of MAC-Auth Bypass to locked port
tests
drivers/net/dsa/mv88e6xxx/Makefile | 1 +
drivers/net/dsa/mv88e6xxx/chip.c | 40 ++-
drivers/net/dsa/mv88e6xxx/chip.h | 5 +
drivers/net/dsa/mv88e6xxx/global1.h | 1 +
drivers/net/dsa/mv88e6xxx/global1_atu.c | 35 ++-
.../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c | 249 ++++++++++++++++++
.../net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h | 40 +++
drivers/net/dsa/mv88e6xxx/port.c | 32 ++-
drivers/net/dsa/mv88e6xxx/port.h | 2 +
include/net/dsa.h | 6 +
include/net/switchdev.h | 3 +-
include/uapi/linux/neighbour.h | 1 +
net/bridge/br.c | 3 +-
net/bridge/br_fdb.c | 18 +-
net/bridge/br_if.c | 1 +
net/bridge/br_input.c | 11 +-
net/bridge/br_private.h | 9 +-
.../net/forwarding/bridge_locked_port.sh | 42 ++-
18 files changed, 470 insertions(+), 29 deletions(-)
create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.c
create mode 100644 drivers/net/dsa/mv88e6xxx/mv88e6xxx_switchdev.h
--
2.30.2
Note: this series applies on top of
https://lore.kernel.org/linux-kselftest/20220516194730.1546328-2-dlatypov@g….
That patch greatly simplified the process of adding new flags.
This flag would let users pass additional arguments to QEMU when using a
non-UML arch to run their tests.
E.g. for kcsan's tests, they require SMP and with this patch, you can do
$ ./tools/testing/kunit/kunit.py run --kconfig_add=CONFIG_SMP --qemu_args='-smp 8'
This is proposed as an alternative to users manually creating new
qemu_config python files and also to [1], where we discussed checking in
a new x86_64 variant w/ `-smp 8` hard-coded into it.
This patch also contains a fix to the example `run_kunit` bash function
since it didn't quote properly and would parse the example above as
--qemu_args='-smp' '8'
no matter how you tried to quote your arguments.
[1] https://lore.kernel.org/linux-kselftest/20220518073232.526443-1-davidgow@go…
Daniel Latypov (3):
Documentation: kunit: fix example run_kunit func to allow spaces in
args
kunit: tool: simplify creating LinuxSourceTreeOperations
kunit: tool: introduce --qemu_args
.../dev-tools/kunit/running_tips.rst | 2 +-
tools/testing/kunit/kunit.py | 14 +++++++++-
tools/testing/kunit/kunit_kernel.py | 26 +++++++++++--------
tools/testing/kunit/kunit_tool_test.py | 20 +++++++++++---
4 files changed, 46 insertions(+), 16 deletions(-)
--
2.36.1.124.g0e6072fb45-goog