The primary reason to invoke the oom reaper from the exit_mmap path used
to be a prevention of an excessive oom killing if the oom victim exit
races with the oom reaper (see [1] for more details). The invocation has
moved around since then because of the interaction with the munlock
logic but the underlying reason has remained the same (see [2]).
Munlock code is no longer a problem since [3] and there shouldn't be
any blocking operation before the memory is unmapped by exit_mmap so
the oom reaper invocation can be dropped. The unmapping part can be done
with the non-exclusive mmap_sem and the exclusive one is only required
when page tables are freed.
Remove the oom_reaper from exit_mmap which will make the code easier to
read. This is really unlikely to make any observable difference although
some microbenchmarks could benefit from one less branch that needs to be
evaluated even though it almost never is true.
[1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
---
Notes:
- Rebased over git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
mm-unstable branch per Andrew's request but applies cleany to Linus' ToT
- Conflicts with maple-tree patchset. Resolving these was discussed in
https://lore.kernel.org/all/20220519223438.qx35hbpfnnfnpouw@revolver/
include/linux/oom.h | 2 --
mm/mmap.c | 31 ++++++++++++-------------------
mm/oom_kill.c | 2 +-
3 files changed, 13 insertions(+), 22 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 02d1e7bbd8cd..6cdde62b078b 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -106,8 +106,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm)
return 0;
}
-bool __oom_reap_task_mm(struct mm_struct *mm);
-
long oom_badness(struct task_struct *p,
unsigned long totalpages);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2b9305ed0dda..b7918e6bb0db 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3110,30 +3110,13 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Manually reap the mm to free as much memory as possible.
- * Then, as the oom reaper does, set MMF_OOM_SKIP to disregard
- * this mm from further consideration. Taking mm->mmap_lock for
- * write after setting MMF_OOM_SKIP will guarantee that the oom
- * reaper will not run on this mm again after mmap_lock is
- * dropped.
- *
- * Nothing can be holding mm->mmap_lock here and the above call
- * to mmu_notifier_release(mm) ensures mmu notifier callbacks in
- * __oom_reap_task_mm() will not block.
- */
- (void)__oom_reap_task_mm(mm);
- set_bit(MMF_OOM_SKIP, &mm->flags);
- }
-
- mmap_write_lock(mm);
+ mmap_read_lock(mm);
arch_exit_mmap(mm);
vma = mm->mmap;
if (!vma) {
/* Can happen if dup_mmap() received an OOM */
- mmap_write_unlock(mm);
+ mmap_read_unlock(mm);
return;
}
@@ -3143,6 +3126,16 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
+ mmap_read_unlock(mm);
+
+ /*
+ * Set MMF_OOM_SKIP to hide this task from the oom killer/reaper
+ * because the memory has been already freed. Do not bother checking
+ * mm_is_oom_victim because setting a bit unconditionally is cheaper.
+ */
+ set_bit(MMF_OOM_SKIP, &mm->flags);
+
+ mmap_write_lock(mm);
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8a70bca67c94..98dca2b42357 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -538,7 +538,7 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait);
static struct task_struct *oom_reaper_list;
static DEFINE_SPINLOCK(oom_reaper_lock);
-bool __oom_reap_task_mm(struct mm_struct *mm)
+static bool __oom_reap_task_mm(struct mm_struct *mm)
{
struct vm_area_struct *vma;
bool ret = true;
--
2.36.1.255.ge46751e96f-goog
Add bpf trampoline support for arm64. Most of the logic is the same as
x86.
Tested on raspberry pi 4b and qemu with KASLR disabled (avoid long jump),
result:
#9 /1 bpf_cookie/kprobe:OK
#9 /2 bpf_cookie/multi_kprobe_link_api:FAIL
#9 /3 bpf_cookie/multi_kprobe_attach_api:FAIL
#9 /4 bpf_cookie/uprobe:OK
#9 /5 bpf_cookie/tracepoint:OK
#9 /6 bpf_cookie/perf_event:OK
#9 /7 bpf_cookie/trampoline:OK
#9 /8 bpf_cookie/lsm:OK
#9 bpf_cookie:FAIL
#18 /1 bpf_tcp_ca/dctcp:OK
#18 /2 bpf_tcp_ca/cubic:OK
#18 /3 bpf_tcp_ca/invalid_license:OK
#18 /4 bpf_tcp_ca/dctcp_fallback:OK
#18 /5 bpf_tcp_ca/rel_setsockopt:OK
#18 bpf_tcp_ca:OK
#51 /1 dummy_st_ops/dummy_st_ops_attach:OK
#51 /2 dummy_st_ops/dummy_init_ret_value:OK
#51 /3 dummy_st_ops/dummy_init_ptr_arg:OK
#51 /4 dummy_st_ops/dummy_multiple_args:OK
#51 dummy_st_ops:OK
#55 fentry_fexit:OK
#56 fentry_test:OK
#57 /1 fexit_bpf2bpf/target_no_callees:OK
#57 /2 fexit_bpf2bpf/target_yes_callees:OK
#57 /3 fexit_bpf2bpf/func_replace:OK
#57 /4 fexit_bpf2bpf/func_replace_verify:OK
#57 /5 fexit_bpf2bpf/func_sockmap_update:OK
#57 /6 fexit_bpf2bpf/func_replace_return_code:OK
#57 /7 fexit_bpf2bpf/func_map_prog_compatibility:OK
#57 /8 fexit_bpf2bpf/func_replace_multi:OK
#57 /9 fexit_bpf2bpf/fmod_ret_freplace:OK
#57 fexit_bpf2bpf:OK
#58 fexit_sleep:OK
#59 fexit_stress:OK
#60 fexit_test:OK
#67 get_func_args_test:OK
#68 get_func_ip_test:OK
#104 modify_return:OK
#237 xdp_bpf2bpf:OK
bpf_cookie/multi_kprobe_link_api and bpf_cookie/multi_kprobe_attach_api
failed due to lack of multi_kprobe on arm64.
v5:
- As Alexei suggested, remove is_valid_bpf_tramp_flags()
v4: https://lore.kernel.org/bpf/20220517071838.3366093-1-xukuohai@huawei.com/
- Run the test cases on raspberry pi 4b
- Rebase and add cookie to trampoline
- As Steve suggested, move trace_direct_tramp() back to entry-ftrace.S to
avoid messing up generic code with architecture specific code
- As Jakub suggested, merge patch 4 and patch 5 of v3 to provide full function
in one patch
- As Mark suggested, add a comment for the use of aarch64_insn_patch_text_nosync()
- Do not generate trampoline for long jump to avoid triggering ftrace_bug
- Round stack size to multiples of 16B to avoid SPAlignmentFault
- Use callee saved register x20 to reduce the use of mov_i64
- Add missing BTI J instructions
- Trivial spelling and code style fixes
v3: https://lore.kernel.org/bpf/20220424154028.1698685-1-xukuohai@huawei.com/
- Append test results for bpf_tcp_ca, dummy_st_ops, fexit_bpf2bpf,
xdp_bpf2bpf
- Support to poke bpf progs
- Fix return value of arch_prepare_bpf_trampoline() to the total number
of bytes instead of number of instructions
- Do not check whether CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled in
arch_prepare_bpf_trampoline, since the trampoline may be hooked to a bpf
prog
- Restrict bpf_arch_text_poke() to poke bpf text only, as kernel functions
are poked by ftrace
- Rewrite trace_direct_tramp() in inline assembly in trace_selftest.c
to avoid messing entry-ftrace.S
- isolate arch_ftrace_set_direct_caller() with macro
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS to avoid compile error
when this macro is disabled
- Some trivial code sytle fixes
v2: https://lore.kernel.org/bpf/20220414162220.1985095-1-xukuohai@huawei.com/
- Add Song's ACK
- Change the multi-line comment in is_valid_bpf_tramp_flags() into net
style (patch 3)
- Fix a deadloop issue in ftrace selftest (patch 2)
- Replace pt_regs->x0 with pt_regs->orig_x0 in patch 1 commit message
- Replace "bpf trampoline" with "custom trampoline" in patch 1, as
ftrace direct call is not only used by bpf trampoline.
v1: https://lore.kernel.org/bpf/20220413054959.1053668-1-xukuohai@huawei.com/
Xu Kuohai (6):
arm64: ftrace: Add ftrace direct call support
ftrace: Fix deadloop caused by direct call in ftrace selftest
bpf: Remove is_valid_bpf_tramp_flags()
bpf, arm64: Impelment bpf_arch_text_poke() for arm64
bpf, arm64: bpf trampoline for arm64
selftests/bpf: Fix trivial typo in fentry_fexit.c
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/ftrace.h | 22 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/entry-ftrace.S | 28 +-
arch/arm64/net/bpf_jit.h | 1 +
arch/arm64/net/bpf_jit_comp.c | 523 +++++++++++++++++-
arch/x86/net/bpf_jit_comp.c | 20 -
kernel/bpf/bpf_struct_ops.c | 3 +
kernel/bpf/trampoline.c | 3 +
kernel/trace/trace_selftest.c | 2 +
.../selftests/bpf/prog_tests/fentry_fexit.c | 4 +-
11 files changed, 570 insertions(+), 39 deletions(-)
--
2.30.2
From: Vincent Cheng <vincent.cheng.xh(a)renesas.com>
This series adds adjust phase to the PTP Hardware Clock device interface.
Some PTP hardware clocks have a write phase mode that has
a built-in hardware filtering capability. The write phase mode
utilizes a phase offset control word instead of a frequency offset
control word. Add adjust phase function to take advantage of this
capability.
Changes since v1:
- As suggested by Richard Cochran:
1. ops->adjphase is new so need to check for non-null function pointer.
2. Kernel coding style uses lower_case_underscores.
3. Use existing PTP clock API for delayed worker.
Vincent Cheng (3):
ptp: Add adjphase function to support phase offset control.
ptp: Add adjust_phase to ptp_clock_caps capability.
ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
drivers/ptp/ptp_chardev.c | 1 +
drivers/ptp/ptp_clock.c | 3 ++
drivers/ptp/ptp_clockmatrix.c | 92 +++++++++++++++++++++++++++++++++++
drivers/ptp/ptp_clockmatrix.h | 8 ++-
include/linux/ptp_clock_kernel.h | 6 ++-
include/uapi/linux/ptp_clock.h | 4 +-
tools/testing/selftests/ptp/testptp.c | 6 ++-
7 files changed, 114 insertions(+), 6 deletions(-)
--
2.7.4
The default file permissions on a memfd include execute bits, which
means that such a memfd can be filled with a executable and passed to
the exec() family of functions. This is undesirable on systems where all
code is verified and all filesystems are intended to be mounted noexec,
since an attacker may be able to use a memfd to load unverified code and
execute it.
Additionally, execution via memfd is a common way to avoid scrutiny for
malicious code, since it allows execution of a program without a file
ever appearing on disk. This attack vector is not totally mitigated with
this new flag, since the default memfd file permissions must remain
executable to avoid breaking existing legitimate uses, but it should be
possible to use other security mechanisms to prevent memfd_create calls
without MFD_NOEXEC on systems where it is known that executable memfds
are not necessary.
This patch series adds a new MFD_NOEXEC flag for memfd_create(), which
allows creation of non-executable memfds, and as part of the
implementation of this new flag, it also adds a new F_SEAL_EXEC seal,
which will prevent modification of any of the execute bits of a sealed
memfd.
I am not sure if this is the best way to implement the desired behavior
(for example, the F_SEAL_EXEC seal is really more of an implementation
detail and feels a bit clunky to expose), so suggestions are welcome
for alternate approaches.
Daniel Verkamp (4):
mm/memfd: add F_SEAL_EXEC
mm/memfd: add MFD_NOEXEC flag to memfd_create
selftests/memfd: add tests for F_SEAL_EXEC
selftests/memfd: add tests for MFD_NOEXEC
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/memfd.h | 1 +
mm/memfd.c | 12 ++-
mm/shmem.c | 6 ++
tools/testing/selftests/memfd/memfd_test.c | 114 +++++++++++++++++++++
5 files changed, 133 insertions(+), 1 deletion(-)
--
2.35.1.1094.g7c7d902a7c-goog
On Thu, Jun 30, 2022 at 01:16:34PM +0200, Hans Schultz wrote:
> This patch is related to the patch set
> "Add support for locked bridge ports (for 802.1X)"
> Link: https://lore.kernel.org/netdev/20220223101650.1212814-1-schultz.hans+netdev…
>
> This patch makes the locked port feature work with learning turned on,
> which is enabled with the command:
>
> bridge link set dev DEV learning on
>
> Without this patch, link local traffic (01:80:c2) like EAPOL packets will
> create a fdb entry when ingressing on a locked port with learning turned
> on, thus unintentionally opening up the port for traffic for the said MAC.
>
> Some switchcore features like Mac-Auth and refreshing of FDB entries,
> require learning enables on some switchcores, f.ex. the mv88e6xxx family.
> Other features may apply too.
>
> Since many switchcores trap or mirror various multicast packets to the
> CPU, link local traffic will unintentionally unlock the port for the
> SA mac in question unless prevented by this patch.
Why not just teach hostapd to do:
echo 1 > /sys/class/net/br0/bridge/no_linklocal_learn
?
This v2 series implements selftests targeting the feature floated by Chao
via:
https://lore.kernel.org/linux-mm/20220310140911.50924-1-chao.p.peng@linux.i…
Below changes aim to test the fd based approach for guest private memory
in context of normal (non-confidential) VMs executing on non-confidential
platforms.
priv_memfd_test.c file adds a suite of selftests to access private memory
from the guest via private/shared accesses and checking if the contents
can be leaked to/accessed by vmm via shared memory view.
Updates in V2:
1) Tests are added to exercise implicit/explicit memory conversion paths.
2) Test is added to exercise UPM feature without double memory allocation.
This series has dependency on following patches:
1) V5 series patches from Chao mentioned above.
2) https://github.com/vishals4gh/linux/commit/b9adedf777ad84af39042e9c19899600…
- Fixes host kernel crash with current implementation
3) https://github.com/vishals4gh/linux/commit/0577e351ee36d52c1f6cdcb1b8de7aa6…
- Confidential platforms along with the confidentiality aware software stack
support a notion of private/shared accesses from the confidential VMs.
Generally, a bit in the GPA conveys the shared/private-ness of the access.
Non-confidential platforms don't have a notion of private or shared accesses
from the guest VMs. To support this notion, KVM_HC_MAP_GPA_RANGE is modified
to allow marking an access from a VM within a GPA range as always shared or
private. There is an ongoing discussion about adding support for
software-only confidential VMs, which should replace this patch.
4) https://github.com/vishals4gh/linux/commit/8d46aea9a7d72e4b1b998066ce0dde08…
- Temporary placeholder to be able to test memory conversion paths
till the memory conversion exit error code is finalized.
5) https://github.com/vishals4gh/linux/commit/4c36706477c62d9416d635fa6ac4ef64…
- Fixes GFN calculation during memory conversion path.
Github link for the patches posted as part of this series:
https://github.com/vishals4gh/linux/commits/priv_memfd_selftests_rfc_v2
Austin Diviness (1):
selftests: kvm: Add hugepage support to priv_memfd_test suite.
Vishal Annapurve (7):
selftests: kvm: Fix inline assembly for hypercall
selftests: kvm: Add a basic selftest to test private memory
selftests: kvm: priv_memfd_test: Add support for memory conversion
selftests: kvm: priv_memfd_test: Add shared access test
selftests: kvm: Add implicit memory conversion tests
selftests: kvm: Add KVM_HC_MAP_GPA_RANGE hypercall test
selftests: kvm: priv_memfd: Add test avoiding double allocation
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 2 +-
tools/testing/selftests/kvm/priv_memfd_test.c | 1359 +++++++++++++++++
3 files changed, 1361 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/priv_memfd_test.c
--
2.36.0.512.ge40c2bad7a-goog
Add a new QEMU config for kunit_tool, x86_64-smp, which provides an
8-cpu SMP setup. No other kunit_tool configurations provide an SMP
setup, so this is the best bet for testing things like KCSAN, which
require a multicore/multi-cpu system.
The choice of 8 CPUs is pretty arbitrary: it's enough to get tests like
KCSAN to run with a nontrivial number of worker threads, while still
working relatively quickly on older machines.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is based off the discussion in:
https://groups.google.com/g/kasan-dev/c/A7XzC2pXRC8
---
tools/testing/kunit/qemu_configs/x86_64-smp.py | 13 +++++++++++++
1 file changed, 13 insertions(+)
create mode 100644 tools/testing/kunit/qemu_configs/x86_64-smp.py
diff --git a/tools/testing/kunit/qemu_configs/x86_64-smp.py b/tools/testing/kunit/qemu_configs/x86_64-smp.py
new file mode 100644
index 000000000000..a95623f5f8b7
--- /dev/null
+++ b/tools/testing/kunit/qemu_configs/x86_64-smp.py
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+from ..qemu_config import QemuArchParams
+
+QEMU_ARCH = QemuArchParams(linux_arch='x86_64',
+ kconfig='''
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SMP=y
+ ''',
+ qemu_arch='x86_64',
+ kernel_path='arch/x86/boot/bzImage',
+ kernel_command_line='console=ttyS0',
+ extra_qemu_params=['-smp', '8'])
--
2.36.0.550.gb090851708-goog