Introduce SW acceleration for IPIP tunnels in the netfilter flowtable
infrastructure.
---
Changes in v6:
- Rebase on top of nf-next main branch
- Link to v5: https://lore.kernel.org/r/20250721-nf-flowtable-ipip-v5-0-0865af9e58c6@kern…
Changes in v5:
- Rely on __ipv4_addr_hash() to compute the hash used as encap ID
- Remove unnecessary pskb_may_pull() in nf_flow_tuple_encap()
- Add nf_flow_ip4_ecanp_pop utility routine
- Link to v4: https://lore.kernel.org/r/20250718-nf-flowtable-ipip-v4-0-f8bb1c18b986@kern…
Changes in v4:
- Use the hash value of the saddr, daddr and protocol of outer IP header as
encapsulation id.
- Link to v3: https://lore.kernel.org/r/20250703-nf-flowtable-ipip-v3-0-880afd319b9f@kern…
Changes in v3:
- Add outer IP header sanity checks
- target nf-next tree instead of net-next
- Link to v2: https://lore.kernel.org/r/20250627-nf-flowtable-ipip-v2-0-c713003ce75b@kern…
Changes in v2:
- Introduce IPIP flowtable selftest
- Link to v1: https://lore.kernel.org/r/20250623-nf-flowtable-ipip-v1-1-2853596e3941@kern…
---
Lorenzo Bianconi (2):
net: netfilter: Add IPIP flowtable SW acceleration
selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest
include/linux/netdevice.h | 1 +
net/ipv4/ipip.c | 28 +++++++++++
net/netfilter/nf_flow_table_ip.c | 56 +++++++++++++++++++++-
net/netfilter/nft_flow_offload.c | 1 +
.../selftests/net/netfilter/nft_flowtable.sh | 40 ++++++++++++++++
5 files changed, 124 insertions(+), 2 deletions(-)
---
base-commit: bab3ce404553de56242d7b09ad7ea5b70441ea41
change-id: 20250623-nf-flowtable-ipip-1b3d7b08d067
Best regards,
--
Lorenzo Bianconi <lorenzo(a)kernel.org>
This series fixes issues in devlink_rate_tc_bw.py selftest that made
its checks unreliable and its documentation inconsistent with the
actual configuration.
Thanks
Carolina Jubran (3):
selftests: drv-net: Fix and clarify TC bandwidth split in
devlink_rate_tc_bw.py
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Relax total BW check in devlink_rate_tc_bw.py
.../drivers/net/hw/devlink_rate_tc_bw.py | 102 ++++++++----------
1 file changed, 44 insertions(+), 58 deletions(-)
--
2.38.1
This patchset ensures that the number of hugepages is correctly set in the
system so that the uffd-stress test does not fail due to the racy nature of
the test. Patch 1 corrects the hugepage constraint in the run_vmtests.sh
script, whereas patch 2 corrects the constraint in the test itself.
Dev Jain (2):
selftests/mm/uffd-stress: Make test operate on less hugetlb memory
selftests/mm/uffd-stress: Stricten constraint on free hugepages before
the test
tools/testing/selftests/mm/run_vmtests.sh | 2 +-
tools/testing/selftests/mm/uffd-stress.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--
2.30.2
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for managing GCS for KVM guests, it also
includes a fix for S1PIE which has also been sent separately as this
feature is a dependency for GCS. It is based on:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/gcs
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v15:
- Rebase onto v6.17-rc1.
- Link to v14: https://lore.kernel.org/r/20241005-arm64-gcs-v14-0-59060cd6092b@kernel.org
Changes in v14:
- Rebase onto arm64/for-next/gcs which includes all the non-KVM support.
- Manage the fine grained traps for GCS instructions.
- Manage PSTATE.EXLOCK when delivering exceptions to KVM guests.
- Link to v13: https://lore.kernel.org/r/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org
Changes in v13:
- Rebase onto v6.12-rc1.
- Allocate VM_HIGH_ARCH_6 since protection keys used all the existing
bits.
- Implement mm_release() and free transparently allocated GCSs there.
- Use bit 32 of AT_HWCAP for GCS due to AT_HWCAP2 being filled.
- Since we now only set GCSCRE0_EL1 on change ensure that it is
initialised with GCSPR_EL0 accessible to EL0.
- Fix OOM handling on thread copy.
- Link to v12: https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec947436a@kernel.org
Changes in v12:
- Clarify and simplify the signal handling code so we work with the
register state.
- When checking for write aborts to shadow stack pages ensure the fault
is a data abort.
- Depend on !UPROBES.
- Comment cleanups.
- Link to v11: https://lore.kernel.org/r/20240822-arm64-gcs-v11-0-41b81947ecb5@kernel.org
Changes in v11:
- Remove the dependency on the addition of clone3() support for shadow
stacks, rebasing onto v6.11-rc3.
- Make ID_AA64PFR1_EL1.GCS writeable in KVM.
- Hide GCS registers when GCS is not enabled for KVM guests.
- Require HCRX_EL2.GCSEn if booting at EL1.
- Require that GCSCR_EL1 and GCSCRE0_EL1 be initialised regardless of
if we boot at EL2 or EL1.
- Remove some stray use of bit 63 in signal cap tokens.
- Warn if we see a GCS with VM_SHARED.
- Remove rdundant check for VM_WRITE in fault handling.
- Cleanups and clarifications in the ABI document.
- Clean up and improve documentation of some sync placement.
- Only set the EL0 GCS mode if it's actually changed.
- Various minor fixes and tweaks.
- Link to v10: https://lore.kernel.org/r/20240801-arm64-gcs-v10-0-699e2bd2190b@kernel.org
Changes in v10:
- Fix issues with THP.
- Tighten up requirements for initialising GCSCR*.
- Only generate GCS signal frames for threads using GCS.
- Only context switch EL1 GCS registers if S1PIE is enabled.
- Move context switch of GCSCRE0_EL1 to EL0 context switch.
- Make GCS registers unconditionally visible to userspace.
- Use FHU infrastructure.
- Don't change writability of ID_AA64PFR1_EL1 for KVM.
- Remove unused arguments from alloc_gcs().
- Typo fixes.
- Link to v9: https://lore.kernel.org/r/20240625-arm64-gcs-v9-0-0f634469b8f0@kernel.org
Changes in v9:
- Rebase onto v6.10-rc3.
- Restructure and clarify memory management fault handling.
- Fix up basic-gcs for the latest clone3() changes.
- Convert to newly merged KVM ID register based feature configuration.
- Fixes for NV traps.
- Link to v8: https://lore.kernel.org/r/20240203-arm64-gcs-v8-0-c9fec77673ef@kernel.org
Changes in v8:
- Invalidate signal cap token on stack when consuming.
- Typo and other trivial fixes.
- Don't try to use process_vm_write() on GCS, it intentionally does not
work.
- Fix leak of thread GCSs.
- Rebase onto latest clone3() series.
- Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org
Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org
Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (6):
arm64/gcs: Ensure FGTs for EL1 GCS instructions are disabled
KVM: arm64: Manage GCS access and registers for guests
KVM: arm64: Forward GCS exceptions to nested guests
KVM: arm64: Set PSTATE.EXLOCK when entering an exception
KVM: arm64: Allow GCS to be enabled for guests
KVM: selftests: arm64: Add GCS registers to get-reg-list
arch/arm64/include/asm/el2_setup.h | 4 +++
arch/arm64/include/asm/kvm_emulate.h | 3 ++
arch/arm64/include/asm/kvm_host.h | 14 +++++++++
arch/arm64/include/asm/vncr_mapping.h | 2 ++
arch/arm64/include/uapi/asm/ptrace.h | 1 +
arch/arm64/kvm/handle_exit.c | 14 +++++++--
arch/arm64/kvm/hyp/exception.c | 37 ++++++++++++++++++++++++
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 31 ++++++++++++++++++++
arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 10 +++++++
arch/arm64/kvm/sys_regs.c | 32 ++++++++++++++++++--
tools/testing/selftests/kvm/arm64/get-reg-list.c | 12 ++++++++
11 files changed, 155 insertions(+), 5 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This is based on mm-unstable and was cross-compiled heavily.
I should probably have already dropped the RFC label but I want to hear
first if I ignored some corner case (SG entries?) and I need to do
at least a bit more testing.
I will only CC non-MM folks on the cover letter and the respective patch
to not flood too many inboxes (the lists receive all patches).
---
As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.
To recap, the reason we currently need nth_page() within a folio is because
on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.
While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.
So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.
Likely, many people have no idea when nth_page() is required and when
it might be dropped.
We refer to such problematic PFN ranges and "non-contiguous pages".
If we only deal with "contiguous pages", there is not need for nth_page().
Besides that "obvious" folio case, we might end up using nth_page()
within CMA allocations (again, could span memory sections), and in
one corner case (kfence) when processing memblock allocations (again,
could span memory sections).
So let's handle all that, add sanity checks, and remove nth_page().
Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #12 : disallow folios to have non-contiguous pages
Patch #13 -> #20 : remove nth_page() usage within folios
Patch #21 : disallow CMA allocations of non-contiguous pages
Patch #22 -> #31 : sanity+check + remove nth_page() usage within SG entry
Patch #32 : sanity-check + remove nth_page() usage in
unpin_user_page_range_dirty_lock()
Patch #33 : remove nth_page() in kfence
Patch #34 : adjust stale comment regarding nth_page
Patch #35 : mm: remove nth_page()
A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.
[1] https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-G…
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: Robin Murphy <robin.murphy(a)arm.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Brendan Jackman <jackmanb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Christoph Lameter <cl(a)gentwo.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: x86(a)kernel.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-mips(a)vger.kernel.org
Cc: linux-s390(a)vger.kernel.org
Cc: linux-crypto(a)vger.kernel.org
Cc: linux-ide(a)vger.kernel.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-mmc(a)vger.kernel.org
Cc: linux-arm-kernel(a)axis.com
Cc: linux-scsi(a)vger.kernel.org
Cc: kvm(a)vger.kernel.org
Cc: virtualization(a)lists.linux.dev
Cc: linux-mm(a)kvack.org
Cc: io-uring(a)vger.kernel.org
Cc: iommu(a)lists.linux.dev
Cc: kasan-dev(a)googlegroups.com
Cc: wireguard(a)lists.zx2c4.com
Cc: netdev(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-riscv(a)lists.infradead.org
David Hildenbrand (35):
mm: stop making SPARSEMEM_VMEMMAP user-selectable
arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
s390/Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
x86/Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu
kernel config
mm/page_alloc: reject unreasonable folio/compound page sizes in
alloc_contig_range_noprof()
mm/memremap: reject unreasonable folio/compound page sizes in
memremap_pages()
mm/hugetlb: check for unreasonable folio sizes when registering hstate
mm/mm_init: make memmap_init_compound() look more like
prep_compound_page()
mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
mm: sanity-check maximum folio size in folio_set_order()
mm: limit folio/compound page sizes in problematic kernel configs
mm: simplify folio_page() and folio_page_idx()
mm/mm/percpu-km: drop nth_page() usage within single allocation
fs: hugetlbfs: remove nth_page() usage within folio in
adjust_range_hwpoison()
mm/pagewalk: drop nth_page() usage within folio in folio_walk_start()
mm/gup: drop nth_page() usage within folio when recording subpages
io_uring/zcrx: remove "struct io_copy_cache" and one nth_page() usage
io_uring/zcrx: remove nth_page() usage within folio
mips: mm: convert __flush_dcache_pages() to
__flush_dcache_folio_pages()
mm/cma: refuse handing out non-contiguous page ranges
dma-remap: drop nth_page() in dma_common_contiguous_remap()
scatterlist: disallow non-contigous page ranges in a single SG entry
ata: libata-eh: drop nth_page() usage within SG entry
drm/i915/gem: drop nth_page() usage within SG entry
mspro_block: drop nth_page() usage within SG entry
memstick: drop nth_page() usage within SG entry
mmc: drop nth_page() usage within SG entry
scsi: core: drop nth_page() usage within SG entry
vfio/pci: drop nth_page() usage within SG entry
crypto: remove nth_page() usage within SG entry
mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock()
kfence: drop nth_page() usage
block: update comment of "struct bio_vec" regarding nth_page()
mm: remove nth_page()
arch/arm64/Kconfig | 1 -
arch/mips/include/asm/cacheflush.h | 11 +++--
arch/mips/mm/cache.c | 8 ++--
arch/s390/Kconfig | 1 -
arch/x86/Kconfig | 1 -
crypto/ahash.c | 4 +-
crypto/scompress.c | 8 ++--
drivers/ata/libata-sff.c | 6 +--
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 2 +-
drivers/memstick/core/mspro_block.c | 3 +-
drivers/memstick/host/jmb38x_ms.c | 3 +-
drivers/memstick/host/tifm_ms.c | 3 +-
drivers/mmc/host/tifm_sd.c | 4 +-
drivers/mmc/host/usdhi6rol0.c | 4 +-
drivers/scsi/scsi_lib.c | 3 +-
drivers/scsi/sg.c | 3 +-
drivers/vfio/pci/pds/lm.c | 3 +-
drivers/vfio/pci/virtio/migrate.c | 3 +-
fs/hugetlbfs/inode.c | 25 ++++------
include/crypto/scatterwalk.h | 4 +-
include/linux/bvec.h | 7 +--
include/linux/mm.h | 48 +++++++++++++++----
include/linux/page-flags.h | 5 +-
include/linux/scatterlist.h | 4 +-
io_uring/zcrx.c | 34 ++++---------
kernel/dma/remap.c | 2 +-
mm/Kconfig | 3 +-
mm/cma.c | 36 +++++++++-----
mm/gup.c | 13 +++--
mm/hugetlb.c | 23 ++++-----
mm/internal.h | 1 +
mm/kfence/core.c | 17 ++++---
mm/memremap.c | 3 ++
mm/mm_init.c | 13 ++---
mm/page_alloc.c | 5 +-
mm/pagewalk.c | 2 +-
mm/percpu-km.c | 2 +-
mm/util.c | 33 +++++++++++++
tools/testing/scatterlist/linux/mm.h | 1 -
.../selftests/wireguard/qemu/kernel.config | 1 -
40 files changed, 203 insertions(+), 150 deletions(-)
base-commit: c0e3b3f33ba7b767368de4afabaf7c1ddfdc3872
--
2.50.1
From: Vivek Yadav <vivekyadav1207731111(a)gmail.com>
Hi all,
This small series makes cosmetic style cleanups in the arm64 kselftests
to improve readability and suppress checkpatch warnings. These changes
are purely cosmetic and do not affect functionality.
Changes in this series:
* Suppress unnecessary checkpatch warning in a comment
* Add parentheses around sizeof for clarity
* Remove redundant blank line
---
Vivek Yadav (3):
kselftest/arm64: Remove extra blank line
kselftest/arm64: Supress warning and improve readability
kselftest/arm64: Add parentheses around sizeof for clarity
tools/testing/selftests/arm64/abi/hwcap.c | 1 -
tools/testing/selftests/arm64/bti/assembler.h | 1 -
tools/testing/selftests/arm64/fp/fp-ptrace.c | 1 -
tools/testing/selftests/arm64/fp/fp-stress.c | 4 ++--
tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +-
tools/testing/selftests/arm64/fp/vec-syscfg.c | 1 -
tools/testing/selftests/arm64/fp/zt-ptrace.c | 1 -
tools/testing/selftests/arm64/gcs/gcs-locking.c | 1 -
8 files changed, 3 insertions(+), 9 deletions(-)
--
2.25.1