All packets in the same flow (L3/L4 depending on multipath hash policy)
should be directed to the same target, but after [0]/[1] we see stray
packets directed towards other targets. This, for instance, causes RST
to be sent on TCP connections.
The first two patches solve the problem by ignoring route hints for
destinations that are part of multipath group, by using new SKB flags
for IPv4 and IPv6. The third patch is a selftest that tests the
scenario.
Thanks to Ido, for reviewing and suggesting a way forward in [2].
[0]: commit 02b24941619f ("ipv4: use dst hint for ipv4 list receive")
[1]: commit 197dbf24e360
("ipv6: introduce and uses route look hints for list input.")
[2]: https://lore.kernel.org/netdev/20230815201048.1796-1-sriram.yagnaraman@est.…
Sriram Yagnaraman (3):
ipv4: ignore dst hint for multipath routes
ipv6: ignore dst hint for multipath routes
selftests: forwarding: Add test for load-balancing between multiple
servers
include/linux/ipv6.h | 1 +
include/net/ip.h | 1 +
net/ipv4/ip_input.c | 3 +-
net/ipv4/route.c | 1 +
net/ipv6/ip6_input.c | 3 +-
net/ipv6/route.c | 2 +
.../testing/selftests/net/forwarding/Makefile | 1 +
.../net/forwarding/router_multipath_vip.sh | 403 ++++++++++++++++++
8 files changed, 413 insertions(+), 2 deletions(-)
create mode 100755 tools/testing/selftests/net/forwarding/router_multipath_vip.sh
--
2.34.1
On Thu, Jun 22 2023 at 14:01, Erdem Aktas wrote:
> On Mon, Jun 12, 2023 at 12:03 PM Dan Williams <dan.j.williams(a)intel.com>
> wrote:
>> Now multiple
>> confidential computing vendors trying to develop similar flows with
>> differentiated formats where that differentiation need not leak over the
>> ABI boundary.
>>
>
> <Just my personal opinion below>
> I agree with this statement in the high level but it is also somehow
> surprising for me after all the discussion happened around this topic.
> Honestly, I feel like there are multiple versions of "Intel" working in
> different directions.
>
> If we want multiple vendors trying to do the similar things behind a common
> ABI, it should start with the spec. Since this comment is coming from
> Intel, I wonder if there is any plan to combine the GHCB and GHCI
> interfaces under common ABI in the future or why it did not even happen in
> the first place.
You are conflating things here.
The GETQUOTE TDVMCALL interface is part of the Guest-Hypervisor
Communication Interface (GHCI), which is a firmware interface.
Firmware (likewise hardware) interfaces have the unfortunate property
that they are mostly cast in stone.
But that has absolutely nothing to do with the way how the kernel
implements support for them. If we'd follow your reasoning then we'd
have a gazillion of vendor specific SCSI stacks in the kernel.
> What I see is that Intel has GETQUOTE TDVMCALL interface in its spec and
> again Intel does not really want to provide support for it in linux. It
> feels really frustrating.
Intel definitely wants to provide support for this interface and this
very thread is about that support. But Intel is not in a position to
define what the kernel community has to accept or not, neither is
Google.
Sure, it would have been more efficient to come up with a better
interface earlier, but that's neither an Intel nor a TDX specific
problem.
It's just how kernel development works. Some ideas look good on first
sight, some stuff slips through and at some point the maintainers
realize that this is not the way to go and request a proper generalized
and maintainable implementation.
If you can provide compelling technical reasons why the IOCTL is the
better and more maintainable approach for the kernel, then we are all
ears and happy to debate that on the technical level.
Feel free to be frustrated, but I can assure you that the only way to
resolve this dilemma is to sit down and actually get work done in a way
which is acceptable by the kernel community at the technical level.
Everything else is frustrating for everyone involved, not only you.
Thanks,
tglx
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2, this will be implemented separately. Executables
are started without GCS and must use a prctl() to enable it, it is
expected that this will be done very early in application execution by
the dynamic linker or other startup code. For dynamic linking this will
be done by checking that everything in the executable is marked as GCS
compatible.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V zisslpcfi feature which I initially
adopted fairly directly but following review feedback has been revised
quite a bit.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
There's a few bits where I'm not convinced with where I've placed
things, in particular the GCS write operation is in the GCS header not
in uaccess.h, I wasn't sure what was clearest there and am probably too
close to the code to have a clear opinion. The reporting of GCS in
/proc/PID/smaps is also a bit awkward.
The series depends on the x86 shadow stack support:
https://lore.kernel.org/lkml/20230227222957.24501-1-rick.p.edgecombe@intel.…
I've rebased this onto v6.5-rc4 but not included it in the series in
order to avoid confusion with Rick's work and cut down the size of the
series, you can see the branch at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
Pending feedback from Catalin:
- Switch copy_to_user_gcs() to be put_user_gcs().
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (37):
arm64/mm: Restructure arch_validate_flags() for extensibility
prctl: arch-agnostic prctl for shadow stack
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide copy_to_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/gcs: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS state for EL0
arm64/gcs: Allocate a new GCS for threads with GCS enabled
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
selftests/arm64: Add GCS signal tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Enable GCS for the FP stress tests
Documentation/admin-guide/kernel-parameters.txt | 3 +
Documentation/arch/arm64/booting.rst | 22 +
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 233 +++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 19 +
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 106 +++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/mman.h | 20 +-
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 +
arch/arm64/include/asm/uaccess.h | 42 ++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 +
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 89 +++
arch/arm64/kernel/ptrace.c | 59 ++
arch/arm64/kernel/signal.c | 237 ++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 +
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 79 ++-
arch/arm64/mm/gcs.c | 225 +++++++
arch/arm64/mm/mmap.c | 13 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 55 ++
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 +
kernel/sys.c | 30 +
kernel/sys_ni.c | 1 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 +
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 5 +
tools/testing/selftests/arm64/gcs/Makefile | 24 +
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
tools/testing/selftests/arm64/gcs/basic-gcs.c | 356 ++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 742 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 ++
.../arm64/signal/testcases/gcs_exception_fault.c | 59 ++
.../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
73 files changed, 4096 insertions(+), 38 deletions(-)
---
base-commit: e514f673179ed8af6c64d79f8d43e2569ad6cb9f
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
I ran all kernel selftests on some test machine, and stumbled upon
cachestat failing (among others).
These patches fix the run on older kernels and when the current
directory is on a tmpfs instance.
I dropped the first two fix patches from v1, since Shuah applied those
already. [PATCH v2 1/2] is almost the same as [PATCH 3/3] from v1, but
using the proper skip function from kselftest.h. I am not sure if Shuah
applied that already, if yes, it's not a big problem, the output is the
same.
Patch 2/2 implements the tmpfs detection that Nhat suggested the last
time (many thanks for pointing me to statfs and the magics!).
Cheers,
Andre
Andre Przywara (2):
selftests: cachestat: test for cachestat availability
selftests: cachestat: catch failing fsync test on tmpfs
.../selftests/cachestat/test_cachestat.c | 80 +++++++++++++++----
1 file changed, 65 insertions(+), 15 deletions(-)
--
2.25.1
From: Björn Töpel <bjorn(a)rivosinc.com>
When kselftest is built/installed with the 'gen_tar' target, rsync is
used for the installation step to copy files. Extra care is needed for
tests that have symlinks. Commit ae108c48b5d2 ("selftests: net: Fix
cross-tree inclusion of scripts") added '-L' (transform symlink into
referent file/dir) to rsync, to fix dangling links. However, that
broke some tests where the symlink (being a symlink) is part of the
test (e.g. exec:execveat).
Use rsync's '--copy-unsafe-links' that does right thing.
Fixes: ae108c48b5d2 ("selftests: net: Fix cross-tree inclusion of scripts")
Signed-off-by: Björn Töpel <bjorn(a)rivosinc.com>
---
tools/testing/selftests/lib.mk | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index d17854285f2b..118e0964bda9 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -106,7 +106,7 @@ endef
run_tests: all
ifdef building_out_of_srctree
@if [ "X$(TEST_PROGS)$(TEST_PROGS_EXTENDED)$(TEST_FILES)" != "X" ]; then \
- rsync -aLq $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(OUTPUT); \
+ rsync -aq --copy-unsafe-links $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(OUTPUT); \
fi
@if [ "X$(TEST_PROGS)" != "X" ]; then \
$(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) \
@@ -120,7 +120,7 @@ endif
define INSTALL_SINGLE_RULE
$(if $(INSTALL_LIST),@mkdir -p $(INSTALL_PATH))
- $(if $(INSTALL_LIST),rsync -aL $(INSTALL_LIST) $(INSTALL_PATH)/)
+ $(if $(INSTALL_LIST),rsync -a --copy-unsafe-links $(INSTALL_LIST) $(INSTALL_PATH)/)
endef
define INSTALL_RULE
base-commit: f7757129e3dea336c407551c98f50057c22bb266
--
2.39.2
This patch chain changes the logging implementation to use string_stream
so that the log will grow dynamically.
The first 8 patches add test code for string_stream, and make some
changes to string_stream needed to be able to use it for the log.
The final patch adds a performance report of string_stream.
CHANGES SINCE V3:
Completely rewritten to use string_stream instead of implementing a
separate extending-buffer implementation for logging.
I have used the performance test from the final patch on my original
fixed-size-fragment implementation from V3 to get a comparison of the
two implementations (run on i3-8145UE CPU @ 2.20GHz):
string_stream V3 fixed-size-buffer
Time elapsed: 7748 us 3251 us
Total string length: 573890 573890
Bytes requested: 823994 728336
Actual bytes allocated: 1061440 728352
I don't think the difference is enough to be worth complicating the
string_stream implementation with my fixed-fragment implementation from
V3 of this patch chain.
Richard Fitzgerald (10):
kunit: string-stream: Improve testing of string_stream
kunit: string-stream: Don't create a fragment for empty strings
kunit: string-stream: Add cases for adding empty strings to a
string_stream
kunit: string-stream: Add option to make all lines end with newline
kunit: string-stream: Add cases for string_stream newline appending
kunit: string-stream: Pass struct kunit to string_stream_get_string()
kunit: string-stream: Decouple string_stream from kunit
kunit: string-stream: Add test for freeing resource-managed
string_stream
kunit: Use string_stream for test log
kunit: string-stream: Test performance of string_stream
include/kunit/test.h | 14 +-
lib/kunit/Makefile | 5 +-
lib/kunit/debugfs.c | 36 ++-
lib/kunit/kunit-test.c | 52 +---
lib/kunit/log-test.c | 72 ++++++
lib/kunit/string-stream-test.c | 447 +++++++++++++++++++++++++++++++--
lib/kunit/string-stream.c | 129 +++++++---
lib/kunit/string-stream.h | 22 +-
lib/kunit/test.c | 48 +---
9 files changed, 656 insertions(+), 169 deletions(-)
create mode 100644 lib/kunit/log-test.c
--
2.30.2