This series enables support for the data processing extensions in the
newly released 2023 architecture, this is mainly support for 8 bit
floating point formats. Most of the extensions only introduce new
instructions and therefore only require hwcaps but there is a new EL0
visible control register FPMR used to control the 8 bit floating point
formats, we need to manage traps for this and context switch it.
Due to uncertainty with the plan for parsing ID registers to identify
which features to expose to the guest the KVM support is placed at the
end of the series, it will need to be revised once that issue is
resolved. The sharing of floating point save code between the host and
guest kernels slightly complicates the introduction of KVM support, we
first introduce host support with some placeholders for KVM then replace
those with the actual KVM support.
I've not added test coverage for ptrace, I've got a test program which
exercises all the FP ptrace interfaces and their interactions together,
my plan is to cover it there rather than add another tiny test program
that duplicates the boilerplace for tracing a target and doesn't
actually run the traced program.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v4:
- Rebase onto v6.8-rc1.
- Move KVM support to the end of the series.
- Link to v3: https://lore.kernel.org/r/20231205-arm64-2023-dpisa-v3-0-dbcbcd867a7f@kerne…
Changes in v3:
- Rebase onto v6.7-rc3.
- Hook up traps for FPMR in emulate-nested.c.
- Link to v2: https://lore.kernel.org/r/20231114-arm64-2023-dpisa-v2-0-47251894f6a8@kerne…
Changes in v2:
- Rebase onto v6.7-rc1.
- Link to v1: https://lore.kernel.org/r/20231026-arm64-2023-dpisa-v1-0-8470dd989bb2@kerne…
---
Mark Brown (14):
arm64/cpufeature: Hook new identification registers up to cpufeature
arm64/fpsimd: Enable host kernel access to FPMR
arm64/fpsimd: Support FEAT_FPMR
arm64/signal: Add FPMR signal handling
arm64/ptrace: Expose FPMR via ptrace
arm64/hwcap: Define hwcaps for 2023 DPISA features
kselftest/arm64: Handle FPMR context in generic signal frame parser
kselftest/arm64: Add basic FPMR test
kselftest/arm64: Add 2023 DPISA hwcap test coverage
KVM: arm64: Share all userspace hardened thread data with the hypervisor
KVM: arm64: Add newly allocated ID registers to register descriptions
KVM: arm64: Support FEAT_FPMR for guests
KVM: arm64: selftests: Document feature registers added in 2023 extensions
KVM: arm64: selftests: Teach get-reg-list about FPMR
Documentation/arch/arm64/elf_hwcaps.rst | 49 +++++
arch/arm64/include/asm/cpu.h | 3 +
arch/arm64/include/asm/cpufeature.h | 5 +
arch/arm64/include/asm/fpsimd.h | 2 +
arch/arm64/include/asm/hwcap.h | 15 ++
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 5 +-
arch/arm64/include/asm/processor.h | 6 +-
arch/arm64/include/uapi/asm/hwcap.h | 15 ++
arch/arm64/include/uapi/asm/sigcontext.h | 8 +
arch/arm64/kernel/cpufeature.c | 72 +++++++
arch/arm64/kernel/cpuinfo.c | 18 ++
arch/arm64/kernel/fpsimd.c | 13 ++
arch/arm64/kernel/ptrace.c | 42 ++++
arch/arm64/kernel/signal.c | 59 ++++++
arch/arm64/kvm/emulate-nested.c | 8 +
arch/arm64/kvm/fpsimd.c | 14 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 9 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 +-
arch/arm64/kvm/sys_regs.c | 17 +-
arch/arm64/tools/cpucaps | 1 +
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/abi/hwcap.c | 217 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../arm64/signal/testcases/fpmr_siginfo.c | 82 ++++++++
.../selftests/arm64/signal/testcases/testcases.c | 8 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 11 +-
28 files changed, 670 insertions(+), 20 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20231003-arm64-2023-dpisa-2f3d25746474
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Major changes in v1:
--------------
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Changes in RFC v3:
------------------
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Changes in RFC v2:
------------------
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
----------------------
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this RFC is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This RFC is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Jakub Kicinski (2):
net: page_pool: factor out releasing DMA from releasing the page
net: page_pool: create hooks for custom page providers
Mina Almasry (14):
queue_api: define queue api
gve: implement queue api
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
memory-provider: dmabuf devmem memory provider
page_pool: device memory support
page_pool: don't release iov on elevanted refcount
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 52 ++
Documentation/networking/devmem.rst | 270 ++++++++++
drivers/net/ethernet/google/gve/gve_adminq.c | 6 +-
drivers/net/ethernet/google/gve/gve_adminq.h | 3 +
drivers/net/ethernet/google/gve/gve_dqo.h | 2 +
drivers/net/ethernet/google/gve/gve_main.c | 286 +++++++++++
drivers/net/ethernet/google/gve/gve_rx_dqo.c | 5 +-
include/linux/netdevice.h | 24 +
include/linux/skbuff.h | 56 ++-
include/linux/socket.h | 1 +
include/net/devmem.h | 109 +++++
include/net/netdev_rx_queue.h | 1 +
include/net/page_pool/helpers.h | 162 +++++-
include/net/page_pool/types.h | 48 ++
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 14 +
net/core/datagram.c | 6 +
net/core/dev.c | 314 +++++++++++-
net/core/gro.c | 7 +-
net/core/netdev-genl-gen.c | 19 +
net/core/netdev-genl-gen.h | 2 +
net/core/netdev-genl.c | 124 +++++
net/core/page_pool.c | 239 +++++++--
net/core/skbuff.c | 108 +++-
net/core/sock.c | 38 ++
net/ipv4/tcp.c | 196 +++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 8 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 5 +
tools/testing/selftests/net/ncdevmem.c | 489 +++++++++++++++++++
37 files changed, 2585 insertions(+), 83 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.43.0.472.g3155946c3a-goog
Previous patch series[1] changes a mmap behavior that treats the hint
address as the upper bound of the mmap address range. The motivation of the
previous patch series is that some user space software may assume 48-bit
address space and use higher bits to encode some information, which may
collide with large virtual address space mmap may return. However, to make
sv48 by default, we don't need to change the meaning of the hint address on
mmap as the upper bound of the mmap address range, especially when this
behavior only shows up on the RISC-V. This behavior also breaks some user
space software which assumes mmap should try to create mapping on the hint
address if possible. As the mmap manpage said:
> If addr is not NULL, then the kernel takes it as a hint about where to
> place the mapping; on Linux, the kernel will pick a nearby page boundary
> (but always above or equal to the value specified by
> /proc/sys/vm/mmap_min_addr) and attempt to create the mapping there.
Unfortunately, what mmap said is not true on RISC-V since kernel v6.6.
Other ISAs with larger than 48-bit virtual address space like x86, arm64,
and powerpc do not have this special mmap behavior on hint address. They
all just make 48-bit / 47-bit virtual address space by default, and if a
user space software wants to large virtual address space, it only need to
specify a hint address larger than 48-bit / 47-bit.
Thus, this patch series keeps the change of mmap to use sv48 by default but
does not treat the hint address as the upper bound of the mmap address
range. After this patch, the behavior of mmap will align with existing
behavior on other ISAs with larger than 48-bit virtual address space like
x86, arm64, and powerpc. The user space software will no longer need to
rewrite their code to fit with this special mmap behavior only on RISC-V.
My concern is that the change of mmap behavior on the hint address is
already in the upstream kernel since v6.6, and it might be hard to revert
it although it already brings some regression on some user space software.
And it will be harder than adding it since v6.6 because mmap not creating
mapping on the hint address is very common, especially when running on a
machine without sv57 / sv48. However, if some user space software already
adopted this special mmap behavior on RISC-V, we should not return a mmap
address larger than the hint if the address is larger than BIT(38). My
opinion is that revert this change on the next kernel release might be a
good choice as only a few of hardware support sv57 / sv48 now, these
changes will have no impact on sv39 systems.
Moreover, previous patch series said it make sv48 by default, which is
in the cover letter, kernel documentation and MMAP_VA_BITS defination.
However, the code on arch_get_mmap_end and arch_get_mmap_base marco still
use sv39 by default, which makes me confused, and I still use sv48 by
default in this patch series including arch_get_mmap_end and
arch_get_mmap_base.
Changes in v2:
- correct arch_get_mmap_end and arch_get_mmap_base
- Add description in documentation about mmap behavior on kernel v6.6-6.7.
- Improve commit message and cover letter
- Rebase to newest riscv/for-next branch
- Link to v1: https://lore.kernel.org/linux-riscv/tencent_F3B3B5AB1C9D704763CA423E1A41F8B…
[1]. https://lore.kernel.org/linux-riscv/20230809232218.849726-1-charlie@rivosin…
Yangyu Chen (3):
RISC-V: mm: do not treat hint addr on mmap as the upper bound to
search
RISC-V: mm: only test mmap without hint
Documentation: riscv: correct sv57 kernel behavior
Documentation/arch/riscv/vm-layout.rst | 54 ++++++++++++-------
arch/riscv/include/asm/processor.h | 38 +++----------
.../selftests/riscv/mm/mmap_bottomup.c | 12 -----
.../testing/selftests/riscv/mm/mmap_default.c | 12 -----
tools/testing/selftests/riscv/mm/mmap_test.h | 30 -----------
5 files changed, 41 insertions(+), 105 deletions(-)
--
2.43.0
The RISC-V arch_timer selftests is used to validate Sstc timer
functionality in a guest, which sets up periodic timer interrupts
and check the basic interrupt status upon its receipt.
This KVM selftests was ported from aarch64 arch_timer and tested
with Linux v6.7-rc8 on a Qemu riscv64 virt machine.
---
Changed since v4:
* Rebased to Linux 6.7-rc8
* Added new patch(2/12) to clean up the data type in struct test_args
* Re-ordered patch(11/11) in v4 to patch(3/12)
* Changed the timer_err_margin_us type from int to uint32_t
Haibo Xu (11):
KVM: arm64: selftests: Data type cleanup for arch_timer test
KVM: arm64: selftests: Enable tuning of error margin in arch_timer
test
KVM: arm64: selftests: Split arch_timer test code
KVM: selftests: Add CONFIG_64BIT definition for the build
tools: riscv: Add header file csr.h
tools: riscv: Add header file vdso/processor.h
KVM: riscv: selftests: Switch to use macro from csr.h
KVM: riscv: selftests: Add exception handling support
KVM: riscv: selftests: Add guest helper to get vcpu id
KVM: riscv: selftests: Change vcpu_has_ext to a common function
KVM: riscv: selftests: Add sstc timer test
Paolo Bonzini (1):
selftests/kvm: Fix issues with $(SPLIT_TESTS)
tools/arch/riscv/include/asm/csr.h | 541 ++++++++++++++++++
tools/arch/riscv/include/asm/vdso/processor.h | 32 ++
tools/testing/selftests/kvm/Makefile | 27 +-
.../selftests/kvm/aarch64/arch_timer.c | 295 +---------
tools/testing/selftests/kvm/arch_timer.c | 259 +++++++++
.../selftests/kvm/include/aarch64/processor.h | 4 -
.../selftests/kvm/include/kvm_util_base.h | 9 +
.../selftests/kvm/include/riscv/arch_timer.h | 71 +++
.../selftests/kvm/include/riscv/processor.h | 65 ++-
.../testing/selftests/kvm/include/test_util.h | 2 +
.../selftests/kvm/include/timer_test.h | 45 ++
.../selftests/kvm/lib/riscv/handlers.S | 101 ++++
.../selftests/kvm/lib/riscv/processor.c | 87 +++
.../testing/selftests/kvm/riscv/arch_timer.c | 111 ++++
.../selftests/kvm/riscv/get-reg-list.c | 11 +-
15 files changed, 1353 insertions(+), 307 deletions(-)
create mode 100644 tools/arch/riscv/include/asm/csr.h
create mode 100644 tools/arch/riscv/include/asm/vdso/processor.h
create mode 100644 tools/testing/selftests/kvm/arch_timer.c
create mode 100644 tools/testing/selftests/kvm/include/riscv/arch_timer.h
create mode 100644 tools/testing/selftests/kvm/include/timer_test.h
create mode 100644 tools/testing/selftests/kvm/lib/riscv/handlers.S
create mode 100644 tools/testing/selftests/kvm/riscv/arch_timer.c
--
2.34.1
From: Deepak Gupta <debug(a)rivosinc.com>
It's been almost an year since I posted my last patch series [1] to
enable CPU assisted control-flow integrity for usermode on riscv. A lot
has changed since then and so has the patches. It's been a while and since
this is a reboot of series, starting with RFC and v1.
Securing control-flow integrity for usermode requires following
- Securing forward control flow : All callsites must reach
reach a target that they actually intend to reach.
- Securing backward control flow : All function returns must
return to location where they were called from.
This patch series use riscv cpu extension `zicfilp` [2] to secure forward
control flow and `zicfiss` [2] to secure backward control flow. `zicfilp`
enforces that all indirect calls or jmps must land on a landing pad instr
and label embedded in landing pad instr must match a value programmed in
`x7` register (at callsite via compiler). `zicfiss` introduces shadow stack
which can only be writeable via shadow stack instructions (sspush and
ssamoswap) and thus can't be tampered with via inadvertent stores. More
details about extension can be read from [2] and there are details in
documentation as well (in this patch series).
Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
Enabling of control flow integrity for user programs is left to user runtime
(specifically expected from dynamic loader). There has been a lot of earlier
discussion on the enabling topic around x86 shadow stack enabling [3, 4, 5] and
overall consensus had been to let dynamic loader (or usermode) to decide for
enabling the feature.
This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv. arm64 is expected
to implement shadow stack part of these arch agnostic `prctls` [6]
Changes since last time
***********************
Spec changes
------------
- Forward cfi spec has become much simpler. `lpad` instruction is pseudo for
`auipc rd, <20bit_imm>`. `lpad` checks x7 against 20bit embedded in instr.
Thus label width is 20bit.
- Shadow stack management instructions are reduced to
sspush - to push x1/x5 on shadow stack
sspopchk - pops from shadow stack and comapres with x1/x5.
ssamoswap - atomically swap value on shadow stack.
rdssp - reads current shadow stack pointer
- Shadow stack accesses on readonly memory always raise AMO/store page fault.
`sspopchk` is load but if underlying page is readonly, it'll raise a store
page fault. It simplifies hardware and kernel for COW handling for shadow
stack pages.
- riscv defines a new exception type `software check exception` and control flow
violations raise software check exception.
- enabling controls for shadow stack and landing are in xenvcfg CSR and controls
lower privilege mode enabling. As an example senvcfg controls enabling for U and
menvcfg controls enabling for S mode.
core mm shadow stack enabling
-----------------------------
Shadow stack for x86 usermode are now in mainline and thus this patch
series builds on top of that for arch-agnostic mm related changes. Big
thanks and shout out to Rick Edgecombe for that.
selftests
---------
Created some minimal selftests to test the patch series.
[1] - https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
[2] - https://github.com/riscv/riscv-cfi
[3] - https://lore.kernel.org/lkml/ZWHcBq0bJ+15eeKs@finisterre.sirena.org.uk/T/#m…
[4] - https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.co…
[5] - https://lore.kernel.org/lkml/CAHk-=wgP5mk3poVeejw16Asbid0ghDt4okHnWaWKLBkRh…
[6] - https://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-2-201c483bd775@kerne…
Deepak Gupta (27):
riscv: abstract envcfg CSR
riscv: envcfg save and restore on trap entry/exit
riscv: define default value for envcfg
riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv
riscv: zicfiss/zicfilp enumeration
riscv: zicfiss/zicfilp extension csr and bit definitions
riscv: kernel handling on trap entry/exit for user cfi
mm: Define VM_SHADOW_STACK for RISC-V
mm: abstract shadow stack vma behind `arch_is_shadow_stack`
riscv/mm : Introducing new protection flag "PROT_SHADOWSTACK"
riscv: Implementing "PROT_SHADOWSTACK" on riscv
riscv mm: manufacture shadow stack pte
riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs
riscv mmu: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
prctl: arch-agnostic prtcl for indirect branch tracking
riscv: Implements arch agnostic shadow stack prctls
riscv: Implements arch argnostic indirect branch tracking prctls
riscv/traps: Introduce software check exception
riscv sigcontext: adding cfi state field in sigcontext
riscv signal: Save and restore of shadow stack for signal
riscv: select config for shadow stack and landing pad instr support
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Mark Brown (1):
prctl: arch-agnostic prctl for shadow stack
Documentation/arch/riscv/zicfilp.rst | 104 ++++
Documentation/arch/riscv/zicfiss.rst | 169 ++++++
arch/riscv/Kconfig | 16 +
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/cpufeature.h | 18 +
arch/riscv/include/asm/csr.h | 20 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 42 ++
arch/riscv/include/asm/pgtable.h | 32 +-
arch/riscv/include/asm/processor.h | 2 +
arch/riscv/include/asm/thread_info.h | 4 +
arch/riscv/include/asm/usercfi.h | 106 ++++
arch/riscv/include/uapi/asm/ptrace.h | 18 +
arch/riscv/include/uapi/asm/sigcontext.h | 5 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/asm-offsets.c | 6 +-
arch/riscv/kernel/cpufeature.c | 4 +-
arch/riscv/kernel/entry.S | 32 ++
arch/riscv/kernel/process.c | 16 +
arch/riscv/kernel/ptrace.c | 83 +++
arch/riscv/kernel/signal.c | 45 ++
arch/riscv/kernel/sys_riscv.c | 19 +
arch/riscv/kernel/traps.c | 38 ++
arch/riscv/kernel/usercfi.c | 497 ++++++++++++++++++
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 21 +
include/linux/mm.h | 35 +-
include/uapi/asm-generic/mman.h | 1 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 49 ++
kernel/sys.c | 60 +++
mm/gup.c | 5 +-
mm/internal.h | 2 +-
mm/mmap.c | 1 +
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/Makefile | 10 +
.../testing/selftests/riscv/cfi/cfi_rv_test.h | 85 +++
.../selftests/riscv/cfi/riscv_cfi_test.c | 91 ++++
.../testing/selftests/riscv/cfi/shadowstack.c | 376 +++++++++++++
.../testing/selftests/riscv/cfi/shadowstack.h | 39 ++
40 files changed, 2050 insertions(+), 11 deletions(-)
create mode 100644 Documentation/arch/riscv/zicfilp.rst
create mode 100644 Documentation/arch/riscv/zicfiss.rst
create mode 100644 arch/riscv/include/asm/mman.h
create mode 100644 arch/riscv/include/asm/usercfi.h
create mode 100644 arch/riscv/kernel/usercfi.c
create mode 100644 tools/testing/selftests/riscv/cfi/Makefile
create mode 100644 tools/testing/selftests/riscv/cfi/cfi_rv_test.h
create mode 100644 tools/testing/selftests/riscv/cfi/riscv_cfi_test.c
create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.c
create mode 100644 tools/testing/selftests/riscv/cfi/shadowstack.h
--
2.43.0
If an integer's type has x bits, shifting the integer left by x or more
is undefined behavior.
This can happen in the rotate function when attempting to do a rotation
of the whole value by 0.
Fixes: 0dd714bfd200 ("KVM: s390: selftest: memop: Add cmpxchg tests")
Signed-off-by: Nina Schoetterl-Glausch <nsg(a)linux.ibm.com>
---
v1 -> v2:
use early return instead of modulus
tools/testing/selftests/kvm/s390x/memop.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/kvm/s390x/memop.c b/tools/testing/selftests/kvm/s390x/memop.c
index bb3ca9a5d731..4ec8d0181e8d 100644
--- a/tools/testing/selftests/kvm/s390x/memop.c
+++ b/tools/testing/selftests/kvm/s390x/memop.c
@@ -489,6 +489,8 @@ static __uint128_t rotate(int size, __uint128_t val, int amount)
amount = (amount + bits) % bits;
val = cut_to_size(size, val);
+ if (!amount)
+ return val;
return (val << (bits - amount)) | (val >> amount);
}
base-commit: 305230142ae0637213bf6e04f6d9f10bbcb74af8
--
2.40.1
Hi all:
The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.
Earlier implementations of amd-pstate preferred core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.
Amd-pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it amd-pstate preferred core.
Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
Amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.
Amd-pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When amd-pstate
driver receives a message with the highest performance change, it will
update the core ranking.
Changes from V13->V14:
- cpufreq:
- - fix build error without CONFIG_CPU_FREQ
- ACPI: CPPC:
Changes from V12->V13:
- ACPI: CPPC:
- - modify commit message.
- - modify handle function of the notify(0x85).
- cpufreq: amd-pstate:
- - implement update_limits() callback function.
- x86:
- - pick up Acked-By flag added by Petkov.
Changes from V11->V12:
- all:
- - pick up Reviewed-By flag added by Perry.
- cpufreq: amd-pstate:
- - rebase the latest linux-next and fixed conflicts.
- - fixed the issue about cpudata without init in amd_pstate_update_highest_perf().
Changes from V10->V11:
- cpufreq: amd-pstate:
- - according Perry's commnts, I replace the string with str_enabled_disable().
Changes from V9->V10:
- cpufreq: amd-pstate:
- - add judgement for highest_perf. When it is less than 255, the
preferred core feature is enabled. And it will set the priority.
- - deleset "static u32 max_highest_perf" etc, because amd p-state
perferred coe does not require specail process for hotpulg.
Changes form V8->V9:
- all:
- - pick up Tested-By flag added by Oleksandr.
- cpufreq: amd-pstate:
- - pick up Review-By flag added by Wyes.
- - ignore modification of bug.
- - add a attribute of prefcore_ranking.
- - modify data type conversion from u32 to int.
- Documentation: amd-pstate:
- - pick up Review-By flag added by Wyes.
Changes form V7->V8:
- all:
- - pick up Review-By flag added by Mario and Ray.
- cpufreq: amd-pstate:
- - use hw_prefcore embeds into cpudata structure.
- - delete preferred core init from cpu online/off.
Changes form V6->V7:
- x86:
- - Modify kconfig about X86_AMD_PSTATE.
- cpufreq: amd-pstate:
- - modify incorrect comments about scheduler_work().
- - convert highest_perf data type.
- - modify preferred core init when cpu init and online.
- ACPI: CPPC:
- - modify link of CPPC highest performance.
- cpufreq:
- - modify link of CPPC highest performance changed.
Changes form V5->V6:
- cpufreq: amd-pstate:
- - modify the wrong tag order.
- - modify warning about hw_prefcore sysfs attribute.
- - delete duplicate comments.
- - modify the variable name cppc_highest_perf to prefcore_ranking.
- - modify judgment conditions for setting highest_perf.
- - modify sysfs attribute for CPPC highest perf to pr_debug message.
- Documentation: amd-pstate:
- - modify warning: title underline too short.
Changes form V4->V5:
- cpufreq: amd-pstate:
- - modify sysfs attribute for CPPC highest perf.
- - modify warning about comments
- - rebase linux-next
- cpufreq:
- - Moidfy warning about function declarations.
- Documentation: amd-pstate:
- - align with ``amd-pstat``
Changes form V3->V4:
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V2->V3:
- x86:
- - Modify kconfig and description.
- cpufreq: amd-pstate:
- - Add Co-developed-by tag in commit message.
- cpufreq:
- - Modify commit message.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V1->V2:
- ACPI: CPPC:
- - Add reference link.
- cpufreq:
- - Moidfy link error.
- cpufreq: amd-pstate:
- - Init the priorities of all online CPUs
- - Use a single variable to represent the status of preferred core.
- Documentation:
- - Default enabled preferred core.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
- - Default enabled preferred core.
- - Use a single variable to represent the status of preferred core.
Meng Li (7):
x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
ACPI: CPPC: Add get the highest performance cppc control
cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
cpufreq: Add a notification message that the highest perf has changed
cpufreq: amd-pstate: Update amd-pstate preferred core ranking
dynamically
Documentation: amd-pstate: introduce amd-pstate preferred core
Documentation: introduce amd-pstate preferrd core mode kernel command
line options
.../admin-guide/kernel-parameters.txt | 5 +
Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++-
arch/x86/Kconfig | 5 +-
drivers/acpi/cppc_acpi.c | 13 ++
drivers/acpi/processor_driver.c | 6 +
drivers/cpufreq/amd-pstate.c | 183 +++++++++++++++++-
include/acpi/cppc_acpi.h | 5 +
include/linux/amd-pstate.h | 10 +
include/linux/cpufreq.h | 1 +
9 files changed, 275 insertions(+), 12 deletions(-)
--
2.34.1
This series adds a few missing functions to the shell KTAP helpers
script which are present in the C counterpart, kselftest.h.
This series depends on
"selftests: Move KTAP bash helpers to selftests common folder"
https://lore.kernel.org/all/20240102141528.169947-1-laura.nao@collabora.com/
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Nícolas F. R. A. Prado (4):
selftests: ktap_helpers: Add helper to print diagnostic messages
selftests: ktap_helpers: Add helper to pass/fail test based on exit code
selftests: ktap_helpers: Add a helper to abort the test
selftests: ktap_helpers: Add a helper to finish the test
tools/testing/selftests/kselftest/ktap_helpers.sh | 39 +++++++++++++++++++++--
1 file changed, 37 insertions(+), 2 deletions(-)
---
base-commit: f1ca07220ad16a9efae7f68e4bade0db89b63a3c
change-id: 20240131-ktap-sh-helpers-extend-805b77ca773c
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>