Real-time setups try hard to ensure proper isolation between time
critical applications and e.g. network processing performed by the
network stack in softirq and RPS is used to move the softirq
activity away from the isolated core.
If the network configuration is dynamic, with netns and devices
routinely created at run-time, enforcing the correct RPS setting
on each newly created device allowing to transient bad configuration
became complex.
These series try to address the above, introducing a new
sysctl knob: rps_default_mask. The new sysctl entry allows
configuring a systemwide RPS mask, to be enforced since receive
queue creation time without any fourther per device configuration
required.
Additionally, a simple self-test is introduced to check the
rps_default_mask behavior.
v1 -> v2:
- fix sparse warning in patch 2/3
Paolo Abeni (3):
net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper
net/core: introduce default_rps_mask netns attribute
self-tests: introduce self-tests for RPS default mask
Documentation/admin-guide/sysctl/net.rst | 6 ++
include/linux/netdevice.h | 1 +
net/core/net-sysfs.c | 73 +++++++++++--------
net/core/sysctl_net_core.c | 58 +++++++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 3 +
.../testing/selftests/net/rps_default_mask.sh | 57 +++++++++++++++
7 files changed, 169 insertions(+), 30 deletions(-)
create mode 100755 tools/testing/selftests/net/rps_default_mask.sh
--
2.26.2
From: Vincent Cheng <vincent.cheng.xh(a)renesas.com>
This series adds adjust phase to the PTP Hardware Clock device interface.
Some PTP hardware clocks have a write phase mode that has
a built-in hardware filtering capability. The write phase mode
utilizes a phase offset control word instead of a frequency offset
control word. Add adjust phase function to take advantage of this
capability.
Changes since v1:
- As suggested by Richard Cochran:
1. ops->adjphase is new so need to check for non-null function pointer.
2. Kernel coding style uses lower_case_underscores.
3. Use existing PTP clock API for delayed worker.
Vincent Cheng (3):
ptp: Add adjphase function to support phase offset control.
ptp: Add adjust_phase to ptp_clock_caps capability.
ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
drivers/ptp/ptp_chardev.c | 1 +
drivers/ptp/ptp_clock.c | 3 ++
drivers/ptp/ptp_clockmatrix.c | 92 +++++++++++++++++++++++++++++++++++
drivers/ptp/ptp_clockmatrix.h | 8 ++-
include/linux/ptp_clock_kernel.h | 6 ++-
include/uapi/linux/ptp_clock.h | 4 +-
tools/testing/selftests/ptp/testptp.c | 6 ++-
7 files changed, 114 insertions(+), 6 deletions(-)
--
2.7.4
Current PTP driver exposes one PTP device to user which binds network
interface/interfaces to provide timestamping. Actually we have a way
utilizing timecounter/cyclecounter to virtualize any number of PTP
clocks based on a same free running physical clock for using.
The purpose of having multiple PTP virtual clocks is for user space
to directly/easily use them for multiple domains synchronization.
user
space: ^ ^
| SO_TIMESTAMPING new flag: | Packets with
| SOF_TIMESTAMPING_BIND_PHC | TX/RX HW timestamps
v v
+--------------------------------------------+
sock: | sock (new member sk_bind_phc) |
+--------------------------------------------+
^ ^
| ethtool_get_phc_vclocks | Convert HW timestamps
| | to sk_bind_phc
v v
+--------------+--------------+--------------+
vclock: | ptp1 | ptp2 | ptpN |
+--------------+--------------+--------------+
pclock: | ptp0 free running |
+--------------------------------------------+
The block diagram may explain how it works. Besides the PTP virtual
clocks, the packet HW timestamp converting to the bound PHC is also
done in sock driver. For user space, PTP virtual clocks can be
created via sysfs, and extended SO_TIMESTAMPING API (new flag
SOF_TIMESTAMPING_BIND_PHC) can be used to bind one PTP virtual clock
for timestamping.
The test tool timestamping.c (together with linuxptp phc_ctl tool) can
be used to verify:
# echo 4 > /sys/class/ptp/ptp0/n_vclocks
[ 129.399472] ptp ptp0: new virtual clock ptp2
[ 129.404234] ptp ptp0: new virtual clock ptp3
[ 129.409532] ptp ptp0: new virtual clock ptp4
[ 129.413942] ptp ptp0: new virtual clock ptp5
[ 129.418257] ptp ptp0: guarantee physical clock free running
#
# phc_ctl /dev/ptp2 set 10000
# phc_ctl /dev/ptp3 set 20000
#
# timestamping eno0 2 SOF_TIMESTAMPING_TX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
# timestamping eno0 2 SOF_TIMESTAMPING_RX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
# timestamping eno0 3 SOF_TIMESTAMPING_TX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
# timestamping eno0 3 SOF_TIMESTAMPING_RX_HARDWARE SOF_TIMESTAMPING_RAW_HARDWARE SOF_TIMESTAMPING_BIND_PHC
Changes for v2:
- Converted to num_vclocks for creating virtual clocks.
- Guranteed physical clock free running when using virtual
clocks.
- Fixed build warning.
- Updated copyright.
Changes for v3:
- Supported PTP virtual clock in default in PTP driver.
- Protected concurrency of ptp->num_vclocks accessing.
- Supported PHC vclocks query via ethtool.
- Extended SO_TIMESTAMPING API for PHC binding.
- Converted HW timestamps to PHC bound, instead of previous
binding domain value to PHC idea.
- Other minor fixes.
Changes for v4:
- Used do_aux_work callback for vclock refreshing instead.
- Used unsigned int for vclocks number, and max_vclocks
for limitiation.
- Fixed mutex locking.
- Dynamically allocated memory for vclock index storage.
- Removed ethtool ioctl command for vclocks getting.
- Updated doc for ethtool phc vclocks get.
- Converted to mptcp_setsockopt_sol_socket_timestamping().
- Passed so_timestamping for sock_set_timestamping.
- Fixed checkpatch/build.
- Other minor fixed.
Yangbo Lu (11):
ptp: add ptp virtual clock driver framework
ptp: support ptp physical/virtual clocks conversion
ptp: track available ptp vclocks information
ptp: add kernel API ptp_get_vclocks_index()
ethtool: add a new command for getting PHC virtual clocks
ptp: add kernel API ptp_convert_timestamp()
mptcp: setsockopt: convert to
mptcp_setsockopt_sol_socket_timestamping()
net: sock: extend SO_TIMESTAMPING for PHC binding
net: socket: support hardware timestamp conversion to PHC bound
selftests/net: timestamping: support binding PHC
MAINTAINERS: add entry for PTP virtual clock driver
Documentation/ABI/testing/sysfs-ptp | 20 ++
Documentation/networking/ethtool-netlink.rst | 22 ++
MAINTAINERS | 7 +
drivers/ptp/Makefile | 2 +-
drivers/ptp/ptp_clock.c | 41 +++-
drivers/ptp/ptp_private.h | 39 ++++
drivers/ptp/ptp_sysfs.c | 160 ++++++++++++++
drivers/ptp/ptp_vclock.c | 219 +++++++++++++++++++
include/linux/ethtool.h | 10 +
include/linux/ptp_clock_kernel.h | 31 ++-
include/net/sock.h | 8 +-
include/uapi/linux/ethtool_netlink.h | 15 ++
include/uapi/linux/net_tstamp.h | 17 +-
net/core/sock.c | 65 +++++-
net/ethtool/Makefile | 2 +-
net/ethtool/common.c | 14 ++
net/ethtool/netlink.c | 10 +
net/ethtool/netlink.h | 2 +
net/ethtool/phc_vclocks.c | 94 ++++++++
net/mptcp/sockopt.c | 69 ++++--
net/socket.c | 19 +-
tools/testing/selftests/net/timestamping.c | 62 ++++--
22 files changed, 875 insertions(+), 53 deletions(-)
create mode 100644 drivers/ptp/ptp_vclock.c
create mode 100644 net/ethtool/phc_vclocks.c
base-commit: 19938bafa7ae8fc0a4a2c1c1430abb1a04668da1
--
2.25.1
When running the openat2 test suite on ARM64 platform, we got below failure,
since the definition of the O_LARGEFILE is different on ARM64. So we can
set the correct O_LARGEFILE definition on ARM64 to fix this issue.
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
---
tools/testing/selftests/openat2/openat2_test.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing/selftests/openat2/openat2_test.c
index d7ec1e7..1bddbe9 100644
--- a/tools/testing/selftests/openat2/openat2_test.c
+++ b/tools/testing/selftests/openat2/openat2_test.c
@@ -22,7 +22,11 @@
* XXX: This is wrong on {mips, parisc, powerpc, sparc}.
*/
#undef O_LARGEFILE
+#ifdef __aarch64__
+#define O_LARGEFILE 0x20000
+#else
#define O_LARGEFILE 0x8000
+#endif
struct open_how_ext {
struct open_how inner;
--
1.8.3.1
This patchset is an implementation of futex2 interface on top of existing
futex.c code.
* What happened to the current futex()?
The futex() is implemented using a multiplexed interface that doesn't
scale well and gives headaches to people. We don't want to add more
features there.
* New features at futex2()
** NUMA-awareness
At the current implementation, all futex kernel side infrastructure is
stored on a single node. Given that, all futex() calls issued by
processors that aren't located on that node will have a memory access
penalty when doing it.
** Variable sized futexes
Futexes are used to implement atomic operations in userspace.
Supporting 8, 16, 32 and 64 bit sized futexes allows user libraries to
implement all those sizes in a performant way. Thanks Boost devs for
feedback: https://lists.boost.org/Archives/boost/2021/05/251508.php
Embedded systems or anything with memory constrains could benefit of
using smaller sizes for the futex userspace integer.
** Wait on multiple futexes
Proton's (a set of compatibility tools to run Windows games) fork of Wine
benefits of this feature to implement WaitForMultipleObjects from Win32 in
a performant way. Native game engines will benefit from this as well,
given that this is a common wait pattern for games.
* The interface
The new interface has one syscall per operation as opposed to the
current multiplexing one. The details can be found in the following
patches, but this is a high level summary of what the interface can do:
- Supports wake/wait semantics, as in futex()
- Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
individual flags for each address
- Supports waiting for a vector of futexes, using a new syscall named
futex_waitv()
- The following features will be implemented in next patchset versions:
- Supports variable sized futexes (8bits, 16bits, 32bits and 64bits)
- Supports NUMA-awareness operations, where the user can specify on
which memory node would like to operate
* The patchset
Given that futex2 reuses futex code, the patches make futex.c functions
public and modify them as needed.
This patchset can be also found at my git tree:
https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev
- Patch 1: Implements 32bit wait/wake
- Patches 2-3: Implement waitv and requeue.
- Patch 4: Add a documentation file which details the interface and
the internal implementation.
- Patches 5-10: Selftests for all operations along with perf
support for futex2.
- Patch 11: Proof of concept of waking threads at waitpid(), not to be
merged as it is.
* Testing
** Stability
- glibc[1]: nptl's low level locking was modified to use futex2 API
(except for PI). All nptl/ tests passed.
- Proton's Wine: Proton/Wine was modified in order to use futex2() for the
emulation of Windows NT sync mechanisms based on futex, called "fsync".
Triple-A games with huge CPU's loads and tons of parallel jobs worked
as expected when compared with the previous FUTEX_WAIT_MULTIPLE
implementation at futex(). Some games issue 42k futex2() calls
per second.
- perf: The perf benchmarks tests can also be used to stress the
interface, and they can be found in this patchset.
[1] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2-dev
** Performance
- Using perf, no significant difference was measured when comparing
futex() and futex2() for the following benchmarks: hash, wake and
wake-parallel.
- I measured a 15% overhead for the perf's requeue benchmark, comparing
futex2() to futex(). Requeue patch provides more details about why this
happens and how to overcome this.
* Changelog
Changes from v4:
- Use existing futex.c code when possible
- Cleaned up cover letter, check v4 for a more verbose version
v4: https://lore.kernel.org/lkml/20210603195924.361327-1-andrealmeid@collabora.…
André Almeida (11):
futex2: Implement wait and wake functions
futex2: Implement vectorized wait
futex2: Implement requeue operation
docs: locking: futex2: Add documentation
selftests: futex2: Add wake/wait test
selftests: futex2: Add timeout test
selftests: futex2: Add wouldblock test
selftests: futex2: Add waitv test
selftests: futex2: Add requeue test
perf bench: Add futex2 benchmark tests
kernel: Enable waitpid() for futex2
Documentation/locking/futex2.rst | 185 ++++++
Documentation/locking/index.rst | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 4 +
arch/x86/entry/syscalls/syscall_64.tbl | 4 +
include/linux/compat.h | 23 +
include/linux/futex.h | 103 ++++
include/linux/syscalls.h | 8 +
include/uapi/asm-generic/unistd.h | 11 +-
include/uapi/linux/futex.h | 27 +
init/Kconfig | 7 +
kernel/Makefile | 1 +
kernel/fork.c | 2 +
kernel/futex.c | 111 +---
kernel/futex2.c | 566 ++++++++++++++++++
kernel/sys_ni.c | 9 +
tools/arch/x86/include/asm/unistd_64.h | 12 +
tools/perf/bench/bench.h | 4 +
tools/perf/bench/futex-hash.c | 24 +-
tools/perf/bench/futex-requeue.c | 57 +-
tools/perf/bench/futex-wake-parallel.c | 41 +-
tools/perf/bench/futex-wake.c | 37 +-
tools/perf/bench/futex.h | 47 ++
tools/perf/builtin-bench.c | 18 +-
.../selftests/futex/functional/.gitignore | 3 +
.../selftests/futex/functional/Makefile | 6 +-
.../futex/functional/futex2_requeue.c | 164 +++++
.../selftests/futex/functional/futex2_wait.c | 195 ++++++
.../selftests/futex/functional/futex2_waitv.c | 154 +++++
.../futex/functional/futex_wait_timeout.c | 24 +-
.../futex/functional/futex_wait_wouldblock.c | 33 +-
.../testing/selftests/futex/functional/run.sh | 6 +
.../selftests/futex/include/futex2test.h | 112 ++++
32 files changed, 1865 insertions(+), 134 deletions(-)
create mode 100644 Documentation/locking/futex2.rst
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h
--
2.32.0