Hello,
This patch series implements a new ioctl on the pagemap proc fs file to
get, clear and perform both get and clear at the same time atomically on
the specified range of the memory.
Soft-dirty PTE bit of the memory pages can be viewed by using pagemap
procfs file. The soft-dirty PTE bit for the whole memory range of the
process can be cleared by writing to the clear_refs file. This series
adds features that weren't present earlier.
- There is no atomic get soft-dirty PTE bit status and clear operation
present.
- The soft-dirty PTE bit of only a part of memory cannot be cleared.
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The proc fs interface is enough for that as I think the process
is frozen. We have the use case where we need to track the soft-dirty
PTE bit for the running processes. We need this tracking and clear
mechanism of a region of memory while the process is running to emulate
the getWriteWatch() syscall of Windows. This syscall is used by games to
keep track of dirty pages and keep processing only the dirty pages. This
new ioctl can be used by the CRIU project and other applications which
require soft-dirty PTE bit information.
As in the current kernel there is no way to clear a part of memory (instead
of clearing the Soft-Dirty bits for the entire process) and get+clear
operation cannot be performed atomically, there are other methods to mimic
this information entirely in userspace with poor performance:
- The mprotect syscall and SIGSEGV handler for bookkeeping
- The userfaultfd syscall with the handler for bookkeeping
Some benchmarks can be seen [1].
This ioctl can be used by the CRIU project and other applications which
require soft-dirty PTE bit information. The following operations are
supported in this ioctl:
- Get the pages that are soft-dirty.
- Clear the pages which are soft-dirty.
- The optional flag to ignore the VM_SOFTDIRTY and only track per page
soft-dirty PTE bit
There are two decisions which have been taken about how to get the output
from the syscall.
- Return offsets of the pages from the start in the vec
- Stop execution when vec is filled with dirty pages
These two arguments doesn't follow the mincore() philosophy where the
output array corresponds to the address range in one to one fashion, hence
the output buffer length isn't passed and only a flag is set if the page
is present. This makes mincore() easy to use with less control. We are
passing the size of the output array and putting return data consecutively
which is offset of dirty pages from the start. The user can convert these
offsets back into the dirty page addresses easily. Suppose, the user want
to get first 10 dirty pages from a total memory of 100 pages. He'll
allocate output buffer of size 10 and the ioctl will abort after finding the
10 pages. This behaviour is needed to support Windows' getWriteWatch(). The
behaviour like mincore() can be achieved by passing output buffer of 100
size. This interface can be used for any desired behaviour.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit
fs/proc/task_mmu: Implement IOCTL to get and clear soft dirty PTE bit
selftests: vm: add pagemap ioctl tests
mm: add documentation of the new ioctl on pagemap
Documentation/admin-guide/mm/soft-dirty.rst | 42 +-
fs/proc/task_mmu.c | 342 ++++++++++-
include/uapi/linux/fs.h | 23 +
tools/include/uapi/linux/fs.h | 23 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 2 +
tools/testing/selftests/vm/pagemap_ioctl.c | 649 ++++++++++++++++++++
7 files changed, 1050 insertions(+), 32 deletions(-)
create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c
--
2.30.2
This v3 series implements selftests targeting the feature floated by Chao
via:
https://lore.kernel.org/linux-mm/20220706082016.2603916-12-chao.p.peng@linu…
Below changes aim to test the fd based approach for guest private memory
in context of normal (non-confidential) VMs executing on non-confidential
platforms.
private_mem_test.c file adds selftest to access private memory from the
guest via private/shared accesses and checking if the contents can be
leaked to/accessed by vmm via shared memory view before/after conversions.
Updates in V3:
1) Series is based on v7 series from Chao
2) Changes are introduced in KVM to help execute private mem selftests
3) Selftests are executing from private memory
4) Test implementation is simplified to contain implicit/explicit memory
conversion paths according to feedback from Sean.
5) Addressed comments from Sean and Shuah.
This series has dependency on following patches:
1) V7 series patches from Chao mentioned above.
2) https://lore.kernel.org/lkml/20220810152033.946942-1-pgonda@google.com/T/#u
- Series posted by Peter containing patches from Michael and Sean.
Github link for the patches posted as part of this series:
https://github.com/vishals4gh/linux/commits/priv_memfd_selftests_rfc_v3
Vishal Annapurve (6):
kvm: x86: Add support for testing private memory
selftests: kvm: Add support for private memory
selftests: kvm: ucall: Allow querying ucall pool gpa
selftests: kvm: x86: Execute hypercall as per the cpu
selftests: kvm: x86: Execute VMs with private memory
sefltests: kvm: x86: Add selftest for private memory
arch/x86/include/uapi/asm/kvm_para.h | 2 +
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/mmu/mmu.c | 19 ++
arch/x86/kvm/mmu/mmu_internal.h | 2 +-
arch/x86/kvm/x86.c | 67 +++-
include/linux/kvm_host.h | 12 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 2 +
.../selftests/kvm/include/kvm_util_base.h | 12 +-
.../selftests/kvm/include/ucall_common.h | 2 +
.../kvm/include/x86_64/private_mem.h | 51 +++
tools/testing/selftests/kvm/lib/kvm_util.c | 40 ++-
.../testing/selftests/kvm/lib/ucall_common.c | 12 +
.../selftests/kvm/lib/x86_64/private_mem.c | 297 ++++++++++++++++++
.../selftests/kvm/lib/x86_64/processor.c | 15 +-
.../selftests/kvm/x86_64/private_mem_test.c | 262 +++++++++++++++
virt/kvm/Kconfig | 9 +
virt/kvm/kvm_main.c | 90 +++++-
18 files changed, 887 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/private_mem.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/private_mem.c
create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_test.c
--
2.37.1.595.g718a3a8f04-goog
This patch set extends the locked port feature for devices
that are behind a locked port, but do not have the ability to
authorize themselves as a supplicant using IEEE 802.1X.
Such devices can be printers, meters or anything related to
fixed installations. Instead of 802.1X authorization, devices
can get access based on their MAC addresses being whitelisted.
For an authorization daemon to detect that a device is trying
to get access through a locked port, the bridge will add the
MAC address of the device to the FDB with a locked flag to it.
Thus the authorization daemon can catch the FDB add event and
check if the MAC address is in the whitelist and if so replace
the FDB entry without the locked flag enabled, and thus open
the port for the device.
This feature is known as MAC-Auth or MAC Authentication Bypass
(MAB) in Cisco terminology, where the full MAB concept involves
additional Cisco infrastructure for authorization. There is no
real authentication process, as the MAC address of the device
is the only input the authorization daemon, in the general
case, has to base the decision if to unlock the port or not.
With this patch set, an implementation of the offloaded case is
supplied for the mv88e6xxx driver. When a packet ingresses on
a locked port, an ATU miss violation event will occur. When
handling such ATU miss violation interrupts, the MAC address of
the device is added to the FDB with a zero destination port
vector (DPV) and the MAC address is communicated through the
switchdev layer to the bridge, so that a FDB entry with the
locked flag enabled can be added.
Log:
v3: Added timers and lists in the driver (mv88e6xxx)
to keep track of and remove locked entries.
v4: Leave out enforcing a limit to the number of
locked entries in the bridge.
Removed the timers in the driver and use the
worker only. Add locked FDB flag to all drivers
using port_fdb_add() from the dsa api and let
all drivers ignore entries with this flag set.
Change how to get the ageing timeout of locked
entries. See global1_atu.c and switchdev.c.
Use struct mv88e6xxx_port for locked entries
variables instead of struct dsa_port.
v5: Added 'mab' flag to enable MAB/MacAuth feature,
in a similar way to the locked feature flag.
In these implementations for the mv88e6xxx, the
switchport must be configured with learning on.
To tell userspace about the behavior of the
locked entries in the driver, a 'blackhole'
FDB flag has been added, which locked FDB
entries coming from the driver gets. Also the
'sticky' flag comes with those locked entries,
as the drivers locked entries cannot roam.
Fixed issues with taking mutex locks, and added
a function to read the fid, that supports all
versions of the chipset family.
Hans Schultz (6):
net: bridge: add locked entry fdb flag to extend locked port feature
net: switchdev: add support for offloading of fdb locked flag
drivers: net: dsa: add locked fdb entry flag to drivers
net: dsa: mv88e6xxx: allow reading FID when handling ATU violations
net: dsa: mv88e6xxx: MacAuth/MAB implementation
selftests: forwarding: add test of MAC-Auth Bypass to locked port
tests
drivers/net/dsa/b53/b53_common.c | 5 +
drivers/net/dsa/b53/b53_priv.h | 1 +
drivers/net/dsa/hirschmann/hellcreek.c | 5 +
drivers/net/dsa/lan9303-core.c | 5 +
drivers/net/dsa/lantiq_gswip.c | 5 +
drivers/net/dsa/microchip/ksz_common.c | 5 +
drivers/net/dsa/mt7530.c | 5 +
drivers/net/dsa/mv88e6xxx/Makefile | 1 +
drivers/net/dsa/mv88e6xxx/chip.c | 81 ++++-
drivers/net/dsa/mv88e6xxx/chip.h | 19 ++
drivers/net/dsa/mv88e6xxx/global1.h | 1 +
drivers/net/dsa/mv88e6xxx/global1_atu.c | 76 ++++-
drivers/net/dsa/mv88e6xxx/port.c | 15 +-
drivers/net/dsa/mv88e6xxx/port.h | 6 +
drivers/net/dsa/mv88e6xxx/switchdev.c | 285 ++++++++++++++++++
drivers/net/dsa/mv88e6xxx/switchdev.h | 37 +++
drivers/net/dsa/ocelot/felix.c | 5 +
drivers/net/dsa/qca/qca8k-common.c | 5 +
drivers/net/dsa/qca/qca8k.h | 1 +
drivers/net/dsa/sja1105/sja1105_main.c | 7 +-
include/linux/if_bridge.h | 1 +
include/net/dsa.h | 1 +
include/net/switchdev.h | 3 +
include/uapi/linux/if_link.h | 1 +
include/uapi/linux/neighbour.h | 4 +-
net/bridge/br.c | 5 +-
net/bridge/br_fdb.c | 43 ++-
net/bridge/br_input.c | 16 +-
net/bridge/br_netlink.c | 9 +-
net/bridge/br_private.h | 7 +-
net/bridge/br_switchdev.c | 5 +-
net/dsa/dsa_priv.h | 4 +-
net/dsa/port.c | 7 +-
net/dsa/slave.c | 4 +-
net/dsa/switch.c | 10 +-
.../net/forwarding/bridge_locked_port.sh | 107 ++++++-
.../net/forwarding/bridge_sticky_fdb.sh | 21 +-
37 files changed, 768 insertions(+), 50 deletions(-)
create mode 100644 drivers/net/dsa/mv88e6xxx/switchdev.c
create mode 100644 drivers/net/dsa/mv88e6xxx/switchdev.h
--
2.30.2
v2:
- Added enable check in executor.c to prevent wrong error output from
kunit_tool.py when run against a KUnit disabled kernel
- kunit_tool.py now passes kunit.enable=1
- Flipped around logic of new config to KUNIT_DEFAULT_ENABLED
- Now load modules containing tests but not executing them
- Various message/description text clean up
There are some use cases where the kernel binary is desired to be the same
for both production and testing. This poses a problem for users of KUnit
as built-in tests will automatically run at startup and test modules
can still be loaded leaving the kernel in an unsafe state. There is a
"test" taint flag that gets set if a test runs but nothing to prevent
the execution.
This patch adds the kunit.enable module parameter that will need to be
set to true in addition to KUNIT being enabled for KUnit tests to run.
The default value is true giving backwards compatibility. However, for
the production+testing use case the new config option KUNIT_DEFAULT_ENABLED
can be set to N requiring the tester to opt-in by passing kunit.enable=1 to
the kernel.
Joe Fradley (2):
kunit: add kunit.enable to enable/disable KUnit test
kunit: no longer call module_info(test, "Y") for kunit modules
.../admin-guide/kernel-parameters.txt | 6 +++++
include/kunit/test.h | 3 ++-
lib/kunit/Kconfig | 11 +++++++++
lib/kunit/executor.c | 4 ++++
lib/kunit/test.c | 24 +++++++++++++++++++
tools/testing/kunit/kunit_kernel.py | 1 +
6 files changed, 48 insertions(+), 1 deletion(-)
--
2.37.1.595.g718a3a8f04-goog
Fix the comment to accurately describe the test and recently added
SYSTEM_SUSPEND test case.
What was once psci_cpu_on_test was renamed and extended to squeeze in a
test case for PSCI SYSTEM_SUSPEND. Nonetheless, the author of those
changes (whoever they may be...) failed to update the file comment to
reflect what had changed.
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
---
Forgetting the name of the darned UAPI event. Tsk tsk.
tools/testing/selftests/kvm/aarch64/psci_test.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index f7621f6e938e..e0b9e81a3e09 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -1,12 +1,14 @@
// SPDX-License-Identifier: GPL-2.0-only
/*
- * psci_cpu_on_test - Test that the observable state of a vCPU targeted by the
- * CPU_ON PSCI call matches what the caller requested.
+ * psci_test - Tests relating to KVM's PSCI implementation.
*
* Copyright (c) 2021 Google LLC.
*
- * This is a regression test for a race between KVM servicing the PSCI call and
- * userspace reading the vCPUs registers.
+ * This test includes:
+ * - A regression test for a race between KVM servicing the PSCI CPU_ON call
+ * and userspace reading the targeted vCPU's registers.
+ * - A test for KVM's handling of PSCI SYSTEM_SUSPEND and the associated
+ * KVM_SYSTEM_EVENT_SUSPEND UAPI.
*/
#define _GNU_SOURCE
base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
--
2.37.1.595.g718a3a8f04-goog
QUIC requires end to end encryption of the data. The application usually
prepares the data in clear text, encrypts and calls send() which implies
multiple copies of the data before the packets hit the networking stack.
Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
pressure by reducing the number of copies.
The scope of kernel support is limited to the symmetric cryptography,
leaving the handshake to the user space library. For QUIC in particular,
the application packets that require symmetric cryptography are the 1RTT
packets with short headers. Kernel will encrypt the application packets
on transmission and decrypt on receive. This series implements Tx only,
because in QUIC server applications Tx outweighs Rx by orders of
magnitude.
Supporting the combination of QUIC and GSO requires the application to
correctly place the data and the kernel to correctly slice it. The
encryption process appends an arbitrary number of bytes (tag) to the end
of the message to authenticate it. The GSO value should include this
overhead, the offload would then subtract the tag size to parse the
input on Tx before chunking and encrypting it.
With the kernel cryptography, the buffer copy operation is conjoined
with the encryption operation. The memory bandwidth is reduced by 5-8%.
When devices supporting QUIC encryption in hardware come to the market,
we will be able to free further 7% of CPU utilization which is used
today for crypto operations.
Adel Abouchaev (6):
Documentation on QUIC kernel Tx crypto.
Define QUIC specific constants, control and data plane structures
Add UDP ULP operations, initialization and handling prototype
functions.
Implement QUIC offload functions
Add flow counters and Tx processing error counter
Add self tests for ULP operations, flow setup and crypto tests
Documentation/networking/index.rst | 1 +
Documentation/networking/quic.rst | 185 ++++
include/net/inet_sock.h | 2 +
include/net/netns/mib.h | 3 +
include/net/quic.h | 63 ++
include/net/snmp.h | 6 +
include/net/udp.h | 33 +
include/uapi/linux/quic.h | 60 +
include/uapi/linux/snmp.h | 9 +
include/uapi/linux/udp.h | 4 +
net/Kconfig | 1 +
net/Makefile | 1 +
net/ipv4/Makefile | 3 +-
net/ipv4/udp.c | 15 +
net/ipv4/udp_ulp.c | 192 ++++
net/quic/Kconfig | 16 +
net/quic/Makefile | 8 +
net/quic/quic_main.c | 1417 ++++++++++++++++++++++++
net/quic/quic_proc.c | 45 +
tools/testing/selftests/net/.gitignore | 4 +-
tools/testing/selftests/net/Makefile | 3 +-
tools/testing/selftests/net/quic.c | 1153 +++++++++++++++++++
tools/testing/selftests/net/quic.sh | 46 +
23 files changed, 3267 insertions(+), 3 deletions(-)
create mode 100644 Documentation/networking/quic.rst
create mode 100644 include/net/quic.h
create mode 100644 include/uapi/linux/quic.h
create mode 100644 net/ipv4/udp_ulp.c
create mode 100644 net/quic/Kconfig
create mode 100644 net/quic/Makefile
create mode 100644 net/quic/quic_main.c
create mode 100644 net/quic/quic_proc.c
create mode 100644 tools/testing/selftests/net/quic.c
create mode 100755 tools/testing/selftests/net/quic.sh
base-commit: fd78d07c7c35de260eb89f1be4a1e7487b8092ad
--
2.30.2
In test_sockmap.c, the testcase sets socket nonblock first, and then
calls select() and recvmsg() to receive data.
If some error occur, nonblock setting will make recvmsg() return
immediately, rather than blocking forever.
However, the way to call fcntl() to set nonblock is wrong.
To set socket noblock, we need to use
> fcntl(fd, F_SETFL, O_NONBLOCK);
rather than:
> fcntl(fd, O_NONBLOCK);
Signed-off-by: Qiao Ma <mqaio(a)linux.alibaba.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 0fbaccdc8861..abb4102f33b0 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -598,7 +598,12 @@ static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
struct timeval timeout;
fd_set w;
- fcntl(fd, fd_flags);
+ err = fcntl(fd, F_SETFL, fd_flags);
+ if (err < 0) {
+ perror("fcntl failed");
+ goto out_errno;
+ }
+
/* Account for pop bytes noting each iteration of apply will
* call msg_pop_data helper so we need to account for this
* by calculating the number of apply iterations. Note user
--
1.8.3.1
The livepatch kselftests rely on comparing expected and actual output
from such commands as sysctl. A recent commit in procps-ng v4.0.0 [1]
changed sysctl's output to emit key pathnames like:
sysctl: setting key "/proc/sys/kernel/ftrace_enabled": Device or resource busy
versus previous dotted output:
sysctl: setting key "kernel.ftrace_enabled": Device or resource busy
The modification in output was later reverted [2], but since the change
has been tagged in procps-ng v4.0.0, update the livepatch kselftest to
handle either case.
[1] https://gitlab.com/procps-ng/procps/-/commit/6389deca5bf667f5fab5912acde78b…
[2] https://gitlab.com/procps-ng/procps/-/commit/b159c198c9160a8eb13254e2b631d0…
Reported-by: Dennis(Zhuoheng) Li <denli(a)redhat.com>
Signed-off-by: Joe Lawrence <joe.lawrence(a)redhat.com>
---
tools/testing/selftests/livepatch/functions.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index 9230b869371d..d5001c9eb72e 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -86,7 +86,7 @@ function set_ftrace_enabled() {
if [[ "$result" != "$1" ]] ; then
if [[ $can_fail -eq 1 ]] ; then
- echo "livepatch: $err" > /dev/kmsg
+ echo "livepatch: $err" | sed 's#/proc/sys/kernel/#kernel.#' > /dev/kmsg
return
fi
--
2.26.3