Hi Linus,
Please pull the following KUnit update for Linux 5.17-rc1.
This KUnit update for Linux 5.17-rc1 consists of several fixes and
enhancements. A few highlights:
- Option --kconfig_add option allows easily tweaking kunitconfigs
- make build subcommand can reconfigure if needed
- doesn't error on tests without test plans
- doesn't crash if no parameters are generated
- defaults --jobs to # of cups
- reports test parameter results as (K)TAP subtests
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit fa55b7dcdc43c1aa1ba12bca9d2dd4318c2a0dbf:
Linux 5.16-rc1 (2021-11-14 13:56:52 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-5.17-rc1
for you to fetch changes up to ad659ccb5412874c6a89d3588cb18857c00e9d0f:
kunit: tool: Default --jobs to number of CPUs (2021-12-15 16:44:55 -0700)
----------------------------------------------------------------
linux-kselftest-kunit-5.17-rc1
This KUnit update for Linux 5.17-rc1 consists of several fixes and
enhancements. A few highlights:
- Option --kconfig_add option allows easily tweaking kunitconfigs
- make build subcommand can reconfigure if needed
- doesn't error on tests without test plans
- doesn't crash if no parameters are generated
- defaults --jobs to # of cups
- reports test parameter results as (K)TAP subtests
----------------------------------------------------------------
Daniel Latypov (13):
kunit: tool: fix --json output for skipped tests
Documentation: kunit: remove claims that kunit is a mocking framework
kunit: add run_checks.py script to validate kunit changes
kunit: tool: print parsed test results fully incrementally
kunit: tool: move Kconfig read_from_file/parse_from_string to package-level
kunit: tool: add --kconfig_add to allow easily tweaking kunitconfigs
kunit: tool: revamp message for invalid kunitconfig
kunit: tool: reconfigure when the used kunitconfig changes
kunit: tool: suggest using decode_stacktrace.sh on kernel crash
kunit: tool: use dataclass instead of collections.namedtuple
kunit: tool: delete kunit_parser.TestResult type
kunit: tool: make `build` subcommand also reconfigure if needed
kunit: tool: fix newly introduced typechecker errors
David Gow (5):
kunit: tool: Do not error on tests without test plans
kunit: tool: Report an error if any test has no subtests
kunit: Don't crash if no parameters are generated
kunit: Report test parameter results as (K)TAP subtests
kunit: tool: Default --jobs to number of CPUs
Documentation/dev-tools/kunit/api/index.rst | 3 +-
Documentation/dev-tools/kunit/api/test.rst | 3 +-
Documentation/dev-tools/kunit/index.rst | 2 +-
Documentation/dev-tools/kunit/start.rst | 8 +-
lib/kunit/test.c | 25 +--
tools/testing/kunit/kunit.py | 182 ++++++++++++---------
tools/testing/kunit/kunit_config.py | 61 +++----
tools/testing/kunit/kunit_json.py | 8 +-
tools/testing/kunit/kunit_kernel.py | 76 ++++++---
tools/testing/kunit/kunit_parser.py | 57 ++++---
tools/testing/kunit/kunit_tool_test.py | 171 ++++++++++++++++---
tools/testing/kunit/run_checks.py | 81 +++++++++
.../test_is_test_passed-no_tests_no_plan.log | 7 +
13 files changed, 480 insertions(+), 204 deletions(-)
create mode 100755 tools/testing/kunit/run_checks.py
create mode 100644 tools/testing/kunit/test_data/test_is_test_passed-no_tests_no_plan.log
----------------------------------------------------------------
Hi Linus,
Please pull these seccomp selftest updates for v5.17-rc1. The core
seccomp code hasn't changed for this cycle, but the selftests were
improved while helping to debug the recent signal handling refactoring
work Eric did.
Thanks!
-Kees
The following changes since commit d9bbdbf324cda23aa44873f505be77ed4b61d79c:
x86: deduplicate the spectre_v2_user documentation (2021-10-04 12:12:57 -0700)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/seccomp-v5.17-rc1
for you to fetch changes up to 1e6d69c7b9cd7735bbf4c6754ccbb9cce8bd8ff4:
selftests/seccomp: Report event mismatches more clearly (2021-11-03 12:02:07 -0700)
----------------------------------------------------------------
seccomp updates for v5.17-rc1
- Improve seccomp selftests in support of signal handler refactoring (Kees Cook)
----------------------------------------------------------------
Kees Cook (2):
selftests/seccomp: Stop USER_NOTIF test if kcmp() fails
selftests/seccomp: Report event mismatches more clearly
tools/testing/selftests/seccomp/seccomp_bpf.c | 56 ++++++++++++++++++++++++---
1 file changed, 50 insertions(+), 6 deletions(-)
--
Kees Cook
Attn,Dear
I need you to know that the fear of the LORD is
the beginning of wisdom, and knowledge of the Holy One is
understanding. As power of God Most High. And This is the confidence
we have in approaching God, that if we ask anything according to his
will, he hears us. I will make you know that Slow and steady wins the race.
It is your turn to receive your overdue compensation funds total
amount $18.5Milion USD.
I actualized that you will receive your transfer today without any more delay
No More fee OK, Believe me , I am your Attorney standing here on your favor.
I just concluded conversation with the Gt Bank Director, Mrs Mary Gate
And She told me that your transfer is ready today
So the Bank Asked you to contact them immediately by re-confirming
your Bank details asap.
Because this is the Only thing holding this transfer
If you did not trust me and Mrs Mary Gate,Who Else will you Trust?
For we are the ones trying to protect your funds here
and make sure that your funds is secure.
So Promisingly, I am here to assure you, that Grate Miracle is coming on
your way, and this funds total amount of $18.500,000 is your
compensation, entitlement inheritance overdue funds on your name.
Which you cannot let anything delay you from receiving your funds now,
Finally i advised you to try your possible best and contact Gt Bank Benin
once you get this message to receive your transfer $18.5 USD today.
I know that a journey of thousand miles begins with a single step.
Always put your best foot forward
Try as hard as you can, God give you best.
take my advice and follow the due process of your payment, the
transfer will be released to
you smoothly without any hitches or hindrance.
Contact DR.MRS MARY GATE, Director Gt bank-Benin to receive your
transfer amount of $18.5m US Dollars
It was deposited and registered to your name this morning.
Contact the Bank now to know when they will transfer to your
country today
Email id: gtbank107(a)yahoo.com
Tel/mobile, +229 99069872
Contact person, Mrs Mary Gate,Director Gt bank-Benin.
Among the blind the one-eyed man is king
As you sow, so you shall reap, i want you to receive your funds
Best things in life are free
Send to her your Bank Details as i listed here.
Your account name-------------
Your Bank Name----------------
Account Number----------
your Bank address----------
Country-----------
Your private phone number---------
Routing Numbers-------------
Swift Code-----------
Note, Your funds is %100 Percent ready for
transfer.
Everything you do remember that Good things come to those who wait.
I have done this work for you with my personally effort, Honesty is
the best policy.
now your transfer is currently deposited with paying bank this morning.
It is by the grace of God that I received Christ, having known the truth.
I had no choice than to do what is lawful and justice in the
sight of God for eternal life and in the sight of man for witness of
God & His Mercies and glory upon my life.
send this needed bank details to the bank today, so that you receive
your transfer today as
it is available for your confirmation today.
Please do your best as a serious person and send the fee urgent, Note
that this transfer of $18.500.000 M USD is a Gift from God to Bless
you.
If you did not contact the bank urgent, finally the Bank will release
your transfer of $18.500.000M USD to Mr. David Bollen as your
representative.
So not allow another to claim your Money.
Thanks For your Understanding.
Barr Robert Richter, UN Attorney At Law Court-Benin
Hello Dear,
how are you today,I hope you are doing great. It is my great pleasure
to contact you,I want to make a new and special friend,I hope you
don't mind. My name is Tracy Williams
from the United States, Am a french and English nationality. I will
give you pictures and more details about my self as soon as i hear
from you in my email account bellow,
Here is my email address; drtracywilliams89(a)gmail.com
Please send your reply to my PRIVATE mail box.
Thanks,
Tracy Williams.
Fix a typo: actualy -> actual
Signed-off-by: Qinghua Jin <qhjin.dev(a)gmail.com>
---
Documentation/dev-tools/kunit/usage.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index 63f1bb89ebf5..b9940758787c 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -615,7 +615,7 @@ kunit_tool) only fully supports running tests inside of UML and QEMU; however,
this is only due to our own time limitations as humans working on KUnit. It is
entirely possible to support other emulators and even actual hardware, but for
now QEMU and UML is what is fully supported within the KUnit Wrapper. Again, to
-be clear, this is just the Wrapper. The actualy KUnit tests and the KUnit
+be clear, this is just the Wrapper. The actual KUnit tests and the KUnit
library they are written in is fully architecture agnostic and can be used in
virtually any setup, you just won't have the benefit of typing a single command
out of the box and having everything magically work perfectly.
--
2.30.2
From: Menglong Dong <imagedong(a)tencent.com>
The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in
__inet_bind() is not handled properly. While the return value
is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
exit:
exit:
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
goto out_release_sock;
}
Let's take UDP for example and see what will happen. For UDP
socket, it will be added to 'udp_prot.h.udp_table->hash' and
'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
called success. If 'inet->inet_rcv_saddr' is specified here,
then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
to (because inet_saddr is changed to 0), and UDP packet received
will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
specified here, the sock will work fine, as it can receive packet
properly, which is wired, as the 'bind()' is already failed.
To undo the get_port() operation, introduce the 'put_port' field
for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP
proto, it is udp_lib_unhash(); For icmp proto, it is
ping_unhash().
Therefore, after sys_bind() fail caused by
BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which
means that it can try to be binded to another port.
The second patch use C99 initializers in test_sock.c
The third patch is the selftests for this modification.
Changes since v4:
- use C99 initializers in test_sock.c before adding the test case
Changes since v3:
- add the third patch which use C99 initializers in test_sock.c
Changes since v2:
- NULL check for sk->sk_prot->put_port
Changes since v1:
- introduce 'put_port' field for 'struct proto'
- add selftests for it
Menglong Dong (3):
net: bpf: handle return value of
BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND()
bpf: selftests: use C99 initializers in test_sock.c
bpf: selftests: add bind retry for post_bind{4, 6}
include/net/sock.h | 1 +
net/ipv4/af_inet.c | 2 +
net/ipv4/ping.c | 1 +
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/udp.c | 1 +
net/ipv6/af_inet6.c | 2 +
net/ipv6/ping.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
net/ipv6/udp.c | 1 +
tools/testing/selftests/bpf/test_sock.c | 370 ++++++++++++++----------
10 files changed, 233 insertions(+), 148 deletions(-)
--
2.27.0
The hugetlb cgroup reservation test charge_reserved_hugetlb.sh assume
that no cgroup filesystems are mounted before running the test. That is
not true in many cases. As a result, the test fails to run. Fix that by
querying the current cgroup mount setting and using the existing cgroup
setup instead before attempting to freshly mount a cgroup filesystem.
Similar change is also made for hugetlb_reparenting_test.sh as well,
though it still has problem if cgroup v2 isn't used.
The patched test scripts were run on a centos 8 based system to verify
that they ran properly.
Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests")
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
.../selftests/vm/charge_reserved_hugetlb.sh | 34 +++++++++++--------
.../selftests/vm/hugetlb_reparenting_test.sh | 21 +++++++-----
.../selftests/vm/write_hugetlb_memory.sh | 2 +-
3 files changed, 34 insertions(+), 23 deletions(-)
mode change 100644 => 100755 tools/testing/selftests/vm/charge_reserved_hugetlb.sh
mode change 100644 => 100755 tools/testing/selftests/vm/hugetlb_reparenting_test.sh
mode change 100644 => 100755 tools/testing/selftests/vm/write_hugetlb_memory.sh
diff --git a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
old mode 100644
new mode 100755
index fe8fcfb334e0..a5cb4b09a46c
--- a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
@@ -24,19 +24,23 @@ if [[ "$1" == "-cgroup-v2" ]]; then
reservation_usage_file=rsvd.current
fi
-cgroup_path=/dev/cgroup/memory
-if [[ ! -e $cgroup_path ]]; then
- mkdir -p $cgroup_path
- if [[ $cgroup2 ]]; then
+if [[ $cgroup2 ]]; then
+ cgroup_path=$(mount -t cgroup2 | head -1 | awk -e '{print $3}')
+ if [[ -z "$cgroup_path" ]]; then
+ cgroup_path=/dev/cgroup/memory
mount -t cgroup2 none $cgroup_path
- else
+ do_umount=1
+ fi
+ echo "+hugetlb" >$cgroup_path/cgroup.subtree_control
+else
+ cgroup_path=$(mount -t cgroup | grep ",hugetlb" | awk -e '{print $3}')
+ if [[ -z "$cgroup_path" ]]; then
+ cgroup_path=/dev/cgroup/memory
mount -t cgroup memory,hugetlb $cgroup_path
+ do_umount=1
fi
fi
-
-if [[ $cgroup2 ]]; then
- echo "+hugetlb" >/dev/cgroup/memory/cgroup.subtree_control
-fi
+export cgroup_path
function cleanup() {
if [[ $cgroup2 ]]; then
@@ -108,7 +112,7 @@ function setup_cgroup() {
function wait_for_hugetlb_memory_to_get_depleted() {
local cgroup="$1"
- local path="/dev/cgroup/memory/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
+ local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
# Wait for hugetlbfs memory to get depleted.
while [ $(cat $path) != 0 ]; do
echo Waiting for hugetlb memory to get depleted.
@@ -121,7 +125,7 @@ function wait_for_hugetlb_memory_to_get_reserved() {
local cgroup="$1"
local size="$2"
- local path="/dev/cgroup/memory/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
+ local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$reservation_usage_file"
# Wait for hugetlbfs memory to get written.
while [ $(cat $path) != $size ]; do
echo Waiting for hugetlb memory reservation to reach size $size.
@@ -134,7 +138,7 @@ function wait_for_hugetlb_memory_to_get_written() {
local cgroup="$1"
local size="$2"
- local path="/dev/cgroup/memory/$cgroup/hugetlb.${MB}MB.$fault_usage_file"
+ local path="$cgroup_path/$cgroup/hugetlb.${MB}MB.$fault_usage_file"
# Wait for hugetlbfs memory to get written.
while [ $(cat $path) != $size ]; do
echo Waiting for hugetlb memory to reach size $size.
@@ -574,5 +578,7 @@ for populate in "" "-o"; do
done # populate
done # method
-umount $cgroup_path
-rmdir $cgroup_path
+if [[ $do_umount ]]; then
+ umount $cgroup_path
+ rmdir $cgroup_path
+fi
diff --git a/tools/testing/selftests/vm/hugetlb_reparenting_test.sh b/tools/testing/selftests/vm/hugetlb_reparenting_test.sh
old mode 100644
new mode 100755
index 4a9a3afe9fd4..bf2d2a684edf
--- a/tools/testing/selftests/vm/hugetlb_reparenting_test.sh
+++ b/tools/testing/selftests/vm/hugetlb_reparenting_test.sh
@@ -18,19 +18,24 @@ if [[ "$1" == "-cgroup-v2" ]]; then
usage_file=current
fi
-CGROUP_ROOT='/dev/cgroup/memory'
-MNT='/mnt/huge/'
-if [[ ! -e $CGROUP_ROOT ]]; then
- mkdir -p $CGROUP_ROOT
- if [[ $cgroup2 ]]; then
+if [[ $cgroup2 ]]; then
+ CGROUP_ROOT=$(mount -t cgroup2 | head -1 | awk -e '{print $3}')
+ if [[ -z "$CGROUP_ROOT" ]]; then
+ CGROUP_ROOT=/dev/cgroup/memory
mount -t cgroup2 none $CGROUP_ROOT
- sleep 1
- echo "+hugetlb +memory" >$CGROUP_ROOT/cgroup.subtree_control
- else
+ do_umount=1
+ fi
+ echo "+hugetlb +memory" >$CGROUP_ROOT/cgroup.subtree_control
+else
+ CGROUP_ROOT=$(mount -t cgroup | grep ",hugetlb" | awk -e '{print $3}')
+ if [[ -z "$CGROUP_ROOT" ]]; then
+ CGROUP_ROOT=/dev/cgroup/memory
mount -t cgroup memory,hugetlb $CGROUP_ROOT
+ do_umount=1
fi
fi
+MNT='/mnt/huge/'
function get_machine_hugepage_size() {
hpz=$(grep -i hugepagesize /proc/meminfo)
diff --git a/tools/testing/selftests/vm/write_hugetlb_memory.sh b/tools/testing/selftests/vm/write_hugetlb_memory.sh
old mode 100644
new mode 100755
index d3d0d108924d..70a02301f4c2
--- a/tools/testing/selftests/vm/write_hugetlb_memory.sh
+++ b/tools/testing/selftests/vm/write_hugetlb_memory.sh
@@ -14,7 +14,7 @@ want_sleep=$8
reserve=$9
echo "Putting task in cgroup '$cgroup'"
-echo $$ > /dev/cgroup/memory/"$cgroup"/cgroup.procs
+echo $$ > ${cgroup_path:-/dev/cgroup/memory}/"$cgroup"/cgroup.procs
echo "Method is $method"
--
2.27.0
While building selftests the following warnings were noticed for arm
architecture on Linux stable v5.15.13 kernel and also on Linus's tree.
arm-linux-gnueabihf-gcc -Wall -Wl,--no-as-needed -O2 -g
-I../../../../usr/include/ txtimestamp.c -o
/home/tuxbuild/.cache/tuxmake/builds/current/kselftest/net/txtimestamp
txtimestamp.c: In function 'validate_timestamp':
txtimestamp.c:164:29: warning: format '0' expects argument of type
'long unsigned int', but argument 3 has type 'int64_t' {aka 'long long
int'} [-Wformat=]
164 | fprintf(stderr, "ERROR: 0 us expected between 0 and 0\n",
| ~~^
| |
| long unsigned int
| 0
165 | cur64 - start64, min_delay, max_delay);
| ~~~~~~~~~~~~~~~
| |
| int64_t {aka long long int}
txtimestamp.c: In function '__print_ts_delta_formatted':
txtimestamp.c:173:22: warning: format '0' expects argument of type
'long unsigned int', but argument 3 has type 'int64_t' {aka 'long long
int'} [-Wformat=]
173 | fprintf(stderr, "0 ns", ts_delta);
| ~~^ ~~~~~~~~
| | |
| | int64_t {aka long long int}
| long unsigned int
| 0
txtimestamp.c:175:22: warning: format '0' expects argument of type
'long unsigned int', but argument 3 has type 'int64_t' {aka 'long long
int'} [-Wformat=]
175 | fprintf(stderr, "0 us", ts_delta / NSEC_PER_USEC);
| ~~^
| |
| long unsigned int
| 0
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
build link:
https://builds.tuxbuild.com/23HFntxpqyCx0RbiuadfGZ36Kym/
metadata:
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
git commit: 734eb1fd2073f503f5c6b44f1c0d453ca6986b84
git describe: v5.15.13
toolchain: gcc-11
kernel-config: https://builds.tuxbuild.com/23HFntxpqyCx0RbiuadfGZ36Kym/config
# To install tuxmake on your system globally:
# sudo pip3 install -U tuxmake
tuxmake --runtime podman --target-arch arm --toolchain gcc-10 \
--kconfig https://builds.tuxbuild.com/23HFntxpqyCx0RbiuadfGZ36Kym/config \
dtbs dtbs-legacy headers kernel kselftest kselftest-merge modules
--
Linaro LKFT
https://lkft.linaro.org
From: Menglong Dong <imagedong(a)tencent.com>
The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in
__inet_bind() is not handled properly. While the return value
is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
exit:
exit:
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
goto out_release_sock;
}
Let's take UDP for example and see what will happen. For UDP
socket, it will be added to 'udp_prot.h.udp_table->hash' and
'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
called success. If 'inet->inet_rcv_saddr' is specified here,
then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
to (because inet_saddr is changed to 0), and UDP packet received
will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
specified here, the sock will work fine, as it can receive packet
properly, which is wired, as the 'bind()' is already failed.
To undo the get_port() operation, introduce the 'put_port' field
for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP
proto, it is udp_lib_unhash(); For icmp proto, it is
ping_unhash().
Therefore, after sys_bind() fail caused by
BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which
means that it can try to be binded to another port.
The second patch is the selftests for this modification.
The third patch use C99 initializers in test_sock.c.
Changes since v3:
- add the third patch which use C99 initializers in test_sock.c
Changes since v2:
- NULL check for sk->sk_prot->put_port
Changes since v1:
- introduce 'put_port' field for 'struct proto'
- add selftests for it
Menglong Dong (3):
net: bpf: handle return value of
BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND()
bpf: selftests: add bind retry for post_bind{4, 6}
bpf: selftests: use C99 initializers in test_sock.c
include/net/sock.h | 1 +
net/ipv4/af_inet.c | 2 +
net/ipv4/ping.c | 1 +
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/udp.c | 1 +
net/ipv6/af_inet6.c | 2 +
net/ipv6/ping.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
net/ipv6/udp.c | 1 +
tools/testing/selftests/bpf/test_sock.c | 370 ++++++++++++++----------
10 files changed, 233 insertions(+), 148 deletions(-)
--
2.27.0
Thanks a lot for all the review comments and guidance! Hope this
version is in a good state now. :)
(Jing is temporarily leave for family reason, Yang helped work out
this version)
----
v4->v5:
- Directly call fpu core to expand fpstate buffer in kvm_check_cpuid()
and remove duplicated permission check there (Sean)
- Accordingly remove Thomas's reviewed-by as a different wrapper is
introduced now (patch-7)
- Properly queue #NM exception in nested scenario (Sean)
- Verify non-XFD related #NM usage in nested scenario (Sean)
- Hide XFD in kvm_cpu_cap on 32bit host kernels (Sean)
- Use xstate_required_size() in KVM_CAP_XSAVE2 which may be called
before any vcpu is created (Sean/Paolo)
- Replace boot_cpu_has with kvm_cpu_cap_has when disabling RDMSR
interception for xfd_err (Sean)
v3->v4:
- Verify kvm selftest for AMX (Paolo)
- Expand fpstate buffer in kvm_check_cpuid() and improve patch
description (Sean)
- Drop 'preemption' word in #NM interception patch (Sean)
- Remove 'trap_nm' flag. Replace it by: (Sean)
* Trapping #NM according to guest_fpu::xfd when write to xfd is
intercepted.
* Always trapping #NM when xfd write interception is disabled
- Use better name for #NM related functions (Sean)
- Drop '#ifdef CONFIG_X86_64' in __kvm_set_xcr (Sean)
- Update description for KVM_CAP_XSAVE2 and prevent the guest from
using the wrong ioctl (Sean)
- Replace 'xfd_out_of_sync' with a better name (Sean)
v2->v3:
- Trap #NM until write IA32_XFD with a non-zero value (Thomas)
- Revise return value in __xstate_request_perm() (Thomas)
- Revise doc for KVM_GET_SUPPORTED_CPUID (Paolo)
- Add Thomas's reviewed-by on one patch
- Reorder disabling read interception of XFD_ERR patch (Paolo)
- Move disabling r/w interception of XFD from x86.c to vmx.c (Paolo)
- Provide the API doc together with the new KVM_GET_XSAVE2 ioctl (Paolo)
- Make KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) return minimum size of struct
kvm_xsave (4K) (Paolo)
- Request permission at the start of vm_create_with_vcpus() in selftest
- Request permission conditionally when XFD is supported (Paolo)
v1->v2:
- Live migration supported and verified with a selftest
- Rebase to Thomas's new series for guest fpstate reallocation [1]
- Expand fpstate at KVM_SET_CPUID2 instead of when emulating XCR0
and IA32_XFD (Thomas/Paolo)
- Accordingly remove all exit-to-userspace stuff
- Intercept #NM to save guest XFD_ERR and restore host/guest value
at preemption on/off boundary (Thomas)
- Accordingly remove all xfd_err logic in preemption callback and
fpu_swap_kvm_fpstate()
- Reuse KVM_SET_XSAVE to handle both legacy and expanded buffer (Paolo)
- Don't return dynamic bits w/o prctl() in KVM_GET_SUPPORTED_CPUID (Paolo)
- Check guest permissions for dynamic features in CPUID[0xD] instead
of only for AMX at KVM_SET_CPUID (Paolo)
- Remove dynamic bit check for 32-bit guest in __kvm_set_xcr() (Paolo)
- Fix CPUID emulation for 0x1d and 0x1e (Paolo)
- Move "disable interception" to the end of the series (Paolo)
This series brings AMX (Advanced Matrix eXtensions) virtualization support to
KVM. The preparatory series from Thomas [1] is also included.
A large portion of the changes in this series is to deal with eXtended Feature
Disable (XFD) which allows resizing of the fpstate buffer to support
dynamically-enabled XSTATE features with large state component (e.g. 8K for AMX).
There are a lot of simplications when comparing v5 to the original proposal [2]
and the first version [3]. Thanks to Thomas, Paolo and Sean for many good
suggestions.
The support is based on following key changes:
- Guest permissions for dynamically-enabled XSAVE features
Native tasks have to request permission via prctl() before touching
a dynamic-resized XSTATE compoenent. Introduce guest permissions
for the similar purpose. Userspace VMM is expected to request guest
permission only once when the first vCPU is created.
KVM checks guest permission in KVM_SET_CPUID2. Setting XFD in guest
cpuid w/o proper permissions fails this operation. In the meantime,
unpermitted features are also excluded in KVM_GET_SUPPORTED_CPUID.
- Extend fpstate reallocation mechanism to cover guest fpu
Unlike native tasks which have reallocation triggered from #NM
handler, guest fpstate reallocation is requested by KVM when it
identifies the intention on using dynamically-enabled XSAVE
features inside guest.
Extend fpu core to allow KVM request fpstate buffer expansion
for a guest fpu containter.
- Trigger fpstate reallocation in KVM
This could be done either before guest runs or until xfd is updated
in the emulation path. According to discussion [1] we decide to
go the former option in KVM_SET_CPUID2, with fpstate buffer sized
accordingly. This spares a lot of code and also avoid imposing an
ordered restore sequence (XCR0, XFD and XSTATE) to userspace VMM.
- RDMSR/WRMSR emulation for IA32_XFD
Because fpstate expansion is completed in KVM_SET_CPUID2, emulating
r/w access to IA32_XFD simply involves the xfd field in the guest
fpu container. If write and guest fpu is currently active, the
software state (guest_fpstate::xfd and per-cpu xfd cache) is also
updated.
- RDMSR/WRMSR emulation for XFD_ERR
When XFD causes an instruction to generate #NM, XFD_ERR contains
information about which disabled state components are being accessed.
It'd be problematic if the XFD_ERR value generated in guest is
consumed/clobbered by the host before the guest itself doing so.
Intercept #NM exception to save the guest XFD_ERR value when write
IA32_XFD with a non-zero value for 1st time. There is at most one
interception per guest task given a dynamic feature.
RDMSR/WRMSR emulation uses the saved value. The host value (always
ZERO outside of the host #NM handler) is restored before enabling
interrupts. The saved guest value is restored right before entering
the guest (with interrupts disabled).
- Get/set dynamic xfeature state for migration
Introduce new capability (KVM_CAP_XSAVE2) to deal with >4KB fpstate
buffer. Reading this capability returns the size of the current
guest fpstate (e.g. after expansion). Userspace VMM uses a new ioctl
(KVM_GET_XSAVE2) to read guest fpstate from the kernel and reuses
the existing ioctl (KVM_SET_XSAVE) to update guest fpsate to the
kernel. KVM_SET_XSAVE is extended to do properly_sized memdup_user()
based on the guest fpstate.
- Expose related cpuid bits to guest
The last step is to allow exposing XFD, AMX_TILE, AMX_INT8 and
AMX_BF16 in guest cpuid. Adding those bits into kvm_cpu_caps finally
activates all previous logics in this series
- Optimization: disable interception for IA32_XFD
IA32_XFD can be frequently updated by the guest, as it is part of
the task state and swapped in context switch when prev and next have
different XFD setting. Always intercepting WRMSR can easily cause
non-negligible overhead.
Disable r/w emulation for IA32_XFD after intercepting the first
WRMSR(IA32_XFD) with a non-zero value. However MSR passthrough
implies the software state (guest_fpstate::xfd and per-cpu xfd
cache) might be out of sync with MSR. This suggests KVM needs to
re-sync them at VM-exit before preemption is enabled.
Thanks Jun Nakajima and Kevin Tian for the design suggestions when this was
being internally worked on.
[1] https://lore.kernel.org/all/20211214022825.563892248@linutronix.de/
[2] https://www.spinics.net/lists/kvm/msg259015.html
[3] https://lore.kernel.org/lkml/20211208000359.2853257-1-yang.zhong@intel.com/
Thanks,
Yang
----
Guang Zeng (1):
kvm: x86: Add support for getting/setting expanded xstate buffer
Jing Liu (11):
kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule
kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID
x86/fpu: Make XFD initialization in __fpstate_reset() a function
argument
kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2
kvm: x86: Add emulation for IA32_XFD
x86/fpu: Prepare xfd_err in struct fpu_guest
kvm: x86: Intercept #NM for saving IA32_XFD_ERR
kvm: x86: Emulate IA32_XFD_ERR for guest
kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
kvm: x86: Add XCR0 support for Intel AMX
kvm: x86: Add CPUID support for Intel AMX
Kevin Tian (2):
x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
kvm: x86: Disable interception for IA32_XFD on demand
Sean Christopherson (1):
x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM
Thomas Gleixner (5):
x86/fpu: Extend fpu_xstate_prctl() with guest permissions
x86/fpu: Prepare guest FPU for dynamically enabled FPU features
x86/fpu: Add guest support to xfd_enable_feature()
x86/fpu: Add uabi_size to guest_fpu
x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
Wei Wang (1):
kvm: selftests: Add support for KVM_CAP_XSAVE2
Documentation/virt/kvm/api.rst | 46 +++++-
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/fpu/api.h | 11 ++
arch/x86/include/asm/fpu/types.h | 32 ++++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/uapi/asm/kvm.h | 16 +-
arch/x86/include/uapi/asm/prctl.h | 26 ++--
arch/x86/kernel/fpu/core.c | 94 ++++++++++-
arch/x86/kernel/fpu/xstate.c | 147 +++++++++++-------
arch/x86/kernel/fpu/xstate.h | 15 +-
arch/x86/kernel/process.c | 2 +
arch/x86/kvm/cpuid.c | 86 +++++++---
arch/x86/kvm/cpuid.h | 2 +
arch/x86/kvm/vmx/vmcs.h | 5 +
arch/x86/kvm/vmx/vmx.c | 68 ++++++++
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 112 ++++++++++++-
include/uapi/linux/kvm.h | 4 +
tools/arch/x86/include/uapi/asm/kvm.h | 16 +-
tools/include/uapi/linux/kvm.h | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 2 +
.../selftests/kvm/include/x86_64/processor.h | 10 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 32 ++++
.../selftests/kvm/lib/x86_64/processor.c | 67 +++++++-
.../testing/selftests/kvm/x86_64/evmcs_test.c | 2 +-
tools/testing/selftests/kvm/x86_64/smm_test.c | 2 +-
.../testing/selftests/kvm/x86_64/state_test.c | 2 +-
.../kvm/x86_64/vmx_preemption_timer_test.c | 2 +-
28 files changed, 702 insertions(+), 107 deletions(-)
These patches are based on kvm/next, and are also available at:
https://github.com/mdroth/linux/commits/sev-selftests-ucall-rfc1
== BACKGROUND ==
These patches are a prerequisite for adding selftest support for SEV guests
and possibly other confidential computing implementations in the future.
They were motivated by a suggestion Paolo made in response to the initial
SEV selftest RFC:
https://lore.kernel.org/lkml/20211025035833.yqphcnf5u3lk4zgg@amd.com/T/#m95…
Since the changes touch multiple archs and ended up creating a bit more churn
than expected, I thought it would be a good idea to carve this out into a
separate standalone series for reviewers who may be more interested in the
ucall changes than anything SEV-related.
To summarize, x86 relies on a ucall based on using PIO intructions to generate
an exit to userspace and provide the GVA of a dynamically-allocated ucall
struct that resides in guest memory and contains information about how to
handle/interpret the exit. This doesn't work for SEV guests for 3 main reasons:
1) The guest memory is generally encrypted during run-time, so the guest
needs to ensure the ucall struct is allocated in shared memory.
2) The guest page table is also encrypted, so the address would need to be a
GPA instead of a GVA.
3) The guest vCPU register may also be encrypted in the case of
SEV-ES/SEV-SNP, so the approach of examining vCPU register state has
additional requirements such as requiring guest code to implement a #VC
handler that can provide the appropriate registers via a vmgexit.
To address these issues, the SEV selftest RFC1 patchset introduced a set of new
SEV-specific interfaces that closely mirrored the functionality of
ucall()/get_ucall(), but relied on a pre-allocated/static ucall buffer in
shared guest memory so it that guest code could pass messages/state to the host
by simply writing to this pre-arranged shared memory region and then generating
an exit to userspace (via a halt instruction).
Paolo suggested instead implementing support for test/guest-specific ucall
implementations that could be used as an alternative to the default PIO-based
ucall implementations as-needed based on test/guest requirements, while still
allowing for tests to use a common set interfaces like ucall()/get_ucall().
== OVERVIEW ==
This series implements the above functionality by introducing a new ucall_ops
struct that can be used to register a particular ucall implementation as need,
then re-implements x86/arm64/s390x in terms of the ucall_ops.
But for the purposes of introducing a new ucall_ops implementation appropriate
for SEV, there are a couple issues that resulted in the need for some additional
ucall interfaces as well:
a) ucall() doesn't take a pointer to the ucall struct it modifies, so to make
it work in the case of an implementation that relies a pre-allocated ucall
struct in shared guest memory some sort of global lookup functionality
would be needed to locate the appropriate ucall struct for a particular
VM/vcpu combination, and this would need to be made accessible for use by
the guest as well. guests would then need some way of determining what
VM/vcpu identifiers they need to use to do the lookup, which to do reliably
would likely require seeding the guest with those identifiers in advance,
which is possible, but much more easily achievable by simply adding a
ucall() alternative that accepts a pointer to the ucall struct for that
particular VM/vcpu.
b) get_ucall() *does* take a pointer to a ucall struct, but currently zeroes
it out and uses it to copy the guest's ucall struct into. It *could* be
re-purposed to handle the case where the pointer is an actual pointer to
the ucall struct in shared guest memory, but that could cause problems
since callers would need some idea of what the underlying ucall
implementation expects. Ideally the interfaces would be agnostic to the
ucall implementation.
So to address those issues, this series also allows ucall implementations to
optionally be extended to support a set of 'shared' ops that are used in the
following manner:
host:
uc_gva = ucall_shared_alloc()
setup_vm_args(vm, uc_gva)
guest:
ucall_shared(uc_gva, ...)
host:
uget_ucall_shared(uc_gva, ...)
and then implements a new ucall implementation, ucall_ops_halt, based around
these shared interfaces and halt instructions.
While this doesn't really meet the initial goal of re-using the existing
ucall interfaces as-is, the hope is that these *_shared interfaces are
general enough to be re-usable things other than SEV, or at least improve on
code readability over the initial SEV-specific interfaces.
Any review/comments are greatly appreciated!
----------------------------------------------------------------
Michael Roth (10):
kvm: selftests: move base kvm_util.h declarations to kvm_util_base.h
kvm: selftests: move ucall declarations into ucall_common.h
kvm: selftests: introduce ucall_ops for test/arch-specific ucall implementations
kvm: arm64: selftests: use ucall_ops to define default ucall implementation
(COMPILE-TESTED ONLY) kvm: s390: selftests: use ucall_ops to define default ucall implementation
kvm: selftests: add ucall interfaces based around shared memory
kvm: selftests: add ucall_shared ops for PIO
kvm: selftests: introduce ucall implementation based on halt instructions
kvm: selftests: add GUEST_SHARED_* macros for shared ucall implementations
kvm: selftests: add ucall_test to test various ucall functionality
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 5 +-
.../testing/selftests/kvm/include/aarch64/ucall.h | 18 +
tools/testing/selftests/kvm/include/kvm_util.h | 408 +--------------------
.../testing/selftests/kvm/include/kvm_util_base.h | 368 +++++++++++++++++++
tools/testing/selftests/kvm/include/s390x/ucall.h | 18 +
tools/testing/selftests/kvm/include/ucall_common.h | 147 ++++++++
tools/testing/selftests/kvm/include/x86_64/ucall.h | 19 +
tools/testing/selftests/kvm/lib/aarch64/ucall.c | 43 +--
tools/testing/selftests/kvm/lib/s390x/ucall.c | 45 +--
tools/testing/selftests/kvm/lib/ucall_common.c | 133 +++++++
tools/testing/selftests/kvm/lib/x86_64/ucall.c | 82 +++--
tools/testing/selftests/kvm/ucall_test.c | 182 +++++++++
13 files changed, 982 insertions(+), 487 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/aarch64/ucall.h
create mode 100644 tools/testing/selftests/kvm/include/kvm_util_base.h
create mode 100644 tools/testing/selftests/kvm/include/s390x/ucall.h
create mode 100644 tools/testing/selftests/kvm/include/ucall_common.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/ucall.h
create mode 100644 tools/testing/selftests/kvm/lib/ucall_common.c
create mode 100644 tools/testing/selftests/kvm/ucall_test.c
amt.sh test script will not work because it doesn't have execution
permission. So, it adds execution permission.
Reported-by: Hangbin Liu <liuhangbin(a)gmail.com>
Fixes: c08e8baea78e ("selftests: add amt interface selftest script")
Signed-off-by: Taehee Yoo <ap420073(a)gmail.com>
---
tools/testing/selftests/net/amt.sh | 0
1 file changed, 0 insertions(+), 0 deletions(-)
mode change 100644 => 100755 tools/testing/selftests/net/amt.sh
diff --git a/tools/testing/selftests/net/amt.sh b/tools/testing/selftests/net/amt.sh
old mode 100644
new mode 100755
--
2.17.1
From: Menglong Dong <imagedong(a)tencent.com>
The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in
__inet_bind() is not handled properly. While the return value
is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
exit:
exit:
err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
if (err) {
inet->inet_saddr = inet->inet_rcv_saddr = 0;
goto out_release_sock;
}
Let's take UDP for example and see what will happen. For UDP
socket, it will be added to 'udp_prot.h.udp_table->hash' and
'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
called success. If 'inet->inet_rcv_saddr' is specified here,
then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
to (because inet_saddr is changed to 0), and UDP packet received
will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
specified here, the sock will work fine, as it can receive packet
properly, which is wired, as the 'bind()' is already failed.
To undo the get_port() operation, introduce the 'put_port' field
for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP
proto, it is udp_lib_unhash(); For icmp proto, it is
ping_unhash().
Therefore, after sys_bind() fail caused by
BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which
means that it can try to be binded to another port.
The second patch is the selftests for this modification.
Changes since v2:
- NULL check for sk->sk_prot->put_port
Changes since v1:
- introduce 'put_port' field for 'struct proto'
- add selftests for it
Menglong Dong (2):
net: bpf: handle return value of
BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND()
bpf: selftests: add bind retry for post_bind{4, 6}
include/net/sock.h | 1 +
net/ipv4/af_inet.c | 2 +
net/ipv4/ping.c | 1 +
net/ipv4/tcp_ipv4.c | 1 +
net/ipv4/udp.c | 1 +
net/ipv6/af_inet6.c | 2 +
net/ipv6/ping.c | 1 +
net/ipv6/tcp_ipv6.c | 1 +
net/ipv6/udp.c | 1 +
tools/testing/selftests/bpf/test_sock.c | 166 +++++++++++++++++++++---
10 files changed, 157 insertions(+), 20 deletions(-)
--
2.27.0
Highly appreciate for your review. This version mostly addressed the comments
from Sean. Most comments are adopted except three which are not closed and
need more discussions:
- Move the entire xfd write emulation code to x86.c. Doing so requires
introducing a new kvm_x86_ops callback to disable msr write bitmap.
According to Paolo's earlier comment he prefers to handle it in vmx.c.
- Directly check msr_bitmap in update_exception_bitmap() (for
trapping #NM) and vcpu_enter_guest() (for syncing guest xfd after
vm-exit) instead of introducing an extra flag in the last patch. However,
doing so requires another new kvm_x86_ops callback for checking
msr_bitmap since vcpu_enter_guest() is x86 common code. Having an
extra flag sounds simpler here (at least for the initial AMX support).
It does penalize nested guest with one xfd sync per exit, but it's not
worse than a normal guest which initializes xfd but doesn't run
AMX applications at all. Those could be improved afterwards.
- Disable #NM trap for nested guest. This version still chooses to always
trap #NM (regardless in L1 or L2) as long as xfd write interception is disabled.
In reality #NM is rare if nested guest doesn't intend to run AMX applications
and always-trap is safer than dynamic trap for the basic support in case
of any oversight here.
(Jing is temporarily leave for family reason, Yang helped work out this version)
----
v3->v4:
- Verify kvm selftest for AMX (Paolo)
- Move fpstate buffer expansion from kvm_vcpu_after_set_cpuid () to
kvm_check_cpuid() and improve patch description (Sean)
- Drop 'preemption' word in #NM interception patch (Sean)
- Remove 'trap_nm' flag. Replace it by: (Sean)
* Trapping #NM according to guest_fpu::xfd when write to xfd is
intercepted.
* Always trapping #NM when xfd write interception is disabled
- Use better name for #NM related functions (Sean)
- Drop '#ifdef CONFIG_X86_64' in __kvm_set_xcr (Sean)
- Update description for KVM_CAP_XSAVE2 and prevent the guest from
using the wrong ioctl (Sean)
- Replace 'xfd_out_of_sync' with a better name (Sean)
v2->v3:
- Trap #NM until write IA32_XFD with a non-zero value (Thomas)
- Revise return value in __xstate_request_perm() (Thomas)
- Revise doc for KVM_GET_SUPPORTED_CPUID (Paolo)
- Add Thomas's reviewed-by on one patch
- Reorder disabling read interception of XFD_ERR patch (Paolo)
- Move disabling r/w interception of XFD from x86.c to vmx.c (Paolo)
- Provide the API doc together with the new KVM_GET_XSAVE2 ioctl (Paolo)
- Make KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) return minimum size of struct
kvm_xsave (4K) (Paolo)
- Request permission at the start of vm_create_with_vcpus() in selftest
- Request permission conditionally when XFD is supported (Paolo)
v1->v2:
- Live migration supported and verified with a selftest
- Rebase to Thomas's new series for guest fpstate reallocation [1]
- Expand fpstate at KVM_SET_CPUID2 instead of when emulating XCR0
and IA32_XFD (Thomas/Paolo)
- Accordingly remove all exit-to-userspace stuff
- Intercept #NM to save guest XFD_ERR and restore host/guest value
at preemption on/off boundary (Thomas)
- Accordingly remove all xfd_err logic in preemption callback and
fpu_swap_kvm_fpstate()
- Reuse KVM_SET_XSAVE to handle both legacy and expanded buffer (Paolo)
- Don't return dynamic bits w/o prctl() in KVM_GET_SUPPORTED_CPUID (Paolo)
- Check guest permissions for dynamic features in CPUID[0xD] instead
of only for AMX at KVM_SET_CPUID (Paolo)
- Remove dynamic bit check for 32-bit guest in __kvm_set_xcr() (Paolo)
- Fix CPUID emulation for 0x1d and 0x1e (Paolo)
- Move "disable interception" to the end of the series (Paolo)
This series brings AMX (Advanced Matrix eXtensions) virtualization support
to KVM. The preparatory series from Thomas [1] is also included.
A large portion of the changes in this series is to deal with eXtended
Feature Disable (XFD) which allows resizing of the fpstate buffer to
support dynamically-enabled XSTATE features with large state component
(e.g. 8K for AMX).
There are a lot of simplications when comparing v2/v3 to the original
proposal [2] and the first version [3]. Thanks to Thomas and Paolo for
many good suggestions.
The support is based on following key changes:
- Guest permissions for dynamically-enabled XSAVE features
Native tasks have to request permission via prctl() before touching
a dynamic-resized XSTATE compoenent. Introduce guest permissions
for the similar purpose. Userspace VMM is expected to request guest
permission only once when the first vCPU is created.
KVM checks guest permission in KVM_SET_CPUID2. Setting XFD in guest
cpuid w/o proper permissions fails this operation. In the meantime,
unpermitted features are also excluded in KVM_GET_SUPPORTED_CPUID.
- Extend fpstate reallocation mechanism to cover guest fpu
Unlike native tasks which have reallocation triggered from #NM
handler, guest fpstate reallocation is requested by KVM when it
identifies the intention on using dynamically-enabled XSAVE
features inside guest.
Extend fpu core to allow KVM request fpstate buffer expansion
for a guest fpu containter.
- Trigger fpstate reallocation in KVM
This could be done either statically (before guest runs) or
dynamically (in the emulation path). According to discussion [1]
we decide to statically enable all xfeatures allowed by guest perm
in KVM_SET_CPUID2, with fpstate buffer sized accordingly. This spares
a lot of code and also avoid imposing an ordered restore sequence
(XCR0, XFD and XSTATE) to userspace VMM.
- RDMSR/WRMSR emulation for IA32_XFD
Because fpstate expansion is completed in KVM_SET_CPUID2, emulating
r/w access to IA32_XFD simply involves the xfd field in the guest
fpu container. If write and guest fpu is currently active, the
software state (guest_fpstate::xfd and per-cpu xfd cache) is also
updated.
- RDMSR/WRMSR emulation for XFD_ERR
When XFD causes an instruction to generate #NM, XFD_ERR contains
information about which disabled state components are being accessed.
It'd be problematic if the XFD_ERR value generated in guest is
consumed/clobbered by the host before the guest itself doing so.
Intercept #NM exception to save the guest XFD_ERR value when write
IA32_XFD with a non-zero value for 1st time. There is at most one
interception per guest task given a dynamic feature.
RDMSR/WRMSR emulation uses the saved value. The host value (always
ZERO outside of the host #NM handler) is restored before enabling
preemption. The saved guest value is restored right before entering
the guest (with preemption disabled).
- Get/set dynamic xfeature state for migration
Introduce new capability (KVM_CAP_XSAVE2) to deal with >4KB fpstate
buffer. Reading this capability returns the size of the current
guest fpstate (e.g. after expansion). Userspace VMM uses a new ioctl
(KVM_GET_XSAVE2) to read guest fpstate from the kernel and reuses
the existing ioctl (KVM_SET_XSAVE) to update guest fpsate to the
kernel. KVM_SET_XSAVE is extended to do properly_sized memdup_user()
based on the guest fpstate.
- Expose related cpuid bits to guest
The last step is to allow exposing XFD, AMX_TILE, AMX_INT8 and
AMX_BF16 in guest cpuid. Adding those bits into kvm_cpu_caps finally
activates all previous logics in this series
- Optimization: disable interception for IA32_XFD
IA32_XFD can be frequently updated by the guest, as it is part of
the task state and swapped in context switch when prev and next have
different XFD setting. Always intercepting WRMSR can easily cause
non-negligible overhead.
Disable r/w emulation for IA32_XFD after intercepting the first
WRMSR(IA32_XFD) with a non-zero value. However MSR passthrough
implies the software state (guest_fpstate::xfd and per-cpu xfd
cache) might be out of sync with MSR. This suggests KVM needs to
re-sync them at VM-exit before preemption is enabled.
Thanks Jun Nakajima and Kevin Tian for the design suggestions when this
version is being internally worked on.
[1] https://lore.kernel.org/all/20211214022825.563892248@linutronix.de/
[2] https://www.spinics.net/lists/kvm/msg259015.html
[3] https://lore.kernel.org/lkml/20211208000359.2853257-1-yang.zhong@intel.com/
Thanks,
Yang
---
Guang Zeng (1):
kvm: x86: Add support for getting/setting expanded xstate buffer
Jing Liu (11):
kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule
kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID
x86/fpu: Make XFD initialization in __fpstate_reset() a function
argument
kvm: x86: Check and enable permitted dynamic xfeatures at
KVM_SET_CPUID2
kvm: x86: Add emulation for IA32_XFD
x86/fpu: Prepare xfd_err in struct fpu_guest
kvm: x86: Intercept #NM for saving IA32_XFD_ERR
kvm: x86: Emulate IA32_XFD_ERR for guest
kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
kvm: x86: Add XCR0 support for Intel AMX
kvm: x86: Add CPUID support for Intel AMX
Kevin Tian (3):
x86/fpu: Provide fpu_update_guest_perm_features() for guest
x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
kvm: x86: Disable interception for IA32_XFD on demand
Thomas Gleixner (5):
x86/fpu: Extend fpu_xstate_prctl() with guest permissions
x86/fpu: Prepare guest FPU for dynamically enabled FPU features
x86/fpu: Add guest support to xfd_enable_feature()
x86/fpu: Add uabi_size to guest_fpu
x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
Wei Wang (1):
kvm: selftests: Add support for KVM_CAP_XSAVE2
Documentation/virt/kvm/api.rst | 46 +++++-
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/fpu/api.h | 11 ++
arch/x86/include/asm/fpu/types.h | 32 ++++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/uapi/asm/kvm.h | 16 +-
arch/x86/include/uapi/asm/prctl.h | 26 ++--
arch/x86/kernel/fpu/core.c | 104 ++++++++++++-
arch/x86/kernel/fpu/xstate.c | 147 +++++++++++-------
arch/x86/kernel/fpu/xstate.h | 15 +-
arch/x86/kernel/process.c | 2 +
arch/x86/kvm/cpuid.c | 99 +++++++++---
arch/x86/kvm/vmx/vmcs.h | 5 +
arch/x86/kvm/vmx/vmx.c | 45 +++++-
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 105 ++++++++++++-
include/uapi/linux/kvm.h | 4 +
tools/arch/x86/include/uapi/asm/kvm.h | 16 +-
tools/include/uapi/linux/kvm.h | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 2 +
.../selftests/kvm/include/x86_64/processor.h | 10 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 32 ++++
.../selftests/kvm/lib/x86_64/processor.c | 67 +++++++-
.../testing/selftests/kvm/x86_64/evmcs_test.c | 2 +-
tools/testing/selftests/kvm/x86_64/smm_test.c | 2 +-
.../testing/selftests/kvm/x86_64/state_test.c | 2 +-
.../kvm/x86_64/vmx_preemption_timer_test.c | 2 +-
27 files changed, 691 insertions(+), 109 deletions(-)
These patches and are also available at:
https://github.com/mdroth/linux/commits/sev-selftests-v2
They are based on top of the recent RFC:
"KVM: selftests: Add support for test-selectable ucall implementations"
https://lore.kernel.org/all/20211210164620.11636-1-michael.roth@amd.com/T/https://github.com/mdroth/linux/commits/sev-selftests-ucall-rfc1
which provides a new ucall implementation that this series relies on.
Those patches were in turn based on kvm/next as of 2021-12-10.
== OVERVIEW ==
This series introduces a set of memory encryption-related parameter/hooks
in the core kselftest library, then uses the hooks to implement a small
library for creating/managing SEV, SEV-ES, and (eventually) SEV-SNP guests.
This library is then used to implement a basic boot/memory test that's run
for variants of SEV/SEV-ES guests.
- Patches 1-8 implement SEV boot tests and should run against existing
kernels
- Patch 9 is a KVM changes that's required to allow SEV-ES/SEV-SNP
guests to boot with an externally generated page table, and is a
host kernel prequisite for the remaining patches in the series.
- Patches 10-13 extend the boot tests to cover SEV-ES
Any review/comments are greatly appreciated!
v2:
- rebased on ucall_ops patchset (which is based on kvm/next 2021-12-10)
- remove SEV-SNP support for now
- provide encryption bitmap as const* to original rather than as a copy
(Mingwei, Paolo)
- drop SEV-specific synchronization helpers in favor of ucall_ops_halt (Paolo)
- don't pass around addresses with c-bit included, add them as-needed via
addr_gpa2raw() (e.g. when adding PTEs, or initializing initial
cr3/vm->pgd) (Paolo)
- rename lib/sev.c functions for better consistency (Krish)
- move more test setup code out of main test function and into
setup_test_common() (Krish)
- suppress compiler warnings due to -Waddress-of-packed-member like kernel
does
- don't require SNP support in minimum firmware version detection (Marc)
- allow SEV device path to be configured via make SEV_PATH= (Marc)
----------------------------------------------------------------
Michael Roth (13):
KVM: selftests: move vm_phy_pages_alloc() earlier in file
KVM: selftests: sparsebit: add const where appropriate
KVM: selftests: add hooks for managing encrypted guest memory
KVM: selftests: handle encryption bits in page tables
KVM: selftests: add support for encrypted vm_vaddr_* allocations
KVM: selftests: ensure ucall_shared_alloc() allocates shared memory
KVM: selftests: add library for creating/interacting with SEV guests
KVM: selftests: add SEV boot tests
KVM: SVM: include CR3 in initial VMSA state for SEV-ES guests
KVM: selftests: account for error code in #VC exception frame
KVM: selftests: add support for creating SEV-ES guests
KVM: selftests: add library for handling SEV-ES-related exits
KVM: selftests: add SEV-ES boot tests
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/svm.c | 19 ++
arch/x86/kvm/vmx/vmx.c | 6 +
arch/x86/kvm/x86.c | 1 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 10 +-
.../testing/selftests/kvm/include/kvm_util_base.h | 10 +
tools/testing/selftests/kvm/include/sparsebit.h | 36 +--
tools/testing/selftests/kvm/include/x86_64/sev.h | 44 +++
.../selftests/kvm/include/x86_64/sev_exitlib.h | 14 +
tools/testing/selftests/kvm/include/x86_64/svm.h | 35 +++
.../selftests/kvm/include/x86_64/svm_util.h | 1 +
tools/testing/selftests/kvm/lib/kvm_util.c | 270 ++++++++++++------
.../testing/selftests/kvm/lib/kvm_util_internal.h | 10 +
tools/testing/selftests/kvm/lib/sparsebit.c | 48 ++--
tools/testing/selftests/kvm/lib/ucall_common.c | 4 +-
tools/testing/selftests/kvm/lib/x86_64/handlers.S | 4 +-
tools/testing/selftests/kvm/lib/x86_64/processor.c | 16 +-
tools/testing/selftests/kvm/lib/x86_64/sev.c | 252 ++++++++++++++++
.../testing/selftests/kvm/lib/x86_64/sev_exitlib.c | 249 ++++++++++++++++
.../selftests/kvm/x86_64/sev_all_boot_test.c | 316 +++++++++++++++++++++
22 files changed, 1215 insertions(+), 133 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/sev.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/sev_exitlib.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/sev.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/sev_exitlib.c
create mode 100644 tools/testing/selftests/kvm/x86_64/sev_all_boot_test.c
Highly appreciate for your review. We will continue working on
remaining selftest and send out later.
TODO:
- kvm selftest for AMX is still in progress;
----
v2->v3:
- Trap #NM until write IA32_XFD with a non-zero value (Thomas)
- Revise return value in __xstate_request_perm() (Thomas)
- Revise doc for KVM_GET_SUPPORTED_CPUID (Paolo)
- Add Thomas's reviewed-by on one patch
- Reorder disabling read interception of XFD_ERR patch (Paolo)
- Move disabling r/w interception of XFD from x86.c to vmx.c (Paolo)
- Provide the API doc together with the new KVM_GET_XSAVE2 ioctl (Paolo)
- Make KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) return minimum size of struct
kvm_xsave (4K) (Paolo)
- Request permission at the start of vm_create_with_vcpus() in selftest
- Request permission conditionally when XFD is supported (Paolo)
v1->v2:
- Live migration supported and verified with a selftest
- Rebase to Thomas's new series for guest fpstate reallocation [1]
- Expand fpstate at KVM_SET_CPUID2 instead of when emulating XCR0
and IA32_XFD (Thomas/Paolo)
- Accordingly remove all exit-to-userspace stuff
- Intercept #NM to save guest XFD_ERR and restore host/guest value
at preemption on/off boundary (Thomas)
- Accordingly remove all xfd_err logic in preemption callback and
fpu_swap_kvm_fpstate()
- Reuse KVM_SET_XSAVE to handle both legacy and expanded buffer (Paolo)
- Don't return dynamic bits w/o prctl() in KVM_GET_SUPPORTED_CPUID (Paolo)
- Check guest permissions for dynamic features in CPUID[0xD] instead
of only for AMX at KVM_SET_CPUID (Paolo)
- Remove dynamic bit check for 32-bit guest in __kvm_set_xcr() (Paolo)
- Fix CPUID emulation for 0x1d and 0x1e (Paolo)
- Move "disable interception" to the end of the series (Paolo)
This series brings AMX (Advanced Matrix eXtensions) virtualization
support to KVM. The preparatory series from Thomas [1] is also included.
A large portion of the changes in this series is to deal with eXtended
Feature Disable (XFD) which allows resizing of the fpstate buffer to
support dynamically-enabled XSTATE features with large state component
(e.g. 8K for AMX).
There are a lot of simplications when comparing v2/v3 to the original
proposal [2] and the first version [3]. Thanks to Thomas and Paolo for
many good suggestions.
The support is based on following key changes:
- Guest permissions for dynamically-enabled XSAVE features
Native tasks have to request permission via prctl() before touching
a dynamic-resized XSTATE compoenent. Introduce guest permissions
for the similar purpose. Userspace VMM is expected to request guest
permission only once when the first vCPU is created.
KVM checks guest permission in KVM_SET_CPUID2. Setting XFD in guest
cpuid w/o proper permissions fails this operation. In the meantime,
unpermitted features are also excluded in KVM_GET_SUPPORTED_CPUID.
- Extend fpstate reallocation mechanism to cover guest fpu
Unlike native tasks which have reallocation triggered from #NM
handler, guest fpstate reallocation is requested by KVM when it
identifies the intention on using dynamically-enabled XSAVE
features inside guest.
Extend fpu core to allow KVM request fpstate buffer expansion
for a guest fpu containter.
- Trigger fpstate reallocation in KVM
This could be done either statically (before guest runs) or
dynamically (in the emulation path). According to discussion [1]
we decide to statically enable all xfeatures allowed by guest perm
in KVM_SET_CPUID2, with fpstate buffer sized accordingly. This spares
a lot of code and also avoid imposing an ordered restore sequence
(XCR0, XFD and XSTATE) to userspace VMM.
- RDMSR/WRMSR emulation for IA32_XFD
Because fpstate expansion is completed in KVM_SET_CPUID2, emulating
r/w access to IA32_XFD simply involves the xfd field in the guest
fpu container. If write and guest fpu is currently active, the
software state (guest_fpstate::xfd and per-cpu xfd cache) is also
updated.
- RDMSR/WRMSR emulation for XFD_ERR
When XFD causes an instruction to generate #NM, XFD_ERR contains
information about which disabled state components are being accessed.
It'd be problematic if the XFD_ERR value generated in guest is
consumed/clobbered by the host before the guest itself doing so.
Intercept #NM exception to save the guest XFD_ERR value when write
IA32_XFD with a non-zero value for 1st time. There is at most one
interception per guest task given a dynamic feature.
RDMSR/WRMSR emulation uses the saved value. The host value (always
ZERO outside of the host #NM handler) is restored before enabling
preemption. The saved guest value is restored right before entering
the guest (with preemption disabled).
- Get/set dynamic xfeature state for migration
Introduce new capability (KVM_CAP_XSAVE2) to deal with >4KB fpstate
buffer. Reading this capability returns the size of the current
guest fpstate (e.g. after expansion). Userspace VMM uses a new ioctl
(KVM_GET_XSAVE2) to read guest fpstate from the kernel and reuses
the existing ioctl (KVM_SET_XSAVE) to update guest fpsate to the
kernel. KVM_SET_XSAVE is extended to do properly_sized memdup_user()
based on the guest fpstate.
- Expose related cpuid bits to guest
The last step is to allow exposing XFD, AMX_TILE, AMX_INT8 and
AMX_BF16 in guest cpuid. Adding those bits into kvm_cpu_caps finally
activates all previous logics in this series
- Optimization: disable interception for IA32_XFD
IA32_XFD can be frequently updated by the guest, as it is part of
the task state and swapped in context switch when prev and next have
different XFD setting. Always intercepting WRMSR can easily cause
non-negligible overhead.
Disable r/w emulation for IA32_XFD after intercepting the first
WRMSR(IA32_XFD) with a non-zero value. However MSR passthrough
implies the software state (guest_fpstate::xfd and per-cpu xfd
cache) might be out of sync with MSR. This suggests KVM needs to
re-sync them at VM-exit before preemption is enabled.
To verify AMX virtualization overhead on non-AMX usages, we run the
Phoronix kernel build test in the guest w/ and w/o AMX in cpuid. The
result shows no observable difference between two configurations.
Thanks Jun Nakajima and Kevin Tian for the design suggestions when
this version is being internally worked on.
[1] https://lore.kernel.org/all/20211214022825.563892248@linutronix.de/
[2] https://www.spinics.net/lists/kvm/msg259015.html
[3] https://lore.kernel.org/lkml/20211208000359.2853257-1-yang.zhong@intel.com/
Thanks,
Jing
---
Guang Zeng (1):
kvm: x86: Get/set expanded xstate buffer
Jing Liu (13):
kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule
kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID
kvm: x86: Check permitted dynamic xfeatures at KVM_SET_CPUID2
x86/fpu: Make XFD initialization in __fpstate_reset() a function
argument
kvm: x86: Enable dynamic XSAVE features at KVM_SET_CPUID2
kvm: x86: Add emulation for IA32_XFD
x86/fpu: Prepare xfd_err in struct fpu_guest
kvm: x86: Intercept #NM for saving IA32_XFD_ERR
kvm: x86: Emulate IA32_XFD_ERR for guest
kvm: x86: Disable RDMSR interception of IA32_XFD_ERR
kvm: x86: Add XCR0 support for Intel AMX
kvm: x86: Add CPUID support for Intel AMX
kvm: x86: Disable interception for IA32_XFD on demand
Kevin Tian (2):
x86/fpu: Provide fpu_update_guest_perm_features() for guest
x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation
Thomas Gleixner (5):
x86/fpu: Extend fpu_xstate_prctl() with guest permissions
x86/fpu: Prepare guest FPU for dynamically enabled FPU features
x86/fpu: Add guest support to xfd_enable_feature()
x86/fpu: Add uabi_size to guest_fpu
x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state()
Wei Wang (1):
kvm: selftests: Add support for KVM_CAP_XSAVE2
Documentation/virt/kvm/api.rst | 46 +++++-
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/fpu/api.h | 11 ++
arch/x86/include/asm/fpu/types.h | 32 ++++
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/include/uapi/asm/kvm.h | 16 +-
arch/x86/include/uapi/asm/prctl.h | 26 ++--
arch/x86/kernel/fpu/core.c | 104 ++++++++++++-
arch/x86/kernel/fpu/xstate.c | 147 +++++++++++-------
arch/x86/kernel/fpu/xstate.h | 15 +-
arch/x86/kernel/process.c | 2 +
arch/x86/kvm/cpuid.c | 93 ++++++++---
arch/x86/kvm/vmx/vmcs.h | 5 +
arch/x86/kvm/vmx/vmx.c | 32 +++-
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 102 +++++++++++-
include/uapi/linux/kvm.h | 4 +
tools/arch/x86/include/uapi/asm/kvm.h | 16 +-
tools/include/uapi/linux/kvm.h | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 2 +
.../selftests/kvm/include/x86_64/processor.h | 10 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 32 ++++
.../selftests/kvm/lib/x86_64/processor.c | 67 +++++++-
.../testing/selftests/kvm/x86_64/evmcs_test.c | 2 +-
tools/testing/selftests/kvm/x86_64/smm_test.c | 2 +-
.../testing/selftests/kvm/x86_64/state_test.c | 2 +-
.../kvm/x86_64/vmx_preemption_timer_test.c | 2 +-
27 files changed, 668 insertions(+), 111 deletions(-)
--
2.27.0
Hi everybody,
as discussed in the linux-mm alignment session on Wednesday, this is part 1
of the COW fixes: fix the COW security issue using GUP-triggered
unsharing of shared anonymous pages (ordinary, THP, hugetlb). In the
meeting slides, this approach was referred to as "Copy On Read". If anybody
wants to have access to the slides, please feel free to reach out.
The patches are based on v5.16-rc5 and available at:
https://github.com/davidhildenbrand/linux/pull/new/unshare_v1
It is currently again possible for a child process to observe modifications
of anonymous pages performed by the parent process after fork() in some
cases, which is not only a violation of the POSIX semantics of MAP_PRIVATE,
but more importantly a real security issue.
This issue, including other related COW issues, has been summarized at [1]:
"
1. Observing Memory Modifications of Private Pages From A Child Process
Long story short: process-private memory might not be as private as you
think once you fork(): successive modifications of private memory
regions in the parent process can still be observed by the child
process, for example, by smart use of vmsplice()+munmap().
The core problem is that pinning pages readable in a child process, such
as done via the vmsplice system call, can result in a child process
observing memory modifications done in the parent process the child is
not supposed to observe. [1] contains an excellent summary and [2]
contains further details. This issue was assigned CVE-2020-29374 [9].
For this to trigger, it's required to use a fork() without subsequent
exec(), for example, as used under Android zygote. Without further
details about an application that forks less-privileged child processes,
one cannot really say what's actually affected and what's not -- see the
details section the end of this mail for a short sshd/openssh analysis.
While commit 17839856fd58 ("gup: document and work around "COW can break
either way" issue") fixed this issue and resulted in other problems
(e.g., ptrace on pmem), commit 09854ba94c6a ("mm: do_wp_page()
simplification") re-introduced part of the problem unfortunately.
The original reproducer can be modified quite easily to use THP [3] and
make the issue appear again on upstream kernels. I modified it to use
hugetlb [4] and it triggers as well. The problem is certainly less
severe with hugetlb than with THP; it merely highlights that we still
have plenty of open holes we should be closing/fixing.
Regarding vmsplice(), the only known workaround is to disallow the
vmsplice() system call ... or disable THP and hugetlb. But who knows
what else is affected (RDMA? O_DIRECT?) to achieve the same goal -- in
the end, it's a more generic issue.
"
This security issue was first reported by Jann Horn on 27 May 2020 and it
currently affects anonymous THP and hugetlb again. The "security issue"
part for hugetlb might be less important than for THP. However, with this
approach it's just easy to get the MAP_PRIVATE semantics of any anonymous
pages in that regard and avoid any such information leaks without much
added complexity.
Ordinary anonymous pages are currently not affected, because the COW logic
was changed in commit 09854ba94c6a ("mm: do_wp_page() simplification")
for them to COW on "page_count() != 1" instead of "mapcount > 1", which
unfortunately results in other COW issues, some of them documented in [1]
as well.
To fix this COW issue once and for all, introduce GUP-triggered unsharing
that can be conditionally triggered via FAULT_FLAG_UNSHARE. In contrast to
traditional COW, unsharing will leave the copied page mapped
write-protected in the page table, not having the semantics of a write
fault.
Logically, unsharing is triggered "early", as soon as GUP performs the
action that could result in a COW getting missed later and the security
issue triggering: however, unsharing is not triggered as before via a
write fault with undesired side effects.
Long story short, GUP triggers unsharing if all of the following conditions
are met:
* The page is mapped R/O
* We have an anonymous page, excluding KSM
* We want to read (!FOLL_WRITE)
* Unsharing is not disabled (!FOLL_NOUNSHARE)
* We want to take a reference (FOLL_GET or FOLL_PIN)
* The page is a shared anonymous page: mapcount > 1
To reliably detect shared anonymous THP without heavy locking, introduce
a mapcount_seqcount seqlock that protects the mapcount of a THP and can
be used to read an atomic mapcount value. The mapcount_seqlock is stored
inside the memmap of the compound page -- to keep it simple, factor out
a raw_seqlock_t from the seqlock_t.
As this patch series introduces the same unsharing logic for any
anonymous pages, it also paves the way to fix other COW issues, e.g.,
documented in [1], without reintroducing the security issue or
reintroducing other issues we observed in the past (e.g., broken ptrace on
pmem).
All reproducers for this COW issue have been consolidated in the selftest
included in this series. Hopefully we'll get this fixed for good.
Future work:
* get_user_pages_fast_only() can currently spin on the mapcount_seqcount
when reading the mapcount, which might be a rare event. While this is
fine even when done from get_user_pages_fast_only() in IRQ context, we
might want to just fail fast in get_user_pages_fast_only(). We already
have patches prepared that add page_anon_maybe_shared() and
page_trans_huge_anon_maybe_shared() that will return "true" in case
spinning would be required and make get_user_pages_fast_only() fail fast.
I'm excluding them for simplicity.
... even better would be finding a way to just not need the
mapcount_seqcount, but THP splitting and PageDoubleMap() gives us a
hard time -- but maybe we'll eventually find a way someday :)
* Part 2 will tackle the other user-space visible breakages / COW issues
raised in [1]. This series is the basis for adjusting the COW logic once
again without re-introducing the COW issue fixed in this series and
without reintroducing the issues we saw with the original CVE fix
(e.g., breaking ptrace on pmem). There might be further parts to improve
the GUP long-term <-> MM synchronicity and to optimize some things
around that.
The idea is by Andrea and some patches are rewritten versions of prototype
patches by Andrea. I cross-compiled and tested as good as possible.
I'll CC locking+selftest folks only on the relevant patch and the cover
letter to minimze the noise. I'll put everyone on CC who was either
involved with the COW issues in the past or attended the linux-mm alignment
session on Wednesday. Appologies if I forget anyone :)
[1] https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com
David Hildenbrand (11):
seqlock: provide lockdep-free raw_seqcount_t variant
mm: thp: consolidate mapcount logic on THP split
mm: simplify hugetlb and file-THP handling in __page_mapcount()
mm: thp: simlify total_mapcount()
mm: thp: allow for reading the THP mapcount atomically via a
raw_seqlock_t
mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb)
mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required
(!hugetlb)
mm: hugetlb: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE
mm: gup: trigger unsharing via FAULT_FLAG_UNSHARE when required
(hugetlb)
mm: thp: introduce and use page_trans_huge_anon_shared()
selftests/vm: add tests for the known COW security issues
Documentation/locking/seqlock.rst | 50 ++++
include/linux/huge_mm.h | 72 +++++
include/linux/mm.h | 14 +
include/linux/mm_types.h | 9 +
include/linux/seqlock.h | 145 +++++++---
mm/gup.c | 89 +++++-
mm/huge_memory.c | 120 +++++++--
mm/hugetlb.c | 129 +++++++--
mm/memory.c | 136 ++++++++--
mm/rmap.c | 40 +--
mm/swapfile.c | 35 ++-
mm/util.c | 24 +-
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/gup_cow.c | 312 ++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests.sh | 16 ++
15 files changed, 1044 insertions(+), 148 deletions(-)
create mode 100644 tools/testing/selftests/vm/gup_cow.c
--
2.31.1
From: Mike Kravetz <mike.kravetz(a)oracle.com>
[ Upstream commit f5c73297181c6b3ad76537bad98eaad6d29b9333 ]
Currently, userfaultfd selftest for hugetlb as run from run_vmtests.sh
or any environment where there are 'just enough' hugetlb pages will
always fail with:
testing events (fork, remap, remove):
ERROR: UFFDIO_COPY error: -12 (errno=12, line=616)
The ENOMEM error code implies there are not enough hugetlb pages.
However, there are free hugetlb pages but they are all reserved. There
is a basic problem with the way the test allocates hugetlb pages which
has existed since the test was originally written.
Due to the way 'cleanup' was done between different phases of the test,
this issue was masked until recently. The issue was uncovered by commit
8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each
test").
For the hugetlb test, src and dst areas are allocated as PRIVATE
mappings of a hugetlb file. This means that at mmap time, pages are
reserved for the src and dst areas. At the start of event testing (and
other tests) the src area is populated which results in allocation of
huge pages to fill the area and consumption of reserves associated with
the area. Then, a child is forked to fault in the dst area. Note that
the dst area was allocated in the parent and hence the parent owns the
reserves associated with the mapping. The child has normal access to
the dst area, but can not use the reserves created/owned by the parent.
Thus, if there are no other huge pages available allocation of a page
for the dst by the child will fail.
Fix by not creating reserves for the dst area. In this way the child
can use free (non-reserved) pages.
Also, MAP_PRIVATE of a file only makes sense if you are interested in
the contents of the file before making a COW copy. The test does not do
this. So, just use MAP_ANONYMOUS | MAP_HUGETLB to create an anonymous
hugetlb mapping. There is no need to create a hugetlb file in the
non-shared case.
Link: https://lkml.kernel.org/r/20211217172919.7861-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mina Almasry <almasrymina(a)google.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/vm/userfaultfd.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 60aa1a4fc69b6..81690f1737c80 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -86,7 +86,7 @@ static bool test_uffdio_minor = false;
static bool map_shared;
static int shm_fd;
-static int huge_fd;
+static int huge_fd = -1; /* only used for hugetlb_shared test */
static char *huge_fd_off0;
static unsigned long long *count_verify;
static int uffd = -1;
@@ -222,6 +222,9 @@ static void noop_alias_mapping(__u64 *start, size_t len, unsigned long offset)
static void hugetlb_release_pages(char *rel_area)
{
+ if (huge_fd == -1)
+ return;
+
if (fallocate(huge_fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
rel_area == huge_fd_off0 ? 0 : nr_pages * page_size,
nr_pages * page_size))
@@ -234,16 +237,17 @@ static void hugetlb_allocate_area(void **alloc_area)
char **alloc_area_alias;
*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
- (map_shared ? MAP_SHARED : MAP_PRIVATE) |
- MAP_HUGETLB,
- huge_fd, *alloc_area == area_src ? 0 :
- nr_pages * page_size);
+ map_shared ? MAP_SHARED :
+ MAP_PRIVATE | MAP_HUGETLB |
+ (*alloc_area == area_src ? 0 : MAP_NORESERVE),
+ huge_fd,
+ *alloc_area == area_src ? 0 : nr_pages * page_size);
if (*alloc_area == MAP_FAILED)
err("mmap of hugetlbfs file failed");
if (map_shared) {
area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_HUGETLB,
+ MAP_SHARED,
huge_fd, *alloc_area == area_src ? 0 :
nr_pages * page_size);
if (area_alias == MAP_FAILED)
--
2.34.1
This series adds initial support for testing KVM RISC-V 64-bit using
kernel selftests framework. The PATCH1 & PATCH2 of this series does
some ground work in KVM RISC-V to implement RISC-V support in the KVM
selftests whereas remaining patches does required changes in the KVM
selftests.
These patches can be found in riscv_kvm_selftests_v3 branch at:
https://github.com/avpatel/linux.git
Changes since v2:
- Rebased series on Linux-5.16-rc6
- Renamed kvm_riscv_stage2_gpa_size() to kvm_riscv_stage2_gpa_bits()
in PATCH2
Changes since v1:
- Renamed kvm_sbi_ext_expevend_handler() to kvm_sbi_ext_forward_handler()
in PATCH1
- Renamed KVM_CAP_RISCV_VM_GPA_SIZE to KVM_CAP_VM_GPA_BITS in PATCH2
and PATCH4
Anup Patel (4):
RISC-V: KVM: Forward SBI experimental and vendor extensions
RISC-V: KVM: Add VM capability to allow userspace get GPA bits
KVM: selftests: Add EXTRA_CFLAGS in top-level Makefile
KVM: selftests: Add initial support for RISC-V 64-bit
arch/riscv/include/asm/kvm_host.h | 1 +
arch/riscv/kvm/mmu.c | 5 +
arch/riscv/kvm/vcpu_sbi.c | 4 +
arch/riscv/kvm/vcpu_sbi_base.c | 27 ++
arch/riscv/kvm/vm.c | 3 +
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/Makefile | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 10 +
.../selftests/kvm/include/riscv/processor.h | 135 +++++++
tools/testing/selftests/kvm/lib/guest_modes.c | 10 +
.../selftests/kvm/lib/riscv/processor.c | 362 ++++++++++++++++++
tools/testing/selftests/kvm/lib/riscv/ucall.c | 87 +++++
12 files changed, 658 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/include/riscv/processor.h
create mode 100644 tools/testing/selftests/kvm/lib/riscv/processor.c
create mode 100644 tools/testing/selftests/kvm/lib/riscv/ucall.c
--
2.25.1
Dzień dobry,
dostrzegam możliwość współpracy z Państwa firmą.
Świadczymy kompleksową obsługę inwestycji w fotowoltaikę, która obniża koszty energii elektrycznej nawet o 90%.
Czy są Państwo zainteresowani weryfikacją wstępnych propozycji?
Pozdrawiam
Paweł Jasiński
Greetings to you linux-kselftest,
I was wondering if you got my previous email? I have been trying
to reach you by email linux-kselftest(a)vger.kernel.org, kindly get
back to me swiftly, it is very important and urgent.
Thanks
Mustafa Ayvaz
Email: mustafa.ayvaz(a)ayvazburosu.com
These two patches improve the mixer test, checking that the default
values for enums are valid and extending them to cover all the values
for multi-value controls, not just the first one. It also factors out
the validation that values are OK for controls for future reuse.
Mark Brown (2):
kselftest: alsa: Factor out check that values meet constraints
kselftest: alsa: Validate values read from enumerations
tools/testing/selftests/alsa/mixer-test.c | 158 ++++++++++++++--------
1 file changed, 99 insertions(+), 59 deletions(-)
base-commit: b73dad806533cad55df41a9c0349969b56d4ff7f
--
2.30.2
The XSAVE feature set supports the saving and restoring of xstate components,
which is used for process context switching. The state components include
x87 state for FPU execution environment, SSE state, AVX state and so on.
In order to ensure that XSAVE works correctly, add XSAVE basic test for XSAVE
architecture functionality.
This patch set tests and verifies the basic functions of XSAVE in user
space; it tests "FPU, SSE(XMM), AVX2(YMM), AVX512 opmask and PKRU parts"
xstates with following cases:
1. In nested signal processing, the signal handling will use each signal's own
xstates, and the xstates of the signal handling under test should not be
changed after another nested signal handling is completed; and these xstates
content in the process should not change after the nested signal handling
is complete.
2. The xstates content of the child process should be the same as that of the
parent process. The xstates content of the process should be the same across
process switching.
This is the xstates position for FP, XMM, Header, YMM, AVX512 opmask and PKRU:
It could be saved by xsave instruction, and mask could control which part will
be saved(Header will be saved as mandatory):
FP (0 - 159 bytes)
XMM (160-415 bytes)
Reserved (416-511 bytes SDM vol1 13.4.1)
Header_used (512-527 bytes)
Headed_reserved (528-575 bytes must 00, otherwise xrstor will #GP)
YMM (Offset:CPUID.(EAX=0D,ECX=2).EBX Size:CPUID(EAX=0D,ECX=2).EAX)
AVX512 opmask (Offset:CPUID.(EAX=0D,ECX=5).EBX Size:CPUID(EAX=0D,ECX=5).EAX)
PKRU (Offset:CPUID.(EAX=0D,ECX=9).EBX Size:CPUID(EAX=0D,ECX=9).EAX)
It uses syscall function instead of fork() function, becasue syscall libc
function will resume xstates after syscall if there is some xstates change
in syscall libc function.
And populate the xstates will try not to use libc, and every key test action
will try not to use libc except syscall until it's failed or done.
In order to prevent GCC from generating any FP code by mistake,
"-mno-sse -mno-mmx -mno-sse2 -mno-avx -mno-pku" compiler parameter is added to
avoid fake failure. Thanks Dave Hansen's suggestion.
This series introduces only the most basic XSAVE tests. In the future, the
intention is to continue expanding the scope of these selftests to include
more xstates and kernel XSAVE-related functionality tests.
========
- Change from v6 to v7:
- Added the error number and error description of the reason for the
failure, thanks Shuah Khan's suggestion.
- Added a description of what these tests are doing in the head comments.
- Added changes update in the head comments.
- Added description of the purpose of the function. thanks Shuah Khan.
- Change from v5 to v6:
- In order to prevent GCC from generating any FP code by mistake,
"-mno-sse -mno-mmx -mno-sse2 -mno-avx -mno-pku" compiler parameter was
added, it's referred to the parameters for compiling the x86 kernel. Thanks
Dave Hansen's suggestion.
- Removed the use of "kselftest.h", because kselftest.h included <stdlib.h>,
and "stdlib.h" would use sse instructions in it's libc, and this *XSAVE*
test needed to be compiled without libc sse instructions(-mno-sse).
- Improved the description in commit header, thanks Chen Yu's suggestion.
- Becasue test code could not use buildin xsave64 in libc without sse, added
xsave function by instruction way.
- Every key test action would not use libc(like printf) except syscall until
it's failed or done. If it's failed, then it would print the failed reason.
- Used __cpuid_count() instead of native_cpuid(), becasue __cpuid_count()
was a macro definition function with one instruction in libc and did not
change xstate. Thanks Chatre Reinette, Shuah Khan.
https://lore.kernel.org/linux-sgx/8b7c98f4-f050-bc1c-5699-fa598ecc66a2@linu…
- Change from v4 to v5:
- Moved code files into tools/testing/selftests/x86.
- Delete xsave instruction test, becaue it's not related to kernel.
- Improved case description.
- Added AVX512 opmask change and related XSAVE content verification.
- Added PKRU part xstate test into instruction and signal handling test.
- Added XSAVE process swich test for FPU, AVX2, AVX512 opmask and PKRU part.
- Change from v3 to v4:
- Improve the comment in patch 1.
- Change from v2 to v3:
- Improve the description of patch 2 git log.
- Change from v1 to v2:
- Improve the cover-letter. Thanks Dave Hansen's suggestion.
Pengfei Xu (2):
selftests/x86: add xsave test related to nested signal handling
selftests/x86: add xsave test related to process switching
tools/testing/selftests/x86/Makefile | 4 +-
tools/testing/selftests/x86/xsave_common.h | 397 ++++++++++++++++++
tools/testing/selftests/x86/xsave_fork_test.c | 148 +++++++
.../selftests/x86/xsave_signal_handle.c | 189 +++++++++
4 files changed, 737 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/x86/xsave_common.h
create mode 100644 tools/testing/selftests/x86/xsave_fork_test.c
create mode 100644 tools/testing/selftests/x86/xsave_signal_handle.c
--
2.31.1
The KUnit documentation was not very organized. There was little
information related to KUnit architecture and the importance of unit
testing.
Add some new pages, expand and reorganize the existing documentation.
Reword pages to make information and style more consistent.
Changes since v5:
https://lore.kernel.org/linux-kselftest/20211217043716.794289-1-sharinder@g…
-- Forgot to add the new .svg diagram file to git.
Changes since v4:
https://lore.kernel.org/linux-kselftest/20211216055958.634097-1-sharinder@g…
-- Replaced kunit_suitememorydiagram.png with kunit_suitememorydiagram.svg
Changes since v3:
https://lore.kernel.org/linux-kselftest/20211210052812.1998578-1-sharinder@…
--Reworded sentences as per comments
--Replaced Elixir links with kernel.org links or kernel-doc references
Changes since v2:
https://lore.kernel.org/linux-kselftest/20211207054019.1455054-1-sharinder@…
--Reworded sentences as per comments
--Expanded the explaination in usage.rst for accessing the current test example
--Standardized on US english in style.rst
Changes since v1:
https://lore.kernel.org/linux-kselftest/20211203042437.740255-1-sharinder@g…
--Fixed spelling mistakes
--Restored paragraph about kunit_tool introduction
--Added note about CONFIG_KUNIT_ALL_TESTS (Thanks Tim Bird for review
comments)
-- Miscellaneous changes
Harinder Singh (7):
Documentation: KUnit: Rewrite main page
Documentation: KUnit: Rewrite getting started
Documentation: KUnit: Added KUnit Architecture
Documentation: kunit: Reorganize documentation related to running
tests
Documentation: KUnit: Rework writing page to focus on writing tests
Documentation: KUnit: Restyle Test Style and Nomenclature page
Documentation: KUnit: Restyled Frequently Asked Questions
.../dev-tools/kunit/architecture.rst | 204 +++++++
Documentation/dev-tools/kunit/faq.rst | 73 ++-
Documentation/dev-tools/kunit/index.rst | 172 +++---
.../kunit/kunit_suitememorydiagram.svg | 81 +++
Documentation/dev-tools/kunit/run_manual.rst | 57 ++
Documentation/dev-tools/kunit/run_wrapper.rst | 247 ++++++++
Documentation/dev-tools/kunit/start.rst | 198 +++---
Documentation/dev-tools/kunit/style.rst | 105 ++--
Documentation/dev-tools/kunit/usage.rst | 578 ++++++++----------
9 files changed, 1128 insertions(+), 587 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/architecture.rst
create mode 100644 Documentation/dev-tools/kunit/kunit_suitememorydiagram.svg
create mode 100644 Documentation/dev-tools/kunit/run_manual.rst
create mode 100644 Documentation/dev-tools/kunit/run_wrapper.rst
base-commit: 4c388a8e740d3235a194f330c8ef327deef710f6
--
2.34.1.173.g76aa8bc2d0-goog
Good Day,
My name is Luis Fernandez, I am contacting you because we have
investors that have the capacity to invest in any massive project
in your country or invest in your existing project that requires
funding.
Kindly get back to me for more details.
Regards
Luis Fernandez
This is similar to TCP-MD5 in functionality but it's sufficiently
different that packet formats and interfaces are incompatible.
Compared to TCP-MD5 more algorithms are supported and multiple keys
can be used on the same connection but there is still no negotiation
mechanism.
Expected use-case is protecting long-duration BGP/LDP connections
between routers using pre-shared keys. The goal of this series is to
allow routers using the Linux TCP stack to interoperate with vendors
such as Cisco and Juniper.
Both algorithms described in RFC5926 are implemented but the code is not
very easily extensible beyond that. In particular there are several code
paths making stack allocations based on RFC5926 maximum, those would
have to be increased. Support for arbitrary algorithms was requested
in reply to previous posts but I believe there is no real use case for
that.
The current implementation is somewhat loose regarding configuration:
* Overlaping MKTs can be configured despite what RFC5925 says
* Current key can be deleted. RFC says this shouldn't be allowed but
enforcing this belongs at an admin shell rather than in the kernel.
* If multiple keys are valid for a destination the kernel picks one
in an unpredictable manner (this can be overridden).
These conditions could be tightened but it is not clear the kernel
should spend effort to prevent misconfiguration from userspace.
The major change in this version is switching from per-socket to
per-netns keys. This is quite a large change and means that keys can
leak if not explicitly removed but the expected usecase is long-lived
routing daemon anyway. The fact that key management no longer needs
to be duplicate on listen sockets and active connection actually
simplifies them.
The TCP_AUTHOPT option still needs to be enabled for each individual
socket in order for AO keys to take effect.
Here are some known flaws and limitations:
* Crypto API is used with buffers on the stack and inside struct sock,
this might not work on all arches. I'm currently only testing x64 VMs
* Interaction with TCP-MD5 not tested in all corners.
* Interaction with FASTOPEN not tested and unlikely to work because
sequence number assumptions for syn/ack.
* No way to limit keys on a per-port basis (used to be implicit with
per-socket keys).
* Not clear if crypto_ahash_setkey might sleep. If some implementation
do that then maybe they could be excluded through alloc flags.
* Traffic key is not cached (reducing performance)
* No caching or hashing for key lookups so this will scale poorly with
many keys
There is relatively little code sharing with the TCP_MD5SIG feature and
earlier versions shared even less. Unlike MD5 the AO feature is kept
separate from the rest of the TCP code and reusing code would require
many unrelated cleanup changes.
I'm not convinced that "authopt" is particularly good naming convention,
this name is a personal invention that does not appear anywhere else.
The RFC calls this "tcp-ao". Perhaps TCP_AOSIG would be a better name
and it would also make the close connection to TCP_MD5SIG more visible?
Some testing support is included in nettest and fcnal-test.sh, similar
to the current level of tcp-md5 testing.
A more elaborate test suite using pytest and scapy is available out of
tree: https://github.com/cdleonard/tcp-authopt-test That test suite is
much larger that the kernel code and did not receive many comments in
previous ports so I will attempt to push it separately (if at all).
Changes for frr (old): https://github.com/FRRouting/frr/pull/9442
That PR was made early for ABI feedback, it has many issues.
Changes for yabgp (old): https://github.com/cdleonard/yabgp/commits/tcp_authopt
This can be used for easy interoperability testing with cisco/juniper/etc.
Would need updates for global keys to avoid leaks
Changes since PATCH v3:
* Made keys global (per-netns rather than per-sock).
* Add /proc/net/tcp_authopt with a table of keys (not sockets).
* Fix part of the shash/ahash conversion having slipped from patch 3 to patch 5
* Fix tcp_parse_sig_options assigning NULL incorrectly when both MD5 and AO
are disabled (kernel build robot)
* Fix sparse endianness warnings in prefix match (kernel build robot)
* Fix several incorrect RCU annotations reported by sparse (kernel build robot)
Link: https://lore.kernel.org/netdev/cover.1638962992.git.cdleonard@gmail.com/
Changes since PATCH v2:
* Protect tcp_authopt_alg_get/put_tfm with local_bh_disable instead of
preempt_disable. This caused signature corruption when send path executing
with BH enabled was interrupted by recv.
* Fix accepted keyids not configured locally as "unexpected". If any key
is configured that matches the peer then traffic MUST be signed.
* Fix issues related to sne rollover during handshake itself. (Francesco)
* Implement and test prefixlen (David)
* Replace shash with ahash and reuse some of the MD5 code (Dmitry)
* Parse md5+ao options only once in the same function (Dmitry)
* Pass tcp_authopt_info into inbound check path, this avoids second rcu
dereference for same packet.
* Pass tcp_request_socket into inbound check path instead of just listen
socket. This is required for SNE rollover during handshake and clearifies
ISN handling.
* Do not allow disabling via sysctl after enabling once, this is difficult
to support well (David)
* Verbose check for sysctl_tcp_authopt (Dmitry)
* Use netif_index_is_l3_master (David)
* Cleanup ipvx_addr_match (David)
* Add a #define tcp_authopt_needed to wrap static key usage because it looks
nicer.
* Replace rcu_read_lock with rcu_dereference_protected in SNE updates (Eric)
* Remove test suite
Link: https://lore.kernel.org/netdev/cover.1635784253.git.cdleonard@gmail.com/
Changes since PATCH v1:
* Implement Sequence Number Extension
* Implement l3index for vrf: TCP_AUTHOPT_KEY_IFINDEX as equivalent of
TCP_MD5SIG_FLAG_IFINDEX
* Expand TCP-AO tests in fcnal-test.sh to near-parity with md5.
* Show addr/port on failure similar to md5
* Remove tox dependency from test suite (create venv directly)
* Switch default pytest output format to TAP (kselftest standard)
* Fix _copy_from_sockptr_tolerant stack corruption on short sockopts.
This was covered in test but error was invisible without STACKPROTECTOR=y
* Fix sysctl_tcp_authopt check in tcp_get_authopt_val before memset. This
was harmless because error code is checked in getsockopt anyway.
* Fix dropping md5 packets on all sockets with AO enabled
* Fix checking (key->recv_id & TCP_AUTHOPT_KEY_ADDR_BIND) instead of
key->flags in tcp_authopt_key_match_exact
* Fix PATCH 1/19 not compiling due to missing "int err" declaration
* Add ratelimited message for AO and MD5 both present
* Export all symbols required by CONFIG_IPV6=m (again)
* Fix compilation with CONFIG_TCP_AUTHOPT=y CONFIG_TCP_MD5SIG=n
* Fix checkpatch issues
* Pass -rrequirements.txt to tox to avoid dependency variation.
Link: https://lore.kernel.org/netdev/cover.1632240523.git.cdleonard@gmail.com/
Changes since RFCv3:
* Implement TCP_AUTHOPT handling for timewait and reset replies. Write
tests to execute these paths by injecting packets with scapy
* Handle combining md5 and authopt: if both are configured use authopt.
* Fix locking issues around send_key, introduced in on of the later patches.
* Handle IPv4-mapped-IPv6 addresses: it used to be that an ipv4 SYN sent
to an ipv6 socket with TCP-AO triggered WARN
* Implement un-namespaced sysctl disabled this feature by default
* Allocate new key before removing any old one in setsockopt (Dmitry)
* Remove tcp_authopt_key_info.local_id because it's no longer used (Dmitry)
* Propagate errors from TCP_AUTHOPT getsockopt (Dmitry)
* Fix no-longer-correct TCP_AUTHOPT_KEY_DEL docs (Dmitry)
* Simplify crypto allocation (Eric)
* Use kzmalloc instead of __GFP_ZERO (Eric)
* Add static_key_false tcp_authopt_needed (Eric)
* Clear authopt_info copied from oldsk in __tcp_authopt_openreq (Eric)
* Replace memcmp in ipv4 and ipv6 addr comparisons (Eric)
* Export symbols for CONFIG_IPV6=m (kernel test robot)
* Mark more functions static (kernel test robot)
* Fix build with CONFIG_PROVE_RCU_LIST=y (kernel test robot)
Link: https://lore.kernel.org/netdev/cover.1629840814.git.cdleonard@gmail.com/
Changes since RFCv2:
* Removed local_id from ABI and match on send_id/recv_id/addr
* Add all relevant out-of-tree tests to tools/testing/selftests
* Return an error instead of ignoring unknown flags, hopefully this makes
it easier to extend.
* Check sk_family before __tcp_authopt_info_get_or_create in tcp_set_authopt_key
* Use sock_owned_by_me instead of WARN_ON(!lockdep_sock_is_held(sk))
* Fix some intermediate build failures reported by kbuild robot
* Improve documentation
Link: https://lore.kernel.org/netdev/cover.1628544649.git.cdleonard@gmail.com/
Changes since RFC:
* Split into per-topic commits for ease of review. The intermediate
commits compile with a few "unused function" warnings and don't do
anything useful by themselves.
* Add ABI documention including kernel-doc on uapi
* Fix lockdep warnings from crypto by creating pools with one shash for
each cpu
* Accept short options to setsockopt by padding with zeros; this
approach allows increasing the size of the structs in the future.
* Support for aes-128-cmac-96
* Support for binding addresses to keys in a way similar to old tcp_md5
* Add support for retrieving received keyid/rnextkeyid and controling
the keyid/rnextkeyid being sent.
Link: https://lore.kernel.org/netdev/01383a8751e97ef826ef2adf93bfde3a08195a43.162…
Leonard Crestez (19):
tcp: authopt: Initial support and key management
docs: Add user documentation for tcp_authopt
tcp: authopt: Add crypto initialization
tcp: md5: Refactor tcp_sig_hash_skb_data for AO
tcp: authopt: Compute packet signatures
tcp: authopt: Hook into tcp core
tcp: authopt: Disable via sysctl by default
tcp: authopt: Implement Sequence Number Extension
tcp: ipv6: Add AO signing for tcp_v6_send_response
tcp: authopt: Add support for signing skb-less replies
tcp: ipv4: Add AO signing for skb-less replies
tcp: authopt: Add key selection controls
tcp: authopt: Add initial l3index support
tcp: authopt: Add NOSEND/NORECV flags
tcp: authopt: Add prefixlen support
tcp: authopt: Add /proc/net/tcp_authopt listing all keys
selftests: nettest: Rename md5_prefix to key_addr_prefix
selftests: nettest: Initial tcp_authopt support
selftests: net/fcnal: Initial tcp_authopt support
Documentation/networking/index.rst | 1 +
Documentation/networking/ip-sysctl.rst | 6 +
Documentation/networking/tcp_authopt.rst | 88 +
include/linux/tcp.h | 9 +
include/net/net_namespace.h | 4 +
include/net/netns/tcp_authopt.h | 12 +
include/net/tcp.h | 27 +-
include/net/tcp_authopt.h | 323 ++++
include/uapi/linux/snmp.h | 1 +
include/uapi/linux/tcp.h | 137 ++
net/ipv4/Kconfig | 14 +
net/ipv4/Makefile | 1 +
net/ipv4/proc.c | 1 +
net/ipv4/sysctl_net_ipv4.c | 39 +
net/ipv4/tcp.c | 68 +-
net/ipv4/tcp_authopt.c | 1799 +++++++++++++++++++++
net/ipv4/tcp_input.c | 41 +-
net/ipv4/tcp_ipv4.c | 138 +-
net/ipv4/tcp_minisocks.c | 12 +
net/ipv4/tcp_output.c | 86 +-
net/ipv6/tcp_ipv6.c | 110 +-
tools/testing/selftests/net/fcnal-test.sh | 329 +++-
tools/testing/selftests/net/nettest.c | 204 ++-
23 files changed, 3364 insertions(+), 86 deletions(-)
create mode 100644 Documentation/networking/tcp_authopt.rst
create mode 100644 include/net/netns/tcp_authopt.h
create mode 100644 include/net/tcp_authopt.h
create mode 100644 net/ipv4/tcp_authopt.c
base-commit: f4f2970dfd87e5132c436e6125148914596a9863
--
2.25.1
Some distros are now storing the Kconfig in /lib/modules/`uname -r`/config.
Check there first before attempting to read it from /proc or extract it
from the kernel image.
Fix "ignored null byte in input" warning.
Mimi Zohar (2):
selftest/kexec: fix "ignored null byte in input" warning
selftests/kexec: update searching for the Kconfig
tools/testing/selftests/kexec/kexec_common_lib.sh | 13 +++++++++----
.../testing/selftests/kexec/test_kexec_file_load.sh | 5 +++--
2 files changed, 12 insertions(+), 6 deletions(-)
--
2.27.0
Dzień dobry,
dostrzegam możliwość współpracy z Państwa firmą.
Świadczymy kompleksową obsługę inwestycji w fotowoltaikę, która obniża koszty energii elektrycznej nawet o 90%.
Czy są Państwo zainteresowani weryfikacją wstępnych propozycji?
Pozdrawiam,
Jakub Daroch
Synchronous Ethernet networks use a physical layer clock to syntonize
the frequency across different network elements.
Basic SyncE node defined in the ITU-T G.8264 consist of an Ethernet
Equipment Clock (EEC) and have the ability to synchronize to reference
frequency sources.
This patch series is a prerequisite for EEC object and adds ability
to enable recovered clocks in the physical layer of the netdev object.
Recovered clocks can be used as one of the reference signal by the EEC.
Further work is required to add the DPLL subsystem, link it to the
netdev object and create API to read the EEC DPLL state.
v6:
- adapt to removal of 'enum ice_status' in net-next
v5:
- rewritten the documentation
- fixed doxygen headers
v4:
- Dropped EEC_STATE reporting (TBD: DPLL subsystem)
- moved recovered clock configuration to ethtool netlink
v3:
- remove RTM_GETRCLKRANGE
- return state of all possible pins in the RTM_GETRCLKSTATE
- clarify documentation
v2:
- improved documentation
- fixed kdoc warning
RFC history:
v2:
- removed whitespace changes
- fix issues reported by test robot
v3:
- Changed naming from SyncE to EEC
- Clarify cover letter and commit message for patch 1
v4:
- Removed sync_source and pin_idx info
- Changed one structure to attributes
- Added EEC_SRC_PORT flag to indicate that the EEC is synchronized
to the recovered clock of a port that returns the state
v5:
- add EEC source as an optiona attribute
- implement support for recovered clocks
- align states returned by EEC to ITU-T G.781
v6:
- fix EEC clock state reporting
- add documentation
- fix descriptions in code comments
Arkadiusz Kubalewski (4):
ice: add support detecting features based on netlist
ethtool: Add ability to configure recovered clock for SyncE feature
ice: add support for monitoring SyncE DPLL state
ice: add support for recovered clocks
Documentation/networking/ethtool-netlink.rst | 62 ++++
drivers/net/ethernet/intel/ice/ice.h | 7 +
.../net/ethernet/intel/ice/ice_adminq_cmd.h | 70 ++++-
drivers/net/ethernet/intel/ice/ice_common.c | 224 +++++++++++++++
drivers/net/ethernet/intel/ice/ice_common.h | 20 ++
drivers/net/ethernet/intel/ice/ice_devids.h | 3 +
drivers/net/ethernet/intel/ice/ice_ethtool.c | 96 +++++++
drivers/net/ethernet/intel/ice/ice_lib.c | 6 +-
drivers/net/ethernet/intel/ice/ice_ptp.c | 35 +++
drivers/net/ethernet/intel/ice/ice_ptp_hw.c | 49 ++++
drivers/net/ethernet/intel/ice/ice_ptp_hw.h | 36 +++
drivers/net/ethernet/intel/ice/ice_type.h | 1 +
include/linux/ethtool.h | 9 +
include/uapi/linux/ethtool_netlink.h | 21 ++
net/ethtool/Makefile | 3 +-
net/ethtool/netlink.c | 20 ++
net/ethtool/netlink.h | 4 +
net/ethtool/synce.c | 267 ++++++++++++++++++
18 files changed, 930 insertions(+), 3 deletions(-)
create mode 100644 net/ethtool/synce.c
--
2.31.1
This series adds initial support for testing KVM RISC-V 64-bit using
kernel selftests framework. The PATCH1 & PATCH2 of this series does
some ground work in KVM RISC-V to implement RISC-V support in the KVM
selftests whereas remaining patches does required changes in the KVM
selftests.
These patches can be found in riscv_kvm_selftests_v2 branch at:
https://github.com/avpatel/linux.git
Changes since v1:
- Renamed kvm_sbi_ext_expevend_handler() to kvm_sbi_ext_forward_handler()
in PATCH1
- Renamed KVM_CAP_RISCV_VM_GPA_SIZE to KVM_CAP_VM_GPA_BITS in PATCH2
and PATCH4
Anup Patel (4):
RISC-V: KVM: Forward SBI experimental and vendor extensions
RISC-V: KVM: Add VM capability to allow userspace get GPA bits
KVM: selftests: Add EXTRA_CFLAGS in top-level Makefile
KVM: selftests: Add initial support for RISC-V 64-bit
arch/riscv/include/asm/kvm_host.h | 1 +
arch/riscv/kvm/mmu.c | 5 +
arch/riscv/kvm/vcpu_sbi.c | 4 +
arch/riscv/kvm/vcpu_sbi_base.c | 27 ++
arch/riscv/kvm/vm.c | 3 +
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/Makefile | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 10 +
.../selftests/kvm/include/riscv/processor.h | 135 +++++++
tools/testing/selftests/kvm/lib/guest_modes.c | 10 +
.../selftests/kvm/lib/riscv/processor.c | 362 ++++++++++++++++++
tools/testing/selftests/kvm/lib/riscv/ucall.c | 87 +++++
12 files changed, 658 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/include/riscv/processor.h
create mode 100644 tools/testing/selftests/kvm/lib/riscv/processor.c
create mode 100644 tools/testing/selftests/kvm/lib/riscv/ucall.c
--
2.25.1
The XSAVE feature set supports the saving and restoring of xstate components,
which is used for process context switching. The state components include
x87 state for FPU execution environment, SSE state, AVX state and so on.
In order to ensure that XSAVE works correctly, add XSAVE basic test for XSAVE
architecture functionality.
This patch set tests and verifies the basic functions of XSAVE in user
space; it tests "FPU, SSE(XMM), AVX2(YMM), AVX512 opmask and PKRU parts"
xstates with following cases:
1. In nested signal processing, the signal handling will use each signal's own
xstates, and the xstates of the signal handling under test should not be
changed after another nested signal handling is completed; and these xstates
content in the process should not change after the nested signal handling
is complete.
2. The xstates content of the child process should be the same as that of the
parent process. The xstates content of the process should be the same across
process switching.
This is the xstates position for FP, XMM, Header, YMM, AVX512 opmask and PKRU:
It could be saved by xsave instruction, and mask could control which part will
be saved(Header will be saved as mandatory):
FP (0 - 159 bytes)
XMM (160-415 bytes)
Reserved (416-511 bytes SDM vol1 13.4.1)
Header_used (512-527 bytes)
Headed_reserved (528-575 bytes must 00, otherwise xrstor will #GP)
YMM (Offset:CPUID.(EAX=0D,ECX=2).EBX Size:CPUID(EAX=0D,ECX=2).EAX)
AVX512 opmask (Offset:CPUID.(EAX=0D,ECX=5).EBX Size:CPUID(EAX=0D,ECX=5).EAX)
PKRU (Offset:CPUID.(EAX=0D,ECX=9).EBX Size:CPUID(EAX=0D,ECX=9).EAX)
It uses syscall function instead of fork() function, becasue syscall libc
function will resume xstates after syscall if there is some xstates change
in syscall libc function.
And populate the xstates will try not to use libc, and every key test action
will try not to use libc except syscall until it's failed or done.
In order to prevent GCC from generating any FP code by mistake,
"-mno-sse -mno-mmx -mno-sse2 -mno-avx -mno-pku" compiler parameter is added to
avoid fake failure. Thanks Dave Hansen's suggestion.
This series introduces only the most basic XSAVE tests. In the future, the
intention is to continue expanding the scope of these selftests to include
more xstates and kernel XSAVE-related functionality tests.
========
- Change from v5 to v6:
- In order to prevent GCC from generating any FP code by mistake,
"-mno-sse -mno-mmx -mno-sse2 -mno-avx -mno-pku" compiler parameter was
added, it referred to the parameters for compiling the x86 kernel. Thanks
Dave Hansen's suggestion.
- Removed the use of "kselftest.h", because kselftest.h included <stdlib.h>,
and "stdlib.h" would use sse instructions in it's libc, and this *XSAVE*
test needed to be compiled without libc sse instructions(-mno-sse).
- Improved the description in commit header, thanks Chen Yu's suggestion.
- Becasue test code could not use buildin xsave64 in libc without sse, added
xsave function by instruction way.
- Every key test action would not use libc(like printf) except syscall until
it's failed or done. If it's failed, then it would print the failed reason.
- Used __cpuid_count() instead of native_cpuid(), becasue __cpuid_count()
was a macro definition function with one instruction in libc and did not
change xstate. Thanks Chatre Reinette, Shuah.
https://lore.kernel.org/linux-sgx/8b7c98f4-f050-bc1c-5699-fa598ecc66a2@linu…
- Change from v4 to v5:
- Moved code files into tools/testing/selftests/x86.
- Delete xsave instruction test, becaue it's not related to kernel.
- Improved case description.
- Added AVX512 opmask change and related XSAVE content verification.
- Added PKRU part xstate test into instruction and signal handling test.
- Added XSAVE process swich test for FPU, AVX2, AVX512 opmask and PKRU part.
- Change from v3 to v4:
- Improve the comment in patch 1.
- Change from v2 to v3:
- Improve the description of patch 2 git log.
- Change from v1 to v2:
- Improve the cover-letter. Thanks Dave Hansen's suggestion.
Pengfei Xu (2):
selftests/x86: add xsave test related to nested signal handling
selftests/x86: add xsave test related to process switching
tools/testing/selftests/x86/Makefile | 4 +-
tools/testing/selftests/x86/xsave_common.h | 380 ++++++++++++++++++
tools/testing/selftests/x86/xsave_fork_test.c | 117 ++++++
.../selftests/x86/xsave_signal_handle.c | 151 +++++++
4 files changed, 651 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/x86/xsave_common.h
create mode 100644 tools/testing/selftests/x86/xsave_fork_test.c
create mode 100644 tools/testing/selftests/x86/xsave_signal_handle.c
--
2.31.1