This series aims to clarify the behavior of the KVM_GET_EMULATED_CPUID
ioctl, and fix a corner case where -E2BIG is returned when
the nent field of struct kvm_cpuid2 is matching the amount of
emulated entries that kvm returns.
Patch 1 proposes the nent field fix to cpuid.c,
patch 2 updates the ioctl documentation accordingly and
patches 3 and 4 extend the x86_64/get_cpuid_test.c selftest to check
the intended behavior of KVM_GET_EMULATED_CPUID.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit(a)redhat.com>
---
v3:
- clearer commit message and problem explanation
- pre-initialize the stack variable 'entry' in __do_cpuid_func_emulated
so that the various eax/ebx/ecx are initialized if not set by func.
Emanuele Giuseppe Esposito (4):
KVM: x86: Fix a spurious -E2BIG in KVM_GET_EMULATED_CPUID
Documentation: KVM: update KVM_GET_EMULATED_CPUID ioctl description
selftests: add kvm_get_emulated_cpuid to processor.h
selftests: KVM: extend get_cpuid_test to include
KVM_GET_EMULATED_CPUID
Documentation/virt/kvm/api.rst | 10 +--
arch/x86/kvm/cpuid.c | 33 ++++---
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 33 +++++++
.../selftests/kvm/x86_64/get_cpuid_test.c | 90 ++++++++++++++++++-
5 files changed, 142 insertions(+), 25 deletions(-)
--
2.30.2
This patchset introduces batched operations for the per-cpu variant of
the array map.
It also introduces a standard way to define per-cpu values via the
'BPF_PERCPU_TYPE()' macro, which handles the alignment transparently.
This was already implemented in the selftests and was merely refactored
out to libbpf, with some simplifications for reuse.
The tests were updated to reflect all the new changes.
Pedro Tammela (3):
bpf: add batched ops support for percpu array
libbpf: selftests: refactor 'BPF_PERCPU_TYPE()' and 'bpf_percpu()'
macros
bpf: selftests: update array map tests for per-cpu batched ops
kernel/bpf/arraymap.c | 2 +
tools/lib/bpf/bpf.h | 10 ++
tools/testing/selftests/bpf/bpf_util.h | 7 --
.../bpf/map_tests/array_map_batch_ops.c | 114 +++++++++++++-----
.../bpf/map_tests/htab_map_batch_ops.c | 48 ++++----
.../selftests/bpf/prog_tests/map_init.c | 5 +-
tools/testing/selftests/bpf/test_maps.c | 16 +--
7 files changed, 133 insertions(+), 69 deletions(-)
--
2.25.1
Changelog
RFC v1-->v2
The timer based test produces run to run variance on some intel based
systems that sport a mechansim of "C-state pre-wake" which can
pre-wake a CPU from an idle state when timers are armed.
Hence invoking the timer tests is now parameterized for systems and
architectures that don't support pre-wakeup logic and need granular
timer measurements along with IPI results.
This RFC does not yet support treating of CPU 0s idle states differently
especially as reported on Intel systems. More understanding is needed
on systems to determine if only CPU 0 is treated differently of if they
are more CPUs that cannot have its idle state properties changed.
RFC v1: https://lkml.org/lkml/2021/3/15/492
---
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - State0: 3265
# Observed Average IPI latency(ns) - State1: 3507
# Observed Average IPI latency(ns) - State2: 3739
# Observed Average IPI latency(ns) - State3: 3807
# Observed Average IPI latency(ns) - State4: 17070
# Observed Average IPI latency(ns) - State5: 1038174
# Observed Average IPI latency(ns) - State6: 1068784
#
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - State0: 1640
# Observed Average timeout diff(ns) - State1: 1764
# Observed Average timeout diff(ns) - State2: 1715
# Observed Average timeout diff(ns) - State3: 1845
# Observed Average timeout diff(ns) - State4: 16581
# Observed Average timeout diff(ns) - State5: 939977
# Observed Average timeout diff(ns) - State6: 1073024
Things to keep in mind:
1. This kernel module + bash driver does not guarantee idleness on a
core when the IPI and the Timer is armed. It only invokes sleep and
hopes that the core is idle once the IPI/Timer is invoked onto it.
Hence this program must be run on a completely idle system for best
results
2. Even on a completely idle system, there maybe book-keeping tasks or
jitter tasks that can run on the core we want idle. This can create
outliers in the latency measurement. Thankfully, these outliers
should be large enough to easily weed them out.
3. A userspace only selftest variant was also sent out as RFC based on
suggestions over the previous patchset to simply the kernel
complexeity. However, a userspace only approach had more noise in
the latency measurement due to userspace-kernel interactions
which led to run to run variance and a lesser accurate test.
Another downside of the nature of a userspace program is that it
takes orders of magnitude longer to complete a full system test
compared to the kernel framework.
RFC patch: https://lkml.org/lkml/2020/9/2/356
4. For Intel Systems, the Timer based latencies don't exactly give out
the measure of idle latencies. This is because of a hardware
optimization mechanism that pre-arms a CPU when a timer is set to
wakeup. That doesn't make this metric useless for Intel systems,
it just means that is measuring IPI/Timer responding latency rather
than idle wakeup latencies.
(Source: https://lkml.org/lkml/2020/9/2/610)
For solution to this problem, a hardware based latency analyzer is
devised by Artem Bityutskiy from Intel.
https://youtu.be/Opk92aQyvt0?t=8266https://intel.github.io/wult/
Pratik Rajesh Sampat (2):
cpuidle: Extract IPI based and timer based wakeup latency from idle
states
selftest/cpuidle: Add support for cpuidle latency measurement
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 157 ++++++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 323 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 2 +
7 files changed, 500 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.17.1
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or common
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous,
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
Jing Zhang (4):
KVM: stats: Separate common stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
Documentation/virt/kvm/api.rst | 169 ++++++++
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 42 +-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 67 +++-
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 68 +++-
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 63 ++-
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 133 ++++++-
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 71 +++-
include/linux/kvm_host.h | 132 ++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 48 +++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_bin_form_stats.c | 370 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 11 +
virt/kvm/kvm_main.c | 237 ++++++++++-
24 files changed, 1401 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
base-commit: f96be2deac9bca3ef5a2b0b66b71fcef8bad586d
--
2.31.0.208.g409f899ff0-goog
The current way to provide a no-op flag to 'bpf_ringbuf_submit()',
'bpf_ringbuf_discard()' and 'bpf_ringbuf_output()' is to provide a '0'
value.
A '0' value might notify the consumer if it already caught up in processing,
so let's provide a more descriptive notation for this value.
Signed-off-by: Pedro Tammela <pctammela(a)mojatatu.com>
---
include/uapi/linux/bpf.h | 8 ++++++++
tools/include/uapi/linux/bpf.h | 8 ++++++++
tools/testing/selftests/bpf/progs/ima.c | 2 +-
tools/testing/selftests/bpf/progs/ringbuf_bench.c | 2 +-
tools/testing/selftests/bpf/progs/test_ringbuf.c | 2 +-
tools/testing/selftests/bpf/progs/test_ringbuf_multi.c | 2 +-
6 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 598716742593..100cb2e4c104 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4058,6 +4058,8 @@ union bpf_attr {
* Copy *size* bytes from *data* into a ring buffer *ringbuf*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4066,6 +4068,7 @@ union bpf_attr {
* void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
* Description
* Reserve *size* bytes of payload in a ring buffer *ringbuf*.
+ * *flags* must be 0.
* Return
* Valid pointer with *size* bytes of memory available; NULL,
* otherwise.
@@ -4075,6 +4078,8 @@ union bpf_attr {
* Submit reserved ring buffer sample, pointed to by *data*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4085,6 +4090,8 @@ union bpf_attr {
* Discard reserved ring buffer sample, pointed to by *data*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4965,6 +4972,7 @@ enum {
* BPF_FUNC_bpf_ringbuf_output flags.
*/
enum {
+ BPF_RB_MAY_WAKEUP = 0,
BPF_RB_NO_WAKEUP = (1ULL << 0),
BPF_RB_FORCE_WAKEUP = (1ULL << 1),
};
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ab9f2233607c..3d6d324184c0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4058,6 +4058,8 @@ union bpf_attr {
* Copy *size* bytes from *data* into a ring buffer *ringbuf*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4066,6 +4068,7 @@ union bpf_attr {
* void *bpf_ringbuf_reserve(void *ringbuf, u64 size, u64 flags)
* Description
* Reserve *size* bytes of payload in a ring buffer *ringbuf*.
+ * *flags* must be 0.
* Return
* Valid pointer with *size* bytes of memory available; NULL,
* otherwise.
@@ -4075,6 +4078,8 @@ union bpf_attr {
* Submit reserved ring buffer sample, pointed to by *data*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4085,6 +4090,8 @@ union bpf_attr {
* Discard reserved ring buffer sample, pointed to by *data*.
* If **BPF_RB_NO_WAKEUP** is specified in *flags*, no notification
* of new data availability is sent.
+ * If **BPF_RB_MAY_WAKEUP** is specified in *flags*, notification
+ * of new data availability is sent if needed.
* If **BPF_RB_FORCE_WAKEUP** is specified in *flags*, notification
* of new data availability is sent unconditionally.
* Return
@@ -4959,6 +4966,7 @@ enum {
* BPF_FUNC_bpf_ringbuf_output flags.
*/
enum {
+ BPF_RB_MAY_WAKEUP = 0,
BPF_RB_NO_WAKEUP = (1ULL << 0),
BPF_RB_FORCE_WAKEUP = (1ULL << 1),
};
diff --git a/tools/testing/selftests/bpf/progs/ima.c b/tools/testing/selftests/bpf/progs/ima.c
index 96060ff4ffc6..0f4daced6aad 100644
--- a/tools/testing/selftests/bpf/progs/ima.c
+++ b/tools/testing/selftests/bpf/progs/ima.c
@@ -38,7 +38,7 @@ void BPF_PROG(ima, struct linux_binprm *bprm)
return;
*sample = ima_hash;
- bpf_ringbuf_submit(sample, 0);
+ bpf_ringbuf_submit(sample, BPF_RB_MAY_WAKEUP);
}
return;
diff --git a/tools/testing/selftests/bpf/progs/ringbuf_bench.c b/tools/testing/selftests/bpf/progs/ringbuf_bench.c
index 123607d314d6..808e2e0e3d64 100644
--- a/tools/testing/selftests/bpf/progs/ringbuf_bench.c
+++ b/tools/testing/selftests/bpf/progs/ringbuf_bench.c
@@ -24,7 +24,7 @@ static __always_inline long get_flags()
long sz;
if (!wakeup_data_size)
- return 0;
+ return BPF_RB_MAY_WAKEUP;
sz = bpf_ringbuf_query(&ringbuf, BPF_RB_AVAIL_DATA);
return sz >= wakeup_data_size ? BPF_RB_FORCE_WAKEUP : BPF_RB_NO_WAKEUP;
diff --git a/tools/testing/selftests/bpf/progs/test_ringbuf.c b/tools/testing/selftests/bpf/progs/test_ringbuf.c
index 8ba9959b036b..03a5cbd21356 100644
--- a/tools/testing/selftests/bpf/progs/test_ringbuf.c
+++ b/tools/testing/selftests/bpf/progs/test_ringbuf.c
@@ -21,7 +21,7 @@ struct {
/* inputs */
int pid = 0;
long value = 0;
-long flags = 0;
+long flags = BPF_RB_MAY_WAKEUP;
/* outputs */
long total = 0;
diff --git a/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c b/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c
index edf3b6953533..f33c3fdfb1d6 100644
--- a/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c
+++ b/tools/testing/selftests/bpf/progs/test_ringbuf_multi.c
@@ -71,7 +71,7 @@ int test_ringbuf(void *ctx)
sample->seq = total;
total += 1;
- bpf_ringbuf_submit(sample, 0);
+ bpf_ringbuf_submit(sample, BPF_RB_MAY_WAKEUP);
return 0;
}
--
2.25.1
v1 by Uriel is here: [1].
Since it's been a while, I've dropped the Reviewed-By's.
It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which
hadn't been merged yet, so that caused some kerfuffle with applying them
previously and the series was reverted.
This revives the series but makes the kunit_fail_current_test() function
take a format string and logs the file and line number of the failing
code, addressing Alan Maguire's comments on the previous version.
As a result, the patch that makes UBSAN errors was tweaked slightly to
include an error message.
v2 -> v3:
Try and fail to make kunit_fail_current_test() work on CONFIG_KUNIT=m
s/_/__ on the helper func to match others in test.c
v3 -> v4:
Revert to only enabling kunit_fail_current_test() for CONFIG_KUNIT=y
[1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja…
Uriel Guajardo (2):
kunit: support failure from dynamic analysis tools
kunit: ubsan integration
include/kunit/test-bug.h | 30 ++++++++++++++++++++++++++++++
lib/kunit/test.c | 39 +++++++++++++++++++++++++++++++++++----
lib/ubsan.c | 3 +++
3 files changed, 68 insertions(+), 4 deletions(-)
create mode 100644 include/kunit/test-bug.h
base-commit: a74e6a014c9d4d4161061f770c9b4f98372ac778
--
2.31.0.rc2.261.g7f71774620-goog
This series improves the defensive posture of sysfs's use of seq_file
to gain the vmap guard pages at the end of vmalloc buffers to stop a
class of recurring flaw[1]. The long-term goal is to switch sysfs from
a buffer to using seq_file directly, but this will take time to refactor.
Included is also a Clang fix for NULL arithmetic and an LKDTM test to
validate vmalloc guard pages.
v4:
- fix NULL arithmetic (Arnd)
- add lkdtm test
- reword commit message
v3: https://lore.kernel.org/lkml/20210401022145.2019422-1-keescook@chromium.org/
v2: https://lore.kernel.org/lkml/20210315174851.622228-1-keescook@chromium.org/
v1: https://lore.kernel.org/lkml/20210312205558.2947488-1-keescook@chromium.org/
Thanks!
-Kees
Arnd Bergmann (1):
seq_file: Fix clang warning for NULL pointer arithmetic
Kees Cook (2):
lkdtm/heap: Add vmalloc linear overflow test
sysfs: Unconditionally use vmalloc for buffer
drivers/misc/lkdtm/core.c | 3 ++-
drivers/misc/lkdtm/heap.c | 21 +++++++++++++++++-
drivers/misc/lkdtm/lkdtm.h | 3 ++-
fs/kernfs/file.c | 9 +++++---
fs/seq_file.c | 5 ++++-
fs/sysfs/file.c | 29 +++++++++++++++++++++++++
include/linux/seq_file.h | 6 +++++
tools/testing/selftests/lkdtm/tests.txt | 3 ++-
8 files changed, 71 insertions(+), 8 deletions(-)
--
2.25.1
v1 by Uriel is here: [1].
Since it's been a while, I've dropped the Reviewed-By's.
It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which
hadn't been merged yet, so that caused some kerfuffle with applying them
previously and the series was reverted.
This revives the series but makes the kunit_fail_current_test() function
take a format string and logs the file and line number of the failing
code, addressing Alan Maguire's comments on the previous version.
As a result, the patch that makes UBSAN errors was tweaked slightly to
include an error message.
v2 -> v3:
Try and fail to make kunit_fail_current_test() work on CONFIG_KUNIT=m
s/_/__ on the helper func to match others in test.c
v3 -> v4:
Revert to only enabling kunit_fail_current_test() for CONFIG_KUNIT=y
v4 -> v5:
Delete blank line to make checkpatch.pl --strict happy
[1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja…
Uriel Guajardo (2):
kunit: support failure from dynamic analysis tools
kunit: ubsan integration
include/kunit/test-bug.h | 29 +++++++++++++++++++++++++++++
lib/kunit/test.c | 39 +++++++++++++++++++++++++++++++++++----
lib/ubsan.c | 3 +++
3 files changed, 67 insertions(+), 4 deletions(-)
create mode 100644 include/kunit/test-bug.h
base-commit: 1678e493d530e7977cce34e59a86bb86f3c5631e
--
2.31.0.208.g409f899ff0-goog
This patch set has several miscellaneous fixes to resctrl selftest tool
that are easily visible to user. V1 had fixes to CAT test and CMT test
but they were dropped in V2 because having them here made the patchset
humongous. So, changes to CAT test and CMT test will be posted in another
patchset.
Change Log:
v6:
- Add Tested-by: Babu Moger <babu.moger(a)amd.com>.
- Replace "cat" by CAT_STR etc (Babu).
- Capitalize the first letter of printed message (Babu).
v5:
- Address various comments from Shuah Khan:
1. Move a few fixing patches before cleaning patches.
2. Call kselftest APIs to log test results instead of printf().
3. Add .gitignore to ignore resctrl_tests.
4. Share show_cache_info() in CAT and CMT tests.
5. Define long_mask, cbm_mask, count_of_bits etc as static variables.
v4:
- Address various comments from Shuah Khan:
1. Combine a few patches e.g. a couple of fixing typos patches into one
and a couple of unmounting patches into one etc.
2. Add config file.
3. Remove "Fixes" tags.
4. Change strcmp() to strncmp().
5. Move the global variable fixing patch to the patch 1 so that the
compilation issue is fixed first.
Please note:
- I didn't move the patch of renaming CQM to CMT to the end of the series
because code and commit messages in a few other patches depend on the
new term of "CMT". If move the renaming patch to the end, the previous
patches use the old "CQM" term and code which will be changed soon at
the end of series and will cause more code and explanations.
[v3: https://lkml.org/lkml/2020/10/28/137]
v3:
Address various comments (commit messages, return value on test failure,
print failure info on test failure etc) from Reinette and Tony.
[v2: https://lore.kernel.org/linux-kselftest/cover.1589835155.git.sai.praneeth.p…]
v2:
1. Dropped changes to CAT test and CMT test as they will be posted in a later
series.
2. Added several other fixes
[v1: https://lore.kernel.org/linux-kselftest/cover.1583657204.git.sai.praneeth.p…]
Fenghua Yu (19):
selftests/resctrl: Enable gcc checks to detect buffer overflows
selftests/resctrl: Fix compilation issues for global variables
selftests/resctrl: Fix compilation issues for other global variables
selftests/resctrl: Clean up resctrl features check
selftests/resctrl: Fix missing options "-n" and "-p"
selftests/resctrl: Rename CQM test as CMT test
selftests/resctrl: Call kselftest APIs to log test results
selftests/resctrl: Share show_cache_info() by CAT and CMT tests
selftests/resctrl: Add config dependencies
selftests/resctrl: Check for resctrl mount point only if resctrl FS is
supported
selftests/resctrl: Use resctrl/info for feature detection
selftests/resctrl: Fix MBA/MBM results reporting format
selftests/resctrl: Don't hard code value of "no_of_bits" variable
selftests/resctrl: Modularize resctrl test suite main() function
selftests/resctrl: Skip the test if requested resctrl feature is not
supported
selftests/resctrl: Fix unmount resctrl FS
selftests/resctrl: Fix incorrect parsing of iMC counters
selftests/resctrl: Fix checking for < 0 for unsigned values
selftests/resctrl: Create .gitignore to include resctrl_tests
Reinette Chatre (2):
selftests/resctrl: Ensure sibling CPU is not same as original CPU
selftests/resctrl: Fix a printed message
tools/testing/selftests/resctrl/.gitignore | 2 +
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/README | 4 +-
tools/testing/selftests/resctrl/cache.c | 52 +++++-
tools/testing/selftests/resctrl/cat_test.c | 57 ++----
.../resctrl/{cqm_test.c => cmt_test.c} | 75 +++-----
tools/testing/selftests/resctrl/config | 2 +
tools/testing/selftests/resctrl/fill_buf.c | 4 +-
tools/testing/selftests/resctrl/mba_test.c | 43 ++---
tools/testing/selftests/resctrl/mbm_test.c | 42 ++---
tools/testing/selftests/resctrl/resctrl.h | 29 +++-
.../testing/selftests/resctrl/resctrl_tests.c | 163 ++++++++++++------
tools/testing/selftests/resctrl/resctrl_val.c | 95 ++++++----
tools/testing/selftests/resctrl/resctrlfs.c | 134 ++++++++------
14 files changed, 408 insertions(+), 296 deletions(-)
create mode 100644 tools/testing/selftests/resctrl/.gitignore
rename tools/testing/selftests/resctrl/{cqm_test.c => cmt_test.c} (56%)
create mode 100644 tools/testing/selftests/resctrl/config
--
2.31.0
A 'single_cpu_test' parameter is odd and it does not exist
anymore. Instead there was introduced a 'nr_threads' one.
If it is not set it behaves as the former parameter.
That is why update a "stress mode" according to this change
specifying number of workers which are equal to number of CPUs.
Also update an output of help message based on a new interface.
CC: linux-kselftest(a)vger.kernel.org
CC: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
---
tools/testing/selftests/vm/test_vmalloc.sh | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/vm/test_vmalloc.sh b/tools/testing/selftests/vm/test_vmalloc.sh
index 06d2bb109f06..d73b846736f1 100755
--- a/tools/testing/selftests/vm/test_vmalloc.sh
+++ b/tools/testing/selftests/vm/test_vmalloc.sh
@@ -11,6 +11,7 @@
TEST_NAME="vmalloc"
DRIVER="test_${TEST_NAME}"
+NUM_CPUS=`grep -c ^processor /proc/cpuinfo`
# 1 if fails
exitcode=1
@@ -22,9 +23,9 @@ ksft_skip=4
# Static templates for performance, stressing and smoke tests.
# Also it is possible to pass any supported parameters manualy.
#
-PERF_PARAM="single_cpu_test=1 sequential_test_order=1 test_repeat_count=3"
-SMOKE_PARAM="single_cpu_test=1 test_loop_count=10000 test_repeat_count=10"
-STRESS_PARAM="test_repeat_count=20"
+PERF_PARAM="sequential_test_order=1 test_repeat_count=3"
+SMOKE_PARAM="test_loop_count=10000 test_repeat_count=10"
+STRESS_PARAM="nr_threads=$NUM_CPUS test_repeat_count=20"
check_test_requirements()
{
@@ -58,8 +59,8 @@ run_perfformance_check()
run_stability_check()
{
- echo "Run stability tests. In order to stress vmalloc subsystem we run"
- echo "all available test cases on all available CPUs simultaneously."
+ echo "Run stability tests. In order to stress vmalloc subsystem all"
+ echo "available test cases are run by NUM_CPUS workers simultaneously."
echo "It will take time, so be patient."
modprobe $DRIVER $STRESS_PARAM > /dev/null 2>&1
@@ -92,17 +93,17 @@ usage()
echo "# Shows help message"
echo "./${DRIVER}.sh"
echo
- echo "# Runs 1 test(id_1), repeats it 5 times on all online CPUs"
- echo "./${DRIVER}.sh run_test_mask=1 test_repeat_count=5"
+ echo "# Runs 1 test(id_1), repeats it 5 times by NUM_CPUS workers"
+ echo "./${DRIVER}.sh nr_threads=$NUM_CPUS run_test_mask=1 test_repeat_count=5"
echo
echo -n "# Runs 4 tests(id_1|id_2|id_4|id_16) on one CPU with "
echo "sequential order"
- echo -n "./${DRIVER}.sh single_cpu_test=1 sequential_test_order=1 "
+ echo -n "./${DRIVER}.sh sequential_test_order=1 "
echo "run_test_mask=23"
echo
- echo -n "# Runs all tests on all online CPUs, shuffled order, repeats "
+ echo -n "# Runs all tests by NUM_CPUS workers, shuffled order, repeats "
echo "20 times"
- echo "./${DRIVER}.sh test_repeat_count=20"
+ echo "./${DRIVER}.sh nr_threads=$NUM_CPUS test_repeat_count=20"
echo
echo "# Performance analysis"
echo "./${DRIVER}.sh performance"
--
2.20.1
TL;DR
$ ./tools/testing/kunit/kunit.py run --kunitconfig=lib/kunit
Per suggestion from Ted [1], we can reduce the amount of typing by
assuming a convention that these files are named '.kunitconfig'.
In the case of [1], we now have
$ ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4
Also add in such a fragment for kunit itself so we can give that as an
example more close to home (and thus less likely to be accidentally
broken).
[1] https://lore.kernel.org/linux-ext4/YCNF4yP1dB97zzwD@mit.edu/
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
lib/kunit/.kunitconfig | 3 +++
tools/testing/kunit/kunit.py | 4 +++-
tools/testing/kunit/kunit_kernel.py | 2 ++
tools/testing/kunit/kunit_tool_test.py | 6 ++++++
4 files changed, 14 insertions(+), 1 deletion(-)
create mode 100644 lib/kunit/.kunitconfig
diff --git a/lib/kunit/.kunitconfig b/lib/kunit/.kunitconfig
new file mode 100644
index 000000000000..9235b7d42d38
--- /dev/null
+++ b/lib/kunit/.kunitconfig
@@ -0,0 +1,3 @@
+CONFIG_KUNIT=y
+CONFIG_KUNIT_TEST=y
+CONFIG_KUNIT_EXAMPLE_TEST=y
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index d5144fcb03ac..5da8fb3762f9 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -184,7 +184,9 @@ def add_common_opts(parser) -> None:
help='Run all KUnit tests through allyesconfig',
action='store_true')
parser.add_argument('--kunitconfig',
- help='Path to Kconfig fragment that enables KUnit tests',
+ help='Path to Kconfig fragment that enables KUnit tests.'
+ ' If given a directory, (e.g. lib/kunit), "/.kunitconfig" '
+ 'will get automatically appended.',
metavar='kunitconfig')
def add_build_opts(parser) -> None:
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index f309a33256cd..89a7d4024e87 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -132,6 +132,8 @@ class LinuxSourceTree(object):
return
if kunitconfig_path:
+ if os.path.isdir(kunitconfig_path):
+ kunitconfig_path = os.path.join(kunitconfig_path, KUNITCONFIG_PATH)
if not os.path.exists(kunitconfig_path):
raise ConfigError(f'Specified kunitconfig ({kunitconfig_path}) does not exist')
else:
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 1ad3049e9069..2e809dd956a7 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -251,6 +251,12 @@ class LinuxSourceTreeTest(unittest.TestCase):
with tempfile.NamedTemporaryFile('wt') as kunitconfig:
tree = kunit_kernel.LinuxSourceTree('', kunitconfig_path=kunitconfig.name)
+ def test_dir_kunitconfig(self):
+ with tempfile.TemporaryDirectory('') as dir:
+ with open(os.path.join(dir, '.kunitconfig'), 'w') as f:
+ pass
+ tree = kunit_kernel.LinuxSourceTree('', kunitconfig_path=dir)
+
# TODO: add more test cases.
base-commit: b12b47249688915e987a9a2a393b522f86f6b7ab
--
2.30.0.617.g56c4b15f3c-goog
This patch set has several miscellaneous fixes to resctrl selftest tool
that are easily visible to user. V1 had fixes to CAT test and CMT test
but they were dropped in V2 because having them here made the patchset
humongous. So, changes to CAT test and CMT test will be posted in another
patchset.
Change Log:
v5:
- Address various comments from Shuah Khan:
1. Move a few fixing patches before cleaning patches.
2. Call kselftest APIs to log test results instead of printf().
3. Add .gitignore to ignore resctrl_tests.
4. Share show_cache_info() in CAT and CMT tests.
5. Define long_mask, cbm_mask, count_of_bits etc as static variables.
v4:
- Address various comments from Shuah Khan:
1. Combine a few patches e.g. a couple of fixing typos patches into one
and a couple of unmounting patches into one etc.
2. Add config file.
3. Remove "Fixes" tags.
4. Change strcmp() to strncmp().
5. Move the global variable fixing patch to the patch 1 so that the
compilation issue is fixed first.
Please note:
- I didn't move the patch of renaming CQM to CMT to the end of the series
because code and commit messages in a few other patches depend on the
new term of "CMT". If move the renaming patch to the end, the previous
patches use the old "CQM" term and code which will be changed soon at
the end of series and will cause more code and explanations.
[v3: https://lkml.org/lkml/2020/10/28/137]
v3:
Address various comments (commit messages, return value on test failure,
print failure info on test failure etc) from Reinette and Tony.
[v2: https://lore.kernel.org/linux-kselftest/cover.1589835155.git.sai.praneeth.p…]
v2:
1. Dropped changes to CAT test and CMT test as they will be posted in a later
series.
2. Added several other fixes
[v1: https://lore.kernel.org/linux-kselftest/cover.1583657204.git.sai.praneeth.p…]
Fenghua Yu (19):
selftests/resctrl: Enable gcc checks to detect buffer overflows
selftests/resctrl: Fix compilation issues for global variables
selftests/resctrl: Fix compilation issues for other global variables
selftests/resctrl: Clean up resctrl features check
selftests/resctrl: Fix missing options "-n" and "-p"
selftests/resctrl: Rename CQM test as CMT test
selftests/resctrl: Call kselftest APIs to log test results
selftests/resctrl: Share show_cache_info() by CAT and CMT tests
selftests/resctrl: Add config dependencies
selftests/resctrl: Check for resctrl mount point only if resctrl FS is
supported
selftests/resctrl: Use resctrl/info for feature detection
selftests/resctrl: Fix MBA/MBM results reporting format
selftests/resctrl: Don't hard code value of "no_of_bits" variable
selftests/resctrl: Modularize resctrl test suite main() function
selftests/resctrl: Skip the test if requested resctrl feature is not
supported
selftests/resctrl: Fix unmount resctrl FS
selftests/resctrl: Fix incorrect parsing of iMC counters
selftests/resctrl: Fix checking for < 0 for unsigned values
selftests/resctrl: Create .gitignore to include resctrl_tests
Reinette Chatre (2):
selftests/resctrl: Ensure sibling CPU is not same as original CPU
selftests/resctrl: Fix a printed message
tools/testing/selftests/resctrl/.gitignore | 2 +
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/README | 4 +-
tools/testing/selftests/resctrl/cache.c | 52 +++++-
tools/testing/selftests/resctrl/cat_test.c | 57 ++----
.../resctrl/{cqm_test.c => cmt_test.c} | 75 +++-----
tools/testing/selftests/resctrl/config | 2 +
tools/testing/selftests/resctrl/fill_buf.c | 4 +-
tools/testing/selftests/resctrl/mba_test.c | 43 ++---
tools/testing/selftests/resctrl/mbm_test.c | 42 ++---
tools/testing/selftests/resctrl/resctrl.h | 29 +++-
.../testing/selftests/resctrl/resctrl_tests.c | 163 ++++++++++++------
tools/testing/selftests/resctrl/resctrl_val.c | 95 ++++++----
tools/testing/selftests/resctrl/resctrlfs.c | 134 ++++++++------
14 files changed, 408 insertions(+), 296 deletions(-)
create mode 100644 tools/testing/selftests/resctrl/.gitignore
rename tools/testing/selftests/resctrl/{cqm_test.c => cmt_test.c} (56%)
create mode 100644 tools/testing/selftests/resctrl/config
--
2.30.1
This series aims to clarify the behavior of
KVM_GET_EMULATED_CPUID and KVM_GET_SUPPORTED
ioctls, and fix a corner case where the nent field of the
struct kvm_cpuid2 is matching the amount of entries that kvm returns.
Patch 1 proposes the nent field fix to cpuid.c,
patch 2 updates the ioctl documentation accordingly and
patches 3 and 4 provide a selftest to check KVM_GET_EMULATED_CPUID
accordingly.
Emanuele Giuseppe Esposito (4):
kvm: cpuid: adjust the returned nent field of kvm_cpuid2 for
KVM_GET_SUPPORTED_CPUID and KVM_GET_EMULATED_CPUID
Documentation: kvm: update KVM_GET_EMULATED_CPUID ioctl description
selftests: add kvm_get_emulated_cpuid
selftests: kvm: add get_emulated_cpuid test
Documentation/virt/kvm/api.rst | 10 +-
arch/x86/kvm/cpuid.c | 6 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 33 ++++
.../selftests/kvm/x86_64/get_emulated_cpuid.c | 183 ++++++++++++++++++
7 files changed, 229 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/get_emulated_cpuid.c
--
2.30.2
This series aims to clarify the behavior of
KVM_GET_EMULATED_CPUID and KVM_GET_SUPPORTED
ioctls, and fix a corner case where the nent field of the
struct kvm_cpuid2 is matching the amount of entries that kvm returns.
Patch 1 proposes the nent field fix to cpuid.c,
patch 2 updates the ioctl documentation accordingly and
patches 3 and 4 provide a selftest to check KVM_GET_EMULATED_CPUID
accordingly.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit(a)redhat.com>
---
v2:
- better fix in cpuid.c, perform the nent check after the switch statement
- fix bug in get_emulated_cpuid.c selftest, each entry needs to have at least
the padding zeroed otherwise it fails.
Emanuele Giuseppe Esposito (4):
kvm: cpuid: adjust the returned nent field of kvm_cpuid2 for
KVM_GET_SUPPORTED_CPUID and KVM_GET_EMULATED_CPUID
Documentation: kvm: update KVM_GET_EMULATED_CPUID ioctl description
selftests: add kvm_get_emulated_cpuid
selftests: kvm: add get_emulated_cpuid test
Documentation/virt/kvm/api.rst | 10 +-
arch/x86/kvm/cpuid.c | 35 ++--
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 33 +++
.../selftests/kvm/x86_64/get_emulated_cpuid.c | 198 ++++++++++++++++++
7 files changed, 256 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/get_emulated_cpuid.c
--
2.30.2
From: Ira Weiny <ira.weiny(a)intel.com>
Introduce a new page protection mechanism for supervisor pages, Protection Key
Supervisor (PKS).
Generally PKS enables protections on 'domains' of supervisor pages to limit
supervisor mode access to pages beyond the normal paging protections. PKS
works in a similar fashion to user space pkeys, PKU. As with PKU, supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.
Also like PKU, a page mapping is assigned to a domain by setting pkey bits in
the page table entry for that mapping.
Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.
XSAVE is not supported for the PKRS MSR. Therefore the implementation
saves/restores the MSR across context switches and during exceptions. Nested
exceptions are supported by each exception getting a new PKS state.
For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.
Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
CPU instance.
The following are key attributes of PKS.
1) Fast switching of permissions
1a) Prevents access without page table manipulations
1b) No TLB flushes required
2) Works on a per thread basis
PKS is available with 4 and 5 level paging. Like PKRU it consumes 4 bits from
the PTE to store the pkey within the entry.
All code to support PKS is configured via ARCH_ENABLE_SUPERVISOR_PKEYS which
is designed to only be turned on when a user is configured on in the kernel.
Those users must depend on ARCH_HAS_SUPERVISOR_PKEYS to properly work with
other architectures which do not yet support PKS.
Originally this series was submitted as part of a large patch set which
converted the kmap call sites.[1]
Many follow on discussions revealed a few problems. The first of which was
that some callers leak a kmap mapping across threads rather than containing it
to a critical section. Attempts were made to see if these 'global kmaps' could
be supported.[2] However, supporting global kmaps had many problems. Work is
being done in parallel on converting as many kmap calls to the new
kmap_local_page().[3]
Changes from V4 [5]
From kernel test robot <lkp(a)intel.com>
Fix i386 build: pks_init_task not found
Move MSR_IA32_PKRS and INIT_PKRS_VALUE into patch 5 where they are
first 'used'. (Technically nothing is 'used' until the final
test patch. But review wise this is much cleaner.)
From Sean Christoperson
Add documentation details on what happens if the pkey is violated
Change cpu_feature_enabled to be in WARN_ON check
Clean up commit message of patch 6
Fix some checkpatch errors
[1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/
[2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/
[3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/
[4] https://lore.kernel.org/lkml/20201106232908.364581-1-ira.weiny@intel.com/
[5] https://lore.kernel.org/lkml/20210322053020.2287058-1-ira.weiny@intel.com/
Fenghua Yu (1):
x86/pks: Add PKS kernel API
Ira Weiny (9):
x86/pkeys: Create pkeys_common.h
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Add additional PKEY helper macros
x86/pks: Add PKS defines and Kconfig options
x86/pks: Add PKS setup code
x86/fault: Adjust WARN_ON for PKey fault
x86/pks: Preserve the PKRS MSR on context switch
x86/entry: Preserve PKRS MSR across exceptions
x86/pks: Add PKS test code
Documentation/core-api/protection-keys.rst | 112 +++-
arch/x86/Kconfig | 1 +
arch/x86/entry/calling.h | 26 +
arch/x86/entry/common.c | 58 ++
arch/x86/entry/entry_64.S | 22 +-
arch/x86/entry/entry_64_compat.S | 6 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 10 +-
arch/x86/include/asm/pgtable_types.h | 12 +
arch/x86/include/asm/pkeys.h | 4 +
arch/x86/include/asm/pkeys_common.h | 34 +
arch/x86/include/asm/pks.h | 54 ++
arch/x86/include/asm/processor-flags.h | 2 +
arch/x86/include/asm/processor.h | 47 +-
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/fpu/xstate.c | 22 +-
arch/x86/kernel/head_64.S | 7 +-
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/process_64.c | 2 +
arch/x86/mm/fault.c | 31 +-
arch/x86/mm/pkeys.c | 218 +++++-
include/linux/pgtable.h | 4 +
include/linux/pkeys.h | 34 +
kernel/entry/common.c | 14 +-
lib/Kconfig.debug | 11 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 693 ++++++++++++++++++++
mm/Kconfig | 5 +
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 150 +++++
34 files changed, 1528 insertions(+), 77 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_common.h
create mode 100644 arch/x86/include/asm/pks.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c
--
2.28.0.rc0.12.gb6a658bd00c9
From: Mike Rapoport <rppt(a)linux.ibm.com>
Yuri Norov says:
If parameter size is the same for native and compat ABIs, we may
wire a syscall made by compat client to native handler. This is
true for unsigned int, but not true for unsigned long or pointer.
That's why I suggest using unsigned int and so avoid creating compat
entry point.
Use unsigned int as the type of the flags parameter in memfd_secret()
system call.
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
---
@Andrew,
The patch is vs v5.12-rc5-mmots-2021-03-30-23, I'd appreciate if it would
be added as a fixup to the memfd_secret series.
include/linux/syscalls.h | 2 +-
mm/secretmem.c | 2 +-
tools/testing/selftests/vm/memfd_secret.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 49c93c906893..1a1b5d724497 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -1050,7 +1050,7 @@ asmlinkage long sys_landlock_create_ruleset(const struct landlock_ruleset_attr _
asmlinkage long sys_landlock_add_rule(int ruleset_fd, enum landlock_rule_type rule_type,
const void __user *rule_attr, __u32 flags);
asmlinkage long sys_landlock_restrict_self(int ruleset_fd, __u32 flags);
-asmlinkage long sys_memfd_secret(unsigned long flags);
+asmlinkage long sys_memfd_secret(unsigned int flags);
/*
* Architecture-specific system calls
diff --git a/mm/secretmem.c b/mm/secretmem.c
index f2ae3f32a193..3b1ba3991964 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -199,7 +199,7 @@ static struct file *secretmem_file_create(unsigned long flags)
return file;
}
-SYSCALL_DEFINE1(memfd_secret, unsigned long, flags)
+SYSCALL_DEFINE1(memfd_secret, unsigned int, flags)
{
struct file *file;
int fd, err;
diff --git a/tools/testing/selftests/vm/memfd_secret.c b/tools/testing/selftests/vm/memfd_secret.c
index c878c2b841fc..2462f52e9c96 100644
--- a/tools/testing/selftests/vm/memfd_secret.c
+++ b/tools/testing/selftests/vm/memfd_secret.c
@@ -38,7 +38,7 @@ static unsigned long page_size;
static unsigned long mlock_limit_cur;
static unsigned long mlock_limit_max;
-static int memfd_secret(unsigned long flags)
+static int memfd_secret(unsigned int flags)
{
return syscall(__NR_memfd_secret, flags);
}
--
2.28.0
The perf subsystem today unifies various tracing and monitoring
features, from both software and hardware. One benefit of the perf
subsystem is automatically inheriting events to child tasks, which
enables process-wide events monitoring with low overheads. By default
perf events are non-intrusive, not affecting behaviour of the tasks
being monitored.
For certain use-cases, however, it makes sense to leverage the
generality of the perf events subsystem and optionally allow the tasks
being monitored to receive signals on events they are interested in.
This patch series adds the option to synchronously signal user space on
events.
To better support process-wide synchronous self-monitoring, without
events propagating to children that do not share the current process's
shared environment, two pre-requisite patches are added to optionally
restrict inheritance to CLONE_THREAD, and remove events on exec (without
affecting the parent).
Examples how to use these features can be found in the tests added at
the end of the series. In addition to the tests added, the series has
also been subjected to syzkaller fuzzing (focus on 'kernel/events/'
coverage).
Motivation and Example Uses
---------------------------
1. Our immediate motivation is low-overhead sampling-based race
detection for user space [1]. By using perf_event_open() at
process initialization, we can create hardware
breakpoint/watchpoint events that are propagated automatically
to all threads in a process. As far as we are aware, today no
existing kernel facility (such as ptrace) allows us to set up
process-wide watchpoints with minimal overheads (that are
comparable to mprotect() of whole pages).
2. Other low-overhead error detectors that rely on detecting
accesses to certain memory locations or code, process-wide and
also only in a specific set of subtasks or threads.
[1] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
Other ideas for use-cases we found interesting, but should only
illustrate the range of potential to further motivate the utility (we're
sure there are more):
3. Code hot patching without full stop-the-world. Specifically, by
setting a code breakpoint to entry to the patched routine, then
send signals to threads and check that they are not in the
routine, but without stopping them further. If any of the
threads will enter the routine, it will receive SIGTRAP and
pause.
4. Safepoints without mprotect(). Some Java implementations use
"load from a known memory location" as a safepoint. When threads
need to be stopped, the page containing the location is
mprotect()ed and threads get a signal. This could be replaced with
a watchpoint, which does not require a whole page nor DTLB
shootdowns.
5. Threads receiving signals on performance events to
throttle/unthrottle themselves.
6. Tracking data flow globally.
Changelog
---------
v3:
* Add patch "perf: Rework perf_event_exit_event()" to beginning of
series, courtesy of Peter Zijlstra.
* Rework "perf: Add support for event removal on exec" based on
the added "perf: Rework perf_event_exit_event()".
* Fix kselftests to work with more recent libc, due to the way it forces
using the kernel's own siginfo_t.
* Add basic perf-tool built-in test.
v2/RFC: https://lkml.kernel.org/r/20210310104139.679618-1-elver@google.com
* Patch "Support only inheriting events if cloned with CLONE_THREAD"
added to series.
* Patch "Add support for event removal on exec" added to series.
* Patch "Add kselftest for process-wide sigtrap handling" added to
series.
* Patch "Add kselftest for remove_on_exec" added to series.
* Implicitly restrict inheriting events if sigtrap, but the child was
cloned with CLONE_CLEAR_SIGHAND, because it is not generally safe if
the child cleared all signal handlers to continue sending SIGTRAP.
* Various minor fixes (see details in patches).
v1/RFC: https://lkml.kernel.org/r/20210223143426.2412737-1-elver@google.com
Pre-series: The discussion at [2] led to the changes in this series. The
approach taken in "Add support for SIGTRAP on perf events" to trigger
the signal was suggested by Peter Zijlstra in [3].
[2] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZ…
[3] https://lore.kernel.org/lkml/YBv3rAT566k+6zjg@hirez.programming.kicks-ass.n…
Marco Elver (10):
perf: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
perf: Support only inheriting events if cloned with CLONE_THREAD
perf: Add support for event removal on exec
signal: Introduce TRAP_PERF si_code and si_perf to siginfo
perf: Add support for SIGTRAP on perf events
perf: Add breakpoint information to siginfo on SIGTRAP
selftests/perf_events: Add kselftest for process-wide sigtrap handling
selftests/perf_events: Add kselftest for remove_on_exec
tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
perf test: Add basic stress test for sigtrap handling
Peter Zijlstra (1):
perf: Rework perf_event_exit_event()
arch/m68k/kernel/signal.c | 3 +
arch/x86/kernel/signal_compat.c | 5 +-
fs/signalfd.c | 4 +
include/linux/compat.h | 2 +
include/linux/perf_event.h | 6 +-
include/linux/signal.h | 1 +
include/uapi/asm-generic/siginfo.h | 6 +-
include/uapi/linux/perf_event.h | 5 +-
include/uapi/linux/signalfd.h | 4 +-
kernel/events/core.c | 297 +++++++++++++-----
kernel/fork.c | 2 +-
kernel/signal.c | 11 +
tools/include/uapi/linux/perf_event.h | 5 +-
tools/perf/tests/Build | 1 +
tools/perf/tests/builtin-test.c | 5 +
tools/perf/tests/sigtrap.c | 148 +++++++++
tools/perf/tests/tests.h | 1 +
.../testing/selftests/perf_events/.gitignore | 3 +
tools/testing/selftests/perf_events/Makefile | 6 +
tools/testing/selftests/perf_events/config | 1 +
.../selftests/perf_events/remove_on_exec.c | 260 +++++++++++++++
tools/testing/selftests/perf_events/settings | 1 +
.../selftests/perf_events/sigtrap_threads.c | 206 ++++++++++++
23 files changed, 896 insertions(+), 87 deletions(-)
create mode 100644 tools/perf/tests/sigtrap.c
create mode 100644 tools/testing/selftests/perf_events/.gitignore
create mode 100644 tools/testing/selftests/perf_events/Makefile
create mode 100644 tools/testing/selftests/perf_events/config
create mode 100644 tools/testing/selftests/perf_events/remove_on_exec.c
create mode 100644 tools/testing/selftests/perf_events/settings
create mode 100644 tools/testing/selftests/perf_events/sigtrap_threads.c
--
2.31.0.291.g576ba9dcdaf-goog
Previously, we shared too much of the code with COPY and ZEROPAGE, so we
manipulated things in various invalid ways:
- Previously, we unconditionally called shmem_inode_acct_block. In the
continue case, we're looking up an existing page which would have been
accounted for properly when it was allocated. So doing it twice
results in double-counting, and eventually leaking.
- Previously, we made the pte writable whenever the VMA was writable.
However, for continue, consider this case:
1. A tmpfs file was created
2. The non-UFFD-registered side mmap()-s with MAP_SHARED
3. The UFFD-registered side mmap()-s with MAP_PRIVATE
In this case, even though the UFFD-registered VMA may be writable, we
still want CoW behavior. So, check for this case and don't make the
pte writable.
- The initial pgoff / max_off check isn't necessary, so we can skip past
it. The second one seems likely to be unnecessary too, but keep it
just in case. Modify both checks to use pgoff, as offset is equivalent
and not needed.
- Previously, we unconditionally called ClearPageDirty() in the error
path. In the continue case though, since this is an existing page, it
might have already been dirty before we started touching it. It's very
problematic to clear the bit incorrectly, but not a problem to leave
it - so, just omit the ClearPageDirty() entirely.
- Previously, we unconditionally removed the page from the page cache in
the error path. But in the continue case, we didn't add it - it was
already there because the page is present in some second
(non-UFFD-registered) mapping. So, removing it is invalid.
Because the error handling issues are easy to exercise in the selftest,
make a small modification there to do so.
Finally, refactor shmem_mcopy_atomic_pte a bit. By this point, we've
added a lot of "if (!is_continue)"-s everywhere. It's cleaner to just
check for that mode first thing, and then "goto" down to where the parts
we actually want are. This leaves the code in between cleaner.
Changes since v2:
- Drop the ClearPageDirty() entirely, instead of trying to remember the
old value.
- Modify both pgoff / max_off checks to use pgoff. It's equivalent to
offset, but offset wasn't initialized until the first check (which
we're skipping).
- Keep the second pgoff / max_off check in the continue case.
Changes since v1:
- Refactor to skip ahead with goto, instead of adding several more
"if (!is_continue)".
- Fix unconditional ClearPageDirty().
- Don't pte_mkwrite() when is_continue && !VM_SHARED.
Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
mm/shmem.c | 60 +++++++++++++-----------
tools/testing/selftests/vm/userfaultfd.c | 12 +++++
2 files changed, 44 insertions(+), 28 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d2e0e81b7d2e..fbcce850a16e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2377,18 +2377,22 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
struct page *page;
pte_t _dst_pte, *dst_pte;
int ret;
- pgoff_t offset, max_off;
-
- ret = -ENOMEM;
- if (!shmem_inode_acct_block(inode, 1))
- goto out;
+ pgoff_t max_off;
+ int writable;
if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, pgoff);
if (!page)
- goto out_unacct_blocks;
- } else if (!*pagep) {
+ goto out;
+ goto install_ptes;
+ }
+
+ ret = -ENOMEM;
+ if (!shmem_inode_acct_block(inode, 1))
+ goto out;
+
+ if (!*pagep) {
page = shmem_alloc_page(gfp, info, pgoff);
if (!page)
goto out_unacct_blocks;
@@ -2415,30 +2419,29 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
*pagep = NULL;
}
- if (!is_continue) {
- VM_BUG_ON(PageSwapBacked(page));
- VM_BUG_ON(PageLocked(page));
- __SetPageLocked(page);
- __SetPageSwapBacked(page);
- __SetPageUptodate(page);
- }
+ VM_BUG_ON(PageSwapBacked(page));
+ VM_BUG_ON(PageLocked(page));
+ __SetPageLocked(page);
+ __SetPageSwapBacked(page);
+ __SetPageUptodate(page);
ret = -EFAULT;
- offset = linear_page_index(dst_vma, dst_addr);
max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
- if (unlikely(offset >= max_off))
+ if (unlikely(pgoff >= max_off))
goto out_release;
- /* If page wasn't already in the page cache, add it. */
- if (!is_continue) {
- ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
- gfp & GFP_RECLAIM_MASK, dst_mm);
- if (ret)
- goto out_release;
- }
+ ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
+ gfp & GFP_RECLAIM_MASK, dst_mm);
+ if (ret)
+ goto out_release;
+install_ptes:
_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
- if (dst_vma->vm_flags & VM_WRITE)
+ /* For CONTINUE on a non-shared VMA, don't pte_mkwrite for CoW. */
+ writable = is_continue && !(dst_vma->vm_flags & VM_SHARED)
+ ? 0
+ : dst_vma->vm_flags & VM_WRITE;
+ if (writable)
_dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
else {
/*
@@ -2455,7 +2458,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
ret = -EFAULT;
max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
- if (unlikely(offset >= max_off))
+ if (unlikely(pgoff >= max_off))
goto out_release_unlock;
ret = -EEXIST;
@@ -2485,13 +2488,14 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
return ret;
out_release_unlock:
pte_unmap_unlock(dst_pte, ptl);
- ClearPageDirty(page);
- delete_from_page_cache(page);
+ if (!is_continue)
+ delete_from_page_cache(page);
out_release:
unlock_page(page);
put_page(page);
out_unacct_blocks:
- shmem_inode_unacct_blocks(inode, 1);
+ if (!is_continue)
+ shmem_inode_unacct_blocks(inode, 1);
goto out;
}
#endif /* CONFIG_USERFAULTFD */
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index f6c86b036d0f..d8541a59dae5 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -485,6 +485,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
static void continue_range(int ufd, __u64 start, __u64 len)
{
struct uffdio_continue req;
+ int ret;
req.range.start = start;
req.range.len = len;
@@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
if (ioctl(ufd, UFFDIO_CONTINUE, &req))
err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
(uint64_t)start);
+
+ /*
+ * Error handling within the kernel for continue is subtly different
+ * from copy or zeropage, so it may be a source of bugs. Trigger an
+ * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+ */
+ req.mapped = 0;
+ ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
+ if (ret >= 0 || req.mapped != -EEXIST)
+ err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, mapped=%" PRId64,
+ ret, req.mapped);
}
static void *locking_thread(void *arg)
--
2.31.0.291.g576ba9dcdaf-goog
Good Day Sir/Ms,
We are please to invite you or your company to quote the
following item listed below:
Product/Model No: A702TH FYNE PRESSURE REGULATOR
Model Number: A702TH
Qty. 30 units
Compulsory,Kindly send your quotation to:
quotation(a)pfizerbvsupply.com
for immediate approval.
Kind Regards,
Albert Bourla
PFIZER B.V Supply Chain Manager
Tel: +31(0)208080 880
ADDRESS: Rivium Westlaan 142, 2909 LD
Capelle aan den IJssel, Netherlands
From: Ira Weiny <ira.weiny(a)intel.com>
Introduce a new page protection mechanism for supervisor pages, Protection Key
Supervisor (PKS).
Generally PKS enables protections on 'domains' of supervisor pages to limit
supervisor mode access to pages beyond the normal paging protections. PKS
works in a similar fashion to user space pkeys, PKU. As with PKU, supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.
Also like PKU, a page mapping is assigned to a domain by setting pkey bits in
the page table entry for that mapping.
Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.
XSAVE is not supported for the PKRS MSR. Therefore the implementation
saves/restores the MSR across context switches and during exceptions. Nested
exceptions are supported by each exception getting a new PKS state.
For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.
Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
CPU instance.
The following are key attributes of PKS.
1) Fast switching of permissions
1a) Prevents access without page table manipulations
1b) No TLB flushes required
2) Works on a per thread basis
PKS is available with 4 and 5 level paging. Like PKRU it consumes 4 bits from
the PTE to store the pkey within the entry.
All code to support PKS is configured via ARCH_ENABLE_SUPERVISOR_PKEYS which
is designed to only be turned on when a user is configured on in the kernel.
Those users must depend on ARCH_HAS_SUPERVISOR_PKEYS to properly work with
other architectures which do not yet support PKS.
Originally this series was submitted as part of a large patch set which
converted the kmap call sites.[1]
Many follow on discussions revealed a few problems. The first of which was
that some callers leak a kmap mapping across threads rather than containing it
to a critical section. Attempts were made to see if these 'global kmaps' could
be supported.[2] However, supporting global kmaps had many problems. Work is
being done in parallel on converting as many kmap calls to the new
kmap_local_page().[3]
Changes from V3 [4]
Add ARCH_ENABLE_SUPERVISOR_PKEYS config which is selected by kernel
users to add the functionality to the core. However, they should only
select this if ARCH_HAS_SUPERVISOR_PKEYS is available.
Clean up test code for context switching
Adjust for extended_pt_regs
Reduce output unless --debug is specified
Address internal review comments from Dan Williams and Dave Hansen
Help with macros and assembly coding
Change names of various functions
Clean up documentation
Move all #ifdefery into header files.
Clean up cover letter.
Make extended_pt_regs handling a macro rather than coding
around every call to C
Add macross for PKS shift/mask
New patch : x86/pks: Add additional PKEY helper macros
Preserve pkrs_cache as static when PKS_TEST is not configured
Remove unnecessary pr_* prints
Clarify pks_key_alloc flags parameter
Change CONFIG_PKS_TESTING to CONFIG_PKS_TEST
Clean up test code separation from main code in fault.c
Remove module boilerplate from test code
Clean up all commit messages
Address comments from Thomas Gleixner
Provide a warning and fallback to no protection if a global
mapping is requested.
Fix context switch. Fix where pks_sched_in() is called.
Fix test to actually do a context switch
Remove unecessary noinstr's
From Andy Lutomirski
Use extended_pt_regs idea to stash pks values on the stack
Drop patches 5/10 and 7/10
And use extended_pt_regs to print pkey info on fault
Adjust tests
Comments from Randy Dunlap:
Fix gramatical errors in doc
Clean up kernel docs
Rebase to 5.12
[1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/
[2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/
[3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/
[4] https://lore.kernel.org/lkml/20201106232908.364581-1-ira.weiny@intel.com/
</proposed cover letter>
Fenghua Yu (1):
x86/pks: Add PKS kernel API
Ira Weiny (9):
x86/pkeys: Create pkeys_common.h
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Add additional PKEY helper macros
x86/pks: Add PKS defines and Kconfig options
x86/pks: Add PKS setup code
x86/fault: Adjust WARN_ON for PKey fault
x86/pks: Preserve the PKRS MSR on context switch
x86/entry: Preserve PKRS MSR across exceptions
x86/pks: Add PKS test code
Documentation/core-api/protection-keys.rst | 111 +++-
arch/x86/Kconfig | 1 +
arch/x86/entry/calling.h | 26 +
arch/x86/entry/common.c | 58 ++
arch/x86/entry/entry_64.S | 22 +-
arch/x86/entry/entry_64_compat.S | 6 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 10 +-
arch/x86/include/asm/pgtable_types.h | 12 +
arch/x86/include/asm/pkeys.h | 4 +
arch/x86/include/asm/pkeys_common.h | 34 +
arch/x86/include/asm/pks.h | 54 ++
arch/x86/include/asm/processor-flags.h | 2 +
arch/x86/include/asm/processor.h | 43 +-
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/fpu/xstate.c | 22 +-
arch/x86/kernel/head_64.S | 7 +-
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/process_64.c | 2 +
arch/x86/mm/fault.c | 27 +-
arch/x86/mm/pkeys.c | 218 +++++-
include/linux/pgtable.h | 4 +
include/linux/pkeys.h | 34 +
kernel/entry/common.c | 14 +-
lib/Kconfig.debug | 11 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 693 ++++++++++++++++++++
mm/Kconfig | 5 +
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 150 +++++
34 files changed, 1519 insertions(+), 77 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_common.h
create mode 100644 arch/x86/include/asm/pks.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c
--
2.28.0.rc0.12.gb6a658bd00c9
If a signed number field starts with a '-' the field width must be > 1,
or unlimited, to allow at least one digit after the '-'.
This patch adds a check for this. If a signed field starts with '-'
and field_width == 1 the scanf will quit.
It is ok for a signed number field to have a field width of 1 if it
starts with a digit. In that case the single digit can be converted.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
lib/vsprintf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 41ddc353ebb8..f78651e9b030 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3466,8 +3466,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
--
2.20.1
Previously, we shared too much of the code with COPY and ZEROPAGE, so we
manipulated things in various invalid ways:
- Previously, we unconditionally called shmem_inode_acct_block. In the
continue case, we're looking up an existing page which would have been
accounted for properly when it was allocated. So doing it twice
results in double-counting, and eventually leaking.
- Previously, we made the pte writable whenever the VMA was writable.
However, for continue, consider this case:
1. A tmpfs file was created
2. The non-UFFD-registered side mmap()-s with MAP_SHARED
3. The UFFD-registered side mmap()-s with MAP_PRIVATE
In this case, even though the UFFD-registered VMA may be writable, we
still want CoW behavior. So, check for this case and don't make the
pte writable.
- The offset / max_off checking doesn't necessarily hurt anything, but
it's not needed in the CONTINUE case, so skip it.
- Previously, we unconditionally called ClearPageDirty() in the error
path. In the continue case though, since this is an existing page, it
might have already been dirty before we started touching it. So,
remember whether or not it was dirty before we set_page_dirty(), and
only clear the bit if it wasn't dirty before.
- Previously, we unconditionally removed the page from the page cache in
the error path. But in the continue case, we didn't add it - it was
already there because the page is present in some second
(non-UFFD-registered) mapping. So, removing it is invalid.
Because the error handling issues are easy to exercise in the selftest,
make a small modification there to do so.
Finally, refactor shmem_mcopy_atomic_pte a bit. By this point, we've
added a lot of "if (!is_continue)"-s everywhere. It's cleaner to just
check for that mode first thing, and then "goto" down to where the parts
we actually want are. This leaves the code in between cleaner.
Changes since v1:
- Refactor to skip ahead with goto, instead of adding several more
"if (!is_continue)".
- Fix unconditional ClearPageDirty().
- Don't pte_mkwrite() when is_continue && !VM_SHARED.
Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
mm/shmem.c | 67 ++++++++++++++----------
tools/testing/selftests/vm/userfaultfd.c | 12 +++++
2 files changed, 51 insertions(+), 28 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d2e0e81b7d2e..8ab1f1f29987 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2378,17 +2378,22 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
pte_t _dst_pte, *dst_pte;
int ret;
pgoff_t offset, max_off;
-
- ret = -ENOMEM;
- if (!shmem_inode_acct_block(inode, 1))
- goto out;
+ int writable;
+ bool was_dirty;
if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, pgoff);
if (!page)
- goto out_unacct_blocks;
- } else if (!*pagep) {
+ goto out;
+ goto install_ptes;
+ }
+
+ ret = -ENOMEM;
+ if (!shmem_inode_acct_block(inode, 1))
+ goto out;
+
+ if (!*pagep) {
page = shmem_alloc_page(gfp, info, pgoff);
if (!page)
goto out_unacct_blocks;
@@ -2415,13 +2420,11 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
*pagep = NULL;
}
- if (!is_continue) {
- VM_BUG_ON(PageSwapBacked(page));
- VM_BUG_ON(PageLocked(page));
- __SetPageLocked(page);
- __SetPageSwapBacked(page);
- __SetPageUptodate(page);
- }
+ VM_BUG_ON(PageSwapBacked(page));
+ VM_BUG_ON(PageLocked(page));
+ __SetPageLocked(page);
+ __SetPageSwapBacked(page);
+ __SetPageUptodate(page);
ret = -EFAULT;
offset = linear_page_index(dst_vma, dst_addr);
@@ -2429,16 +2432,18 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
if (unlikely(offset >= max_off))
goto out_release;
- /* If page wasn't already in the page cache, add it. */
- if (!is_continue) {
- ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
- gfp & GFP_RECLAIM_MASK, dst_mm);
- if (ret)
- goto out_release;
- }
+ ret = shmem_add_to_page_cache(page, mapping, pgoff, NULL,
+ gfp & GFP_RECLAIM_MASK, dst_mm);
+ if (ret)
+ goto out_release;
+install_ptes:
_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
- if (dst_vma->vm_flags & VM_WRITE)
+ /* For CONTINUE on a non-shared VMA, don't pte_mkwrite for CoW. */
+ writable = is_continue && !(dst_vma->vm_flags & VM_SHARED)
+ ? 0
+ : dst_vma->vm_flags & VM_WRITE;
+ if (writable)
_dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
else {
/*
@@ -2448,15 +2453,18 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
* unconditionally before unlock_page(), but doing it
* only if VM_WRITE is not set is faster.
*/
+ was_dirty = PageDirty(page);
set_page_dirty(page);
}
dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
- ret = -EFAULT;
- max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
- if (unlikely(offset >= max_off))
- goto out_release_unlock;
+ if (!is_continue) {
+ ret = -EFAULT;
+ max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+ if (unlikely(offset >= max_off))
+ goto out_release_unlock;
+ }
ret = -EEXIST;
if (!pte_none(*dst_pte))
@@ -2485,13 +2493,16 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
return ret;
out_release_unlock:
pte_unmap_unlock(dst_pte, ptl);
- ClearPageDirty(page);
- delete_from_page_cache(page);
+ if (!was_dirty)
+ ClearPageDirty(page);
+ if (!is_continue)
+ delete_from_page_cache(page);
out_release:
unlock_page(page);
put_page(page);
out_unacct_blocks:
- shmem_inode_unacct_blocks(inode, 1);
+ if (!is_continue)
+ shmem_inode_unacct_blocks(inode, 1);
goto out;
}
#endif /* CONFIG_USERFAULTFD */
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index f6c86b036d0f..d8541a59dae5 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -485,6 +485,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
static void continue_range(int ufd, __u64 start, __u64 len)
{
struct uffdio_continue req;
+ int ret;
req.range.start = start;
req.range.len = len;
@@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
if (ioctl(ufd, UFFDIO_CONTINUE, &req))
err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
(uint64_t)start);
+
+ /*
+ * Error handling within the kernel for continue is subtly different
+ * from copy or zeropage, so it may be a source of bugs. Trigger an
+ * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+ */
+ req.mapped = 0;
+ ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
+ if (ret >= 0 || req.mapped != -EEXIST)
+ err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, mapped=%" PRId64,
+ ret, req.mapped);
}
static void *locking_thread(void *arg)
--
2.31.0.291.g576ba9dcdaf-goog
From: Andre Przywara <andre.przywara(a)arm.com>
[ Upstream commit 7011d72588d16a9e5f5d85acbc8b10019809599c ]
The "First Fault Register" (FFR) is an SVE register that mimics a
predicate register, but clears bits when a load or store fails to handle
an element of a vector. The supposed usage scenario is to initialise
this register (using SETFFR), then *read* it later on to learn about
elements that failed to load or store. Explicit writes to this register
using the WRFFR instruction are only supposed to *restore* values
previously read from the register (for context-switching only).
As the manual describes, this register holds only certain values, it:
"... contains a monotonic predicate value, in which starting from bit 0
there are zero or more 1 bits, followed only by 0 bits in any remaining
bit positions."
Any other value is UNPREDICTABLE and is not supposed to be "restored"
into the register.
The SVE test currently tries to write a signature pattern into the
register, which is *not* a canonical FFR value. Apparently the existing
setups treat UNPREDICTABLE as "read-as-written", but a new
implementation actually only stores canonical values. As a consequence,
the sve-test fails immediately when comparing the FFR value:
-----------
# ./sve-test
Vector length: 128 bits
PID: 207
Mismatch: PID=207, iteration=0, reg=48
Expected [cf00]
Got [0f00]
Aborted
-----------
Fix this by only populating the FFR with proper canonical values.
Effectively the requirement described above limits us to 17 unique
values over 16 bits worth of FFR, so we condense our signature down to 4
bits (2 bits from the PID, 2 bits from the generation) and generate the
canonical pattern from it. Any bits describing elements above the
minimum 128 bit are set to 0.
This aligns the FFR usage to the architecture and fixes the test on
microarchitectures implementing FFR in a more restricted way.
Signed-off-by: Andre Przywara <andre.przywara(a)arm.com>
Reviwed-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20210319120128.29452-1-andre.przywara@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-test.S | 22 ++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index f95074c9b48b..07f14e279a90 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -284,16 +284,28 @@ endfunction
// Set up test pattern in the FFR
// x0: pid
// x2: generation
+//
+// We need to generate a canonical FFR value, which consists of a number of
+// low "1" bits, followed by a number of zeros. This gives us 17 unique values
+// per 16 bits of FFR, so we create a 4 bit signature out of the PID and
+// generation, and use that as the initial number of ones in the pattern.
+// We fill the upper lanes of FFR with zeros.
// Beware: corrupts P0.
function setup_ffr
mov x4, x30
- bl pattern
+ and w0, w0, #0x3
+ bfi w0, w2, #2, #2
+ mov w1, #1
+ lsl w1, w1, w0
+ sub w1, w1, #1
+
ldr x0, =ffrref
- ldr x1, =scratch
- rdvl x2, #1
- lsr x2, x2, #3
- bl memcpy
+ strh w1, [x0], 2
+ rdvl x1, #1
+ lsr x1, x1, #3
+ sub x1, x1, #2
+ bl memclr
mov x0, #0
ldr x1, =ffrref
--
2.30.1
From: David Gow <davidgow(a)google.com>
[ Upstream commit 7421b1a4d10c633ca5f14c8236d3e2c1de07e52b ]
The first argument to namedtuple() should match the name of the type,
which wasn't the case for KconfigEntryBase.
Fixing this is enough to make mypy show no python typing errors again.
Fixes 97752c39bd ("kunit: kunit_tool: Allow .kunitconfig to disable config items")
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
Acked-by: Brendan Higgins <brendanhiggins(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/kunit_config.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py
index 02ffc3a3e5dc..b30e9d6db6b4 100644
--- a/tools/testing/kunit/kunit_config.py
+++ b/tools/testing/kunit/kunit_config.py
@@ -12,7 +12,7 @@ import re
CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_(\w+) is not set$'
CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+|".*")$'
-KconfigEntryBase = collections.namedtuple('KconfigEntry', ['name', 'value'])
+KconfigEntryBase = collections.namedtuple('KconfigEntryBase', ['name', 'value'])
class KconfigEntry(KconfigEntryBase):
--
2.30.1
From: Andre Przywara <andre.przywara(a)arm.com>
[ Upstream commit 7011d72588d16a9e5f5d85acbc8b10019809599c ]
The "First Fault Register" (FFR) is an SVE register that mimics a
predicate register, but clears bits when a load or store fails to handle
an element of a vector. The supposed usage scenario is to initialise
this register (using SETFFR), then *read* it later on to learn about
elements that failed to load or store. Explicit writes to this register
using the WRFFR instruction are only supposed to *restore* values
previously read from the register (for context-switching only).
As the manual describes, this register holds only certain values, it:
"... contains a monotonic predicate value, in which starting from bit 0
there are zero or more 1 bits, followed only by 0 bits in any remaining
bit positions."
Any other value is UNPREDICTABLE and is not supposed to be "restored"
into the register.
The SVE test currently tries to write a signature pattern into the
register, which is *not* a canonical FFR value. Apparently the existing
setups treat UNPREDICTABLE as "read-as-written", but a new
implementation actually only stores canonical values. As a consequence,
the sve-test fails immediately when comparing the FFR value:
-----------
# ./sve-test
Vector length: 128 bits
PID: 207
Mismatch: PID=207, iteration=0, reg=48
Expected [cf00]
Got [0f00]
Aborted
-----------
Fix this by only populating the FFR with proper canonical values.
Effectively the requirement described above limits us to 17 unique
values over 16 bits worth of FFR, so we condense our signature down to 4
bits (2 bits from the PID, 2 bits from the generation) and generate the
canonical pattern from it. Any bits describing elements above the
minimum 128 bit are set to 0.
This aligns the FFR usage to the architecture and fixes the test on
microarchitectures implementing FFR in a more restricted way.
Signed-off-by: Andre Przywara <andre.przywara(a)arm.com>
Reviwed-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20210319120128.29452-1-andre.przywara@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-test.S | 22 ++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-test.S b/tools/testing/selftests/arm64/fp/sve-test.S
index 9210691aa998..e3e08d9c7020 100644
--- a/tools/testing/selftests/arm64/fp/sve-test.S
+++ b/tools/testing/selftests/arm64/fp/sve-test.S
@@ -284,16 +284,28 @@ endfunction
// Set up test pattern in the FFR
// x0: pid
// x2: generation
+//
+// We need to generate a canonical FFR value, which consists of a number of
+// low "1" bits, followed by a number of zeros. This gives us 17 unique values
+// per 16 bits of FFR, so we create a 4 bit signature out of the PID and
+// generation, and use that as the initial number of ones in the pattern.
+// We fill the upper lanes of FFR with zeros.
// Beware: corrupts P0.
function setup_ffr
mov x4, x30
- bl pattern
+ and w0, w0, #0x3
+ bfi w0, w2, #2, #2
+ mov w1, #1
+ lsl w1, w1, w0
+ sub w1, w1, #1
+
ldr x0, =ffrref
- ldr x1, =scratch
- rdvl x2, #1
- lsr x2, x2, #3
- bl memcpy
+ strh w1, [x0], 2
+ rdvl x1, #1
+ lsr x1, x1, #3
+ sub x1, x1, #2
+ bl memclr
mov x0, #0
ldr x1, =ffrref
--
2.30.1
From: David Gow <davidgow(a)google.com>
[ Upstream commit 7421b1a4d10c633ca5f14c8236d3e2c1de07e52b ]
The first argument to namedtuple() should match the name of the type,
which wasn't the case for KconfigEntryBase.
Fixing this is enough to make mypy show no python typing errors again.
Fixes 97752c39bd ("kunit: kunit_tool: Allow .kunitconfig to disable config items")
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
Acked-by: Brendan Higgins <brendanhiggins(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/kunit_config.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py
index bdd60230764b..27fe086d2d0d 100644
--- a/tools/testing/kunit/kunit_config.py
+++ b/tools/testing/kunit/kunit_config.py
@@ -13,7 +13,7 @@ from typing import List, Set
CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_(\w+) is not set$'
CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+|".*")$'
-KconfigEntryBase = collections.namedtuple('KconfigEntry', ['name', 'value'])
+KconfigEntryBase = collections.namedtuple('KconfigEntryBase', ['name', 'value'])
class KconfigEntry(KconfigEntryBase):
--
2.30.1
If a signed number field starts with a '-' the field width must be > 1,
or unlimited, to allow at least one digit after the '-'.
This patch adds a check for this. If a signed field starts with '-'
and field_width == 1 the scanf will quit.
It is ok for a signed number field to have a field width of 1 if it
starts with a digit. In that case the single digit can be converted.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
lib/vsprintf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 41ddc353ebb8..f78651e9b030 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3466,8 +3466,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
--
2.20.1
Hi,
This v5 series can mainly include two parts.
Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
In the first part, all the known hugetlb backing src types specified
with different hugepage sizes are listed, so that we can specify use
of hugetlb source of the exact granularity that we want, instead of
the system default ones. And as all the known hugetlb page sizes are
listed, it's appropriate for all architectures. Besides, a helper that
can get granularity of different backing src types(anonumous/thp/hugetlb)
is added, so that we can use the accurate backing src granularity for
kinds of alignment or guest memory accessing of vcpus.
In the second part, a new test is added:
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm. And the following explains
what we can exactly do through this test.
The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.
If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.
So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.
When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
---
Change logs:
v4->v5:
- Use synchronization(sem_wait) for time measurement
- Add a new patch about TEST_ASSERT(patch 4)
- Address Andrew Jones's comments for v4 series
- Add Andrew Jones's R-b tags in some patches
- v4: https://lore.kernel.org/lkml/20210302125751.19080-1-wangyanan55@huawei.com/
v3->v4:
- Add a helper to get system default hugetlb page size
- Add tags of Reviewed-by of Ben in the patches
- v3: https://lore.kernel.org/lkml/20210301065916.11484-1-wangyanan55@huawei.com/
v2->v3:
- Add tags of Suggested-by, Reviewed-by in the patches
- Add a generic micro to get hugetlb page sizes
- Some changes for suggestions about v2 series
- v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyanan55@huawei.com/
v1->v2:
- Add a patch to sync header files
- Add helpers to get granularity of different backing src types
- Some changes for suggestions about v1 series
- v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
---
Yanan Wang (10):
tools headers: sync headers of asm-generic/hugetlb_encode.h
tools headers: Add a macro to get HUGETLB page sizes for mmap
KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing
KVM: selftests: Print the errno besides error-string in TEST_ASSERT
KVM: selftests: Make a generic helper to get vm guest mode strings
KVM: selftests: Add a helper to get system configured THP page size
KVM: selftests: Add a helper to get system default hugetlb page size
KVM: selftests: List all hugetlb src types specified with page sizes
KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
KVM: selftests: Add a test for kvm page table code
include/uapi/linux/mman.h | 2 +
tools/include/asm-generic/hugetlb_encode.h | 3 +
tools/include/uapi/linux/mman.h | 2 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/demand_paging_test.c | 8 +-
.../selftests/kvm/dirty_log_perf_test.c | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +-
.../testing/selftests/kvm/include/test_util.h | 21 +-
.../selftests/kvm/kvm_page_table_test.c | 512 ++++++++++++++++++
tools/testing/selftests/kvm/lib/assert.c | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 59 +-
tools/testing/selftests/kvm/lib/test_util.c | 163 +++++-
tools/testing/selftests/kvm/steal_time.c | 4 +-
14 files changed, 739 insertions(+), 61 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.23.0
Previously, in the error path, we unconditionally removed the page from
the page cache. But in the continue case, we didn't add it - it was
already there because the page is used by a second (non-UFFD-registered)
mapping. So, in that case, it's incorrect to remove it as the other
mapping may still use it normally.
For this error handling failure, trivially exercise it in the
userfaultfd selftest, to detect this kind of bug in the future.
Also, we previously were unconditionally calling shmem_inode_acct_block.
In the continue case, however, this is incorrect, because we would have
already accounted for the RAM usage when the page was originally
allocated (since at this point it's already in the page cache). So,
doing it in the continue case causes us to double-count.
Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
mm/shmem.c | 15 ++++++++++-----
tools/testing/selftests/vm/userfaultfd.c | 12 ++++++++++++
2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index d2e0e81b7d2e..5ac8ea737004 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2379,9 +2379,11 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
int ret;
pgoff_t offset, max_off;
- ret = -ENOMEM;
- if (!shmem_inode_acct_block(inode, 1))
- goto out;
+ if (!is_continue) {
+ ret = -ENOMEM;
+ if (!shmem_inode_acct_block(inode, 1))
+ goto out;
+ }
if (is_continue) {
ret = -EFAULT;
@@ -2389,6 +2391,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
if (!page)
goto out_unacct_blocks;
} else if (!*pagep) {
+ ret = -ENOMEM;
page = shmem_alloc_page(gfp, info, pgoff);
if (!page)
goto out_unacct_blocks;
@@ -2486,12 +2489,14 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
out_release_unlock:
pte_unmap_unlock(dst_pte, ptl);
ClearPageDirty(page);
- delete_from_page_cache(page);
+ if (!is_continue)
+ delete_from_page_cache(page);
out_release:
unlock_page(page);
put_page(page);
out_unacct_blocks:
- shmem_inode_unacct_blocks(inode, 1);
+ if (!is_continue)
+ shmem_inode_unacct_blocks(inode, 1);
goto out;
}
#endif /* CONFIG_USERFAULTFD */
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index f6c86b036d0f..d8541a59dae5 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -485,6 +485,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp)
static void continue_range(int ufd, __u64 start, __u64 len)
{
struct uffdio_continue req;
+ int ret;
req.range.start = start;
req.range.len = len;
@@ -493,6 +494,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
if (ioctl(ufd, UFFDIO_CONTINUE, &req))
err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
(uint64_t)start);
+
+ /*
+ * Error handling within the kernel for continue is subtly different
+ * from copy or zeropage, so it may be a source of bugs. Trigger an
+ * error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+ */
+ req.mapped = 0;
+ ret = ioctl(ufd, UFFDIO_CONTINUE, &req);
+ if (ret >= 0 || req.mapped != -EEXIST)
+ err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, mapped=%" PRId64,
+ ret, req.mapped);
}
static void *locking_thread(void *arg)
--
2.31.0.291.g576ba9dcdaf-goog
From: Colin Ian King <colin.king(a)canonical.com>
There is a spelling mistake in a comment. Fix it.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
tools/testing/selftests/timers/clocksource-switch.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/timers/clocksource-switch.c b/tools/testing/selftests/timers/clocksource-switch.c
index bfc974b4572d..2d66abd877e6 100644
--- a/tools/testing/selftests/timers/clocksource-switch.c
+++ b/tools/testing/selftests/timers/clocksource-switch.c
@@ -3,7 +3,7 @@
* (C) Copyright IBM 2012
* Licensed under the GPLv2
*
- * NOTE: This is a meta-test which quickly changes the clocksourc and
+ * NOTE: This is a meta-test which quickly changes the clocksource and
* then uses other tests to detect problems. Thus this test requires
* that the inconsistency-check and nanosleep tests be present in the
* same directory it is run from.
--
2.30.2
Currently the following command produces an error message:
linux# make kselftest TARGETS=bpf O=/mnt/linux-build
# selftests: bpf: test_libbpf.sh
# ./test_libbpf.sh: line 23: ./test_libbpf_open: No such file or directory
# test_libbpf: failed at file test_l4lb.o
# selftests: test_libbpf [FAILED]
The error message might not affect the return code of make, therefore
one needs to grep make output in order to detect it.
This is not the only instance of the same underlying problem; any test
with more than one element in $(TEST_PROGS) fails the same way. Another
example:
linux# make O=/mnt/linux-build TARGETS=splice kselftest
[...]
# ./short_splice_read.sh: 15: ./splice_read: not found
# FAIL: /sys/module/test_module/sections/.init.text 2
not ok 2 selftests: splice: short_splice_read.sh # exit=1
The current logic prepends $(OUTPUT) only to the first member of
$(TEST_PROGS). After that, run_one() does
cd `dirname $TEST`
For all tests except the first one, `dirname $TEST` is ., which means
they cannot access the files generated in $(OUTPUT).
Fix by using $(addprefix) to prepend $(OUTPUT)/ to each member of
$(TEST_PROGS).
Fixes: 1a940687e424 ("selftests: lib.mk: copy test scripts and test files for make O=dir run")
Signed-off-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
---
v1->v2:
- Append / to $(OUTPUT).
- Use $(addprefix) instead of $(foreach).
v2->v3:
- Split the patch in two.
- Improve the commit message.
v3: https://lore.kernel.org/linux-kselftest/20191024121347.22189-1-iii@linux.ib…
v3->v4:
- Drop the first patch.
- Add a note regarding make return code to the commit message.
v4: https://lore.kernel.org/linux-kselftest/20191115150428.61131-1-iii@linux.ib…
v4->v5:
- Add another reproducer to the commit message.
tools/testing/selftests/lib.mk | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index a5ce26d548e4..be17462fe146 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -74,7 +74,8 @@ ifdef building_out_of_srctree
rsync -aq $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(OUTPUT); \
fi
@if [ "X$(TEST_PROGS)" != "X" ]; then \
- $(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(OUTPUT)/$(TEST_PROGS)) ; \
+ $(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) \
+ $(addprefix $(OUTPUT)/,$(TEST_PROGS))) ; \
else \
$(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS)); \
fi
--
2.29.2
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
Knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v6" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…
Version 2
https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Version 3
https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/
Version 4
https://lore.kernel.org/lkml/20210227150956.6022-1-john.wood@gmx.com/
Version 5
https://lore.kernel.org/kernel-hardening/20210227153013.6747-1-john.wood@gm…
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Changelog v4 -> v5
------------------
- Fix some typos (Randy Dunlap).
Changelog v5 -> v6
------------------
- Fix a reported deadlock (kernel test robot).
- Add high level details to the documentation (Andi Kleen).
Any constructive comments are welcome.
Thanks.
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 278 ++++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1107 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2219 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1