Hello
At first, I thought that the proposed system call is capable of
reading *multiple* small files using a single system call - which
would help increase HDD/SSD queue utilization and increase IOPS (I/O
operations per second) - but that isn't the case and the proposed
system call can read just a single file.
Without the ability to read multiple small files using a single system
call, it is impossible to increase IOPS (unless an application is
using multiple reader threads or somehow instructs the kernel to
prefetch multiple files into memory).
While you are at it, why not also add a readfiles system call to read
multiple, presumably small, files? The initial unoptimized
implementation of readfiles syscall can simply call readfile
sequentially.
Sincerely
Jan (atomsymbol)
With procfs v3.3.16, the sysctl command doesn't print the set key and
value on error. This change breaks livepatch selftest test-ftrace.sh,
that tests the interaction of sysctl ftrace_enabled:
Make it work with all sysctl versions using '-q' option.
Explicitly print the final status on success so that it can be verified
in the log. The error message is enough on failure.
Reported-by: Kamalesh Babulal <kamalesh(a)linux.vnet.ibm.com>
Signed-off-by: Petr Mladek <pmladek(a)suse.com>
---
The patch has been created against livepatch.git,
branch for-5.9/selftests-cleanup. But it applies also against
the current Linus' tree.
tools/testing/selftests/livepatch/functions.sh | 3 ++-
tools/testing/selftests/livepatch/test-ftrace.sh | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index 408529d94ddb..1aba83c87ad3 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -75,7 +75,8 @@ function set_dynamic_debug() {
}
function set_ftrace_enabled() {
- result=$(sysctl kernel.ftrace_enabled="$1" 2>&1 | paste --serial --delimiters=' ')
+ result=$(sysctl -q kernel.ftrace_enabled="$1" 2>&1 && \
+ sysctl kernel.ftrace_enabled 2>&1)
echo "livepatch: $result" > /dev/kmsg
}
diff --git a/tools/testing/selftests/livepatch/test-ftrace.sh b/tools/testing/selftests/livepatch/test-ftrace.sh
index 9160c9ec3b6f..552e165512f4 100755
--- a/tools/testing/selftests/livepatch/test-ftrace.sh
+++ b/tools/testing/selftests/livepatch/test-ftrace.sh
@@ -51,7 +51,7 @@ livepatch: '$MOD_LIVEPATCH': initializing patching transition
livepatch: '$MOD_LIVEPATCH': starting patching transition
livepatch: '$MOD_LIVEPATCH': completing patching transition
livepatch: '$MOD_LIVEPATCH': patching complete
-livepatch: sysctl: setting key \"kernel.ftrace_enabled\": Device or resource busy kernel.ftrace_enabled = 0
+livepatch: sysctl: setting key \"kernel.ftrace_enabled\": Device or resource busy
% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH/enabled
livepatch: '$MOD_LIVEPATCH': initializing unpatching transition
livepatch: '$MOD_LIVEPATCH': starting unpatching transition
--
2.26.2
During setup():
...
for ns in h0 r1 h1 h2 h3
do
create_ns ${ns}
done
...
while in cleanup():
...
for n in h1 r1 h2 h3 h4
do
ip netns del ${n} 2>/dev/null
done
...
and after removing the stderr redirection in cleanup():
$ sudo ./fib_nexthop_multiprefix.sh
...
TEST: IPv4: host 0 to host 3, mtu 1400 [ OK ]
TEST: IPv6: host 0 to host 3, mtu 1400 [ OK ]
Cannot remove namespace file "/run/netns/h4": No such file or directory
$ echo $?
1
and a non-zero return code, make kselftests fail (even if the test
itself is fine):
...
not ok 34 selftests: net: fib_nexthop_multiprefix.sh # exit=1
...
Signed-off-by: Paolo Pisati <paolo.pisati(a)canonical.com>
---
tools/testing/selftests/net/fib_nexthop_multiprefix.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/fib_nexthop_multiprefix.sh b/tools/testing/selftests/net/fib_nexthop_multiprefix.sh
index 9dc35a16e415..51df5e305855 100755
--- a/tools/testing/selftests/net/fib_nexthop_multiprefix.sh
+++ b/tools/testing/selftests/net/fib_nexthop_multiprefix.sh
@@ -144,7 +144,7 @@ setup()
cleanup()
{
- for n in h1 r1 h2 h3 h4
+ for n in h0 r1 h1 h2 h3
do
ip netns del ${n} 2>/dev/null
done
--
2.25.1
Apparently we haven't run the unit tests for kunit_tool in a while and
consequently some things have broken. This patchset fixes those issues.
Brendan Higgins (2):
kunit: tool: fix broken default args in unit tests
kunit: tool: fix improper treatment of file location
tools/testing/kunit/kunit.py | 24 ------------------------
tools/testing/kunit/kunit_tool_test.py | 14 +++++++-------
2 files changed, 7 insertions(+), 31 deletions(-)
base-commit: a581387e415bbb0085e7e67906c8f4a99746590e
--
2.27.0.389.gc38d7665816-goog
From: Ira Weiny <ira.weiny(a)intel.com>
This RFC series has been reviewed by Dave Hansen.
This patch set introduces a new page protection mechanism for supervisor pages,
Protection Key Supervisor (PKS) and an initial user of them, persistent memory,
PMEM.
PKS enables protections on 'domains' of supervisor pages to limit supervisor
mode access to those pages beyond the normal paging protections. They work in
a similar fashion to user space pkeys. Like User page pkeys (PKU), supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.
A page mapping is assigned to a domain by setting a pkey in the page table
entry.
Unlike User pkeys no new instructions are added; rather WRMSR/RDMSR are used to
update the PKRS register.
XSAVE is not supported for the PKRS MSR. To reduce software complexity the
implementation saves/restores the MSR across context switches but not during
irqs. This is a compromise which results is a hardening of unwanted access
without absolute restriction.
For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.
Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
arch and/or CPU instance.
Protecting against stray writes is particularly important for PMEM because,
unlike writes to anonymous memory, writes to PMEM persists across a reboot.
Thus data corruption could result in permanent loss of data.
The following attributes of PKS makes it perfect as a mechanism to protect PMEM
from stray access within the kernel:
1) Fast switching of permissions
2) Prevents access without page table manipulations
3) Works on a per thread basis
4) No TLB flushes required
The second half of this series thus uses the PKS mechanism to protect PMEM from
stray access.
Implementation details
----------------------
Modifications of task struct in patches:
(x86/pks: Preserve the PKRS MSR on context switch)
(memremap: Add zone device access protection)
Because pkey access is per-thread 2 modifications are made to the task struct.
The first is a saved copy of the MSR during context switches. The second
reference counts access to the device domain to correctly handle kmap nesting
properly.
Maintain PKS setting in a re-entrant manner in patch:
(memremap: Add zone device access protection)
Using local_irq_save() seems to be the safest and fastest way to maintain kmap
as re-entrant. But there may be a better way. spin_lock_irq() and atomic
counters were considered. But atomic counters do not properly protect the pkey
update and spin_lock_irq() is unnecessary as the pkey protections are thread
local. Suggestions are welcome.
The use of kmap in patch:
(kmap: Add stray write protection for device pages)
To keep general access to PMEM pages general, we piggy back on the kmap()
interface as there are many places in the kernel who do not have, nor should be
required to have, a priori knowledge that a page is PMEM. The modifications to
the kmap code is careful to quickly determine which pages don't require special
handling to reduce overhead for non PMEM pages.
Breakdown of patches
--------------------
Implement PKS within x86 arch:
x86/pkeys: Create pkeys_internal.h
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Enable Protection Keys Supervisor (PKS)
x86/pks: Preserve the PKRS MSR on context switch
x86/pks: Add PKS kernel API
x86/pks: Add a debugfs file for allocated PKS keys
Documentation/pkeys: Update documentation for kernel pkeys
x86/pks: Add PKS Test code
pre-req bug fixes for dax:
fs/dax: Remove unused size parameter
drivers/dax: Expand lock scope to cover the use of addresses
Add stray write protection to PMEM:
memremap: Add zone device access protection
kmap: Add stray write protection for device pages
dax: Stray write protection for dax_direct_access()
nvdimm/pmem: Stray write protection for pmem->virt_addr
[dax|pmem]: Enable stray write protection
Fenghua Yu (4):
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Enable Protection Keys Supervisor (PKS)
x86/pks: Add PKS kernel API
x86/pks: Add a debugfs file for allocated PKS keys
Ira Weiny (11):
x86/pkeys: Create pkeys_internal.h
x86/pks: Preserve the PKRS MSR on context switch
Documentation/pkeys: Update documentation for kernel pkeys
x86/pks: Add PKS Test code
fs/dax: Remove unused size parameter
drivers/dax: Expand lock scope to cover the use of addresses
memremap: Add zone device access protection
kmap: Add stray write protection for device pages
dax: Stray write protection for dax_direct_access()
nvdimm/pmem: Stray write protection for pmem->virt_addr
[dax|pmem]: Enable stray write protection
Documentation/core-api/protection-keys.rst | 81 +++-
arch/x86/Kconfig | 1 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 13 +-
arch/x86/include/asm/pgtable_types.h | 4 +
arch/x86/include/asm/pkeys.h | 43 ++
arch/x86/include/asm/pkeys_internal.h | 35 ++
arch/x86/include/asm/processor.h | 13 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 17 +
arch/x86/kernel/fpu/xstate.c | 17 +-
arch/x86/kernel/process.c | 35 ++
arch/x86/mm/fault.c | 16 +-
arch/x86/mm/pkeys.c | 174 +++++++-
drivers/dax/device.c | 2 +
drivers/dax/super.c | 5 +-
drivers/nvdimm/pmem.c | 6 +
fs/dax.c | 13 +-
include/linux/highmem.h | 32 +-
include/linux/memremap.h | 1 +
include/linux/mm.h | 33 ++
include/linux/pkeys.h | 18 +
include/linux/sched.h | 3 +
init/init_task.c | 3 +
kernel/fork.c | 3 +
lib/Kconfig.debug | 12 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 452 ++++++++++++++++++++
mm/Kconfig | 15 +
mm/memremap.c | 111 +++++
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 65 +++
34 files changed, 1175 insertions(+), 61 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_internal.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c
--
2.25.1
When the selftest "step" counter grew beyond 255, non-fatal warnings
were being emitted, which is noisy and pointless. There are selftests
with more than 255 steps (especially those in loops, etc). Instead,
just cap "steps" to 254 and do not report the saturation.
Reported-by: Ralph Campbell <rcampbell(a)nvidia.com>
Tested-by: Ralph Campbell <rcampbell(a)nvidia.com>
Fixes: 9847d24af95c ("selftests/harness: Refactor XFAIL into SKIP")
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
tools/testing/selftests/kselftest_harness.h | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 935029d4fb21..4f78e4805633 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -680,7 +680,8 @@
__bail(_assert, _metadata->no_print, _metadata->step))
#define __INC_STEP(_metadata) \
- if (_metadata->passed && _metadata->step < 255) \
+ /* Keep "step" below 255 (which is used for "SKIP" reporting). */ \
+ if (_metadata->passed && _metadata->step < 253) \
_metadata->step++;
#define is_signed_type(var) (!!(((__typeof__(var))(-1)) < (__typeof__(var))1))
@@ -976,12 +977,6 @@ void __run_test(struct __fixture_metadata *f,
t->passed = 0;
} else if (t->pid == 0) {
t->fn(t, variant);
- /* Make sure step doesn't get lost in reporting */
- if (t->step >= 255) {
- ksft_print_msg("Too many test steps (%u)!?\n", t->step);
- t->step = 254;
- }
- /* Use 255 for SKIP */
if (t->skip)
_exit(255);
/* Pass is exit 0 */
--
2.25.1
--
Kees Cook
BusyBox diff doesn't support the GNU diff '--LTYPE-line-format' options
that were used in the selftests to filter older kernel log messages from
dmesg output.
Use "comm" which is more available in smaller boot environments.
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Signed-off-by: Joe Lawrence <joe.lawrence(a)redhat.com>
---
based-on: livepatching.git/for-5.9/selftests-cleanup
merge-thru: livepatching.git
tools/testing/selftests/livepatch/functions.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index 36648ca367c2..408529d94ddb 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -277,7 +277,7 @@ function check_result {
# help differentiate repeated testing runs. Remove them with a
# post-comparison sed filter.
- result=$(dmesg | diff --changed-group-format='%>' --unchanged-group-format='' "$SAVED_DMESG" - | \
+ result=$(dmesg | comm -13 "$SAVED_DMESG" - | \
grep -e 'livepatch:' -e 'test_klp' | \
grep -v '\(tainting\|taints\) kernel' | \
sed 's/^\[[ 0-9.]*\] //')
--
2.21.3
On Thu, Jul 09, 2020 at 09:27:43AM -0700, Andy Lutomirski wrote:
> On Thu, Jul 9, 2020 at 9:22 AM Dave Hansen <dave.hansen(a)intel.com> wrote:
> >
> > On 7/9/20 9:07 AM, Andy Lutomirski wrote:
> > > On Thu, Jul 9, 2020 at 8:56 AM Dave Hansen <dave.hansen(a)intel.com> wrote:
> > >> On 7/9/20 8:44 AM, Andersen, John wrote:
> > >>> Bits which are allowed to be pinned default to WP for CR0 and SMEP,
> > >>> SMAP, and UMIP for CR4.
> > >> I think it also makes sense to have FSGSBASE in this set.
> > >>
> > >> I know it hasn't been tested, but I think we should do the legwork to
> > >> test it. If not in this set, can we agree that it's a logical next step?
> > > I have no objection to pinning FSGSBASE, but is there a clear
> > > description of the threat model that this whole series is meant to
> > > address? The idea is to provide a degree of protection against an
> > > attacker who is able to convince a guest kernel to write something
> > > inappropriate to CR4, right? How realistic is this?
> >
> > If a quick search can find this:
> >
> > > https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-…
> >
> > I'd pretty confident that the guys doing actual bad things have it in
> > their toolbox too.
> >
>
> True, but we have the existing software CR4 pinning. I suppose the
> virtualization version is stronger.
>
Yes, as Kees said this will be stronger because it stops ROP and other gadget
based techniques which avoid the use of native_write_cr0/4().
With regards to what should be done in this patchset and what in other
patchsets. I have a fix for kexec thanks to Arvind's note about
TRAMPOLINE_32BIT_CODE_SIZE. The physical host boots fine now and the virtual
one can kexec fine.
What remains to be done on that front is to add some identifying information to
the kernel image to declare that it supports paravirtualized control register
pinning or not.
Liran suggested adding a section to the built image acting as a flag to signify
support for being kexec'd by a kernel with pinning enabled. If anyone has any
opinions on how they'd like to see this implemented please let me know.
Otherwise I'll just take a stab at it and you'll all see it hopefully in the
next version.
With regards to FSGSBASE, are we open to validating and adding that to the
DEFAULT set as a part of a separate patchset? This patchset is focused on
replicating the functionality we already have natively.
Hi,
v2:
- switch harness from XFAIL to SKIP
- pass skip reason from test into TAP output
- add acks/reviews
v1: https://lore.kernel.org/lkml/20200611224028.3275174-1-keescook@chromium.org/
I finally got around to converting the kselftest_harness.h API to actually
use the kselftest.h API so all the tools using it can actually report
TAP correctly. As part of this, there are a bunch of related cleanups,
API updates, and additions.
Thanks!
-Kees
Kees Cook (8):
selftests/clone3: Reorder reporting output
selftests: Remove unneeded selftest API headers
selftests/binderfs: Fix harness API usage
selftests: Add header documentation and helpers
selftests/harness: Switch to TAP output
selftests/harness: Refactor XFAIL into SKIP
selftests/harness: Display signed values correctly
selftests/harness: Report skip reason
tools/testing/selftests/clone3/clone3.c | 2 +-
.../selftests/clone3/clone3_clear_sighand.c | 3 +-
.../testing/selftests/clone3/clone3_set_tid.c | 2 +-
.../filesystems/binderfs/binderfs_test.c | 284 +++++++++---------
tools/testing/selftests/kselftest.h | 78 ++++-
tools/testing/selftests/kselftest_harness.h | 169 ++++++++---
.../pid_namespace/regression_enomem.c | 1 -
.../selftests/pidfd/pidfd_getfd_test.c | 1 -
.../selftests/pidfd/pidfd_setns_test.c | 1 -
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 +-
.../selftests/uevent/uevent_filtering.c | 1 -
11 files changed, 356 insertions(+), 194 deletions(-)
--
2.25.1