+ linux-kselftest(a)vger.kernel.org
Le 20/07/2022 à 17:24, Matthias May a écrit :
> Hi
>
> I finally got around to do the previously mentioned selftest for gretap, vxlan
> and geneve.
> See the bash-script below.
>
> Many of the vxlan/geneve tests are currently failing, with gretap working on
> net-next
> because of the fixes i sent.
> What is the policy on sending selftests that are failing?
> Are fixes for the failures required in advance?
I don't know, I've added linux-kselftest(a)vger.kernel.org to the thread.
Regards,
Nicolas
>
> I'm not sure i can fix them.
> Geneve seems to ignore the 3 upper bits of the DSCP completely.
While creating a LSM BPF MAC policy to block user namespace creation, we
used the LSM cred_prepare hook because that is the closest hook to prevent
a call to create_user_ns().
The calls look something like this:
cred = prepare_creds()
security_prepare_creds()
call_int_hook(cred_prepare, ...
if (cred)
create_user_ns(cred)
We noticed that error codes were not propagated from this hook and
introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds()
is not appropriate for MAC policies, and instead the hook is
meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce
a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function
and create_user_ns LSM hook, then marks the hook as sleepable in BPF.
Links:
1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare…
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take
struct cred
- Move security_create_user_ns() call after id mapping check in
create_user_ns()
- Update documentation to reflect changes
Frederick Lawler (4):
security, lsm: Introduce security_create_user_ns()
bpf-lsm: Make bpf_lsm_create_user_ns() sleepable
selftests/bpf: Add tests verifying bpf lsm create_user_ns hook
selinux: Implement create_user_ns hook
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 6 ++
kernel/bpf/bpf_lsm.c | 1 +
kernel/user_namespace.c | 5 ++
security/security.c | 5 ++
security/selinux/hooks.c | 9 ++
security/selinux/include/classmap.h | 2 +
.../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++
.../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++
10 files changed, 160 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
--
2.30.2
The buffer used for verifying SVE Z registers allocated enough space for
16 maximally sized registers rather than 32 due to using the macro for the
number of P registers. In practice this didn't matter since for historical
reasons the maximum VQ defined in the ABI is greater the architectural
maximum so we will always allocate more space than is needed even with
emulated platforms implementing the architectural maximum. Still, we should
use the right define.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/abi/syscall-abi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/arm64/abi/syscall-abi.c b/tools/testing/selftests/arm64/abi/syscall-abi.c
index b632bfe9e022..95229fa73232 100644
--- a/tools/testing/selftests/arm64/abi/syscall-abi.c
+++ b/tools/testing/selftests/arm64/abi/syscall-abi.c
@@ -113,8 +113,8 @@ static int check_fpr(struct syscall_cfg *cfg, int sve_vl, int sme_vl,
}
static uint8_t z_zero[__SVE_ZREG_SIZE(SVE_VQ_MAX)];
-uint8_t z_in[SVE_NUM_PREGS * __SVE_ZREG_SIZE(SVE_VQ_MAX)];
-uint8_t z_out[SVE_NUM_PREGS * __SVE_ZREG_SIZE(SVE_VQ_MAX)];
+uint8_t z_in[SVE_NUM_ZREGS * __SVE_ZREG_SIZE(SVE_VQ_MAX)];
+uint8_t z_out[SVE_NUM_ZREGS * __SVE_ZREG_SIZE(SVE_VQ_MAX)];
static void setup_z(struct syscall_cfg *cfg, int sve_vl, int sme_vl,
uint64_t svcr)
--
2.30.2
Hi Paul,
as previously promised, here comes the nolibc update which introduces the
minimal self-test infrastructure that aims at being reasonably easy to
expand further.
It's based on your branch "dev.2022.06.30b" that contains the previous
minor fixes that aimed at addressing Linus' concerns about the build
process inconsistencies.
The way it works tries to mimmick as much as possible the regular build
process, so that it reuses the same ARCH, CC, CROSS_COMPILE to build the
test program, that will be embedded into an initramfs and the kernel is
(re)built with that initramfs. Then you can decide to run that kernel
under QEMU for the supported archs, and the output of the tests appears
in an output text file in a format that's easily greppable and diffable.
A single target "run" does everything.
By default it will reuse your existing .config (so that developers
continue to use their regular config handling), though it can also
create a known-to-work defconfig for each arch. The reason behind this
is that it took me a moment to figure certain defconfig + machine name
combinations and I found it better to put them there once for all.
I've successfully tested it on arm, arm64, i386, x86_64. riscv64 works
except two syscalls which return unexpected errors, and mips segfaults
in sbrk(). I don't know why yet, but this proves that it's worth having
such a test.
There are not that many tests yet (71), those that have to run can be
filtered either from the program's command line or from a NOLIBC_TEST
environment variable so that it's possible to skip broken ones or to
focus on a few ranges only.
Tests are numerically numbered, and are conveniently handled in a
switch/case statement so that a relative line number assigns the number
to the test. That's convenient because the vast majority of syscall tests
are one-liners. This sometimes slightly upsets check-patch when lines get
moderately long but that significantly improves legibility.
There are expectation for both successes and failures (e.g. -1 ENOTDIR).
I'm sure this can be improved later (and that's the goal). Right now it
covers two test families:
- syscalls
- stdlib (str* functions mostly)
I suspect that over time we might want to split syscalls into different
parts (e.g. core, fs, etc maybe) but I could be wrong.
The program can automatically modulate QEMU's return value on x86 when
QEMU is run with the appropriate options, but for now I'm not using it
as I felt like it didn't bring much value, and the output is more useful.
That's debatable, and maybe some might want to use it in bisect scripts
for example. It's too early to say IMHO.
Oh, I also arranged the code so that the test also builds with glibc. I
noticed that when adding a new test that fails, sometimes it's convenient
to see if it's the nolibc part that's broken or the test. I don't find
this critical but the required includes and ifdefs are there so that it
should be easy to maintain over time as well.
I'm obviously interested in comments, but really, I don't want to
overdesign something for a first step, it remains a very modest test
program and I'd like that it remains easy to hack on it and to contribute
new tests that are deemed useful.
I'm CCing the few who already contributed some patches and/or expressed
interest, as well as Linus who had a first bad experience when trying to
test it, hoping this one will be better. I'm pasting below [1] a copy of
a test on x86_64 below, that's summed up as "71 test(s) passed" at the
end of the "run" target.
If there's no objection, it would be nice to have this with your current
series, as it definitely helps spot and fix the bugs. In parallel I'll see
if I can figure the problems with the two tests that fail each on a
specific arch and I might possibly have a few extra fixes for the current
nolibc.
Thank you!
Willy
[1] example output
----8<----
Running test 'syscall'
0 getpid = 1 [OK]
1 getppid = 0 [OK]
5 getpgid_self = 0 [OK]
6 getpgid_bad = -1 ESRCH [OK]
7 kill_0 = 0 [OK]
8 kill_CONT = 0 [OK]
9 kill_BADPID = -1 ESRCH [OK]
10 sbrk = 0 [OK]
11 brk = 0 [OK]
12 chdir_root = 0 [OK]
13 chdir_dot = 0 [OK]
14 chdir_blah = -1 ENOENT [OK]
15 chmod_net = 0 [OK]
16 chmod_self = -1 EPERM [OK]
17 chown_self = -1 EPERM [OK]
18 chroot_root = 0 [OK]
19 chroot_blah = -1 ENOENT [OK]
20 chroot_exe = -1 ENOTDIR [OK]
21 close_m1 = -1 EBADF [OK]
22 close_dup = 0 [OK]
23 dup_0 = 3 [OK]
24 dup_m1 = -1 EBADF [OK]
25 dup2_0 = 100 [OK]
26 dup2_m1 = -1 EBADF [OK]
27 dup3_0 = 100 [OK]
28 dup3_m1 = -1 EBADF [OK]
29 execve_root = -1 EACCES [OK]
30 getdents64_root = 120 [OK]
31 getdents64_null = -1 ENOTDIR [OK]
32 gettimeofday_null = 0 [OK]
38 ioctl_tiocinq = 0 [OK]
39 ioctl_tiocinq = 0 [OK]
40 link_root1 = -1 EEXIST [OK]
41 link_blah = -1 ENOENT [OK]
42 link_dir = -1 EPERM [OK]
43 link_cross = -1 EXDEV [OK]
44 lseek_m1 = -1 EBADF [OK]
45 lseek_0 = -1 ESPIPE [OK]
46 mkdir_root = -1 EEXIST [OK]
47 open_tty = 3 [OK]
48 open_blah = -1 ENOENT [OK]
49 poll_null = 0 [OK]
50 poll_stdout = 1 [OK]
51 poll_fault = -1 EFAULT [OK]
52 read_badf = -1 EBADF [OK]
53 sched_yield = 0 [OK]
54 select_null = 0 [OK]
55 select_stdout = 1 [OK]
56 select_fault = -1 EFAULT [OK]
57 stat_blah = -1 ENOENT [OK]
58 stat_fault = -1 EFAULT [OK]
59 symlink_root = -1 EEXIST [OK]
60 unlink_root = -1 EISDIR [OK]
61 unlink_blah = -1 ENOENT [OK]
62 wait_child = -1 ECHILD [OK]
63 waitpid_min = -1 ESRCH [OK]
64 waitpid_child = -1 ECHILD [OK]
65 write_badf = -1 EBADF [OK]
66 write_zero = 0 [OK]
Errors during this test: 0
Running test 'stdlib'
0 getenv_TERM = <linux> [OK]
1 getenv_blah = <(null)> [OK]
2 setcmp_blah_blah = 0 [OK]
3 setcmp_blah_blah2 = -50 [OK]
4 setncmp_blah_blah = 0 [OK]
5 setncmp_blah_blah4 = 0 [OK]
6 setncmp_blah_blah5 = -53 [OK]
7 setncmp_blah_blah6 = -54 [OK]
8 strchr_foobar_o = <oobar> [OK]
9 strchr_foobar_z = <(null)> [OK]
10 strrchr_foobar_o = <obar> [OK]
11 strrchr_foobar_z = <(null)> [OK]
Errors during this test: 0
Total number of errors: 0
---->8----
--
Willy Tarreau (17):
tools/nolibc: make argc 32-bit in riscv startup code
tools/nolibc: fix build warning in sys_mmap() when my_syscall6 is not
defined
tools/nolibc: make sys_mmap() automatically use the right __NR_mmap
definition
selftests/nolibc: add basic infrastructure to ease creation of nolibc
tests
selftests/nolibc: support a test definition format
selftests/nolibc: implement a few tests for various syscalls
selftests/nolibc: add a few tests for some stdlib functions
selftests/nolibc: exit with poweroff on success when getpid() == 1
selftests/nolibc: on x86, support exiting with isa-debug-exit
selftests/nolibc: recreate and populate /dev and /proc if missing
selftests/nolibc: condition some tests on /proc existence
selftests/nolibc: support glibc as well
selftests/nolibc: add a "kernel" target to build the kernel with the
initramfs
selftests/nolibc: add a "defconfig" target
selftests/nolibc: add a "run" target to start the kernel in QEMU
selftests/nolibc: "sysroot" target installs a local copy of the
sysroot
selftests/nolibc: add a "help" target
MAINTAINERS | 1 +
tools/include/nolibc/arch-riscv.h | 2 +-
tools/include/nolibc/sys.h | 4 +-
tools/testing/selftests/nolibc/Makefile | 135 ++++
tools/testing/selftests/nolibc/nolibc-test.c | 757 +++++++++++++++++++
5 files changed, 896 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/nolibc/Makefile
create mode 100644 tools/testing/selftests/nolibc/nolibc-test.c
--
2.17.5
On Tue, Jul 19, 2022 at 10:42 PM Karl MacMillan
<karl(a)bigbadwolfsecurity.com> wrote:
> On Thu, Jul 7, 2022 at 6:34 PM Frederick Lawler <fred(a)cloudflare.com> wrote:
>>
>> Unprivileged user namespace creation is an intended feature to enable
>> sandboxing, however this feature is often used to as an initial step to
>> perform a privilege escalation attack.
>>
>> This patch implements a new namespace { userns_create } access control
>> permission to restrict which domains allow or deny user namespace
>> creation. This is necessary for system administrators to quickly protect
>> their systems while waiting for vulnerability patches to be applied.
>>
>> This permission can be used in the following way:
>>
>> allow domA_t domB_t : namespace { userns_create };
>
>
> Isn’t this actually domA_t domA_t : namespace . . .
>
> I got confused reading this initially trying to figure out what the second domain type would be, but looking at the code cleared that up.
Ah, good catch, thanks Karl!
--
paul-moore.com