On Fri, Jul 12, 2024 at 5:44 PM Paul Moore paul@paul-moore.com wrote:
On Thu, Jul 11, 2024 at 7:13 AM Xu Kuohai xukuohai@huaweicloud.com wrote:
From: Xu Kuohai xukuohai@huawei.com
LSM BPF prog returning a positive number attached to the hook file_alloc_security makes kernel panic.
Here is a panic log:
[ 441.235774] BUG: kernel NULL pointer dereference, address: 00000000000009 [ 441.236748] #PF: supervisor write access in kernel mode [ 441.237429] #PF: error_code(0x0002) - not-present page [ 441.238119] PGD 800000000b02f067 P4D 800000000b02f067 PUD b031067 PMD 0 [ 441.238990] Oops: 0002 [#1] PREEMPT SMP PTI [ 441.239546] CPU: 0 PID: 347 Comm: loader Not tainted 6.8.0-rc6-gafe0cbf23373 #22 [ 441.240496] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b4 [ 441.241933] RIP: 0010:alloc_file+0x4b/0x190 [ 441.242485] Code: 8b 04 25 c0 3c 1f 00 48 8b b0 30 0c 00 00 e8 9c fe ff ff 48 3d 00 f0 ff fb [ 441.244820] RSP: 0018:ffffc90000c67c40 EFLAGS: 00010203 [ 441.245484] RAX: ffff888006a891a0 RBX: ffffffff8223bd00 RCX: 0000000035b08000 [ 441.246391] RDX: ffff88800b95f7b0 RSI: 00000000001fc110 RDI: f089cd0b8088ffff [ 441.247294] RBP: ffffc90000c67c58 R08: 0000000000000001 R09: 0000000000000001 [ 441.248209] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001 [ 441.249108] R13: ffffc90000c67c78 R14: ffffffff8223bd00 R15: fffffffffffffff4 [ 441.250007] FS: 00000000005f3300(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 441.251053] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 441.251788] CR2: 00000000000001a9 CR3: 000000000bdc4003 CR4: 0000000000170ef0 [ 441.252688] Call Trace: [ 441.253011] <TASK> [ 441.253296] ? __die+0x24/0x70 [ 441.253702] ? page_fault_oops+0x15b/0x480 [ 441.254236] ? fixup_exception+0x26/0x330 [ 441.254750] ? exc_page_fault+0x6d/0x1c0 [ 441.255257] ? asm_exc_page_fault+0x26/0x30 [ 441.255792] ? alloc_file+0x4b/0x190 [ 441.256257] alloc_file_pseudo+0x9f/0xf0 [ 441.256760] __anon_inode_getfile+0x87/0x190 [ 441.257311] ? lock_release+0x14e/0x3f0 [ 441.257808] bpf_link_prime+0xe8/0x1d0 [ 441.258315] bpf_tracing_prog_attach+0x311/0x570 [ 441.258916] ? __pfx_bpf_lsm_file_alloc_security+0x10/0x10 [ 441.259605] __sys_bpf+0x1bb7/0x2dc0 [ 441.260070] __x64_sys_bpf+0x20/0x30 [ 441.260533] do_syscall_64+0x72/0x140 [ 441.261004] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 441.261643] RIP: 0033:0x4b0349 [ 441.262045] Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 88 [ 441.264355] RSP: 002b:00007fff74daee38 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ 441.265293] RAX: ffffffffffffffda RBX: 00007fff74daef30 RCX: 00000000004b0349 [ 441.266187] RDX: 0000000000000040 RSI: 00007fff74daee50 RDI: 000000000000001c [ 441.267114] RBP: 000000000000001b R08: 00000000005ef820 R09: 0000000000000000 [ 441.268018] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 441.268907] R13: 0000000000000004 R14: 00000000005ef018 R15: 00000000004004e8
This is because the filesystem uses IS_ERR to check if the return value is an error code. If it is not, the filesystem takes the return value as a file pointer. Since the positive number returned by the BPF prog is not a real file pointer, this misinterpretation causes a panic.
Since other LSM modules always return either a negative error code or a valid pointer, this specific issue only exists in BPF LSM. The proposed solution is to reject LSM BPF progs returning unexpected values in the verifier. This patch set adds return value check to ensure only BPF progs returning expected values are accepted.
Since each LSM hook has different excepted return values, we need to know the expected return values for each individual hook to do the check. Earlier versions of the patch set used LSM hook annotations to specify the return value range for each hook. Based on Paul's suggestion, current version gets rid of such annotations and instead converts hook return values to a common pattern: return 0 on success and negative error code on failure.
Basically, LSM hooks are divided into two types: hooks that return a negative error code and zero or other values, and hooks that do not return a negative error code. This patch set converts all hooks of the first type and part of the second type to return 0 on success and a negative error code on failure (see patches 1-10). For certain hooks, like ismaclabel and inode_xattr_skipcap, the hook name already imply that returning 0 or 1 is the best choice, so they are not converted. There are four unconverted hooks. Except for ismaclabel, which is not used by BPF LSM, the other three are specified with a BTF ID list to only return 0 or 1.
Thank you for following up on your initial work with this patchset, Xu Kuohai. It doesn't look like I'm going to be able to finish my review by the end of the day today, so expect that a bit later, but so far I think most of the changes look good and provide a nice improvement :)
You should have my feedback now, let me know if you have any questions.
One additional comment I might make is that you may either want to wait until after v6.11-rc1 is released and I've had a chance to rebase the lsm/{dev,next} branches and merge the patchsets which are currently queued; there are a few patches queued up which will have an impact on this work. While it's an unstable branch, you can take a peek at those queues patches in the lsm/dev-staging branch.
https://github.com/LinuxSecurityModule/kernel/blob/main/README.md