On Fri, 22 Feb 2019 09:43:14 -0800 Linus Torvalds torvalds@linux-foundation.org wrote:
On Fri, Feb 22, 2019 at 12:35 AM Masami Hiramatsu mhiramat@kernel.org wrote:
Or, can we do this?
long __probe_user_read(void *dst, const void *src, size_t size) {
Add a
if (!access_ok(src, size)) ret = -EFAULT; else {
.. do the pagefault_disable() etc .. }
Since kprobes handler runs in IRQ context, we can not use access_ok() in it. (only on x86 + CONFIG_DEBUG_ATOMIC_SLEEP=y)
In arch/x86/include/asm/uaccess.h:
#define access_ok(addr, size) \ ({ \ WARN_ON_IN_IRQ(); \ likely(!__range_not_ok(addr, size, user_addr_max())); \ })
Do we need acccess_ok_inatomic()?
BTW, it seems a bit strange that this WARN_ON_IN_IRQ() is only in x86 access_ok() implementation, since CONFIG_DEBUG_ATOMIC_SLEEP(which defines WARN_ON_IN_IRQ) doesn't depend on x86, and access_ok() is widely used in kernel. I think it would be better that each arch provides __access_ok() and include/linux/uaccess.h provides access_ok() with WARN_ON_IN_IRQ().
to after the "set_fs()", and it looks good to me. Make it clear that yes, this works _only_ for user reads.
Adn that makes all the games with "kernel_uaccess_faults_ok" pointless, so you can just remove them.
OK.
(note that the "access_ok()" has to come after we've done "set_fs()", because it takes the address limit from that).
Also, since normally we'd expect that we already have USER_DS, it might be worthwhile to do this with a wrapper, something along the lines of
mm_segment_t old_fs = get_fs(); if (segment_eq(old_fs, USER_DS)) return __normal_probe_user_read(); set_fs(USER_DS); ret = __normal_probe_user_read(); set_fs(old_fs); return ret;
and have that __normal_probe_user_read() just do
if (!access_ok(src, size)) return -EFAULT; pagefault_disable(); ret = __copy_from_user_inatomic(dst, ...); pagefault_enable(); return ret ? -EFAULT : 0;
which looks more obvious.
OK.
Also, I would suggest that you just make the argument type be "const void __user *", since the whole point is that this takes a user pointer, and nothing else.
Ah, right.
Then we should still probably fix up "__probe_kernel_read()" to not allow user accesses. The easiest way to do that is actually likely to use the "unsafe_get_user()" functions *without* doing a uaccess_begin(), which will mean that modern CPU's will simply fault on a kernel access to user space.
Or, use __chk_user_ptr(ptr) to check it?
Thank you,
The nice thing about that is that usually developers will have access to exactly those modern boxes, so the people who notice that it doesn't work are the right people.
Alternatively, we should just make it be architecture-specific, so that architectures can decide "this address cannot be a kernel address" and refuse to do it.
Linus