Hi Linus,
On Fri, Sep 07, 2018 at 09:30:35AM -0700, Linus Torvalds wrote:
On Fri, Sep 7, 2018 at 8:26 AM Catalin Marinas catalin.marinas@arm.com wrote:
So it's not about casting to another pointer; it's rather about no longer using the value as a user pointer but as an actual (untyped, untagged) virtual address.
[...]
I actually originally wanted to have sparse not just check types, but actually do transformations too, in order to check more.
[...]
But it sounds like this is exactly what you guys would want for the tagged pointers. Some functions can take a "wild" pointer, because they deal with the tag part natively. And others need to be "checked" and have gone through the cleaning and verification.
But sparse is sadly not the right tool for this, and having a single "__user" address space is not sufficient. I guess for the arm64 case, you really could make up a *new* address space: "__user_untagged", and then have functions that convert from "void __user *" to "void __user_untagged *", and then mark the functions that need the tag removed as taking that new kind of user pointer.
Fortunately, most (all) functions taking a __user pointer can cope with tagged pointers since they never dereference the pointer directly but pass it through uaccess functions (which can access tagged pointers without untagging). The problem appears when the pointer is no longer used for access but converted to a long for other uses like rbtree look-up, so not actually dereferenced. Such conversion, in a few cases, needs to lose the tag.
Of course, there are lots of void __user * conversions to long where removing the tag is not always the right thing or required (hence the __force annotations in this patchset).
As Luc mentioned in this thread, we can consider that __user pointers are always tagged. What I think we'd need is a few annotations where ulong must be an __untagged address (and I guess in smaller numbers than the __force ones proposed here). For example we can allow get_user_pages() to get an (ulong)(void __user *) conversion but find_vma() would only take an (unsigned long __untagged) argument. Such attribute conversion would be handled by an untagged_addr() macro. So we move the detection problem from pointer conversion to an ulong (tagged by default) to ulong __untagged conversion (I'm not sure sparse can do this).
That's slightly different than trying to identify all the __user ptr to long conversions but, as you said, it's probably not a complete solution anyway and with lots of __force annotations throughout the kernel.