Hi Andrey,
On Wed, Dec 12, 2018 at 03:23:25PM +0100, Andrey Konovalov wrote:
On Mon, Dec 10, 2018 at 3:31 PM Vincenzo Frascino vincenzo.frascino@arm.com wrote:
On arm64 the TCR_EL1.TBI0 bit has been set since Linux 3.x hence the userspace (EL0) is allowed to set a non-zero value in the top byte but the resulting pointers are not allowed at the user-kernel syscall ABI boundary.
This patchset proposes a relaxation of the ABI and a mechanism to advertise it to the userspace via an AT_FLAGS.
The rationale behind the choice of AT_FLAGS is that the Unix System V ABI defines AT_FLAGS as "flags", leaving some degree of freedom in interpretation. There are two previous attempts of using AT_FLAGS in the Linux Kernel for different reasons: the first was more generic and was used to expose the support for the GNU STACK NX feature [1] and the second was done for the MIPS architecture and was used to expose the support of "MIPS ABI Extension for IEEE Std 754 Non-Compliant Interlinking" [2]. Both the changes are currently _not_ merged in mainline. The only architecture that reserves some of the bits in AT_FLAGS is currently MIPS, which introduced the concept of platform specific ABI (psABI) reserving the top-byte [3].
When ARM64_AT_FLAGS_SYSCALL_TBI is set the kernel is advertising to the userspace that a relaxed ABI is supported hence this type of pointers are now allowed to be passed to the syscalls when they are in memory ranges obtained by anonymous mmap() or brk().
The userspace _must_ verify that the flag is set before passing tagged pointers to the syscalls allowed by this relaxation.
More in general, exposing the ARM64_AT_FLAGS_SYSCALL_TBI flag and mandating to the software to check that the feature is present, before using the associated functionality, it provides a degree of control on the decision of disabling such a feature in future without consequently breaking the userspace.
[...]
Acked-by: Andrey Konovalov andreyknvl@google.com
Thanks for the ack. However, if we go ahead with this ABI proposal it means that your patches need to be reworked to allow a non-zero top byte in all syscalls, including mmap() and friends, ioctl(). There are ABI concerns in either case but I'd rather have this discussion in the open. It doesn't necessarily mean that I endorse this proposal, I would like feedback and not just from kernel developers but user space ones.
The summary of our internal discussions (mostly between kernel developers) is that we can't properly describe a user ABI that covers future syscalls or syscall extensions while not all syscalls accept tagged pointers. So we tweaked the requirements slightly to only allow tagged pointers back into the kernel *if* the originating address is from an anonymous mmap() or below sbrk(0). This should cover some of the ioctls or getsockopt(TCP_ZEROCOPY_RECEIVE) where the user passes a pointer to a buffer obtained via mmap() on the device operations.
(sorry for not being clear on what Vincenzo's proposal implies)