On Wed, Aug 26, 2015 at 12:30:51AM +0100, Russell King - ARM Linux wrote:
From what I can tell from the oops dumps you have, the failure is in padzero calling clear_user, which then calls __clear_user_std. That path already changes the domain register to make userspace accessible; we can see that in the register values in the oops dump, where r4 is 0x15 and r3 is 0x55. This follows the assembly I have here, where r3 will be written to the DACR, followed by an ISB, then a call to __clear_user_std before restoring it to the state saved in r4.
I wonder whether we need something stronger there, maybe a DSB to prevent the accesses trying to occur before the DACR update. I'll need to read the ARM or talk to Will about that.
I finally found the cause of it - it comes down to the crappy idea that assembler macros which depend on a config option should be done like this:
#ifdef CONFIG_symbol .macro foo ... .endm .macro bar ... .endm #else .macro foo ... .endm .macro bar ... .endm #endif
On the face of it, it looks like a good idea, but it _really_ isn't, especially when we end up with it getting to:
#ifdef CONFIG_symbol .macro foo ... .endm .macro bar ... .endm .macro baz ... .endm #else .macro foo ... .endm .macro bar ... .endm #ifdef CONFIG_symbol2 .macro baz ... .endm #else .macro baz ... .endm #endif #endif
which is exactly what we have for the svc_exit, svc_exit_with_fiq and restore_user macros in entry-header.S.
It's no wonder, therefore, that the Thumb2 SVC exit paths totally missed out on being updated with the DACR restore. So, what was happening in these failure cases was:
- CPU sets DACR to allow userspace access in padzero - CPU drops into arm_clear_user - CPU writes to userspace, but page is not present. - CPU triggers data abort, saves state (including DACR), disables user access, and processes the exception - CPU returns from data abort processing, but _omits_ to restore the DACR. - CPU retries the write to userspace, and hits a domain fault because the DACR is now set to deny access to userspace.
To prevent this kind of oversight in the future, the only sane way to handle this is to ensure locality of definition. The best way to ensure that is not by the above crappy structure, but by:
.macro foo ... common bits ... #ifdef CONFIG_symbol ... #else ... #endif .endm .macro bar #ifdef CONFIG_symbol ... #else ... #endif .endm .macro baz #ifdef CONFIG_symbol ... #elif defined(CONFIG_symbol2) ... #else ... #endif .endm
which (a) has the effect of keeping multiple definitions of the same macro in close proximity, and (b) allows common code (such as in the case of svc_exit) to be kept in exactly one place and not spread across multiple definitions - which is exactly what leads to stuff being missed when it comes to update it.
Now, if I can stop silly git commit from deleting the above preprocessor lines in my commit message which illustrate the problem, I'll push out the updates. (# is such a wonderful comment prefix, except when you want to quote some C preprocessor in the commit message.)