On 1/19/2018 9:06 AM, Paul Moore wrote:
On Fri, Jan 19, 2018 at 10:49 AM, Mark Salyzyn salyzyn@android.com wrote:
On 01/18/2018 02:36 PM, Paul Moore wrote:
On Thu, Jan 18, 2018 at 4:58 PM, Mark Salyzyn salyzyn@android.com wrote:
general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 14233 Comm: syz-executor2 Not tainted 4.4.112-g5f6325b #28 . . . [<ffffffff81b6a19d>] selinux_socket_setsockopt+0x4d/0x80 security/selinux/hooks.c:4338 [<ffffffff81b4873d>] security_socket_setsockopt+0x7d/0xb0 security/security.c:1257 [<ffffffff82df1ac8>] SYSC_setsockopt net/socket.c:1757 [inline] [<ffffffff82df1ac8>] SyS_setsockopt+0xe8/0x250 net/socket.c:1746 [<ffffffff83776499>] entry_SYSCALL_64_fastpath+0x16/0x92 Code: c2 42 9b b6 81 be 01 00 00 00 48 c7 c7 a0 cb 2b 84 e8 f7 2f 6d ff 49 8d 7d 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 83 01 00 00 41 8b 75 10 31 RIP [<ffffffff81b69b7e>] sock_has_perm+0x1fe/0x3e0 security/selinux/hooks.c:4069 RSP <ffff8800b5957ce0> ---[ end trace 7b5aaf788fef6174 ]---
In the absence of commit a4298e4522d6 ("net: add SOCK_RCU_FREE socket flag") and all the associated infrastructure changes to take advantage of a RCU grace period before freeing, there is a heightened possibility that a security check is performed while an ill-timed setsockopt call races in from user space. It then is prudent to null check sk_security, and if the case, reject the permissions.
This adjustment is orthogonal to infrastructure improvements that may nullify the needed check, but should be added as good code hygiene.
I'm skeptical that this is the full solution for systems that lack the SOCK_RCU_FREE protection. Is this really limited to just setsockopt()?
Maybe overstepped in my analysis and assumptions?
This is a result of a fuzzer hitting an android 4.4 KASAN kernel. We (so far) have _not_ seen this with an android 4.9 KASAN kernel (which has the SOCK_RCU_FREE adjustments). There is no standalone duplication or PoC _except_ via the fuzzer. The rest of the statements stands based on this tidbit (statements of general good code hygiene, not 100% sure SOCK_RCU_FREE usage is completely covered, KISS solution etc).
To be honest, yes, this may be a layer in the onion (swat this NULL check does not by itself solve the _problem_), I'd prefer kernel continuing on in a rational manner rather than panic ... and I have a gut feeling this could be a gratuitous NULL check if all the bugs in the network layer have been solved <that may be sarcasm, I can not tell>. Programming to solve a problem with one's gut is not a good practice, but hygiene is. This is 10 characters, and an estimated 1.2ns of added hygiene.
No, I do not think this is limited to setsockopt, but would be willing to believe a multithreaded attack of any socket functions or ioctl would drop down to the check with sock_has_perm at possibly the wrong time in socket teardown.
I'm not necessarily opposed to adding additional safety checks, if warranted, but I am opposed to adding a single check and declaring mission accomplished when there is a suspicion that additional checks may be needed.
Perhaps in this particular case it really is only setsockopt(), but from what I can tell from your comments and the SOCK_RCU_FREE commit message it would appear that there is a race condition here between a socket's lifetime and its visibility to userspace. Assuming this is the core problem you are trying to workaround with this patch, I suspect that more than just the SELinux/LSM hook for setsockopt() is affected; before merging this patch I would like to see a better investigation into all the socket related SELinux/LSM hooks to see if they suffer from the same problem.
There have been a few cases where Smack has been affected in similar ways outside of setsockopt(). If there's a problem here there's likely to be a problem in Smack that presents itself in a very different way.
For example, if we stick with the setsockopt() syscall and the SELinux/LSM hook we can see that there are two functions which access the socket struct: sock_has_perm() and selinux_netlbl_socket_setsockopt(). While you did add the safety check to sock_has_perm() you neglected to add a check to selinux_netlbl_socket_setsockopt(), I'm guessing because the Android kernels probably do not enable CONFIG_NETLABEL. Looking beyond setsockopt() into other socket related syscalls I see a number of hooks which should have similar protections.
When you see problems like this, please do the investigation to make sure you are fixing everything like it and not just the one instance that blew up.