On 8/9/21 1:16 PM, Al Viro wrote:
On Mon, Aug 09, 2021 at 08:04:40PM +0000, Al Viro wrote:
On Mon, Aug 09, 2021 at 12:40:03PM -0700, Shoaib Rao wrote:
Page faults occur all the time, the page may not even be in the cache or the mapping is not there (mmap), so I would not consider this a bug. The code should complain about all other calls as they are also copying to user pages. I must not be following some semantics for the code to be triggered but I can not figure that out. What is the recommended interface to do user copy from kernel?
What are you talking about? Yes, page faults happen. No, they must not be triggered in contexts when you cannot afford going to sleep. In particular, you can't do that while holding a spinlock.
There are things that can't be done under a spinlock. If your commit is attempting that, it's simply broken.
... in particular, this
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
mutex_lock(&u->iolock);
unix_state_lock(sk);
err = unix_stream_recv_urg(state);
unix_state_unlock(sk);
mutex_unlock(&u->iolock);
+#endif
is 100% broken, since you *are* attempting to copy data to userland between spin_lock(&unix_sk(s)->lock) and spin_unlock(&unix_sk(s)->lock).
You can't do blocking operations under a spinlock. And copyout is inherently a blocking operation - it can require any kind of IO to complete. If you have the destination (very much valid - no bad addresses there) in the middle of a page mmapped from a file and currently not paged in, you *must* read the current contents of the page, at least into the parts of page that are not going to be overwritten by your copyout. No way around that. And that can involve any kind of delays and any amount of disk/network/whatnot traffic.
You fundamentally can not do that kind of thing without giving the CPU up. And under a spinlock you are not allowed to do that.
In the current form that commit is obviously broken.
I am quiet aware of spinlock and mutex and all the other kernel structures etc... As I said the fact that Linux uses locks* for spinlocks and mutexes is confusing unless you look at the details of the lock. I will fix the issue, it is a simple fix, copy the byte to a kernel variable, release the lock. copy the byte to userland.
Shoaib