Hi,
I'm working on Android Linux Kernel Vesion 3.0.15 and seeing a "deadlock" in the ashmem driver, while handling mmap request. I seek your support in finding the correct fix. The locks that involved in the dead lock are 1) mm->mmap_sem 2) ashmem_mutex
The following is the sequence of events that leads to the deadlock. There are two threads A and B that belong to the same process (system_server) and hence share the mm struct. A1) In the A's context an mmap system call is made with an fd of ashmem A2) The system call sys_mmap_pgoff acquires the mmap_sem of the "mm" and sleeps before calling the .mmap of ashmem i.e before calling ashmem_mmap
Now the thread B runs and proceeds to do the following B1) In the B's context ashmem ioctl with option ASHMEM_SET_NAME is called. B2) Now the code proceeds to acquire the ashmem_mutex and performs a "copy_from_user" B3) copy_from_user raises a valid exception to copy the data from user space and proceeds to handle it gracefully, do_DataAbort --> do_page_fault B4) In do_page_fault it finds that the mm->mmap_sem is not available (Note A & B share the mm) since A has it and sleeps
Now the thread A runs again A3) It proceeds to call ashmem_mmap and tries to acquired ashmem_mutex, which is not available (is with B) and sleeps.
Now A has acquired mmap_sem and waits for B to release ashmem_mutex B has acquired ashmem_mutex and waits for the mmap_sem to be available, which is held by A
This creates a dead lock in the system. I'm not sure how to use these locks in such a way as to prevent this scenario. Any suggestions would be of great help.
Workaround: One possible work around is to replace the mutex_lock call made in the ashmem_mmap with mutex_trylock and if it fails, wait for few milliseconds and try back for few iterations and finally give up after few iterations. This will bring the system out deadlock if this scneario happens. I myself feel that this suggestion is not clean. But I'm unable to think of anything. Is there any suggestion to avoid this scenario.
Warm Regards, Shankar
Is it not better to drop ashmem_lock before taking the mmap_sem? An example implementation is in v4l2 implementation. The soc-camera driver lock is dropped before acquiring mmap_sem semaphore and then soc-camera driver lock is re-acquired. See vb2_qbuf() in videobuf2-core.c
- Nishanth Peethambaran +91-9448074166
On Thu, Feb 7, 2013 at 9:41 PM, Shankar Brahadeeswaran shankoo77@gmail.com wrote:
Hi,
I'm working on Android Linux Kernel Vesion 3.0.15 and seeing a "deadlock" in the ashmem driver, while handling mmap request. I seek your support in finding the correct fix. The locks that involved in the dead lock are
- mm->mmap_sem
- ashmem_mutex
The following is the sequence of events that leads to the deadlock. There are two threads A and B that belong to the same process (system_server) and hence share the mm struct. A1) In the A's context an mmap system call is made with an fd of ashmem A2) The system call sys_mmap_pgoff acquires the mmap_sem of the "mm" and sleeps before calling the .mmap of ashmem i.e before calling ashmem_mmap
Now the thread B runs and proceeds to do the following B1) In the B's context ashmem ioctl with option ASHMEM_SET_NAME is called. B2) Now the code proceeds to acquire the ashmem_mutex and performs a "copy_from_user" B3) copy_from_user raises a valid exception to copy the data from user space and proceeds to handle it gracefully, do_DataAbort --> do_page_fault B4) In do_page_fault it finds that the mm->mmap_sem is not available (Note A & B share the mm) since A has it and sleeps
Now the thread A runs again A3) It proceeds to call ashmem_mmap and tries to acquired ashmem_mutex, which is not available (is with B) and sleeps.
Now A has acquired mmap_sem and waits for B to release ashmem_mutex B has acquired ashmem_mutex and waits for the mmap_sem to be available, which is held by A
This creates a dead lock in the system. I'm not sure how to use these locks in such a way as to prevent this scenario. Any suggestions would be of great help.
Workaround: One possible work around is to replace the mutex_lock call made in the ashmem_mmap with mutex_trylock and if it fails, wait for few milliseconds and try back for few iterations and finally give up after few iterations. This will bring the system out deadlock if this scneario happens. I myself feel that this suggestion is not clean. But I'm unable to think of anything. Is there any suggestion to avoid this scenario.
Warm Regards, Shankar
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
linaro-mm-sig@lists.linaro.org