On Sun, Aug 28, 2016 at 09:33:54PM +0100, Chris Wilson wrote:
On Sun, Aug 28, 2016 at 05:37:47PM +0100, Chris Wilson wrote:
Currently we install a callback for performing poll on a dma-buf, irrespective of the timeout. This involves taking a spinlock, as well as unnecessary work, and greatly reduces scaling of poll(.timeout=0) across multiple threads.
We can query whether the poll will block prior to installing the callback to make the busy-query fast.
Single thread: 60% faster 8 threads on 4 (+4 HT) cores: 600% faster
Hmm, this only really applies to the idle case. reservation_object_test_signaled_rcu() is still a major bottleneck when busy, due to the dance inside reservation_object_test_signaled_single()
The fix is not difficult, just requires extending the seqlock to catch the RCU race (i.e. earlier patches). I'll resend that series in the morning. -Chris