Hi Greg.
On Fri, 3 Apr 2020 at 23:30, Giuliano Procida gprocida@google.com wrote:
Hi Greg.
I also have 4.14 and 4.9, I'll send them on for comparison.
I've done this.
I will try 4.4 but, as one call site doesn't exist and the other didn't have any locking to start with, I'd like to try to reproduce the issue first.
I have failed to build a bootable 4.4 kernel which is surprising / embarrassing, as my current toolchain (even after working around various known issues) compiles kernels that either panic or triple-fault (apparently, as there's no log output, just a reboot) on my amd64 hardware. Running an old live distribution with a 4.4 kernel, I wasn't able to reproduce the issue apparently resolved by these fixes after several hours of running.
I've also spent most of 2 days looking at unfamiliar code.
The code in 4.4 uses a timer instead of a workqueue for timeout callbacks. The callbacks have also have blk_queue_enter/exit protection in 4.9 but not 4.4. I'm guessing, but don't know, that the execution contexts are sufficiently similar between timers and workqueues that this protection should be back-ported to 4.4. This is relatively simple, it's bits of a couple of extra commits.
f5bbbbe4d635 adds to blk_mq_queue_tag_busy_iter an RCU-protected test to see if the blk_queue is held before doing any work. It also adds RCU synchronisation to code that manipulates the number of hardware queues. The follow-up 530ca2c9bd more sensibly just (possibly recursively) does try-to-enter/exit instead. 4.4 doesn't have code that manipulates the number of hardware queues. However, the blk_mq_queue_tag_busy_iter locking may be enough to prevent ioctl/procfs concurrency.
To this end, I've put together patches for 4.4. They are completely untested. Once I've verified they actually compile I'll send them on.
Giuliano.
I should have some spare time for this soon.
Giuilano.
On Fri, 3 Apr 2020 at 10:20, Greg KH greg@kroah.com wrote:
On Wed, Apr 01, 2020 at 05:47:02PM +0000, Giuliano Procida wrote:
This issue was found in 4.14 and is present in earlier kernels.
Please backport
f5bbbbe4d635 blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter 530ca2c9bd69 blk-mq: Allow blocking queue tag iter callbacks
onto the stable branches that don't have these. The second is a fix for the first. Thank you.
4.19.y and later - commits already present 4.14.y - f5bbbbe4d635 doesn't patch cleanly but it's still straightforward, just drop the comment and code mentioning switching to 'none' in the trailing context 4.9.y - ditto 4.4.y - there was a refactoring of the code in commit 0bf6cd5b9531bcc29c0a5e504b6ce2984c6fd8d8 making this non-trivial 3.16.y - ditto
I am happy to try to produce clean patches, but it may be a day or so.
I have done this for 4.14.y and 4.9.y, can you please provide a backport for 4.4.y that I can queue up?
thanks,
greg k-h