Hello,
commit 310ca162d77 "block/loop: Use global lock for ioctl() operation." has been pushed to multiple stable trees. This patch is a part of larger series that overhauls the locking inside loopback device upstream and for 4.4, 4.9, and 4.14 stable trees only this patch from the series is applied. Our testing now has shown [1] that the patch alone makes present deadlocks inside loopback driver more likely (the openqa test in our infrastructure didn't hit the deadlock before whereas with the new kernel it hits it reliably every time). So I would suggest we revert 310ca162d77 from 4.4, 4.9, and 4.14 kernels.
Another option would be to backport other locking fixes for the loop device but honestly I don't think that's a stable material - never heard of real users hitting problems, only syzkaller could, and we are still fixing up some small glitches resulting from that rework...
Honza
[1] https://bugzilla.suse.com/show_bug.cgi?id=1129739
On Wed, Mar 20, 2019 at 01:58:06PM +0100, Jan Kara wrote:
Hello,
commit 310ca162d77 "block/loop: Use global lock for ioctl() operation." has been pushed to multiple stable trees. This patch is a part of larger series that overhauls the locking inside loopback device upstream and for 4.4, 4.9, and 4.14 stable trees only this patch from the series is applied. Our testing now has shown [1] that the patch alone makes present deadlocks inside loopback driver more likely (the openqa test in our infrastructure didn't hit the deadlock before whereas with the new kernel it hits it reliably every time). So I would suggest we revert 310ca162d77 from 4.4, 4.9, and 4.14 kernels.
Ugh, ok.
Another option would be to backport other locking fixes for the loop device but honestly I don't think that's a stable material - never heard of real users hitting problems, only syzkaller could, and we are still fixing up some small glitches resulting from that rework...
I tried to backport a number of the loop fixes, and odds are I am the one that grabbed this. There are other loop locking fixes in the stable releases, I don't know if I can just revert this one, let me check...
And yes, I did get some loop bugreports from the Android kernel team as they are using loop devices in large numbers on the new Android release for something. So I think they have already backported a number of these fixes to their trees, which made me want to push these out to more people.
Also, given that syzbot has a reproducer for 310ca162d77, are we worst off if we revert it?
thanks,
greg k-h
On Wed, Mar 20, 2019 at 04:12:31PM +0100, Greg KH wrote:
On Wed, Mar 20, 2019 at 01:58:06PM +0100, Jan Kara wrote:
Hello,
commit 310ca162d77 "block/loop: Use global lock for ioctl() operation." has been pushed to multiple stable trees. This patch is a part of larger series that overhauls the locking inside loopback device upstream and for 4.4, 4.9, and 4.14 stable trees only this patch from the series is applied. Our testing now has shown [1] that the patch alone makes present deadlocks inside loopback driver more likely (the openqa test in our infrastructure didn't hit the deadlock before whereas with the new kernel it hits it reliably every time). So I would suggest we revert 310ca162d77 from 4.4, 4.9, and 4.14 kernels.
Ugh, ok.
Another option would be to backport other locking fixes for the loop device but honestly I don't think that's a stable material - never heard of real users hitting problems, only syzkaller could, and we are still fixing up some small glitches resulting from that rework...
I tried to backport a number of the loop fixes, and odds are I am the one that grabbed this. There are other loop locking fixes in the stable releases, I don't know if I can just revert this one, let me check...
Nope, does not revert cleanly due to the other loop fixes I added after this one.
So either I back out _all_ of the loop patches, or we leave what we have now, or we add the proper fixes to get things working. I'd like to have this all work properly, so why not just have me backport the needed fixes that are upstream?
thanks,
greg k-h
On Wed 20-03-19 16:16:01, Greg KH wrote:
On Wed, Mar 20, 2019 at 04:12:31PM +0100, Greg KH wrote:
On Wed, Mar 20, 2019 at 01:58:06PM +0100, Jan Kara wrote:
Hello,
commit 310ca162d77 "block/loop: Use global lock for ioctl() operation." has been pushed to multiple stable trees. This patch is a part of larger series that overhauls the locking inside loopback device upstream and for 4.4, 4.9, and 4.14 stable trees only this patch from the series is applied. Our testing now has shown [1] that the patch alone makes present deadlocks inside loopback driver more likely (the openqa test in our infrastructure didn't hit the deadlock before whereas with the new kernel it hits it reliably every time). So I would suggest we revert 310ca162d77 from 4.4, 4.9, and 4.14 kernels.
Ugh, ok.
Another option would be to backport other locking fixes for the loop device but honestly I don't think that's a stable material - never heard of real users hitting problems, only syzkaller could, and we are still fixing up some small glitches resulting from that rework...
I tried to backport a number of the loop fixes, and odds are I am the one that grabbed this. There are other loop locking fixes in the stable releases, I don't know if I can just revert this one, let me check...
Nope, does not revert cleanly due to the other loop fixes I added after this one.
So either I back out _all_ of the loop patches, or we leave what we have now, or we add the proper fixes to get things working. I'd like to have this all work properly, so why not just have me backport the needed fixes that are upstream?
Sure, that is another option. Be sure to include recent fixups:
40853d6fc619 "loop: do not print warn message if partition scan is successful" 758a58d0bc67 "loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()"
otherwise you need commits:
b1ab5fa309e6 "block/loop: Don't grab "struct file" for vfs_getattr() operation." ... 628bd8594709 "loop: Fix double mutex_unlock(&loop_ctl_mutex) in loop_control_ioctl()"
to drivers/block/loop.c. It's just that my experience tells me that when there were 15 original patches in the series and then 3 followup fixups, the series is not usually worth the risk for something nobody hit besides fuzzers...
Honza
On Wed, Mar 20, 2019 at 01:58:06PM +0100, Jan Kara wrote:
Hello,
commit 310ca162d77 "block/loop: Use global lock for ioctl() operation." has been pushed to multiple stable trees. This patch is a part of larger series that overhauls the locking inside loopback device upstream and for 4.4, 4.9, and 4.14 stable trees only this patch from the series is applied. Our testing now has shown [1] that the patch alone makes present deadlocks inside loopback driver more likely (the openqa test in our infrastructure didn't hit the deadlock before whereas with the new kernel it hits it reliably every time). So I would suggest we revert 310ca162d77 from 4.4, 4.9, and 4.14 kernels.
Another option would be to backport other locking fixes for the loop device but honestly I don't think that's a stable material - never heard of real users hitting problems, only syzkaller could, and we are still fixing up some small glitches resulting from that rework...
Honza
Note, I do not have access to that bug report :(
On Wed 20-03-19 16:16:28, Greg KH wrote:
On Wed, Mar 20, 2019 at 01:58:06PM +0100, Jan Kara wrote:
Hello,
commit 310ca162d77 "block/loop: Use global lock for ioctl() operation." has been pushed to multiple stable trees. This patch is a part of larger series that overhauls the locking inside loopback device upstream and for 4.4, 4.9, and 4.14 stable trees only this patch from the series is applied. Our testing now has shown [1] that the patch alone makes present deadlocks inside loopback driver more likely (the openqa test in our infrastructure didn't hit the deadlock before whereas with the new kernel it hits it reliably every time). So I would suggest we revert 310ca162d77 from 4.4, 4.9, and 4.14 kernels.
Another option would be to backport other locking fixes for the loop device but honestly I don't think that's a stable material - never heard of real users hitting problems, only syzkaller could, and we are still fixing up some small glitches resulting from that rework...
Honza
Note, I do not have access to that bug report :(
Sorry, sometimes I hate how paranoid our bugzilla is. I've mirrorred the relevant parts to:
https://bugzilla.kernel.org/show_bug.cgi?id=202985
Honza
Hi Jan, hi Greg,
On Wed, Mar 20, 2019 at 01:58:06PM +0100, Jan Kara wrote:
Hello,
commit 310ca162d77 "block/loop: Use global lock for ioctl() operation." has been pushed to multiple stable trees. This patch is a part of larger series that overhauls the locking inside loopback device upstream and for 4.4, 4.9, and 4.14 stable trees only this patch from the series is applied. Our testing now has shown [1] that the patch alone makes present deadlocks inside loopback driver more likely (the openqa test in our infrastructure didn't hit the deadlock before whereas with the new kernel it hits it reliably every time). So I would suggest we revert 310ca162d77 from 4.4, 4.9, and 4.14 kernels.
A user in Debian reported [1], providing the following testcase which showed up after the recent update to 4.9.168-1 in Debian stretch (based on upstream v4.9.168) as follows:
dd if=/dev/zero of=/tmp/ff1.raw bs=1G seek=8 count=0 sync sleep 1 parted /tmp/ff1.raw mklabel msdos parted -s /tmp/ff1.raw mkpart primary linux-swap 1 100 parted -s -- /tmp/ff1.raw mkpart primary ext2 101 -1 parted -s -- /tmp/ff1.raw set 2 boot on sleep 5 losetup -Pf /tmp/ff1.raw --show
I have verified that the same happens with v4.9.171 where the mentioned commit was not reverted, and bisecting of the testcase showed it was introduced with 3ae3d167f5ec2c7bb5fcd12b7772cfadc93b2305 (v4.9.152~9) (which is the backport of 310ca162d77 for 4.9).
Reverting 3ae3d167f5ec2c7bb5fcd12b7772cfadc93b2305 on top of v4.9.171 worked and fixed the respective issue.
Can this commit in meanwhile be reverted or is there further ongoing work in integrating the followup fixes as mentioned in https://lore.kernel.org/stable/20190321104110.GF29086@quack2.suse.cz/ .
Regards, Salvatore
On Mon, Apr 29, 2019 at 02:05:42PM +0200, Salvatore Bonaccorso wrote:
Hi Jan, hi Greg,
On Wed, Mar 20, 2019 at 01:58:06PM +0100, Jan Kara wrote:
Hello,
commit 310ca162d77 "block/loop: Use global lock for ioctl() operation." has been pushed to multiple stable trees. This patch is a part of larger series that overhauls the locking inside loopback device upstream and for 4.4, 4.9, and 4.14 stable trees only this patch from the series is applied. Our testing now has shown [1] that the patch alone makes present deadlocks inside loopback driver more likely (the openqa test in our infrastructure didn't hit the deadlock before whereas with the new kernel it hits it reliably every time). So I would suggest we revert 310ca162d77 from 4.4, 4.9, and 4.14 kernels.
A user in Debian reported [1], providing the following testcase which showed up after the recent update to 4.9.168-1 in Debian stretch (based on upstream v4.9.168) as follows:
dd if=/dev/zero of=/tmp/ff1.raw bs=1G seek=8 count=0 sync sleep 1 parted /tmp/ff1.raw mklabel msdos parted -s /tmp/ff1.raw mkpart primary linux-swap 1 100 parted -s -- /tmp/ff1.raw mkpart primary ext2 101 -1 parted -s -- /tmp/ff1.raw set 2 boot on sleep 5 losetup -Pf /tmp/ff1.raw --show
I have verified that the same happens with v4.9.171 where the mentioned commit was not reverted, and bisecting of the testcase showed it was introduced with 3ae3d167f5ec2c7bb5fcd12b7772cfadc93b2305 (v4.9.152~9) (which is the backport of 310ca162d77 for 4.9).
Reverting 3ae3d167f5ec2c7bb5fcd12b7772cfadc93b2305 on top of v4.9.171 worked and fixed the respective issue.
Can this commit in meanwhile be reverted or is there further ongoing work in integrating the followup fixes as mentioned in https://lore.kernel.org/stable/20190321104110.GF29086@quack2.suse.cz/ .
Sorry for the delay here. No, I didn't find any time for the followup stuff here, and Jan is right, this should just be dropped.
I've now reverted it from 3.18.y, 4.4.y, 4.9.y, and 4.14.y.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org