Hi Johannes, hi Greg,
Any tree that back-ported 7e7efdda6adb wifi: cfg80211: fix CQM for non-range use that does not contain 076fc8775daf wifi: cfg80211: remove wdev mutex (which does not apply cleanly to 6.6.y or 6.6.1) will be affected.
You can find a downstream bug report at Arch Linux:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/17
So we should either revert 7e7efdda6adb or backport the needed to those kernel series. 6.7.y is reported to work with 6.7.0-rc4.
On Mon, Dec 11, 2023 at 04:02:11PM +0700, Philip Müller wrote:
Hi Johannes, hi Greg,
Any tree that back-ported 7e7efdda6adb wifi: cfg80211: fix CQM for non-range use that does not contain 076fc8775daf wifi: cfg80211: remove wdev mutex (which does not apply cleanly to 6.6.y or 6.6.1) will be affected.
You can find a downstream bug report at Arch Linux:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/17
So we should either revert 7e7efdda6adb or backport the needed to those kernel series. 6.7.y is reported to work with 6.7.0-rc4.
Yeah, this looks bad, I'll go just revert this for now and push out a new release with the fix as lots of people are hitting it.
thanks,
greg k-h
On 11.12.23 16:25, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:02:11PM +0700, Philip Müller wrote:
Hi Johannes, hi Greg,
Any tree that back-ported 7e7efdda6adb wifi: cfg80211: fix CQM for non-range use that does not contain 076fc8775daf wifi: cfg80211: remove wdev mutex (which does not apply cleanly to 6.6.y or 6.6.1) will be affected.
You can find a downstream bug report at Arch Linux:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/17
So we should either revert 7e7efdda6adb or backport the needed to those kernel series. 6.7.y is reported to work with 6.7.0-rc4.
Yeah, this looks bad, I'll go just revert this for now and push out a new release with the fix as lots of people are hitting it.
thanks,
greg k-h
Hi Greg,
there is actually a fix for it:
https://www.spinics.net/lists/stable/msg703040.html
On Mon, Dec 11, 2023 at 04:26:26PM +0700, Philip Müller wrote:
On 11.12.23 16:25, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:02:11PM +0700, Philip Müller wrote:
Hi Johannes, hi Greg,
Any tree that back-ported 7e7efdda6adb wifi: cfg80211: fix CQM for non-range use that does not contain 076fc8775daf wifi: cfg80211: remove wdev mutex (which does not apply cleanly to 6.6.y or 6.6.1) will be affected.
You can find a downstream bug report at Arch Linux:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/17
So we should either revert 7e7efdda6adb or backport the needed to those kernel series. 6.7.y is reported to work with 6.7.0-rc4.
Yeah, this looks bad, I'll go just revert this for now and push out a new release with the fix as lots of people are hitting it.
thanks,
greg k-h
Hi Greg,
there is actually a fix for it:
That "fix" was not cc:ed to any of the wifi developers and would need a lot of review before I feel comfortable accepting it, as I said in the response to that message.
Also, please point to lore.kernel.org lists, it's much easier to handle as we don't have any control over any other archive web site.
thanks,
greg k-h
On Mon, Dec 11, 2023 at 10:39:26AM +0100, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:26:26PM +0700, Philip Müller wrote:
On 11.12.23 16:25, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:02:11PM +0700, Philip Müller wrote:
Hi Johannes, hi Greg,
Any tree that back-ported 7e7efdda6adb wifi: cfg80211: fix CQM for non-range use that does not contain 076fc8775daf wifi: cfg80211: remove wdev mutex (which does not apply cleanly to 6.6.y or 6.6.1) will be affected.
You can find a downstream bug report at Arch Linux:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/17
So we should either revert 7e7efdda6adb or backport the needed to those kernel series. 6.7.y is reported to work with 6.7.0-rc4.
Yeah, this looks bad, I'll go just revert this for now and push out a new release with the fix as lots of people are hitting it.
thanks,
greg k-h
Hi Greg,
there is actually a fix for it:
That "fix" was not cc:ed to any of the wifi developers and would need a lot of review before I feel comfortable accepting it, as I said in the response to that message.
Also, please point to lore.kernel.org lists, it's much easier to handle as we don't have any control over any other archive web site.
Also, have you tested that proposed fix?
thanks,
greg k-h
On 11.12.23 16:40, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 10:39:26AM +0100, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:26:26PM +0700, Philip Müller wrote:
On 11.12.23 16:25, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:02:11PM +0700, Philip Müller wrote:
Hi Johannes, hi Greg,
Any tree that back-ported 7e7efdda6adb wifi: cfg80211: fix CQM for non-range use that does not contain 076fc8775daf wifi: cfg80211: remove wdev mutex (which does not apply cleanly to 6.6.y or 6.6.1) will be affected.
You can find a downstream bug report at Arch Linux:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/17
So we should either revert 7e7efdda6adb or backport the needed to those kernel series. 6.7.y is reported to work with 6.7.0-rc4.
Yeah, this looks bad, I'll go just revert this for now and push out a new release with the fix as lots of people are hitting it.
thanks,
greg k-h
Hi Greg,
there is actually a fix for it:
That "fix" was not cc:ed to any of the wifi developers and would need a lot of review before I feel comfortable accepting it, as I said in the response to that message.
Also, please point to lore.kernel.org lists, it's much easier to handle as we don't have any control over any other archive web site.
Also, have you tested that proposed fix?
thanks,
greg k-h
Not yet. Currently build kernels on my end to see if it fixes the regression. A revert of the patch is confirmed to work also by users who have the issue. I can check with mine, when I've released a kernel with Léo Lam's fix.
On 11.12.23 16:46, Philip Müller wrote:
On 11.12.23 16:40, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 10:39:26AM +0100, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:26:26PM +0700, Philip Müller wrote:
On 11.12.23 16:25, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:02:11PM +0700, Philip Müller wrote:
Hi Johannes, hi Greg,
Any tree that back-ported 7e7efdda6adb wifi: cfg80211: fix CQM for non-range use that does not contain 076fc8775daf wifi: cfg80211: remove wdev mutex (which does not apply cleanly to 6.6.y or 6.6.1) will be affected.
You can find a downstream bug report at Arch Linux:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/17
So we should either revert 7e7efdda6adb or backport the needed to those kernel series. 6.7.y is reported to work with 6.7.0-rc4.
Yeah, this looks bad, I'll go just revert this for now and push out a new release with the fix as lots of people are hitting it.
thanks,
greg k-h
Hi Greg,
there is actually a fix for it:
That "fix" was not cc:ed to any of the wifi developers and would need a lot of review before I feel comfortable accepting it, as I said in the response to that message.
Also, please point to lore.kernel.org lists, it's much easier to handle as we don't have any control over any other archive web site.
Also, have you tested that proposed fix?
thanks,
greg k-h
Not yet. Currently build kernels on my end to see if it fixes the regression. A revert of the patch is confirmed to work also by users who have the issue. I can check with mine, when I've released a kernel with Léo Lam's fix.
According to the author of the patch, it was not yet tested:
This is a kernel bug on the 6.6.x stable branch. As people have correctly pointed out, 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use" backported to 6.6.x) is the culprit as it causes cfg80211_cqm_rssi_update not to release the wdev lock in some cases - which then causes various other things to deadlock.
I have submitted a patch: https://lore.kernel.org/stable/20231210213930.61378-1-leo@leolam.fr/T/
I'm pretty sure it will fix the issue but I haven't tested it.
https://bbs.archlinux.org/viewtopic.php?pid=2136529#p2136529
There is an Arch Kernel with that patch applied for testing: https://bbs.archlinux.org/viewtopic.php?pid=2136533#p2136533
The proper fix seems to be '076fc8775daf wifi: cfg80211: remove wdev mutex' which does not apply cleanly to either 6.6.y or 6.1.y as stated here: https://bbs.archlinux.org/viewtopic.php?pid=2136579#p2136579
On Mon, Dec 11, 2023 at 05:17:47PM +0700, Philip Müller wrote:
On 11.12.23 16:46, Philip Müller wrote:
On 11.12.23 16:40, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 10:39:26AM +0100, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:26:26PM +0700, Philip Müller wrote:
On 11.12.23 16:25, Greg Kroah-Hartman wrote:
On Mon, Dec 11, 2023 at 04:02:11PM +0700, Philip Müller wrote: > Hi Johannes, hi Greg, > > Any tree that back-ported 7e7efdda6adb wifi: > cfg80211: fix CQM for non-range > use that does not contain 076fc8775daf wifi: > cfg80211: remove wdev mutex > (which does not apply cleanly to 6.6.y or 6.6.1) will be affected. > > You can find a downstream bug report at Arch Linux: > > https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/17 > > So we should either revert 7e7efdda6adb or backport > the needed to those > kernel series. 6.7.y is reported to work with 6.7.0-rc4.
Yeah, this looks bad, I'll go just revert this for now and push out a new release with the fix as lots of people are hitting it.
thanks,
greg k-h
Hi Greg,
there is actually a fix for it:
That "fix" was not cc:ed to any of the wifi developers and would need a lot of review before I feel comfortable accepting it, as I said in the response to that message.
Also, please point to lore.kernel.org lists, it's much easier to handle as we don't have any control over any other archive web site.
Also, have you tested that proposed fix?
thanks,
greg k-h
Not yet. Currently build kernels on my end to see if it fixes the regression. A revert of the patch is confirmed to work also by users who have the issue. I can check with mine, when I've released a kernel with Léo Lam's fix.
According to the author of the patch, it was not yet tested:
This is a kernel bug on the 6.6.x stable branch. As people have correctly pointed out, 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use" backported to 6.6.x) is the culprit as it causes cfg80211_cqm_rssi_update not to release the wdev lock in some cases - which then causes various other things to deadlock.
I have submitted a patch: https://lore.kernel.org/stable/20231210213930.61378-1-leo@leolam.fr/T/
I'm pretty sure it will fix the issue but I haven't tested it.
https://bbs.archlinux.org/viewtopic.php?pid=2136529#p2136529
There is an Arch Kernel with that patch applied for testing: https://bbs.archlinux.org/viewtopic.php?pid=2136533#p2136533
The proper fix seems to be '076fc8775daf wifi: cfg80211: remove wdev mutex' which does not apply cleanly to either 6.6.y or 6.1.y as stated here: https://bbs.archlinux.org/viewtopic.php?pid=2136579#p2136579
6.6.6 is out now which should fix the issue for the distros to pick up, it reverts the offending commit. Now we can take the time to fix this up "properly" if developers want to.
thanks,
greg k-h
FWIW, that looks fine to me. I don't know how I managed to miss that. Sorry about that ☹
That "fix" was not cc:ed to any of the wifi developers and would need a lot of review before I feel comfortable accepting it, as I said in the response to that message.
Indeed, I hadn't seen it before.
But I just checked the error paths there, and the fix adjust all three of them correctly.
johannes
On 12.12.23 03:58, Berg, Johannes wrote:
FWIW, that looks fine to me. I don't know how I managed to miss that. Sorry about that ☹
That "fix" was not cc:ed to any of the wifi developers and would need a lot of review before I feel comfortable accepting it, as I said in the response to that message.
Indeed, I hadn't seen it before.
But I just checked the error paths there, and the fix adjust all three of them correctly.
johannes
I have at least 7 users who have tested that fix on my end:
https://lore.kernel.org/stable/20231210213930.61378-1-leo@leolam.fr/
So it can also be called tested now:
https://forum.manjaro.org/t/153045/77 https://forum.manjaro.org/t/153045/88 https://forum.manjaro.org/t/153045/90 https://forum.manjaro.org/t/153045/92 https://forum.manjaro.org/t/153045/93 https://forum.manjaro.org/t/153045/94
On 12.12.23 05:26, Philip Müller wrote:
On 12.12.23 03:58, Berg, Johannes wrote:
FWIW, that looks fine to me. I don't know how I managed to miss that. Sorry about that ☹
That "fix" was not cc:ed to any of the wifi developers and would need a lot of review before I feel comfortable accepting it, as I said in the response to that message.
Indeed, I hadn't seen it before.
But I just checked the error paths there, and the fix adjust all three of them correctly.
johannes
I have at least 7 users who have tested that fix on my end:
https://lore.kernel.org/stable/20231210213930.61378-1-leo@leolam.fr/
So it can also be called tested now:
https://forum.manjaro.org/t/153045/77 https://forum.manjaro.org/t/153045/88 https://forum.manjaro.org/t/153045/90 https://forum.manjaro.org/t/153045/92 https://forum.manjaro.org/t/153045/93 https://forum.manjaro.org/t/153045/94
Since I re-applied the broken patch by Johannes plus the fix of Leo to 6.x kernels on my end and didn't hear any regressions so far, it can be called tested by Manjaro community.
So Greg, how we move forward with this one? Keep the revert or integrate Leo's work on top of Johannes'?
Johannes, how important is your fix for the stable 6.x kernels when done properly?
So Greg, how we move forward with this one? Keep the revert or integrate Leo's work on top of Johannes'?
It would be "resend with the fixes rolled in as a new backport".
Johannes, how important is your fix for the stable 6.x kernels when done properly?
Well CQM was broken completely for anything but (effectively) brcmfmac ... That means roaming decisions will be less optimal, mostly.
Is that annoying? Probably. Super critical? I guess not.
johannes
On Thu, Dec 14, 2023 at 08:05:55AM +0000, Berg, Johannes wrote:
So Greg, how we move forward with this one? Keep the revert or integrate Leo's work on top of Johannes'?
It would be "resend with the fixes rolled in as a new backport".
No, the new change needs to be a seprate commit.
Johannes, how important is your fix for the stable 6.x kernels when done properly?
Well CQM was broken completely for anything but (effectively) brcmfmac ... That means roaming decisions will be less optimal, mostly.
Is that annoying? Probably. Super critical? I guess not.
Is it a regression or was it always like this?
thanks,
greg k-h
So Greg, how we move forward with this one? Keep the revert or integrate Leo's work on top of Johannes'?
It would be "resend with the fixes rolled in as a new backport".
No, the new change needs to be a seprate commit.
Oh, I stand corrected. I thought you said earlier you'd prefer a new, fixed, backport of the change that was meant to fix CQM but broke the locking, rather than two new commits.
Johannes, how important is your fix for the stable 6.x kernels when done properly?
Well CQM was broken completely for anything but (effectively) brcmfmac ...
That means roaming decisions will be less optimal, mostly.
Is that annoying? Probably. Super critical? I guess not.
Is it a regression or was it always like this?
It was a regression.
johannes
On 14.12.23 15:24, Berg, Johannes wrote:
So Greg, how we move forward with this one? Keep the revert or integrate Leo's work on top of Johannes'?
It would be "resend with the fixes rolled in as a new backport".
No, the new change needs to be a seprate commit.
Oh, I stand corrected. I thought you said earlier you'd prefer a new, fixed, backport of the change that was meant to fix CQM but broke the locking, rather than two new commits.
Johannes, how important is your fix for the stable 6.x kernels when done properly?
Well CQM was broken completely for anything but (effectively) brcmfmac ...
That means roaming decisions will be less optimal, mostly.
Is that annoying? Probably. Super critical? I guess not.
Is it a regression or was it always like this?
It was a regression.
johannes
So basically the reversed patch by Johannes gets re-applied as it was and Leo's patch added to the series of patches to fix it. That is the way I currently ship it in my kernels so far.
We can add a Tested-by from my end if wanted.
On Thu, Dec 14, 2023 at 03:32:47PM +0700, Philip Müller wrote:
On 14.12.23 15:24, Berg, Johannes wrote:
So Greg, how we move forward with this one? Keep the revert or integrate Leo's work on top of Johannes'?
It would be "resend with the fixes rolled in as a new backport".
No, the new change needs to be a seprate commit.
Oh, I stand corrected. I thought you said earlier you'd prefer a new, fixed, backport of the change that was meant to fix CQM but broke the locking, rather than two new commits.
Johannes, how important is your fix for the stable 6.x kernels when done properly?
Well CQM was broken completely for anything but (effectively) brcmfmac ...
That means roaming decisions will be less optimal, mostly.
Is that annoying? Probably. Super critical? I guess not.
Is it a regression or was it always like this?
It was a regression.
johannes
So basically the reversed patch by Johannes gets re-applied as it was and Leo's patch added to the series of patches to fix it. That is the way I currently ship it in my kernels so far.
Great, can someone please send the series like this with your:
We can add a Tested-by from my end if wanted.
that would be wonderful.
greg k-h
On 14.12.23 18:59, Greg Kroah-Hartman wrote:
On Thu, Dec 14, 2023 at 03:32:47PM +0700, Philip Müller wrote:
On 14.12.23 15:24, Berg, Johannes wrote:
So Greg, how we move forward with this one? Keep the revert or integrate Leo's work on top of Johannes'?
It would be "resend with the fixes rolled in as a new backport".
No, the new change needs to be a seprate commit.
Oh, I stand corrected. I thought you said earlier you'd prefer a new, fixed, backport of the change that was meant to fix CQM but broke the locking, rather than two new commits.
Johannes, how important is your fix for the stable 6.x kernels when done properly?
Well CQM was broken completely for anything but (effectively) brcmfmac ...
That means roaming decisions will be less optimal, mostly.
Is that annoying? Probably. Super critical? I guess not.
Is it a regression or was it always like this?
It was a regression.
johannes
So basically the reversed patch by Johannes gets re-applied as it was and Leo's patch added to the series of patches to fix it. That is the way I currently ship it in my kernels so far.
Great, can someone please send the series like this with your:
We can add a Tested-by from my end if wanted.
that would be wonderful.
greg k-h
Hi Greg,
Leo provided the patch series here: https://lore.kernel.org/stable/20231216054715.7729-4-leo@leolam.fr/
However, without a cover letter to it. Since we reverted Johannes' patch both in 6.1.67 and 6.6.6 both patches may added to both series to restore the original intent.
thx.
Philip
On Sat, 2023-12-16 at 17:47 +0700, Philip Müller wrote:
Leo provided the patch series here: https://lore.kernel.org/stable/20231216054715.7729-4-leo@leolam.fr/
However, without a cover letter to it. Since we reverted Johannes' patch both in 6.1.67 and 6.6.6 both patches may added to both series to restore the original intent.
Ah sorry, I assumed the link I added in the patch description provided enough context!
Also I should note that my Tested-by only covers 6.6.7, while Phillip's Tested-by covers both 6.1 and 6.6 as there are forum users who tested both.
On 17.12.23 00:58, Léo Lam wrote:
On Sat, 2023-12-16 at 17:47 +0700, Philip Müller wrote:
Leo provided the patch series here: https://lore.kernel.org/stable/20231216054715.7729-4-leo@leolam.fr/
However, without a cover letter to it. Since we reverted Johannes' patch both in 6.1.67 and 6.6.6 both patches may added to both series to restore the original intent.
Ah sorry, I assumed the link I added in the patch description provided enough context!
Also I should note that my Tested-by only covers 6.6.7, while Phillip's Tested-by covers both 6.1 and 6.6 as there are forum users who tested both.
This is now part of 6.1.70, however didn't land in 6.6.x series yet ...
On Wed, Jan 03, 2024 at 10:45:05AM +0700, Philip Müller wrote:
On 17.12.23 00:58, Léo Lam wrote:
On Sat, 2023-12-16 at 17:47 +0700, Philip Müller wrote:
Leo provided the patch series here: https://lore.kernel.org/stable/20231216054715.7729-4-leo@leolam.fr/
However, without a cover letter to it. Since we reverted Johannes' patch both in 6.1.67 and 6.6.6 both patches may added to both series to restore the original intent.
Ah sorry, I assumed the link I added in the patch description provided enough context!
Also I should note that my Tested-by only covers 6.6.7, while Phillip's Tested-by covers both 6.1 and 6.6 as there are forum users who tested both.
This is now part of 6.1.70, however didn't land in 6.6.x series yet ...
Now queued up.
linux-stable-mirror@lists.linaro.org