Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to release the wdev lock in some situations.
Of course, the ensuing deadlock causes userland network managers to break pretty badly, and on typical systems this also causes lockups on on suspend, poweroff and reboot. See [1], [2], [3] for example reports.
The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this issue because the wdev lock does not exist there.
Fix the deadlock by releasing the lock before returning.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 [2] https://bbs.archlinux.org/viewtopic.php?id=290976 [3] https://lore.kernel.org/all/87sf4belmm.fsf@turtle.gmx.de/
Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") Cc: stable@vger.kernel.org Signed-off-by: Léo Lam leo@leolam.fr --- net/wireless/nl80211.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 6a82dd876f27..0b0dfecedc50 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -12906,17 +12906,23 @@ static int nl80211_set_cqm_rssi(struct genl_info *info, lockdep_is_held(&wdev->mtx));
/* if already disabled just succeed */ - if (!n_thresholds && !old) - return 0; + if (!n_thresholds && !old) { + err = 0; + goto unlock; + }
if (n_thresholds > 1) { if (!wiphy_ext_feature_isset(&rdev->wiphy, NL80211_EXT_FEATURE_CQM_RSSI_LIST) || - !rdev->ops->set_cqm_rssi_range_config) - return -EOPNOTSUPP; + !rdev->ops->set_cqm_rssi_range_config) { + err = -EOPNOTSUPP; + goto unlock; + } } else { - if (!rdev->ops->set_cqm_rssi_config) - return -EOPNOTSUPP; + if (!rdev->ops->set_cqm_rssi_config) { + err = -EOPNOTSUPP; + goto unlock; + } }
if (n_thresholds) {
On Sun, Dec 10, 2023 at 09:39:30PM +0000, Léo Lam wrote:
Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to release the wdev lock in some situations.
Of course, the ensuing deadlock causes userland network managers to break pretty badly, and on typical systems this also causes lockups on on suspend, poweroff and reboot. See [1], [2], [3] for example reports.
The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this issue because the wdev lock does not exist there.
Fix the deadlock by releasing the lock before returning.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 [2] https://bbs.archlinux.org/viewtopic.php?id=290976 [3] https://lore.kernel.org/all/87sf4belmm.fsf@turtle.gmx.de/
Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") Cc: stable@vger.kernel.org Signed-off-by: Léo Lam leo@leolam.fr
net/wireless/nl80211.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
So this is only for the 6.6.y tree? If so, you should at least cc: the other wireless developers involved in the original fix, right?
And what commit actually fixed this issue upstream, why not take that instead?
thanks,
greg k-h
On Mon, Dec 11, 2023 at 07:47:32AM +0100, Greg KH wrote:
On Sun, Dec 10, 2023 at 09:39:30PM +0000, Léo Lam wrote:
Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to release the wdev lock in some situations.
Of course, the ensuing deadlock causes userland network managers to break pretty badly, and on typical systems this also causes lockups on on suspend, poweroff and reboot. See [1], [2], [3] for example reports.
The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this issue because the wdev lock does not exist there.
Fix the deadlock by releasing the lock before returning.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 [2] https://bbs.archlinux.org/viewtopic.php?id=290976 [3] https://lore.kernel.org/all/87sf4belmm.fsf@turtle.gmx.de/
Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") Cc: stable@vger.kernel.org Signed-off-by: Léo Lam leo@leolam.fr
net/wireless/nl80211.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
So this is only for the 6.6.y tree? If so, you should at least cc: the other wireless developers involved in the original fix, right?
And what commit actually fixed this issue upstream, why not take that instead?
I've reverted the offending commit in the last 6.1.y and 6.6.y release, so can you send this as a patch series, first one being the original backport, and the second one this one, AFTER it has been tested?
thanks,
greg k-h
On Mon, 2023-12-11 at 13:45 +0100, Greg KH wrote:
On Mon, Dec 11, 2023 at 07:47:32AM +0100, Greg KH wrote:
On Sun, Dec 10, 2023 at 09:39:30PM +0000, Léo Lam wrote:
Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to release the wdev lock in some situations.
Of course, the ensuing deadlock causes userland network managers to break pretty badly, and on typical systems this also causes lockups on on suspend, poweroff and reboot. See [1], [2], [3] for example reports.
The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this issue because the wdev lock does not exist there.
Fix the deadlock by releasing the lock before returning.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 [2] https://bbs.archlinux.org/viewtopic.php?id=290976 [3] https://lore.kernel.org/all/87sf4belmm.fsf@turtle.gmx.de/
Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") Cc: stable@vger.kernel.org Signed-off-by: Léo Lam leo@leolam.fr
net/wireless/nl80211.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
So this is only for the 6.6.y tree? If so, you should at least cc: the other wireless developers involved in the original fix, right?
And what commit actually fixed this issue upstream, why not take that instead?
I've reverted the offending commit in the last 6.1.y and 6.6.y release, so can you send this as a patch series, first one being the original backport, and the second one this one, AFTER it has been tested?
Last night I didn't have the time to do more than just make sure the patch built fine, but it seems that several people on the Manjaro forum have since reported that it does fix their issue (when applied on top of the 6.6.5 tree) [1].
I will re-apply commit 7e7efdda6adb ("wifi: cfg80211: fix CQM for non- range use") with this patch on top of the latest 6.6.y version and verify that the deadlock is still fixed.
[1]https://lore.kernel.org/stable/43a1aa34-5109-41ad-88e7-19ba6101dad3@manjaro....
On Mon, 2023-12-11 at 07:47 +0100, Greg KH wrote:
On Sun, Dec 10, 2023 at 09:39:30PM +0000, Léo Lam wrote:
Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to release the wdev lock in some situations.
Of course, the ensuing deadlock causes userland network managers to break pretty badly, and on typical systems this also causes lockups on on suspend, poweroff and reboot. See [1], [2], [3] for example reports.
The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this issue because the wdev lock does not exist there.
Fix the deadlock by releasing the lock before returning.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 [2] https://bbs.archlinux.org/viewtopic.php?id=290976 [3] https://lore.kernel.org/all/87sf4belmm.fsf@turtle.gmx.de/
Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") Cc: stable@vger.kernel.org Signed-off-by: Léo Lam leo@leolam.fr
net/wireless/nl80211.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
Apologies for the slow reply - been dealing with some eye soreness. :(
First of all, thank you for taking the time to review this and for reverting the broken commit so quickly as it seems quite a few users were hitting this.
So this is only for the 6.6.y tree? If so, you should at least cc: the other wireless developers involved in the original fix, right?
You're right. Sorry I forgot to cc: johannes.berg@intel.com; though just to clarify, there is nothing wrong with their commit per se; the issue comes from how it was backported without 076fc8775daf ("wifi: cfg80211: remove wdev mutex").
And what commit actually fixed this issue upstream, why not take that instead?
As far as I understand, this was never an issue upstream because 076fc8775daf ("wifi: cfg80211: remove wdev mutex") was committed in August, *before* commit 7e7efdda6adb ("wifi: cfg80211: fix CQM for non- range use") added the early returns in late November. This only became an issue on the 6.1.x and 6.6.x trees because the CQM fix commit was applied without first applying the "remove wdev mutex" as well.
I did consider taking 076fc8775daf (i.e. removing the wdev mutex) and applying it to the 6.6.x tree but that diff is much bigger than 100 lines long and I thought it would be simpler and safer to just fix the buggy error handling. Especially for a newcomer who isn't very familiar with the development process...
On 12.12.23 05:57, Léo Lam wrote:
On Mon, 2023-12-11 at 07:47 +0100, Greg KH wrote:
On Sun, Dec 10, 2023 at 09:39:30PM +0000, Léo Lam wrote:
Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to release the wdev lock in some situations.
Of course, the ensuing deadlock causes userland network managers to break pretty badly, and on typical systems this also causes lockups on on suspend, poweroff and reboot. See [1], [2], [3] for example reports.
The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this issue because the wdev lock does not exist there.
Fix the deadlock by releasing the lock before returning.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 [2] https://bbs.archlinux.org/viewtopic.php?id=290976 [3] https://lore.kernel.org/all/87sf4belmm.fsf@turtle.gmx.de/
Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") Cc: stable@vger.kernel.org Signed-off-by: Léo Lam leo@leolam.fr
net/wireless/nl80211.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
Apologies for the slow reply - been dealing with some eye soreness. :(
First of all, thank you for taking the time to review this and for reverting the broken commit so quickly as it seems quite a few users were hitting this.
So this is only for the 6.6.y tree? If so, you should at least cc: the other wireless developers involved in the original fix, right?
You're right. Sorry I forgot to cc: johannes.berg@intel.com; though just to clarify, there is nothing wrong with their commit per se; the issue comes from how it was backported without 076fc8775daf ("wifi: cfg80211: remove wdev mutex").
And what commit actually fixed this issue upstream, why not take that instead?
As far as I understand, this was never an issue upstream because 076fc8775daf ("wifi: cfg80211: remove wdev mutex") was committed in August, *before* commit 7e7efdda6adb ("wifi: cfg80211: fix CQM for non- range use") added the early returns in late November. This only became an issue on the 6.1.x and 6.6.x trees because the CQM fix commit thxwas applied without first applying the "remove wdev mutex" as well.
I did consider taking 076fc8775daf (i.e. removing the wdev mutex) and applying it to the 6.6.x tree but that diff is much bigger than 100 lines long and I thought it would be simpler and safer to just fix the buggy error handling. Especially for a newcomer who isn't very familiar with the development process...
Hi Leo,
thx for the patch. At least some users on my end can say it fixed the issue for them. Also Johannes checked your patch by now: https://lore.kernel.org/stable/DM4PR11MB5359FE14974D50E0D48C2D02E98FA@DM4PR1...
So your patch can be applied via a patch series by including Johannes Berg's patch as well. Addressing all error paths works too in the end ;)
On Tue, 2023-12-12 at 06:15 +0700, Philip Müller wrote:
On 12.12.23 05:57, Léo Lam wrote:
On Mon, 2023-12-11 at 07:47 +0100, Greg KH wrote:
On Sun, Dec 10, 2023 at 09:39:30PM +0000, Léo Lam wrote:
Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to release the wdev lock in some situations.
Of course, the ensuing deadlock causes userland network managers to break pretty badly, and on typical systems this also causes lockups on on suspend, poweroff and reboot. See [1], [2], [3] for example reports.
The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this issue because the wdev lock does not exist there.
Fix the deadlock by releasing the lock before returning.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 [2] https://bbs.archlinux.org/viewtopic.php?id=290976 [3] https://lore.kernel.org/all/87sf4belmm.fsf@turtle.gmx.de/
Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") Cc: stable@vger.kernel.org Signed-off-by: Léo Lam leo@leolam.fr
net/wireless/nl80211.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
Apologies for the slow reply - been dealing with some eye soreness. :(
First of all, thank you for taking the time to review this and for reverting the broken commit so quickly as it seems quite a few users were hitting this.
So this is only for the 6.6.y tree? If so, you should at least cc: the other wireless developers involved in the original fix, right?
You're right. Sorry I forgot to cc: johannes.berg@intel.com; though just to clarify, there is nothing wrong with their commit per se; the issue comes from how it was backported without 076fc8775daf ("wifi: cfg80211: remove wdev mutex").
And what commit actually fixed this issue upstream, why not take that instead?
As far as I understand, this was never an issue upstream because 076fc8775daf ("wifi: cfg80211: remove wdev mutex") was committed in August, *before* commit 7e7efdda6adb ("wifi: cfg80211: fix CQM for non- range use") added the early returns in late November. This only became an issue on the 6.1.x and 6.6.x trees because the CQM fix commit thxwas applied without first applying the "remove wdev mutex" as well.
I did consider taking 076fc8775daf (i.e. removing the wdev mutex) and applying it to the 6.6.x tree but that diff is much bigger than 100 lines long and I thought it would be simpler and safer to just fix the buggy error handling. Especially for a newcomer who isn't very familiar with the development process...
Hi Leo,
thx for the patch. At least some users on my end can say it fixed the issue for them. Also Johannes checked your patch by now: https://lore.kernel.org/stable/DM4PR11MB5359FE14974D50E0D48C2D02E98FA@DM4PR1...
So your patch can be applied via a patch series by including Johannes Berg's patch as well. Addressing all error paths works too in the end ;)
Sorry if this is a newbie question, but just to confirm: do we really want a patch series with:
1. Johannes Berg's original patch (7e7efdda6adb); 2. my error handling patch
...instead of an adjusted version of 7e7efdda6adb with the error handling fixed for the older trees?
I thought each patch in a series was supposed to produce a working kernel (to make bisects easier among other things).
On 11.12.23 04:39, Léo Lam wrote:
Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to release the wdev lock in some situations.
Of course, the ensuing deadlock causes userland network managers to break pretty badly, and on typical systems this also causes lockups on on suspend, poweroff and reboot. See [1], [2], [3] for example reports.
The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this issue because the wdev lock does not exist there.
Fix the deadlock by releasing the lock before returning.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 [2] https://bbs.archlinux.org/viewtopic.php?id=290976 [3] https://lore.kernel.org/all/87sf4belmm.fsf@turtle.gmx.de/
Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") Cc: stable@vger.kernel.org Signed-off-by: Léo Lam leo@leolam.fr
net/wireless/nl80211.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 6a82dd876f27..0b0dfecedc50 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -12906,17 +12906,23 @@ static int nl80211_set_cqm_rssi(struct genl_info *info, lockdep_is_held(&wdev->mtx)); /* if already disabled just succeed */
- if (!n_thresholds && !old)
return 0;
- if (!n_thresholds && !old) {
err = 0;
goto unlock;
- }
if (n_thresholds > 1) { if (!wiphy_ext_feature_isset(&rdev->wiphy, NL80211_EXT_FEATURE_CQM_RSSI_LIST) ||
!rdev->ops->set_cqm_rssi_range_config)
return -EOPNOTSUPP;
!rdev->ops->set_cqm_rssi_range_config) {
err = -EOPNOTSUPP;
goto unlock;
} else {}
if (!rdev->ops->set_cqm_rssi_config)
return -EOPNOTSUPP;
if (!rdev->ops->set_cqm_rssi_config) {
err = -EOPNOTSUPP;
goto unlock;
}}
if (n_thresholds) {
I have at least 7 users who have tested that fix on my end:
https://lore.kernel.org/stable/20231210213930.61378-1-leo@leolam.fr/
So it can also be called tested now:
https://forum.manjaro.org/t/153045/77 https://forum.manjaro.org/t/153045/88 https://forum.manjaro.org/t/153045/90 https://forum.manjaro.org/t/153045/92 https://forum.manjaro.org/t/153045/93 https://forum.manjaro.org/t/153045/94
linux-stable-mirror@lists.linaro.org