On Mon, 2023-03-13 at 21:15 +0100, Alexander Wetzel wrote:
While drivers with native iTXQ support are able to handle that,
questionable. Looking at iwlwifi:
void iwl_mvm_mac_wake_tx_queue(struct ieee80211_hw *hw, struct ieee80211_txq *txq) { struct iwl_mvm *mvm = IWL_MAC80211_GET_MVM(hw); ... list_add_tail(&mvmtxq->list, &mvm->add_stream_txqs); ... }
which might explain some rare and hard to debug list corruptions in the driver that we've seen reports of ...
To avoid what seems to be a not needed distinction between native and drivers using ieee80211_handle_wake_tx_queue(), the serialization is done for drv_wake_tx_queue() here.
So probably no objection to that, at least for now, though in the common path (what I showed above was the 'first use' path), iwlwifi actually hits different HW resources, so it _could_ benefit from concurrent TX after fixing that bug.
The serialization works by detecting and blocking concurrent calls into drv_wake_tx_queue() and - when needed - restarting all queues after the wake_tx_queue ops returned from the driver.
This seems ... overly complex? It feels like you're implementing a kind of spinlock (using atomic bit ops) with very expensive handling of contention?
Since drivers are supposed to handle concurrent TX per AC, you could almost just do something like this:
diff --git a/include/net/mac80211.h b/include/net/mac80211.h index f07a3c1b4d9a..1946e28868ea 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -7108,10 +7108,8 @@ struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac); */ void ieee80211_txq_schedule_start(struct ieee80211_hw *hw, u8 ac);
-/* (deprecated) */ -static inline void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac) -{ -} +/** ... */ +void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac);
void __ieee80211_schedule_txq(struct ieee80211_hw *hw, struct ieee80211_txq *txq, bool force); diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 1fae44fb1be6..606ca8620d20 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4250,11 +4250,17 @@ void ieee80211_txq_schedule_start(struct ieee80211_hw *hw, u8 ac) } else { local->schedule_round[ac] = 0; } - - spin_unlock_bh(&local->active_txq_lock[ac]); } EXPORT_SYMBOL(ieee80211_txq_schedule_start);
+void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac) +{ + struct ieee80211_local *local = hw_to_local(hw); + + spin_unlock_bh(&local->active_txq_lock[ac]); +} +EXPORT_SYMBOL(ieee80211_txq_schedule_end); + void __ieee80211_subif_start_xmit(struct sk_buff *skb, struct net_device *dev, u32 info_flags,
assuming that TXQ drivers actually still call ieee80211_txq_schedule_end() which says it's deprecated.
That even has _bh() so the tasklet can't be running anyway ...
So if the concurrency really is only TX vs. tasklet, then you could even just keep the BHs disabled (in _start spin_unlock only and then in _end local_bh_disable)?
Which may also be the solution for the regression in 6.2: Do it now for ieee80211_handle_wake_tx_queue() and apply this patch to the development tree only.
I'd argue the other way around - do it for all to fix these issues, and then audit drivers such as iwlwifi or even make concurrency here opt-in.
Felix did see some benefits of the concurrency I think though, so he might have a different opinion.
johannes