This patchset serves to prevent an AB/BA deadlock:
thread 0: * (lock A) acquire substream lock by snd_pcm_stream_lock_irq() in snd_pcm_status64() * (lock B) wait for tasklet to finish by calling tasklet_unlock_spin_wait() in tasklet_disable_in_atomic() in ohci_flush_iso_completions() of ohci.c
thread 1: * (lock B) enter tasklet * (lock A) attempt to acquire substream lock, waiting for it to be released: snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed() in update_pcm_pointers() in process_ctx_payloads() in process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait </NMI> <TASK> ohci_flush_iso_completions firewire_ohci amdtp_domain_stream_pcm_pointer snd_firewire_lib snd_pcm_update_hw_ptr0 snd_pcm snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath </NMI> <IRQ> _raw_spin_lock_irqsave snd_pcm_period_elapsed snd_pcm process_rx_packets snd_firewire_lib irq_target_callback snd_firewire_lib handle_it_packet firewire_ohci context_tasklet firewire_ohci
The issue has been reported as a regression of kernel 5.14: Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76... ("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with Fireface 800")
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context") removed the process context workqueue from amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove its overhead. Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period update") belongs to the same patch series and removed the now-unused workqueue entirely.
Though being observed on RME Fireface 800, this issue would affect all Firewire audio interfaces using ohci amdtp + pcm streaming.
ALSA streaming, especially under intensive CPU load will reveal this issue the soonest due to issuing more hardIRQs, with time to occurrence ranging from 2 secons to 30 minutes after starting playback.
to reproduce the issue: direct ALSA playback to the device: mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac Time to occurrence: 2s to 30m Likelihood increased by: - high CPU load stress --cpu $(nproc) - switching between applications via workspaces tested with i915 in Xfce PulsaAudio / PipeWire conceal the issue as they run PCM substream without period wakeup mode, issuing less hardIRQs.
Cc: stable@vger.kernel.org Backport note: Also applies to and fixes on (tested): 6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
Edmund Raile (3): Revert "ALSA: firewire-lib: obsolete workqueue for period update" Revert "ALSA: firewire-lib: operate for period elapse event in process context" ALSA: firewire-lib: amdtp-stream work queue inline description
sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++------------- sound/firewire/amdtp-stream.h | 1 + 2 files changed, 25 insertions(+), 14 deletions(-)
prepare resolution of AB/BA deadlock competition for substream lock: restore workqueue previously used for process context:
revert commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period update")
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl... Signed-off-by: Edmund Raile edmund.raile@protonmail.com --- sound/firewire/amdtp-stream.c | 15 +++++++++++++++ sound/firewire/amdtp-stream.h | 1 + 2 files changed, 16 insertions(+)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c index d35d0a420ee0..31201d506a21 100644 --- a/sound/firewire/amdtp-stream.c +++ b/sound/firewire/amdtp-stream.c @@ -77,6 +77,8 @@ // overrun. Actual device can skip more, then this module stops the packet streaming. #define IR_JUMBO_PAYLOAD_MAX_SKIP_CYCLES 5
+static void pcm_period_work(struct work_struct *work); + /** * amdtp_stream_init - initialize an AMDTP stream structure * @s: the AMDTP stream to initialize @@ -105,6 +107,7 @@ int amdtp_stream_init(struct amdtp_stream *s, struct fw_unit *unit, s->flags = flags; s->context = ERR_PTR(-1); mutex_init(&s->mutex); + INIT_WORK(&s->period_work, pcm_period_work); s->packet_index = 0;
init_waitqueue_head(&s->ready_wait); @@ -347,6 +350,7 @@ EXPORT_SYMBOL(amdtp_stream_get_max_payload); */ void amdtp_stream_pcm_prepare(struct amdtp_stream *s) { + cancel_work_sync(&s->period_work); s->pcm_buffer_pointer = 0; s->pcm_period_pointer = 0; } @@ -624,6 +628,16 @@ static void update_pcm_pointers(struct amdtp_stream *s, } }
+static void pcm_period_work(struct work_struct *work) +{ + struct amdtp_stream *s = container_of(work, struct amdtp_stream, + period_work); + struct snd_pcm_substream *pcm = READ_ONCE(s->pcm); + + if (pcm) + snd_pcm_period_elapsed(pcm); +} + static int queue_packet(struct amdtp_stream *s, struct fw_iso_packet *params, bool sched_irq) { @@ -1910,6 +1924,7 @@ static void amdtp_stream_stop(struct amdtp_stream *s) return; }
+ cancel_work_sync(&s->period_work); fw_iso_context_stop(s->context); fw_iso_context_destroy(s->context); s->context = ERR_PTR(-1); diff --git a/sound/firewire/amdtp-stream.h b/sound/firewire/amdtp-stream.h index a1ed2e80f91a..775db3fc4959 100644 --- a/sound/firewire/amdtp-stream.h +++ b/sound/firewire/amdtp-stream.h @@ -191,5 +191,6 @@ struct amdtp_stream {
/* For a PCM substream processing. */ struct snd_pcm_substream *pcm; + struct work_struct period_work; snd_pcm_uframes_t pcm_buffer_pointer; unsigned int pcm_period_pointer;
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context") removed the process context workqueue from amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove its overhead.
With RME Fireface 800, this lead to a regression since Kernels 5.14.0, causing an AB/BA deadlock competition for the substream lock with eventual system freeze under ALSA operation:
thread 0: * (lock A) acquire substream lock by snd_pcm_stream_lock_irq() in snd_pcm_status64() * (lock B) wait for tasklet to finish by calling tasklet_unlock_spin_wait() in tasklet_disable_in_atomic() in ohci_flush_iso_completions() of ohci.c
thread 1: * (lock B) enter tasklet * (lock A) attempt to acquire substream lock, waiting for it to be released: snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed() in update_pcm_pointers() in process_ctx_payloads() in process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait </NMI> <TASK> ohci_flush_iso_completions firewire_ohci amdtp_domain_stream_pcm_pointer snd_firewire_lib snd_pcm_update_hw_ptr0 snd_pcm snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath </NMI> <IRQ> _raw_spin_lock_irqsave snd_pcm_period_elapsed snd_pcm process_rx_packets snd_firewire_lib irq_target_callback snd_firewire_lib handle_it_packet firewire_ohci context_tasklet firewire_ohci
Restore the process context work queue to prevent deadlock AB/BA deadlock competition for ALSA substream lock of snd_pcm_stream_lock_irq() in snd_pcm_status64() and snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed().
commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context")
Cc: stable@vger.kernel.org Fixes: 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context") Link: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl... Reported-by: edmund.raile edmund.raile@proton.me Closes: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl... Signed-off-by: Edmund Raile edmund.raile@protonmail.com --- sound/firewire/amdtp-stream.c | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c index 31201d506a21..a07b0452267d 100644 --- a/sound/firewire/amdtp-stream.c +++ b/sound/firewire/amdtp-stream.c @@ -615,16 +615,8 @@ static void update_pcm_pointers(struct amdtp_stream *s, // The program in user process should periodically check the status of intermediate // buffer associated to PCM substream to process PCM frames in the buffer, instead // of receiving notification of period elapsed by poll wait. - if (!pcm->runtime->no_period_wakeup) { - if (in_softirq()) { - // In software IRQ context for 1394 OHCI. - snd_pcm_period_elapsed(pcm); - } else { - // In process context of ALSA PCM application under acquired lock of - // PCM substream. - snd_pcm_period_elapsed_under_stream_lock(pcm); - } - } + if (!pcm->runtime->no_period_wakeup) + queue_work(system_highpri_wq, &s->period_work); } }
@@ -1864,11 +1856,22 @@ unsigned long amdtp_domain_stream_pcm_pointer(struct amdtp_domain *d, { struct amdtp_stream *irq_target = d->irq_target;
- // Process isochronous packets queued till recent isochronous cycle to handle PCM frames. if (irq_target && amdtp_stream_running(irq_target)) { - // In software IRQ context, the call causes dead-lock to disable the tasklet - // synchronously. - if (!in_softirq()) + // This function is called in software IRQ context of + // period_work or process context. + // + // When the software IRQ context was scheduled by software IRQ + // context of IT contexts, queued packets were already handled. + // Therefore, no need to flush the queue in buffer furthermore. + // + // When the process context reach here, some packets will be + // already queued in the buffer. These packets should be handled + // immediately to keep better granularity of PCM pointer. + // + // Later, the process context will sometimes schedules software + // IRQ context of the period_work. Then, no need to flush the + // queue by the same reason as described in the above + if (current_work() != &s->period_work) fw_iso_context_flush_completions(irq_target->context); }
Replace prior inline description to prevent future deadlock.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simnl... Signed-off-by: Edmund Raile edmund.raile@protonmail.com --- sound/firewire/amdtp-stream.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c index a07b0452267d..7438999e0510 100644 --- a/sound/firewire/amdtp-stream.c +++ b/sound/firewire/amdtp-stream.c @@ -1857,20 +1857,12 @@ unsigned long amdtp_domain_stream_pcm_pointer(struct amdtp_domain *d, struct amdtp_stream *irq_target = d->irq_target;
if (irq_target && amdtp_stream_running(irq_target)) { - // This function is called in software IRQ context of - // period_work or process context. - // - // When the software IRQ context was scheduled by software IRQ - // context of IT contexts, queued packets were already handled. - // Therefore, no need to flush the queue in buffer furthermore. - // - // When the process context reach here, some packets will be - // already queued in the buffer. These packets should be handled - // immediately to keep better granularity of PCM pointer. - // - // Later, the process context will sometimes schedules software - // IRQ context of the period_work. Then, no need to flush the - // queue by the same reason as described in the above + // use wq to prevent AB/BA deadlock competition for + // substream lock: + // fw_iso_context_flush_completions() acquires + // lock by ohci_flush_iso_completions(), + // amdtp-stream process_rx_packets() attempts to + // acquire same lock by snd_pcm_elapsed() if (current_work() != &s->period_work) fw_iso_context_flush_completions(irq_target->context); }
linux-stable-mirror@lists.linaro.org