c1839501fe3e ("ALSA: firewire-lib: fix wrong value as length of header for
CIP_NO_HEADER case") indeed fixes the issue.
Thank you very much!
I just hope non-CIP_NO_HEADER cases work, but then again, looking at the
original struct
```
struct {
struct fw_iso_packet params;
__be32 header[CIP_HEADER_QUADLETS];
} template = { {0}, {0} };
```
the fw_iso_packet wasn't at the end of the array, thus
```
struct fw_iso_packet {
u16 payload_length; /* Length of indirect payload */
u32 interrupt:1; /* Generate interrupt on this packet */
u32 skip:1; /* tx: Set to not send packet at all */
/* rx: Sync bit, wait for matching sy */
u32 tag:2; /* tx: Tag in packet header */
u32 sy:4; /* tx: Sy in packet header */
u32 header_length:8; /* Size of immediate header */
u32 header[]; /* tx: Top of 1394 isoch. data_block */
};
```
its header array wouldn't ever end up being initialized here with > 0 elements.
Maybe that wasn't ever required.
Tested-by: Edmund Raile <edmund.raile(a)protonmail.com>
Link: https://lore.kernel.org/r/20240725155640.128442-1-o-takashi@sakamocchi.jp
This patchset serves to prevent a deadlock between
process context and softIRQ context:
A. In the process context
* (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) Then attempt to enter tasklet
B. In the softIRQ context
* (lock B) Enter tasklet
* (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed()
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
The issue has been reported as a regression of kernel 5.14:
Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg7…
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with
Fireface 800")
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
update") belongs to the same patch series and removed
the now-unused workqueue entirely.
Though being observed on RME Fireface 800, this issue would affect all
Firewire audio interfaces using ohci amdtp + pcm streaming.
ALSA streaming, especially under intensive CPU load will reveal this issue
the soonest due to issuing more hardIRQs, with time to occurrence ranging
from 2 secons to 30 minutes after starting playback.
to reproduce the issue:
direct ALSA playback to the device:
mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
- high CPU load
stress --cpu $(nproc)
- switching between applications via workspaces
tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.
Backport note:
Also applies to and fixes on (tested):
6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
Edmund Raile (2):
ALSA: firewire-lib: restore workqueue for process context
ALSA: firewire-lib: prevent deadlock between process and softIRQ
context
sound/firewire/amdtp-stream.c | 36 ++++++++++++++++++++++-------------
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 24 insertions(+), 13 deletions(-)
--
2.45.2
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
With RME Fireface 800, this lead to a regression since Kernels 5.14.0,
causing a deadlock with eventual system freeze under ALSA operation:
A. In the process context
* (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) Then attempt to enter tasklet
B. In the softIRQ context
* (lock B) Enter tasklet
* (lock A) Attempt to acquire spin_lock by
snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed()
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
Restore the process context work queue to prevent deadlock
between ALSA substream process context spin_lock of
snd_pcm_stream_lock_irq() in snd_pcm_status64()
and OHCI 1394 IT softIRQ context spin_lock of
snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed().
Cc: stable(a)vger.kernel.org
Fixes: 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context")
Link: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simn…
Reported-by: edmund.raile <edmund.raile(a)proton.me>
Signed-off-by: Edmund Raile <edmund.raile(a)protonmail.com>
---
sound/firewire/amdtp-stream.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index 31201d506a21..c037d7de1fdc 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -615,16 +615,9 @@ static void update_pcm_pointers(struct amdtp_stream *s,
// The program in user process should periodically check the status of intermediate
// buffer associated to PCM substream to process PCM frames in the buffer, instead
// of receiving notification of period elapsed by poll wait.
- if (!pcm->runtime->no_period_wakeup) {
- if (in_softirq()) {
- // In software IRQ context for 1394 OHCI.
- snd_pcm_period_elapsed(pcm);
- } else {
- // In process context of ALSA PCM application under acquired lock of
- // PCM substream.
- snd_pcm_period_elapsed_under_stream_lock(pcm);
- }
- }
+ if (!pcm->runtime->no_period_wakeup)
+ // wq used with amdtp_domain_stream_pcm_pointer() to prevent deadlock
+ queue_work(system_highpri_wq, &s->period_work);
}
}
@@ -1866,9 +1859,11 @@ unsigned long amdtp_domain_stream_pcm_pointer(struct amdtp_domain *d,
// Process isochronous packets queued till recent isochronous cycle to handle PCM frames.
if (irq_target && amdtp_stream_running(irq_target)) {
- // In software IRQ context, the call causes dead-lock to disable the tasklet
- // synchronously.
- if (!in_softirq())
+ // use wq to prevent deadlock between process context spin_lock
+ // of snd_pcm_stream_lock_irq() in snd_pcm_status64() and
+ // softIRQ context spin_lock of snd_pcm_stream_lock_irqsave()
+ // in snd_pcm_period_elapsed()
+ if ((!in_softirq()) && (current_work() != &s->period_work))
fw_iso_context_flush_completions(irq_target->context);
}
--
2.45.2
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
With RME Fireface 800, this lead to a regression since Kernels 5.14.0,
causing a deadlock with eventual system freeze under ALSA operation:
A. In the process context
* (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) Then attempt to enter tasklet
B. In the softIRQ context
* (lock B) Enter tasklet
* (lock A) Attempt to acquire spin_lock by
snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed()
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
Restore the process context work queue to prevent deadlock
between ALSA substream process context spin_lock of
snd_pcm_stream_lock_irq() in snd_pcm_status64()
and OHCI 1394 IT softIRQ context spin_lock of
snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed().
Cc: stable(a)vger.kernel.org
Fixes: 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context")
Link: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simn…
Reported-by: edmund.raile <edmund.raile(a)proton.me>
Signed-off-by: Edmund Raile <edmund.raile(a)protonmail.com>
---
sound/firewire/amdtp-stream.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index 31201d506a21..c037d7de1fdc 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -615,16 +615,9 @@ static void update_pcm_pointers(struct amdtp_stream *s,
// The program in user process should periodically check the status of intermediate
// buffer associated to PCM substream to process PCM frames in the buffer, instead
// of receiving notification of period elapsed by poll wait.
- if (!pcm->runtime->no_period_wakeup) {
- if (in_softirq()) {
- // In software IRQ context for 1394 OHCI.
- snd_pcm_period_elapsed(pcm);
- } else {
- // In process context of ALSA PCM application under acquired lock of
- // PCM substream.
- snd_pcm_period_elapsed_under_stream_lock(pcm);
- }
- }
+ if (!pcm->runtime->no_period_wakeup)
+ // wq used with amdtp_domain_stream_pcm_pointer() to prevent deadlock
+ queue_work(system_highpri_wq, &s->period_work);
}
}
@@ -1866,9 +1859,11 @@ unsigned long amdtp_domain_stream_pcm_pointer(struct amdtp_domain *d,
// Process isochronous packets queued till recent isochronous cycle to handle PCM frames.
if (irq_target && amdtp_stream_running(irq_target)) {
- // In software IRQ context, the call causes dead-lock to disable the tasklet
- // synchronously.
- if (!in_softirq())
+ // use wq to prevent deadlock between process context spin_lock
+ // of snd_pcm_stream_lock_irq() in snd_pcm_status64() and
+ // softIRQ context spin_lock of snd_pcm_stream_lock_irqsave()
+ // in snd_pcm_period_elapsed()
+ if ((!in_softirq()) && (current_work() != &s->period_work))
fw_iso_context_flush_completions(irq_target->context);
}
--
2.45.2
From: Heiner Kallweit <hkallweit1(a)gmail.com>
[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
This early RTL8168b version was the first PCIe chip version, and it's
quite quirky. Last sign of life is from more than 15 yrs ago.
Let's remove detection of this chip version, we'll see whether anybody
complains. If not, support for this chip version can be removed a few
kernel versions later.
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/ethernet/realtek/r8169_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 319f8d7a502da..32ea1d902a173 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2185,7 +2185,9 @@ static void rtl8169_get_mac_version(struct rtl8169_private *tp)
/* 8168B family. */
{ 0x7cf, 0x380, RTL_GIGA_MAC_VER_12 },
{ 0x7c8, 0x380, RTL_GIGA_MAC_VER_17 },
- { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
+ /* This one is very old and rare, let's see if anybody complains.
+ * { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
+ */
/* 8101 family. */
{ 0x7c8, 0x448, RTL_GIGA_MAC_VER_39 },
--
2.43.0
From: Heiner Kallweit <hkallweit1(a)gmail.com>
[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
This early RTL8168b version was the first PCIe chip version, and it's
quite quirky. Last sign of life is from more than 15 yrs ago.
Let's remove detection of this chip version, we'll see whether anybody
complains. If not, support for this chip version can be removed a few
kernel versions later.
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/ethernet/realtek/r8169_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index d24eb5ee152a5..7d3443ad8e797 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2018,7 +2018,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
/* 8168B family. */
{ 0x7cf, 0x380, RTL_GIGA_MAC_VER_12 },
{ 0x7c8, 0x380, RTL_GIGA_MAC_VER_17 },
- { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
+ /* This one is very old and rare, let's see if anybody complains.
+ * { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
+ */
/* 8101 family. */
{ 0x7c8, 0x448, RTL_GIGA_MAC_VER_39 },
--
2.43.0
From: Heiner Kallweit <hkallweit1(a)gmail.com>
[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
This early RTL8168b version was the first PCIe chip version, and it's
quite quirky. Last sign of life is from more than 15 yrs ago.
Let's remove detection of this chip version, we'll see whether anybody
complains. If not, support for this chip version can be removed a few
kernel versions later.
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/ethernet/realtek/r8169_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 76d820c4e6eef..c0ad5cd963d08 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2063,7 +2063,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
/* 8168B family. */
{ 0x7cf, 0x380, RTL_GIGA_MAC_VER_12 },
{ 0x7c8, 0x380, RTL_GIGA_MAC_VER_17 },
- { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
+ /* This one is very old and rare, let's see if anybody complains.
+ * { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
+ */
/* 8101 family. */
{ 0x7c8, 0x448, RTL_GIGA_MAC_VER_39 },
--
2.43.0