Hi,
On Mon, Jun 24, 2024 at 2:23 PM Doug Anderson dianders@chromium.org wrote:
Hi,
On Mon, Jun 24, 2024 at 6:31 AM Johan Hovold johan+linaro@kernel.org wrote:
The stop_tx() callback is used to implement software flow control and must not discard data as the Qualcomm GENI driver is currently doing when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO, which prevents the watermark interrupt from being enabled when TX is later restarted. This results in a soft lockup and is easily triggered by stopping TX using software flow control in a serial console but this can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing the hardware fifo, when shutting down the port. Make sure to temporarily raise the watermark level so that the interrupt fires when TX is restarted.
Nice! I did quite a few experiments, but it sounds like you found something that I wasn't able to find. Specifically once I cancelled an ongoing command I could never manage to get it started back up, but it must have just been that data was still in the FIFO and thus the watermark never fired again.
When I was experimenting, I also swore that there were cases where geni would sometimes fully drop bytes when I tried to "cancel" a command, but maybe I was mistaken. Everything I figured out was essentially by running experiments and I could easily have had a bug in my experiment.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP") Cc: stable@vger.kernel.org # 4.17 Signed-off-by: Johan Hovold johan+linaro@kernel.org
drivers/tty/serial/qcom_geni_serial.c | 28 +++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c index 1d5d6045879a..72addeb9f461 100644 --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -651,13 +651,8 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport) { u32 irq_en;
if (qcom_geni_serial_main_active(uport) ||
!qcom_geni_serial_tx_empty(uport))
return;
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN); irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG); writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
} @@ -665,16 +660,28 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport) static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport) { u32 irq_en;
struct qcom_geni_serial_port *port = to_dev_port(uport); irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN); irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN); writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG); writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
/* Possible stop tx is called multiple times. */
If qcom_geni_serial_stop_tx_fifo() is supposed to be used for UART flow control and you have a way to stop the transfer immediately without losing data (by using geni_se_cancel_m_cmd), maybe we should do that? If the other side wants us to stop transferring data and we can stop it right away that would be ideal...
+}
+static void qcom_geni_serial_clear_tx_fifo(struct uart_port *uport) +{
struct qcom_geni_serial_port *port = to_dev_port(uport);
if (!qcom_geni_serial_main_active(uport)) return;
/*
* Increase watermark level so that TX can be restarted and wait for
* sequencer to start to prevent lockups.
*/
writel(port->tx_fifo_depth, uport->membase + SE_GENI_TX_WATERMARK_REG);
qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
M_TX_FIFO_WATERMARK_EN, true);
Oh, maybe this "wait for sequencer to start to prevent lockups." is the part that I was missing? Can you explain more about what's going on here? Why does waiting for the watermark interrupt to fire prevent lockups? I would have imagined that the watermark interrupt would be part of the geni hardware and have nothing to do with the firmware running on the other end, so I'm not sure why it firing somehow would prevent a lockup. Was this just by trial and error?
Actually, the more I look at it the more confused I am about your qcom_geni_serial_clear_tx_fifo(). Can you explain and maybe add some inline comments in the function since it's not obvious? Specifically, things I'm confused about with your patch:
1. The function is named qcom_geni_serial_clear_tx_fifo() which implies that when it finishes that the hardware FIFO will have nothing in it. ...but how does your code ensure this?
2. If the function is really clearing the FIFOs then why do we need to adjust the watermark level? The fact that you need to adjust the watermark levels implies (to me) that there are things stuck in the FIFO still. ...but then what happens to those characters? When are they sent?
3. On my hardware you're setting the FIFO level to 16 here. The docs I have say that if the FIFO level is "less than" the value you set here then the interrupt will go off and further clarifies that if you set the register to 1 here then you'll get interrupted when the FIFO is empty. So what happens with your solution if the FIFO is completely full? In that case you'd have to set this to 17, right? ...but then I could believe that might confuse the interrupt handler which would get told to start transmitting when there is no room for anything.
Maybe something is missing in my mental model here and testing your patch and hitting Ctrl-C seems to work, but I don't really understand why so hopefully you can clarify! :-)
-Doug