Re: LKFT: arm x15: mmc1: cache flush error -110

3 Mar 2020

On Mon, 2 Mar 2020 at 17:50, Ulf Hansson ulf.hansson@linaro.org wrote:
...
On Mon, 2 Mar 2020 at 14:11, Faiz Abbas faiz_abbas@ti.com wrote:
...
Uffe,
On 26/02/20 8:51 pm, Ulf Hansson wrote:
...

Anders, Kishon

On Tue, 25 Feb 2020 at 17:24, Jon Hunter jonathanh@nvidia.com wrote:
...
On 25/02/2020 14:26, Ulf Hansson wrote:
...
...
However, from the core point of view, the response is still requested,
only that we don't want the driver to wait for the card to stop
signaling busy. Instead we want to deal with that via "polling" from
the core.
This is a rather worrying behaviour, as it seems like the host driver
doesn't really follow this expectations from the core point of view.
And mmc_flush_cache() is not the only case, as we have erase, bkops,
sanitize, etc. Are all these working or not really well tested?
I don't believe that they are well tested. We have a simple test to
mount an eMMC partition, create a file, check the contents, remove the
file and unmount. The timeouts always occur during unmounting.
...
Earlier, before my three patches, if the provided timeout_ms parameter
to __mmc_switch() was zero, which was the case for
mmc_mmc_flush_cache() - this lead to that __mmc_switch() simply
ignored validating host->max_busy_timeout, which was wrong. In any
case, this also meant that an R1B response was always used for
mmc_flush_cache(), as you also indicated above. Perhaps this is the
critical part where things can go wrong.
BTW, have you tried erase commands for sdhci tegra driver? If those
are working fine, do you have any special treatments for these?
That I am not sure, but I will check.
Great, thanks. Looking forward to your report.
So, from my side, me and Anders Roxell, have been collaborating on
testing the behaviour on a TI Beagleboard x15 (remotely with limited
debug options), which is using the sdhci-omap variant. I am trying to
get hold of an Nvidia jetson-TX2, but not found one yet. These are the
conclusions from the observed behaviour on the Beagleboard for the
CMD6 cache flush command.
First, the reported host->max_busy_timeout is 2581 (ms) for the
sdhci-omap driver in this configuration.

As we all know by now, the cache flush command (CMD6) fails with

-110 currently. This is when MMC_CACHE_FLUSH_TIMEOUT_MS is set to 30 *
1000 (30s), which means __mmc_switch() drops the MMC_RSP_BUSY flag
from the command.

Changing the MMC_CACHE_FLUSH_TIMEOUT_MS to 2000 (2s), means that

the MMC_RSP_BUSY flag becomes set by __mmc_switch, because of the
timeout_ms parameter is less than max_busy_timeout (2000 <  2581).
Then everything works fine.

Updating the code to again use 30s as the

MMC_CACHE_FLUSH_TIMEOUT_MS, but instead forcing the MMC_RSP_BUSY to be
set, even when the timeout_ms becomes greater than max_busy_timeout.
This also works fine.
Clearly this indicates a problem that I think needs to be addressed in
the sdhci driver. However, of course I can revert the three discussed
patches to fix the problem, but that would only hide the issues and I
am sure we would then get back to this issue, sooner or later.
To fix the problem in the sdhci driver, I would appreciate if someone
from TI and Nvidia can step in to help, as I don't have the HW on my
desk.
Comments or other ideas of how to move forward?
Sorry I missed this earlier.
I don't have an X15 with me here but I'm trying to set one up in our
remote farm. In the meantime, I tried to reproduce this issue on two
platforms (dra72-evm and am57xx-evm) and wasn't able to see the issue
because those eMMC's don't even have a cache. I will keep you updated
when I do get a board with a eMMC that has a cache.
Is there a way to reproduce this CMD6 issue with another operation?
Yes, most definitely.
Let me cook a debug patch for you that should trigger the problem for
another CMD6 operation. I will post something later this evening or in
the mornings (Swedish timezone).
A bit later than promised, I am clearly an optimist. In any case
here's the patch I had in mind to trigger the problem for other CMD6
operations. Please give at shot and see what happens.
-------
From: Ulf Hansson ulf.hansson@linaro.org
Date: Tue, 3 Mar 2020 22:11:05 +0100
Subject: [PATCH] mmc: core: DEBUG: Force a long timeout for all CMD6
This is to test sdhci-omap, for example, to see what happens when using a
longer timeout. My guess is that it triggers __mmc_switch() to disable the
MMC_RSP_BUSY flag for the command. If so, it likely to make the host driver
to fail, in some way or the other.
Signed-off-by: Ulf Hansson ulf.hansson@linaro.org
---
 drivers/mmc/core/mmc_ops.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
index da425ee2d9bf..f0d2563961f6 100644
--- a/drivers/mmc/core/mmc_ops.c
+++ b/drivers/mmc/core/mmc_ops.c
@@ -532,6 +532,9 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8
index, u8 value,
mmc_retune_hold(host);
+       /* Force a long timeout to likely make use_r1b_resp to become false. */
+       timeout_ms = MMC_CACHE_FLUSH_TIMEOUT_MS;
+
        if (!timeout_ms) {
                pr_warn("%s: unspecified timeout for CMD6 - use generic\n",
                        mmc_hostname(host));
@@ -544,8 +547,11 @@ int __mmc_switch(struct mmc_card *card, u8 set,
u8 index, u8 value,
         * the host to avoid HW busy detection, by converting to a R1 response
         * instead of a R1B.
         */
-       if (host->max_busy_timeout && (timeout_ms > host->max_busy_timeout))
+       if (host->max_busy_timeout && (timeout_ms > host->max_busy_timeout)) {
+               pr_warn("%s:Disable MMC_RSP_BUSY. timeout_ms(%u) >
max_busy_timeout(%u)\n",
+                       mmc_hostname(host), timeout_ms, host->max_busy_timeout);
                use_r1b_resp = false;
+       }
cmd.opcode = MMC_SWITCH;
        cmd.arg = (MMC_SWITCH_MODE_WRITE_BYTE << 24) |
-- 

Kind regards
Uffe

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: LKFT: arm x15: mmc1: cache flush error -110