On Mon, 2 Mar 2020 at 17:50, Ulf Hansson ulf.hansson@linaro.org wrote:
On Mon, 2 Mar 2020 at 14:11, Faiz Abbas faiz_abbas@ti.com wrote:
Uffe,
On 26/02/20 8:51 pm, Ulf Hansson wrote:
- Anders, Kishon
On Tue, 25 Feb 2020 at 17:24, Jon Hunter jonathanh@nvidia.com wrote:
On 25/02/2020 14:26, Ulf Hansson wrote:
...
However, from the core point of view, the response is still requested, only that we don't want the driver to wait for the card to stop signaling busy. Instead we want to deal with that via "polling" from the core.
This is a rather worrying behaviour, as it seems like the host driver doesn't really follow this expectations from the core point of view. And mmc_flush_cache() is not the only case, as we have erase, bkops, sanitize, etc. Are all these working or not really well tested?
I don't believe that they are well tested. We have a simple test to mount an eMMC partition, create a file, check the contents, remove the file and unmount. The timeouts always occur during unmounting.
Earlier, before my three patches, if the provided timeout_ms parameter to __mmc_switch() was zero, which was the case for mmc_mmc_flush_cache() - this lead to that __mmc_switch() simply ignored validating host->max_busy_timeout, which was wrong. In any case, this also meant that an R1B response was always used for mmc_flush_cache(), as you also indicated above. Perhaps this is the critical part where things can go wrong.
BTW, have you tried erase commands for sdhci tegra driver? If those are working fine, do you have any special treatments for these?
That I am not sure, but I will check.
Great, thanks. Looking forward to your report.
So, from my side, me and Anders Roxell, have been collaborating on testing the behaviour on a TI Beagleboard x15 (remotely with limited debug options), which is using the sdhci-omap variant. I am trying to get hold of an Nvidia jetson-TX2, but not found one yet. These are the conclusions from the observed behaviour on the Beagleboard for the CMD6 cache flush command.
First, the reported host->max_busy_timeout is 2581 (ms) for the sdhci-omap driver in this configuration.
- As we all know by now, the cache flush command (CMD6) fails with
-110 currently. This is when MMC_CACHE_FLUSH_TIMEOUT_MS is set to 30 * 1000 (30s), which means __mmc_switch() drops the MMC_RSP_BUSY flag from the command.
- Changing the MMC_CACHE_FLUSH_TIMEOUT_MS to 2000 (2s), means that
the MMC_RSP_BUSY flag becomes set by __mmc_switch, because of the timeout_ms parameter is less than max_busy_timeout (2000 < 2581). Then everything works fine.
- Updating the code to again use 30s as the
MMC_CACHE_FLUSH_TIMEOUT_MS, but instead forcing the MMC_RSP_BUSY to be set, even when the timeout_ms becomes greater than max_busy_timeout. This also works fine.
Clearly this indicates a problem that I think needs to be addressed in the sdhci driver. However, of course I can revert the three discussed patches to fix the problem, but that would only hide the issues and I am sure we would then get back to this issue, sooner or later.
To fix the problem in the sdhci driver, I would appreciate if someone from TI and Nvidia can step in to help, as I don't have the HW on my desk.
Comments or other ideas of how to move forward?
Sorry I missed this earlier.
I don't have an X15 with me here but I'm trying to set one up in our remote farm. In the meantime, I tried to reproduce this issue on two platforms (dra72-evm and am57xx-evm) and wasn't able to see the issue because those eMMC's don't even have a cache. I will keep you updated when I do get a board with a eMMC that has a cache.
Is there a way to reproduce this CMD6 issue with another operation?
Yes, most definitely.
Let me cook a debug patch for you that should trigger the problem for another CMD6 operation. I will post something later this evening or in the mornings (Swedish timezone).
A bit later than promised, I am clearly an optimist. In any case here's the patch I had in mind to trigger the problem for other CMD6 operations. Please give at shot and see what happens.
-------
From: Ulf Hansson ulf.hansson@linaro.org Date: Tue, 3 Mar 2020 22:11:05 +0100 Subject: [PATCH] mmc: core: DEBUG: Force a long timeout for all CMD6
This is to test sdhci-omap, for example, to see what happens when using a longer timeout. My guess is that it triggers __mmc_switch() to disable the MMC_RSP_BUSY flag for the command. If so, it likely to make the host driver to fail, in some way or the other.
Signed-off-by: Ulf Hansson ulf.hansson@linaro.org --- drivers/mmc/core/mmc_ops.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c index da425ee2d9bf..f0d2563961f6 100644 --- a/drivers/mmc/core/mmc_ops.c +++ b/drivers/mmc/core/mmc_ops.c @@ -532,6 +532,9 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
mmc_retune_hold(host);
+ /* Force a long timeout to likely make use_r1b_resp to become false. */ + timeout_ms = MMC_CACHE_FLUSH_TIMEOUT_MS; + if (!timeout_ms) { pr_warn("%s: unspecified timeout for CMD6 - use generic\n", mmc_hostname(host)); @@ -544,8 +547,11 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, * the host to avoid HW busy detection, by converting to a R1 response * instead of a R1B. */ - if (host->max_busy_timeout && (timeout_ms > host->max_busy_timeout)) + if (host->max_busy_timeout && (timeout_ms > host->max_busy_timeout)) { + pr_warn("%s:Disable MMC_RSP_BUSY. timeout_ms(%u) > max_busy_timeout(%u)\n", + mmc_hostname(host), timeout_ms, host->max_busy_timeout); use_r1b_resp = false; + }
cmd.opcode = MMC_SWITCH; cmd.arg = (MMC_SWITCH_MODE_WRITE_BYTE << 24) |