+ Faiz Abbas
On Tue, 25 Feb 2020 at 12:41, Jon Hunter jonathanh@nvidia.com wrote:
On 25/02/2020 10:04, Jon Hunter wrote:
...
I find that from the commit the changes in mmc_flush_cache below is the cause.
## @@ -961,7 +963,8 @@ int mmc_flush_cache(struct mmc_card *card) (card->ext_csd.cache_size > 0) && (card->ext_csd.cache_ctrl & 1)) { err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
EXT_CSD_FLUSH_CACHE, 1, 0);
EXT_CSD_FLUSH_CACHE, 1,
MMC_CACHE_FLUSH_TIMEOUT_MS);
I no longer see the issue on reverting the above hunk as Bitan suggested but now I see the following (which is expected) ...
WARNING KERN mmc1: unspecified timeout for CMD6 - use generic
For Tegra, the default timeout used when no timeout is specified for CMD6 is 100mS. So hard-coding the following also appears to workaround the problem on Tegra ...
Interesting.
diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c index 868653bc1555..5155e0240fca 100644 --- a/drivers/mmc/core/mmc_ops.c +++ b/drivers/mmc/core/mmc_ops.c @@ -992,7 +992,7 @@ int mmc_flush_cache(struct mmc_card *card) (card->ext_csd.cache_size > 0) && (card->ext_csd.cache_ctrl & 1)) { err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
EXT_CSD_FLUSH_CACHE, 1, 0);
EXT_CSD_FLUSH_CACHE, 1, 100); if (err) pr_err("%s: cache flush error %d\n", mmc_hostname(card->host), err);
So the problem appears to be causing by the timeout being too long rather than not long enough.
Looking more at the code, I think now that we are hitting the condition ...
diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c index 868653bc1555..feae82b1ff35 100644 --- a/drivers/mmc/core/mmc_ops.c +++ b/drivers/mmc/core/mmc_ops.c @@ -579,8 +579,10 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, * the host to avoid HW busy detection, by converting to a R1 response * instead of a R1B. */
if (host->max_busy_timeout && (timeout_ms > host->max_busy_timeout))
if (host->max_busy_timeout && (timeout_ms > host->max_busy_timeout)) {
pr_warn("%s: timeout (%d) > max busy timeout (%d)", mmc_hostname(host), timeout_ms, host->max_busy_timeout); use_r1b_resp = false;
}
With the above I see ...
WARNING KERN mmc1: timeout (1600) > max busy timeout (672)
So with the longer timeout we are not using/requesting the response.
You are most likely correct.
However, from the core point of view, the response is still requested, only that we don't want the driver to wait for the card to stop signaling busy. Instead we want to deal with that via "polling" from the core.
This is a rather worrying behaviour, as it seems like the host driver doesn't really follow this expectations from the core point of view. And mmc_flush_cache() is not the only case, as we have erase, bkops, sanitize, etc. Are all these working or not really well tested?
Earlier, before my three patches, if the provided timeout_ms parameter to __mmc_switch() was zero, which was the case for mmc_mmc_flush_cache() - this lead to that __mmc_switch() simply ignored validating host->max_busy_timeout, which was wrong. In any case, this also meant that an R1B response was always used for mmc_flush_cache(), as you also indicated above. Perhaps this is the critical part where things can go wrong.
BTW, have you tried erase commands for sdhci tegra driver? If those are working fine, do you have any special treatments for these?
I have looped in Faiz, as sdhci-omap seems to suffer from very similar problems. One thing I noted for sdhci-omap, is that MMC_ERASE commands is treated in a special manner in sdhci_omap_set_timeout(). This indicates that there is something fishy going on.
Faiz, can you please comment on this?
Kind regards Uffe