On Tue, 8 May 2018 23:12:59 +0200 Boris Brezillon boris.brezillon@bootlin.com wrote:
On Fri, 4 May 2018 11:58:35 +0200 Miquel Raynal miquel.raynal@bootlin.com wrote:
Hi Boris,
On Thu, 3 May 2018 09:49:08 +0200, Boris Brezillon boris.brezillon@bootlin.com wrote:
It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure, which leads all READ operations following the failing one to report an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit.
Note that this behavior is not document in the datasheet, but resetting the chip is the only solution we found to fix the problem.
Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC") Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Cc: Thomas Petazzoni thomas.petazzoni@bootlin.com Cc: Bean Huo beanhuo@micron.com Cc: Peter Pan peterpandong@micron.com
Reviewed-by: Miquel Raynal miquel.raynal@bootlin.com
Queued to mtd/master.
I'm dropping this patch because I'm no longer sure this is the correct way to fix bug. It seems that nand_set_features_op() is checking the FAIL bit while the ONFI spec clearly says that FAIL bit is only valid after a PROGRAM, ERASE or READ-with-on-die-ECC-enabled op. That might explain why ->set_features() fails with -EIO after an ECC failure (apparently Micron only clears the FAIL bit when launching a PROGRAM, ERASE or READ-with-on-die-ECC-enabled op, not on a SET_FEATURES op).