The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: stable@vger.kernel.org Fixes: 02f26ecf8c77 ("mtd: nand: add reworked Marvell NAND controller driver") Reported-by: Aviram Dali aviramd@marvell.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
---
Hello Aviram,
I have not tested this, but based on your report I believe the status check is indeed missing here and could sometimes lead to unnoticed partial writes.
Please test on your side and reply with your Tested-by if you validate the change.
Any backport on kernels predating v4.17 will likely fail because of a folder rename, so you will have to do the backport manually if needed.
Thanks, Miquèl --- drivers/mtd/nand/raw/marvell_nand.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c index 30c15e4e1cc0..576441095012 100644 --- a/drivers/mtd/nand/raw/marvell_nand.c +++ b/drivers/mtd/nand/raw/marvell_nand.c @@ -1162,6 +1162,7 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip, .ndcb[2] = NDCB2_ADDR5_PAGE(page), }; unsigned int oob_bytes = lt->spare_bytes + (raw ? lt->ecc_bytes : 0); + u8 status; int ret;
/* NFCv2 needs more information about the operation being executed */ @@ -1195,7 +1196,18 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip,
ret = marvell_nfc_wait_op(chip, PSEC_TO_MSEC(sdr->tPROG_max)); - return ret; + if (ret) + return ret; + + /* Check write status on the chip side */ + ret = nand_status_op(chip, &status); + if (ret) + return ret; + + if (status & NAND_STATUS_FAIL) + return -EIO; + + return 0; }
static int marvell_nfc_hw_ecc_hmg_write_page_raw(struct nand_chip *chip, @@ -1624,6 +1636,7 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip, int data_len = lt->data_bytes; int spare_len = lt->spare_bytes; int chunk, ret; + u8 status;
marvell_nfc_select_target(chip, chip->cur_cs);
@@ -1660,6 +1673,14 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip, if (ret) return ret;
+ /* Check write status on the chip side */ + ret = nand_status_op(chip, &status); + if (ret) + return ret; + + if (status & NAND_STATUS_FAIL) + return -EIO; + return 0; }
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
---
Hello Michal,
I have not tested this, but based on a report on another driver, I believe the status check is also missing here and could sometimes lead to unnoticed partial writes.
Please test on your side that everything still works and let me know how it goes.
Thanks a lot. Miquèl --- drivers/mtd/nand/raw/arasan-nand-controller.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/raw/arasan-nand-controller.c b/drivers/mtd/nand/raw/arasan-nand-controller.c index 906eef70cb6d..487c139316fe 100644 --- a/drivers/mtd/nand/raw/arasan-nand-controller.c +++ b/drivers/mtd/nand/raw/arasan-nand-controller.c @@ -515,6 +515,7 @@ static int anfc_write_page_hw_ecc(struct nand_chip *chip, const u8 *buf, struct mtd_info *mtd = nand_to_mtd(chip); unsigned int len = mtd->writesize + (oob_required ? mtd->oobsize : 0); dma_addr_t dma_addr; + u8 status; int ret; struct anfc_op nfc_op = { .pkt_reg = @@ -561,10 +562,21 @@ static int anfc_write_page_hw_ecc(struct nand_chip *chip, const u8 *buf, }
/* Spare data is not protected */ - if (oob_required) + if (oob_required) { ret = nand_write_oob_std(chip, page); + if (ret) + return ret; + }
- return ret; + /* Check write status on the chip side */ + ret = nand_status_op(chip, &status); + if (ret) + return ret; + + if (status & NAND_STATUS_FAIL) + return -EIO; + + return 0; }
static int anfc_sel_write_page_hw_ecc(struct nand_chip *chip, const u8 *buf,
Hi Michal,
miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Hello Michal,
I have not tested this, but based on a report on another driver, I believe the status check is also missing here and could sometimes lead to unnoticed partial writes.
Please test on your side that everything still works and let me know how it goes.
Any news from the testing team about patches 2/3 and 3/3?
Thanks, Miquèl
Hi Miquel,
On 9/11/23 17:52, Miquel Raynal wrote:
Hi Michal,
miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Hello Michal,
I have not tested this, but based on a report on another driver, I believe the status check is also missing here and could sometimes lead to unnoticed partial writes.
Please test on your side that everything still works and let me know how it goes.
Any news from the testing team about patches 2/3 and 3/3?
I asked Amit to test and he didn't get back to me even I asked for it couple of times. Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
Thanks, Michal
Hi Michal,
michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
Hi Miquel,
On 9/11/23 17:52, Miquel Raynal wrote:
Hi Michal,
miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Hello Michal,
I have not tested this, but based on a report on another driver, I believe the status check is also missing here and could sometimes lead to unnoticed partial writes.
Please test on your side that everything still works and let me know how it goes.
Any news from the testing team about patches 2/3 and 3/3?
I asked Amit to test and he didn't get back to me even I asked for it couple of times.
Ok.
Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
I believe setting up the board to use the hardware BCH engine and performing basic erase/write/read testing with a known file and check it still behaves correctly would work. You can also run
nandbiterrs -i /dev/mtdx
as a second step and verify there is no difference with and without the patch and finally check the impact:
flash_speed -d -c 10 /dev/mtdx (be careful: this is a destructive operation)
Thanks, Miquèl
Hi Miquel,
On 9/12/23 16:17, Miquel Raynal wrote:
Hi Michal,
michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
Hi Miquel,
On 9/11/23 17:52, Miquel Raynal wrote:
Hi Michal,
miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Hello Michal,
I have not tested this, but based on a report on another driver, I believe the status check is also missing here and could sometimes lead to unnoticed partial writes.
Please test on your side that everything still works and let me know how it goes.
Any news from the testing team about patches 2/3 and 3/3?
I asked Amit to test and he didn't get back to me even I asked for it couple of times.
Ok.
Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
I believe setting up the board to use the hardware BCH engine and performing basic erase/write/read testing with a known file and check it still behaves correctly would work. You can also run
nandbiterrs -i /dev/mtdx
as a second step and verify there is no difference with and without the patch and finally check the impact:
flash_speed -d -c 10 /dev/mtdx (be careful: this is a destructive operation)
I run this myself.
pl353 test log before the patch.
# cat /proc/mtd dev: size erasesize name mtd0: 10000000 00020000 "pl35x-nand-controller" # nandbiterrs -i /dev/mtd0 incremental biterrors test Successfully corrected 0 bit errors per subpage Inserted biterror @ 0/5 Read reported 1 corrected bit errors Successfully corrected 1 bit errors per subpage Inserted biterror @ 0/2 Failed to recover 1 bitflips Read error after 2 bit errors per page # flash_speed -d -c 10 /dev/mtd0 scanning for bad eraseblocks scanned 10 eraseblocks, 0 are bad testing eraseblock write speed eraseblock write speed is 4555 KiB/s testing eraseblock read speed eraseblock read speed is 5765 KiB/s testing page write speed page write speed is 4383 KiB/s testing page read speed page read speed is 5614 KiB/s testing 2 page write speed 2 page write speed is 4444 KiB/s testing 2 page read speed 2 page read speed is 5688 KiB/s Testing erase speed erase speed is 320000 KiB/s Testing 2x multi-block erase speed 2x multi-block erase speed is 320000 KiB/s Testing 4x multi-block erase speed 4x multi-block erase speed is 320000 KiB/s Testing 8x multi-block erase speed 8x multi-block erase speed is 320000 KiB/s Testing 16x multi-block erase speed 16x multi-block erase speed is 320000 KiB/s Testing 32x multi-block erase speed 32x multi-block erase speed is 320000 KiB/s Testing 64x multi-block erase speed 64x multi-block erase speed is 320000 KiB/s finished # dmesg | grep nand [ 2.876719] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda [ 2.883130] nand: Micron MT29F2G08ABAEAWP [ 2.887230] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 #
When applied
# cat /proc/mtd dev: size erasesize name mtd0: 10000000 00020000 "pl35x-nand-controller" # nandbiterrs -i /dev/mtd0 incremental biterrors test Successfully corrected 0 bit errors per subpage Inserted biterror @ 0/5 Read reported 1 corrected bit errors Successfully corrected 1 bit errors per subpage Inserted biterror @ 0/2 Failed to recover 1 bitflips Read error after 2 bit errors per page # flash_speed -d -c 10 /dev/mtd0 scanning for bad eraseblocks scanned 10 eraseblocks, 0 are bad testing eraseblock write speed eraseblock write speed is 4522 KiB/s testing eraseblock read speed eraseblock read speed is 5765 KiB/s testing page write speed page write speed is 4383 KiB/s testing page read speed page read speed is 5638 KiB/s testing 2 page write speed 2 page write speed is 4444 KiB/s testing 2 page read speed 2 page read speed is 5714 KiB/s Testing erase speed erase speed is 320000 KiB/s Testing 2x multi-block erase speed 2x multi-block erase speed is 320000 KiB/s Testing 4x multi-block erase speed 4x multi-block erase speed is 320000 KiB/s Testing 8x multi-block erase speed 8x multi-block erase speed is 320000 KiB/s Testing 16x multi-block erase speed 16x multi-block erase speed is 320000 KiB/s Testing 32x multi-block erase speed 32x multi-block erase speed is 320000 KiB/s Testing 64x multi-block erase speed 64x multi-block erase speed is 320000 KiB/s finished # dmesg | grep nand [ 2.896206] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda [ 2.902648] nand: Micron MT29F2G08ABAEAWP [ 2.906667] nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
Behavior is the same. Speed is changing on every run.
I don't have zynqmp board here but will try to get data asap.
Thanks, Michal
On 9/12/23 16:17, Miquel Raynal wrote:
Hi Michal,
michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
Hi Miquel,
On 9/11/23 17:52, Miquel Raynal wrote:
Hi Michal,
miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Hello Michal,
I have not tested this, but based on a report on another driver, I believe the status check is also missing here and could sometimes lead to unnoticed partial writes.
Please test on your side that everything still works and let me know how it goes.
Any news from the testing team about patches 2/3 and 3/3?
I asked Amit to test and he didn't get back to me even I asked for it couple of times.
Ok.
Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
I believe setting up the board to use the hardware BCH engine and performing basic erase/write/read testing with a known file and check it still behaves correctly would work. You can also run
nandbiterrs -i /dev/mtdx
as a second step and verify there is no difference with and without the patch and finally check the impact:
flash_speed -d -c 10 /dev/mtdx (be careful: this is a destructive operation)
Testing team won't see any issue that's why feel free to add my Acked-by: Michal Smek michal.simek@amd.com
Thanks, Michal
Hi Michal,
michal.simek@amd.com wrote on Thu, 21 Sep 2023 12:25:10 +0200:
On 9/12/23 16:17, Miquel Raynal wrote:
Hi Michal,
michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
Hi Miquel,
On 9/11/23 17:52, Miquel Raynal wrote:
Hi Michal,
miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200:
>> The NAND core complies with the ONFI specification, which itself
mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Hello Michal,
I have not tested this, but based on a report on another driver, I believe the status check is also missing here and could sometimes lead to unnoticed partial writes.
Please test on your side that everything still works and let me know how it goes.
Any news from the testing team about patches 2/3 and 3/3?
I asked Amit to test and he didn't get back to me even I asked for it couple of times.
Ok.
Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
I believe setting up the board to use the hardware BCH engine and performing basic erase/write/read testing with a known file and check it still behaves correctly would work. You can also run
nandbiterrs -i /dev/mtdx
as a second step and verify there is no difference with and without the patch and finally check the impact:
flash_speed -d -c 10 /dev/mtdx (be careful: this is a destructive operation)
Testing team won't see any issue that's why feel free to add my Acked-by: Michal Smek michal.simek@amd.com
I think you told me in the last e-mail you tested the pl353 patch, not the one for the Arasan controller. Shall I add your Acked-by here and your Tested-by in the other?
Thanks, Miquèl
On 9/22/23 11:14, Miquel Raynal wrote:
Hi Michal,
michal.simek@amd.com wrote on Thu, 21 Sep 2023 12:25:10 +0200:
On 9/12/23 16:17, Miquel Raynal wrote:
Hi Michal,
michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
Hi Miquel,
On 9/11/23 17:52, Miquel Raynal wrote:
Hi Michal,
miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200: >>>> The NAND core complies with the ONFI specification, which itself
mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Hello Michal,
I have not tested this, but based on a report on another driver, I believe the status check is also missing here and could sometimes lead to unnoticed partial writes.
Please test on your side that everything still works and let me know how it goes.
Any news from the testing team about patches 2/3 and 3/3?
I asked Amit to test and he didn't get back to me even I asked for it couple of times.
Ok.
Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
I believe setting up the board to use the hardware BCH engine and performing basic erase/write/read testing with a known file and check it still behaves correctly would work. You can also run
nandbiterrs -i /dev/mtdx
as a second step and verify there is no difference with and without the patch and finally check the impact:
flash_speed -d -c 10 /dev/mtdx (be careful: this is a destructive operation)
Testing team won't see any issue that's why feel free to add my Acked-by: Michal Smek michal.simek@amd.com
I think you told me in the last e-mail you tested the pl353 patch, not the one for the Arasan controller. Shall I add your Acked-by here and your Tested-by in the other?
Yes exactly. I tested pl353 myself. If that log looks good feel free to add my Tested-by tag. And I got information from testing team that they tested Arasan one hence only Ack one.
Thanks, Michal
Hi Michal,
michal.simek@amd.com wrote on Fri, 22 Sep 2023 11:16:20 +0200:
On 9/22/23 11:14, Miquel Raynal wrote:
Hi Michal,
michal.simek@amd.com wrote on Thu, 21 Sep 2023 12:25:10 +0200:
On 9/12/23 16:17, Miquel Raynal wrote:
Hi Michal,
michal.simek@amd.com wrote on Tue, 12 Sep 2023 15:55:23 +0200:
>> Hi Miquel,
On 9/11/23 17:52, Miquel Raynal wrote:
Hi Michal,
miquel.raynal@bootlin.com wrote on Mon, 17 Jul 2023 21:42:20 +0200: >>>> The NAND core complies with the ONFI specification, which itself > mentions that after any program or erase operation, a status check > should be performed to see whether the operation was finished *and* > successful. > > The NAND core offers helpers to finish a page write (sending the > "PAGE PROG" command, waiting for the NAND chip to be ready again, and > checking the operation status). But in some cases, advanced controller > drivers might want to optimize this and craft their own page write > helper to leverage additional hardware capabilities, thus not always > using the core facilities. > > Some drivers, like this one, do not use the core helper to finish a page > write because the final cycles are automatically managed by the > hardware. In this case, the additional care must be taken to manually > perform the final status check. > > Let's read the NAND chip status at the end of the page write helper and > return -EIO upon error. > > Cc: Michal Simek michal.simek@amd.com > Cc: stable@vger.kernel.org > Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") > Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com > > --- > > Hello Michal, > > I have not tested this, but based on a report on another driver, I > believe the status check is also missing here and could sometimes > lead to unnoticed partial writes. > > Please test on your side that everything still works and let me > know how it goes.
Any news from the testing team about patches 2/3 and 3/3?
I asked Amit to test and he didn't get back to me even I asked for it couple of times.
Ok.
>> Can you please tell me how to test it? I will setup HW myself and test it and get back to you.
I believe setting up the board to use the hardware BCH engine and performing basic erase/write/read testing with a known file and check it still behaves correctly would work. You can also run
nandbiterrs -i /dev/mtdx
as a second step and verify there is no difference with and without the patch and finally check the impact:
flash_speed -d -c 10 /dev/mtdx (be careful: this is a destructive operation)
Testing team won't see any issue that's why feel free to add my Acked-by: Michal Smek michal.simek@amd.com
I think you told me in the last e-mail you tested the pl353 patch, not the one for the Arasan controller. Shall I add your Acked-by here and your Tested-by in the other?
Yes exactly. I tested pl353 myself. If that log looks good feel free to add my Tested-by tag. And I got information from testing team that they tested Arasan one hence only Ack one.
Perfect. Thanks a lot!
Miquèl
On Mon, 2023-07-17 at 19:42:20 UTC, Miquel Raynal wrote:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 88ffef1b65cf ("mtd: rawnand: arasan: Support the hardware BCH ECC engine") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Acked-by: Michal Smek michal.simek@amd.com
Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git mtd/fixes.
Miquel
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 08d8c62164a3 ("mtd: rawnand: pl353: Add support for the ARM PL353 SMC NAND controller") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
---
Hello Michal,
Same as for the Arasan controller, this is not tested, but I believe it is required. Let me know how testing goes.
Thanks, Miquèl --- drivers/mtd/nand/raw/pl35x-nand-controller.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/mtd/nand/raw/pl35x-nand-controller.c b/drivers/mtd/nand/raw/pl35x-nand-controller.c index 28b7bd7e22eb..9dd06eeb021e 100644 --- a/drivers/mtd/nand/raw/pl35x-nand-controller.c +++ b/drivers/mtd/nand/raw/pl35x-nand-controller.c @@ -513,6 +513,7 @@ static int pl35x_nand_write_page_hwecc(struct nand_chip *chip, u32 addr1 = 0, addr2 = 0, row; u32 cmd_addr; int i, ret; + u8 status;
ret = pl35x_smc_set_ecc_mode(nfc, chip, PL35X_SMC_ECC_CFG_MODE_APB); if (ret) @@ -565,6 +566,14 @@ static int pl35x_nand_write_page_hwecc(struct nand_chip *chip, if (ret) goto disable_ecc_engine;
+ /* Check write status on the chip side */ + ret = nand_status_op(chip, &status); + if (ret) + goto disable_ecc_engine; + + if (status & NAND_STATUS_FAIL) + ret = -EIO; + disable_ecc_engine: pl35x_smc_set_ecc_mode(nfc, chip, PL35X_SMC_ECC_CFG_MODE_BYPASS);
On Mon, 2023-07-17 at 19:42:21 UTC, Miquel Raynal wrote:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: Michal Simek michal.simek@amd.com Cc: stable@vger.kernel.org Fixes: 08d8c62164a3 ("mtd: rawnand: pl353: Add support for the ARM PL353 SMC NAND controller") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git mtd/fixes.
Miquel
On 7/17/23 12:42, Miquel Raynal wrote:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: stable@vger.kernel.org Fixes: 02f26ecf8c77 ("mtd: nand: add reworked Marvell NAND controller driver") Reported-by: Aviram Dali aviramd@marvell.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com
Hello Aviram,
I have not tested this, but based on your report I believe the status check is indeed missing here and could sometimes lead to unnoticed partial writes.
Please test on your side and reply with your Tested-by if you validate the change.
Any backport on kernels predating v4.17 will likely fail because of a folder rename, so you will have to do the backport manually if needed.
Thanks, Miquèl
drivers/mtd/nand/raw/marvell_nand.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c index 30c15e4e1cc0..576441095012 100644 --- a/drivers/mtd/nand/raw/marvell_nand.c +++ b/drivers/mtd/nand/raw/marvell_nand.c @@ -1162,6 +1162,7 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip, .ndcb[2] = NDCB2_ADDR5_PAGE(page), }; unsigned int oob_bytes = lt->spare_bytes + (raw ? lt->ecc_bytes : 0);
- u8 status; int ret;
/* NFCv2 needs more information about the operation being executed */ @@ -1195,7 +1196,18 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip, ret = marvell_nfc_wait_op(chip, PSEC_TO_MSEC(sdr->tPROG_max));
- return ret;
- if (ret)
return ret;
- /* Check write status on the chip side */
- ret = nand_status_op(chip, &status);
- if (ret)
return ret;
- if (status & NAND_STATUS_FAIL)
return -EIO;
- return 0;
} static int marvell_nfc_hw_ecc_hmg_write_page_raw(struct nand_chip *chip, @@ -1624,6 +1636,7 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip, int data_len = lt->data_bytes; int spare_len = lt->spare_bytes; int chunk, ret;
- u8 status;
marvell_nfc_select_target(chip, chip->cur_cs); @@ -1660,6 +1673,14 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip, if (ret) return ret;
- /* Check write status on the chip side */
- ret = nand_status_op(chip, &status);
- if (ret)
return ret;
- if (status & NAND_STATUS_FAIL)
return -EIO;
- return 0;
}
Patch working as expected. Tested on 3 different NAND chips.
Tested-by: Ravi Chandra Minnikanti rminnikanti@marvell.com
On Mon, 2023-07-17 at 19:42:19 UTC, Miquel Raynal wrote:
The NAND core complies with the ONFI specification, which itself mentions that after any program or erase operation, a status check should be performed to see whether the operation was finished *and* successful.
The NAND core offers helpers to finish a page write (sending the "PAGE PROG" command, waiting for the NAND chip to be ready again, and checking the operation status). But in some cases, advanced controller drivers might want to optimize this and craft their own page write helper to leverage additional hardware capabilities, thus not always using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page write because the final cycles are automatically managed by the hardware. In this case, the additional care must be taken to manually perform the final status check.
Let's read the NAND chip status at the end of the page write helper and return -EIO upon error.
Cc: stable@vger.kernel.org Fixes: 02f26ecf8c77 ("mtd: nand: add reworked Marvell NAND controller driver") Reported-by: Aviram Dali aviramd@marvell.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Tested-by: Ravi Chandra Minnikanti rminnikanti@marvell.com
Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git mtd/fixes.
Miquel
linux-stable-mirror@lists.linaro.org