Entire disabling NCQ for Samsugn 860/870 SSD will cause I/O performance drop. In this case, a flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is introduced to used to performe additional check for these SSDs. If it finds it's parent ATA controller is AsMedia/AMD/Marvell, the NCQ will be disabled. Otherwise, the NCQ is kept to enable.
Kate Hsuan (1): libata: libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs
drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++----- include/linux/libata.h | 3 +++ 2 files changed, 35 insertions(+), 5 deletions(-)
A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ on AMD/MAREL/ASMEDIA chipsets.
Samsung 860/870 SSD are disabled to use NCQ trim functions but it will lead to performace drop. From the bugzilla, we could realize the issues only appears on those chipset mentioned above. So this flag could be used to only disable NCQ on specific chipsets.
Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475 Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Kate Hsuan hpa@redhat.com --- drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++----- include/linux/libata.h | 3 +++ 2 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index cc459ce90018..50f635669dd4 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -2119,6 +2119,8 @@ static inline u8 ata_dev_knobble(struct ata_device *dev) static void ata_dev_config_ncq_send_recv(struct ata_device *dev) { struct ata_port *ap = dev->link->ap; + struct device *parent = NULL; + struct pci_dev *pcidev = NULL; unsigned int err_mask;
if (!ata_log_supported(dev, ATA_LOG_NCQ_SEND_RECV)) { @@ -2138,9 +2140,32 @@ static void ata_dev_config_ncq_send_recv(struct ata_device *dev) memcpy(cmds, ap->sector_buf, ATA_LOG_NCQ_SEND_RECV_SIZE);
if (dev->horkage & ATA_HORKAGE_NO_NCQ_TRIM) { - ata_dev_dbg(dev, "disabling queued TRIM support\n"); - cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &= - ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM; + if (dev->horkage & ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL) + { + // get parent device for the controller vendor ID + for(parent = dev->tdev.parent; parent != NULL; parent = parent->parent) + { + if(dev_is_pci(parent)) + { + pcidev = to_pci_dev(parent); + if (pcidev->vendor == PCI_VENDOR_ID_MARVELL || + pcidev->vendor == PCI_VENDOR_ID_AMD || + pcidev->vendor == PCI_VENDOR_ID_ASMEDIA ) + { + ata_dev_dbg(dev, "Disable NCQ -> vendor ID %x product ID %x\n", + pcidev->vendor, pcidev->device); + cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &= + ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM; + } + break; + } + } + }else + { + ata_dev_dbg(dev, "disabling queued TRIM support\n"); + cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &= + ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM; + } } } } @@ -3951,9 +3976,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | - ATA_HORKAGE_ZERO_AFTER_TRIM, }, + ATA_HORKAGE_ZERO_AFTER_TRIM | + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, }, { "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | - ATA_HORKAGE_ZERO_AFTER_TRIM, }, + ATA_HORKAGE_ZERO_AFTER_TRIM | + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, }, { "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, },
diff --git a/include/linux/libata.h b/include/linux/libata.h index 3fcd24236793..ec17f1f3fbf6 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -422,6 +422,9 @@ enum { ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */ ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */ ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */ + ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL = (1 << 27), /*Disable NCQ only on + ASMeida, AMD, and Marvell + Chipset*/
/* DMA mask for user DMA control: User visible values; DO NOT renumber */
On 2021/08/27 14:34, Kate Hsuan wrote:
A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ on AMD/MAREL/ASMEDIA chipsets.
Samsung 860/870 SSD are disabled to use NCQ trim functions but it will lead to performace drop. From the bugzilla, we could realize the issues only appears on those chipset mentioned above. So this flag could be used to only disable NCQ on specific chipsets.
Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475 Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Martin K. Petersen martin.petersen@oracle.com
Since this is a v2, you should not keep the review tag here.
Signed-off-by: Kate Hsuan hpa@redhat.com
This is a v2 patch, so please add the changelog from v1 here.
But I think that v1 introduced ATA_HORKAGE_NO_NCQ_TRIM. Yet, here you introduce a completely different flag on top of the patch that introduced ATA_HORKAGE_NO_NCQ_TRIM. So this patch is not version 2 of the previous one. It is a completely different patch. Unless I missed v1 on the list...
drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++----- include/linux/libata.h | 3 +++ 2 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index cc459ce90018..50f635669dd4 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -2119,6 +2119,8 @@ static inline u8 ata_dev_knobble(struct ata_device *dev) static void ata_dev_config_ncq_send_recv(struct ata_device *dev) { struct ata_port *ap = dev->link->ap;
- struct device *parent = NULL;
- struct pci_dev *pcidev = NULL; unsigned int err_mask;
if (!ata_log_supported(dev, ATA_LOG_NCQ_SEND_RECV)) { @@ -2138,9 +2140,32 @@ static void ata_dev_config_ncq_send_recv(struct ata_device *dev) memcpy(cmds, ap->sector_buf, ATA_LOG_NCQ_SEND_RECV_SIZE); if (dev->horkage & ATA_HORKAGE_NO_NCQ_TRIM) {
ata_dev_dbg(dev, "disabling queued TRIM support\n");
cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
if (dev->horkage & ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL)
{
Please follow the kernel coding style: do not break line before "{". This comment applies to all your modifications below too.
// get parent device for the controller vendor ID
for(parent = dev->tdev.parent; parent != NULL; parent = parent->parent)
{
if(dev_is_pci(parent))
{
pcidev = to_pci_dev(parent);
if (pcidev->vendor == PCI_VENDOR_ID_MARVELL ||
pcidev->vendor == PCI_VENDOR_ID_AMD ||
pcidev->vendor == PCI_VENDOR_ID_ASMEDIA )
Please align the conditions.
{
ata_dev_dbg(dev, "Disable NCQ -> vendor ID %x product ID %x\n",
pcidev->vendor, pcidev->device);
Weird alignment here too. Move the arguments aligned with "dev" at the beginning of the line.
cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
}
break;
}
}
}else
{
ata_dev_dbg(dev, "disabling queued TRIM support\n");
cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
} }}
} @@ -3951,9 +3976,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
ATA_HORKAGE_ZERO_AFTER_TRIM |
ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
You named your flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL but you are applying it to a Samsung device. This is rather confusing. I do not think you need to have the vendor in the flag name, e.g. ATA_HORKAGE_NO_NCQ. Whatever device in ata_device_blacklist has the flag will have NCQ disabled.
{ "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
ATA_HORKAGE_ZERO_AFTER_TRIM |
ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
If you disable NCQ entirely for this device, then what is the point of keeping ATA_HORKAGE_NO_NCQ_TRIM ? TRIM commands will not be NCQ anymore, similarly to all other commands.
{ "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, diff --git a/include/linux/libata.h b/include/linux/libata.h index 3fcd24236793..ec17f1f3fbf6 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -422,6 +422,9 @@ enum { ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */ ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */ ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */
- ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL = (1 << 27), /*Disable NCQ only on
ASMeida, AMD, and Marvell
Chipset*/
See above. You do not need to have the vendor name in the flag name.
/* DMA mask for user DMA control: User visible values; DO NOT renumber */
On Fri, Aug 27, 2021 at 01:33:44AM -0400, Kate Hsuan wrote:
A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ on AMD/MAREL/ASMEDIA chipsets.
Samsung 860/870 SSD are disabled to use NCQ trim functions but it will lead to performace drop. From the bugzilla, we could realize the issues only appears on those chipset mentioned above. So this flag could be used to only disable NCQ on specific chipsets.
Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475 Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Kate Hsuan hpa@redhat.com
drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++----- include/linux/libata.h | 3 +++ 2 files changed, 35 insertions(+), 5 deletions(-)
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
Hi Kate,
Some background info for other people following this thread, as discussed here: https://bugzilla.kernel.org/show_bug.cgi?id=201693 https://bugzilla.kernel.org/show_bug.cgi?id=203475
There are a lot of users who are reporting disk issues (including data corruption with Samsung 860 and 870 SSDs. Coming up with fixes for this has taken longer then it should because I failed to realize for a long time that there really are 2 separate issues here:
https://bugzilla.kernel.org/show_bug.cgi?id=203475#c34
""" So after completely re-reading / analyzing both this bug as well as bug 201693 with a fresh pair of eyes (since the last time I did this was a long time ago) I agree. After careful reading / analysis it seems that there really are 2 different bugs here impacting both the 860 EVO and the 870 EVO:
1. Queued Trim commands are causing issues on Intel + ASmedia + Marvell controllers
2. Things are seriously broken on AMD controllers and only completely disabling NCQ altogether helps there.
I will submit a kernel patch (with a Fixes tag so that it gets backported to stable series) for 1. right away; and I've asked a colleague to start working on a new ATA horkage flag which disables NCQ on AMD SATA controllers only, so that we can add that flag (together with the ATA_HORKAGE_NO_NCQ_TRIM flag which my patch adds) to the 860 EVO and the 870 EVO to also resolve 2. """
I asked Kate to write this patch to address 2., note this patch is to be applied on top of my " libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs" patch.
Kate, thank you for your patch. There are several issues which need to be addressed before this patch can be accepted, starting with the commit message.
It seems that you used the commit message as my patch as a template, but you forgot to change the Subject (the first line) for the next version please change the subject to something correctly describing this patch.
I also see that you gave this patch a version of 2, but since this patch does not replace my patch, in other words it is a different patch you should have just made it v1. Anyways lets just make the next version v3 to avoid confusion.
The rest of the commit message should have 1 paragraph describing the reason why the patch is necessary + a second paragraph describing what the patch is doing to address this. Your cover-letter would be a good candidate for the second paragraph, resulting in for example something like this as body of the commit message:
""" Many users are reporting that the Samsung 860 and 870 SSDs are having various issues when combined with AMD SATA controllers and only completely disabling NCQ helps to avoid these issues.
Entirely disabling NCQ for Samsung 860/870 SSD will cause I/O performance drop. In this case, a flag ATA_HORKAGE_NONCQ_ON_AMD is introduced to used to perform additional check for these SSDs. If it finds it's parent ATA controller is AMD then NCQ will be disabled. Otherwise the NCQ is kept to enable. """
On 8/27/21 7:33 AM, Kate Hsuan wrote:
A flag ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL is added to disable NCQ on AMD/MAREL/ASMEDIA chipsets.
Samsung 860/870 SSD are disabled to use NCQ trim functions but it will lead to performace drop. From the bugzilla, we could realize the issues only appears on those chipset mentioned above. So this flag could be used to only disable NCQ on specific chipsets.
Fixes: ca6bfcb2f6d9 ("libata: Enable queued TRIM for Samsung SSD 860") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=203475 Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Kate Hsuan hpa@redhat.com
drivers/ata/libata-core.c | 37 ++++++++++++++++++++++++++++++++----- include/linux/libata.h | 3 +++ 2 files changed, 35 insertions(+), 5 deletions(-)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index cc459ce90018..50f635669dd4 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -2119,6 +2119,8 @@ static inline u8 ata_dev_knobble(struct ata_device *dev) static void ata_dev_config_ncq_send_recv(struct ata_device *dev) { struct ata_port *ap = dev->link->ap;
- struct device *parent = NULL;
- struct pci_dev *pcidev = NULL; unsigned int err_mask;
if (!ata_log_supported(dev, ATA_LOG_NCQ_SEND_RECV)) { @@ -2138,9 +2140,32 @@ static void ata_dev_config_ncq_send_recv(struct ata_device *dev) memcpy(cmds, ap->sector_buf, ATA_LOG_NCQ_SEND_RECV_SIZE); if (dev->horkage & ATA_HORKAGE_NO_NCQ_TRIM) {
ata_dev_dbg(dev, "disabling queued TRIM support\n");
cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
if (dev->horkage & ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL)
{
// get parent device for the controller vendor ID
for(parent = dev->tdev.parent; parent != NULL; parent = parent->parent)
{
if(dev_is_pci(parent))
{
pcidev = to_pci_dev(parent);
if (pcidev->vendor == PCI_VENDOR_ID_MARVELL ||
pcidev->vendor == PCI_VENDOR_ID_AMD ||
pcidev->vendor == PCI_VENDOR_ID_ASMEDIA )
{
ata_dev_dbg(dev, "Disable NCQ -> vendor ID %x product ID %x\n",
pcidev->vendor, pcidev->device);
cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
}
break;
}
}
}else
{
ata_dev_dbg(dev, "disabling queued TRIM support\n");
cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &=
~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM;
}
Please don't nest the handling of the new ATA_HORKAGE_NONCQ_ON_AMD flag with the handling of other flags.
Also you are just disabling queued-trims now, which my patch already does, instead the new check should completely disable NCQ, this means moving the check to ata_dev_config_ncq() adding the new check after this check:
if (dev->horkage & ATA_HORKAGE_NONCQ) { snprintf(desc, desc_sz, "NCQ (not used)"); return 0; }
And then do the same, but only if pcidev->vendor == PCI_VENDOR_ID_AMD.
}
} } @@ -3951,9 +3976,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "Samsung SSD 850*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Samsung SSD 860*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
ATA_HORKAGE_ZERO_AFTER_TRIM |
{ "Samsung SSD 870*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
ATA_HORKAGE_ZERO_AFTER_TRIM, },
ATA_HORKAGE_ZERO_AFTER_TRIM |
{ "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, },ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL, },
diff --git a/include/linux/libata.h b/include/linux/libata.h index 3fcd24236793..ec17f1f3fbf6 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -422,6 +422,9 @@ enum { ATA_HORKAGE_NOTRIM = (1 << 24), /* don't use TRIM */ ATA_HORKAGE_MAX_SEC_1024 = (1 << 25), /* Limit max sects to 1024 */ ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */
- ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL = (1 << 27), /*Disable NCQ only on
ASMeida, AMD, and Marvell
Chipset*/
When we initially discussed this I know I said that we would need to disable NCQ on AMD + ASMEDIA + Marvell hosts, but after carefully reading both bugs again I've come to the conclusion that for Asmedia and Marvell SATA hosts just disabling trimmed queue as my patch does is enough. So please rename this to:
ATA_HORKAGE_NONCQ_ON_AMD
(and only check for an AMD vendor-id when doing the check).
/* DMA mask for user DMA control: User visible values; DO NOT renumber */
Regards,
Hans
linux-stable-mirror@lists.linaro.org