Hannes,
On 4/4/18 16:35, Hannes Reinecke wrote:
On Wed, 4 Apr 2018 07:06:58 +0000 Damien Le Moal Damien.LeMoal@wdc.com wrote:
Hannes,
On 4/4/18 15:57, Hannes Reinecke wrote:
On Wed, 4 Apr 2018 15:51:38 +0900 Damien Le Moal damien.lemoal@wdc.com wrote:
With the introduction of commit e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but lacking additional sense information will have a return code set to BLK_STS_OK. This results in the request issuer to see successful request execution despite the failure. An example of such case is an unaligned write on a host managed ZAC disk connected to a SAS HBA with a malfunctioning SAT. The unaligned write command gets aborted but has no additional sense information.
sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00 print_req_error: I/O error, dev sde, sector 274726920
In scsi_io_completion(), sense key handling to not change the request error code and success being reported to the issuer.
Fix this by making sure that the error code always indicates an error if scsi_io_completion() decide that the action to be taken for a failed command is to not retry it and terminate it immediately (ACTION_FAIL) .
Signed-off-by: Damien Le Moal damien.lemoal@wdc.com Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()") Cc: Hannes Reinecke hare@suse.com Cc: stable@vger.kernel.org
drivers/scsi/scsi_lib.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index c84f931388f2..87579bfcc186 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) scsi_print_command(cmd); } }
/*
* The command failed and should not be retried.
If the host
* byte is DID_OK, then
__scsi_error_from_host_byte() returned
* BLK_STS_OK and error indicates a success. Make
sure to not
* use that as the completion code and always
return an
* I/O error.
*/
if (error == BLK_STS_OK)
if (!scsi_end_request(req, error,error = BLK_STS_IOERR;
blk_rq_err_bytes(req), 0)) return; /*FALLTHRU*/
That looks wrong. Shouldn't __scsi_error_from_host_byte() return the correct status here?
My drive said:
sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing and error stays equal to success. scsi_end_request() gets called with that and dd sees a success...
There are also plenty of other sense keys cases where error is not changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I think this is likely the most common case since an command failure with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most common one.
My patch is a bit of a hammer and makes sure that an ACTION_FAIL request is completed as a failure... Am I getting all this wrong ?
Just checked with the code, and send a new patch. Which I find a bit clearer.
Fact is that __scsi_error_from_host_byte() is not sufficient to evaluate the return code, so taking its return value with any further checks is indeed wrong. But then it never said so on the boilerplate of __scsi_error_from_host_byte() ...
So please do check the 'v2' version of the patch.
It fixes the particular problem I am seeing. But looking more at this code, isn't there a problem with sshdr.sense_key == RECOVERED_ERROR ?
if (sense_valid && (sshdr.sense_key == RECOVERED_ERROR)) { ... result = 0; /* for passthrough error may be set */ error = BLK_STS_OK; }
Then the error will be changed wrongly to BLK_STS_IOERR.
My original change had the same problem.