When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.
This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".
Signed-off-by: Joe Jin <joe.jin(a)oracle.com>
Tested-by: John Sobecki <john.sobecki(a)oracle.com>
Reviewed-by: Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: stable(a)vger.kernel.org
---
drivers/xen/swiotlb-xen.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index e1c60899fdbc..a6f9ba85dc4b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
* physical address */
phys = xen_bus_to_phys(dev_addr);
- if (((dev_addr + size - 1 > dma_mask)) ||
+ if (((dev_addr + size - 1 <= dma_mask)) ||
range_straddles_page_boundary(phys, size))
xen_destroy_contiguous_region(phys, order);
--
2.14.3 (Apple Git-98)
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.
This issue introduced by commit 6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".
Signed-off-by: Joe Jin <joe.jin(a)oracle.com>
Tested-by: John Sobecki <john.sobecki(a)oracle.com>
Reviewed-by: Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: stable(a)vger.kernel.org
---
drivers/xen/swiotlb-xen.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index e1c60899fdbc..a6f9ba85dc4b 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -351,7 +351,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
* physical address */
phys = xen_bus_to_phys(dev_addr);
- if (((dev_addr + size - 1 > dma_mask)) ||
+ if (((dev_addr + size - 1 <= dma_mask)) ||
range_straddles_page_boundary(phys, size))
xen_destroy_contiguous_region(phys, order);
--
2.14.3 (Apple Git-98)
v4.4.y:
drivers/net/ethernet/ti/cpsw.c: In function 'cpsw_add_dual_emac_def_ale_entries':
drivers/net/ethernet/ti/cpsw.c:1112:23: error: 'cpsw' undeclared
Guenter
On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote:
> + blk_mq_start_request(req);
> if (unlikely(nsock->pending && nsock->pending != req)) {
> blk_mq_requeue_request(req, true);
> ret = 0;
(replying to an e-mail from seven months ago)
Hello Josef,
Are you aware that the nbd driver is one of the very few block drivers that
calls blk_mq_requeue_request() after a request has been started? I think that
can lead to the block layer core to undesired behavior, e.g. that the timeout
handler fires concurrently with a request being reinstered. Can you or a
colleague have a look at this? I would like to add the following code to the
block layer core and I think that the nbd driver would trigger this warning:
void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
{
+ WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE);
+
__blk_mq_requeue_request(rq);
Thanks,
Bart.
For problem determination we always want to see when we were invoked
on the terminate_rport_io callback whether we perform something or not.
Temporal event sequence of interest with a long fast_io_fail_tmo of 27 sec:
loose remote port
t workqueue
[s] zfcp_q_<dev> IRQ zfcperp<dev>
=== ================== =================== ============================
0 recv RSCN
q p.test_link_work
block rport
start fast_io_fail_tmo
send ADISC ELS
4 recv ADISC fail
block zfcp_port
port forced reopen
send open port
12 recv open port fail
q p.gid_pn_work
zfcp_erp_wakeup
(zfcp_erp_wait would return)
GID_PN fail
Before this point, we got a SCSI trace with tag "sctrpi1" on fast_io_fail,
e.g. with the typical 5 sec setting.
port.status |= ERP_FAILED
If fast_io_fail_tmo triggers after this point, we missed a SCSI trace.
workqueue
fc_dl_<host>
==================
27 fc_timeout_fail_rport_io
fc_terminate_rport_io
zfcp_scsi_terminate_rport_io
zfcp_erp_port_forced_reopen
_zfcp_erp_port_forced_reopen
if (port.status & ERP_FAILED)
return;
Therefore, write a trace before above early return.
Example trace record formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : REC
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1 ZFCP_DBF_REC_TRIG
Tag : sctrpi1 SCSI terminate rport I/O
LUN : 0xffffffffffffffff none (invalid)
WWPN : 0x<wwpn>
D_ID : 0x<n_port_id>
Adapter status : 0x...
Port status : 0x...
LUN status : 0x00000000 none (invalid)
Ready count : 0x...
Running count : 0x...
ERP want : 0x03 ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need : 0xe0 ZFCP_ERP_ACTION_FAILED
Signed-off-by: Steffen Maier <maier(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock(a)linux.ibm.com>
---
drivers/s390/scsi/zfcp_erp.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c
index 3489b1bc9121..5c368cdfc455 100644
--- a/drivers/s390/scsi/zfcp_erp.c
+++ b/drivers/s390/scsi/zfcp_erp.c
@@ -42,9 +42,13 @@ enum zfcp_erp_steps {
* @ZFCP_ERP_ACTION_REOPEN_PORT_FORCED: Forced port recovery.
* @ZFCP_ERP_ACTION_REOPEN_ADAPTER: Adapter recovery.
* @ZFCP_ERP_ACTION_NONE: Eyecatcher pseudo flag to bitwise or-combine with
- * either of the other enum values.
+ * either of the first four enum values.
* Used to indicate that an ERP action could not be
* set up despite a detected need for some recovery.
+ * @ZFCP_ERP_ACTION_FAILED: Eyecatcher pseudo flag to bitwise or-combine with
+ * either of the first four enum values.
+ * Used to indicate that ERP not needed because
+ * the object has ZFCP_STATUS_COMMON_ERP_FAILED.
*/
enum zfcp_erp_act_type {
ZFCP_ERP_ACTION_REOPEN_LUN = 1,
@@ -52,6 +56,7 @@ enum zfcp_erp_act_type {
ZFCP_ERP_ACTION_REOPEN_PORT_FORCED = 3,
ZFCP_ERP_ACTION_REOPEN_ADAPTER = 4,
ZFCP_ERP_ACTION_NONE = 0xc0,
+ ZFCP_ERP_ACTION_FAILED = 0xe0,
};
enum zfcp_erp_act_state {
@@ -379,8 +384,12 @@ static void _zfcp_erp_port_forced_reopen(struct zfcp_port *port, int clear,
zfcp_erp_port_block(port, clear);
zfcp_scsi_schedule_rport_block(port);
- if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED)
+ if (atomic_read(&port->status) & ZFCP_STATUS_COMMON_ERP_FAILED) {
+ zfcp_dbf_rec_trig(id, port->adapter, port, NULL,
+ ZFCP_ERP_ACTION_REOPEN_PORT_FORCED,
+ ZFCP_ERP_ACTION_FAILED);
return;
+ }
zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_PORT_FORCED,
port->adapter, port, NULL, id, 0);
--
2.16.3
If a SCSI device is deleted during scsi_eh host reset, we cannot get a
reference to the SCSI device anymore since scsi_device_get returns !=0
by design. Assuming the recovery of adapter and port(s) was successful,
zfcp_erp_strategy_followup_success() attempts to trigger a LUN reset for
the half-gone SCSI device. Unfortunately, it causes the following confusing
trace record which states that zfcp will do a LUN recovery as "ERP need" is
ZFCP_ERP_ACTION_REOPEN_LUN == 1 and equals "ERP want".
Old example trace record formatted with zfcpdbf from s390-tools:
Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN : 0x<FCP_LUN>
WWPN : 0x<WWPN>
D_ID : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status : 0x54000001
LUN status : 0x40000000 ZFCP_STATUS_COMMON_RUNNING
but not ZFCP_STATUS_COMMON_UNBLOCKED as it
was closed on close part of adapter reopen
ERP want : 0x01
ERP need : 0x01 misleading
However, zfcp_erp_setup_act() returns NULL as it cannot get the reference.
Hence, zfcp_erp_action_enqueue() takes an early goto out and _NO_ recovery
actually happens.
We always do want the recovery trigger trace record even if no erp_action
could be enqueued as in this case. For other cases where we did not enqueue
an erp_action, 'need' has always been zero to indicate this. In order to
indicate above goto out, introduce an eyecatcher "flag" to mark the
"ERP need" as 'not needed' but still keep the information which erp_action
type, that zfcp_erp_required_act() had decided upon, is needed.
0xc_ is chosen to be visibly different from 0x0_ in "ERP want".
New example trace record formatted with zfcpdbf from s390-tools:
Tag: : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN : 0x<FCP_LUN>
WWPN : 0x<WWPN>
D_ID : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status : 0x54000001
LUN status : 0x40000000
ERP want : 0x01
ERP need : 0xc1 would need LUN ERP, but no action set up
^
Before v2.6.38 commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.") we could detect this case because the
"erp_action" field in the trace was NULL. The rework removed erp_action
as argument and field from the trace.
This patch here is for tracing. A fix to allow LUN recovery in the case at
hand is a topic for a separate patch.
See also commit fdbd1c5e27da ("[SCSI] zfcp: Allow running unit/LUN shutdown
without acquiring reference") for a similar case and background info.
Signed-off-by: Steffen Maier <maier(a)linux.ibm.com>
Fixes: ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock(a)linux.ibm.com>
---
drivers/s390/scsi/zfcp_erp.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c
index 1d91a32db08e..d9cd25b56cfa 100644
--- a/drivers/s390/scsi/zfcp_erp.c
+++ b/drivers/s390/scsi/zfcp_erp.c
@@ -35,11 +35,23 @@ enum zfcp_erp_steps {
ZFCP_ERP_STEP_LUN_OPENING = 0x2000,
};
+/**
+ * enum zfcp_erp_act_type - Type of ERP action object.
+ * @ZFCP_ERP_ACTION_REOPEN_LUN: LUN recovery.
+ * @ZFCP_ERP_ACTION_REOPEN_PORT: Port recovery.
+ * @ZFCP_ERP_ACTION_REOPEN_PORT_FORCED: Forced port recovery.
+ * @ZFCP_ERP_ACTION_REOPEN_ADAPTER: Adapter recovery.
+ * @ZFCP_ERP_ACTION_NONE: Eyecatcher pseudo flag to bitwise or-combine with
+ * either of the other enum values.
+ * Used to indicate that an ERP action could not be
+ * set up despite a detected need for some recovery.
+ */
enum zfcp_erp_act_type {
ZFCP_ERP_ACTION_REOPEN_LUN = 1,
ZFCP_ERP_ACTION_REOPEN_PORT = 2,
ZFCP_ERP_ACTION_REOPEN_PORT_FORCED = 3,
ZFCP_ERP_ACTION_REOPEN_ADAPTER = 4,
+ ZFCP_ERP_ACTION_NONE = 0xc0,
};
enum zfcp_erp_act_state {
@@ -257,8 +269,10 @@ static int zfcp_erp_action_enqueue(int want, struct zfcp_adapter *adapter,
goto out;
act = zfcp_erp_setup_act(need, act_status, adapter, port, sdev);
- if (!act)
+ if (!act) {
+ need |= ZFCP_ERP_ACTION_NONE; /* marker for trace */
goto out;
+ }
atomic_or(ZFCP_STATUS_ADAPTER_ERP_PENDING, &adapter->status);
++adapter->erp_total_count;
list_add_tail(&act->list, &adapter->erp_ready_head);
--
2.16.3
We already have a SCSI trace for the end of abort and scsi_eh TMF. Due to
zfcp_erp_wait() and fc_block_scsi_eh() time can pass between the
start of our eh callback and an actual send/recv of an abort / TMF request.
In order to see the temporal sequence including any abort / TMF send
retries, add a trace before the above two blocking functions.
This supports problem determination with scsi_eh and parallel zfcp ERP.
No need to explicitly trace the beginning of our eh callback, since we
typically can send an abort / TMF and see its HBA response (in the worst
case, it's a pseudo response on dismiss all of adapter recovery, e.g. due
to an FSF request timeout [fsrth_1] of the abort / TMF). If we cannot send,
we now get a trace record for the first "abrt_wt" or "[lt]r_wait" which
denotes almost the beginning of the callback.
No need to explicitly trace the wakeup after the above two blocking
functions because the next retry loop causes another trace in any case
and that is sufficient.
Example trace records formatted with zfcpdbf from s390-tools:
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : abrt_wt abort, before zfcp_erp_wait()
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0x<scsi_id>
SCSI LUN : 0x<scsi_lun>
SCSI LUN high : 0x<scsi_lun_high>
SCSI result : 0x<scsi_result_of_cmd_to_be_aborted>
SCSI retries : 0x<retries_of_cmd_to_be_aborted>
SCSI allowed : 0x<allowed_retries_of_cmd_to_be_aborted>
SCSI scribble : 0x<req_id_of_cmd_to_be_aborted>
SCSI opcode : <CDB_of_cmd_to_be_aborted>
FCP rsp inf cod: 0x.. none (invalid)
FCP rsp IU : ... none (invalid)
Timestamp : ...
Area : SCSI
Subarea : 00
Level : 1
Exception : -
CPU ID : ..
Caller : 0x...
Record ID : 1
Tag : lr_wait LUN reset, before zfcp_erp_wait()
Request ID : 0x0000000000000000 none (invalid)
SCSI ID : 0x<scsi_id>
SCSI LUN : 0x<scsi_lun>
SCSI LUN high : 0x<scsi_lun_high>
SCSI result : 0x... unrelated
SCSI retries : 0x.. unrelated
SCSI allowed : 0x.. unrelated
SCSI scribble : 0x... unrelated
SCSI opcode : ... unrelated
FCP rsp inf cod: 0x.. none (invalid)
FCP rsp IU : ... none (invalid)
Signed-off-by: Steffen Maier <maier(a)linux.ibm.com>
Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
Fixes: af4de36d911a ("[SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock(a)linux.ibm.com>
---
drivers/s390/scsi/zfcp_scsi.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index a62357f5e8b4..4fdb1665b0e6 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -181,6 +181,7 @@ static int zfcp_scsi_eh_abort_handler(struct scsi_cmnd *scpnt)
if (abrt_req)
break;
+ zfcp_dbf_scsi_abort("abrt_wt", scpnt, NULL);
zfcp_erp_wait(adapter);
ret = fc_block_scsi_eh(scpnt);
if (ret) {
@@ -277,6 +278,7 @@ static int zfcp_task_mgmt_function(struct scsi_cmnd *scpnt, u8 tm_flags)
if (fsf_req)
break;
+ zfcp_dbf_scsi_devreset("wait", scpnt, tm_flags, NULL);
zfcp_erp_wait(adapter);
ret = fc_block_scsi_eh(scpnt);
if (ret) {
--
2.16.3