On Wed 04 Nov 10:16 CST 2020, Bjorn Andersson wrote:
The reliance on the remoteproc's state for determining when to send sysmon notifications to a remote processor is racy with regard to concurrent remoteproc operations.
Further more the advertisement of the state of other remote processor to a newly started remote processor might not only send the wrong state, but might result in a stream of state changes that are out of order.
Address this by introducing state tracking within the sysmon instances themselves and extend the locking to ensure that the notifications are consistent with this state.
The use of a big lock for all instances will cause contention for concurrent remote processor state transitions, but the correctness of the remote processors' view of their peers is more important.
Fixes: 1f36ab3f6e3b ("remoteproc: sysmon: Inform current rproc about all active rprocs") Fixes: 1877f54f75ad ("remoteproc: sysmon: Add notifications for events") Fixes: 1fb82ee806d1 ("remoteproc: qcom: Introduce sysmon") Cc: stable@vger.kernel.org Signed-off-by: Bjorn Andersson bjorn.andersson@linaro.org
drivers/remoteproc/qcom_sysmon.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/remoteproc/qcom_sysmon.c b/drivers/remoteproc/qcom_sysmon.c index 9eb2f6bccea6..1e507b66354a 100644 --- a/drivers/remoteproc/qcom_sysmon.c +++ b/drivers/remoteproc/qcom_sysmon.c @@ -22,6 +22,8 @@ struct qcom_sysmon { struct rproc_subdev subdev; struct rproc *rproc;
- int state;
- struct list_head node;
const char *name; @@ -448,7 +450,10 @@ static int sysmon_prepare(struct rproc_subdev *subdev) .ssr_event = SSCTL_SSR_EVENT_BEFORE_POWERUP };
- mutex_lock(&sysmon_lock);
This doesn't work, because taking the big lock prevents a concurrently failing remote processor from reaching smd orglink to indicate that that remote is dead and the first remote's notifications should be aborted/fail fast.
The result is in most cases that we're stuck here waiting for a timeout, but there are extreme corner cases where the notification might be waiting for the dead remote to drain the communication fifo.
Will send a new version that don't rely on the big lock, but still keeps state information consistent.
Regards, Bjorn