This patch mainly reverts what commit b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling") introduced. Except two things, checking if op->done got true under ops_lock after it got interrupted and changing "no op" messages to debug printout.
There is currently problems with cleaning up pending operations. The main idea of commit b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling") was to wait for a reply and if it was interrupted then the cleanup routine e.g. list_del(), do_unlock_close() will be executed.
This requires that for every dlm op request a answer in dev_write() comes back. The cleanup routine do_unlock_close() is not operating in the dlm user space software on a per request basis and will cleanup everything else what matches certain plock op fields which concludes that we don't get anymore for all request a result back. This will have some leftovers inside the dlm plock recv_list which will never being deleted.
It was confirmed with a new debugfs entry to look if some plock lists have still entries left when there is no posix lock activity, checked by dlm_tool plocks $LS, ongoing anymore. In the specific testcase on a gfs2 mountpoint the following command was executed:
stress-ng --fcntl 32
and the stress-ng program was killed after certain time.
Due the fact that do_unlock_close() cleans more than just a specific operation and the dlm operation is already removed by list_del(). This list_del() can either be operating on send_list or recv_list. If it hits recv_list it still can be that answers coming back for an ongoing operation and do_unlock_close() is not synchronized with the list_del(). This will end in "no op ..." log_print(), to not confuse the user about such issues which seems to be there by design we move this logging information to pr_debug() as those are expected log messages.
Cc: stable@vger.kernel.org Fixes: b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling") Signed-off-by: Alexander Aring aahringo@redhat.com --- fs/dlm/plock.c | 25 ++++++------------------- 1 file changed, 6 insertions(+), 19 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index ff364901f22b..fea2157fac5b 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -30,8 +30,6 @@ struct plock_async_data { struct plock_op { struct list_head list; int done; - /* if lock op got interrupted while waiting dlm_controld reply */ - bool sigint; struct dlm_plock_info info; /* if set indicates async handling */ struct plock_async_data *data; @@ -167,12 +165,14 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file, spin_unlock(&ops_lock); goto do_lock_wait; } - - op->sigint = true; + list_del(&op->list); spin_unlock(&ops_lock); + log_debug(ls, "%s: wait interrupted %x %llx pid %d", __func__, ls->ls_global_id, (unsigned long long)number, op->info.pid); + do_unlock_close(&op->info); + dlm_release_plock_op(op); goto out; }
@@ -434,19 +434,6 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, if (iter->info.fsid == info.fsid && iter->info.number == info.number && iter->info.owner == info.owner) { - if (iter->sigint) { - list_del(&iter->list); - spin_unlock(&ops_lock); - - pr_debug("%s: sigint cleanup %x %llx pid %d", - __func__, iter->info.fsid, - (unsigned long long)iter->info.number, - iter->info.pid); - do_unlock_close(&iter->info); - memcpy(&iter->info, &info, sizeof(info)); - dlm_release_plock_op(iter); - return count; - } list_del_init(&iter->list); memcpy(&iter->info, &info, sizeof(info)); if (iter->data) @@ -465,8 +452,8 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, else wake_up(&recv_wq); } else - log_print("%s: no op %x %llx", __func__, - info.fsid, (unsigned long long)info.number); + pr_debug("%s: no op %x %llx", __func__, + info.fsid, (unsigned long long)info.number); return count; }