This patch fixes to set local processes and their pid value represented inside the struct flock when using F_GETLK if there is a conflict with another process.
Currently every pid in struct flock l_pid is set as negative pid number. This was changed by commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks"). There is still the question how to represent remote pid lock holders, which is possible for DLM posix handling, in the flock structure. This patch however will only change local process holders to be represented as positive pids.
Further patches may change the behaviour for remote pid lock holders, for now this patch will leave the behaviour of remote lock holders unchanged which will be represented as negative pid.
There exists a simple ltp testcase [0] as reproducer.
[0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/ke...
Cc: stable@vger.kernel.org Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks") Signed-off-by: Alexander Aring aahringo@redhat.com --- fs/dlm/plock.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index ed4357e62f35..ff364901f22b 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -360,7 +360,9 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file, locks_init_lock(fl); fl->fl_type = (op->info.ex) ? F_WRLCK : F_RDLCK; fl->fl_flags = FL_POSIX; - fl->fl_pid = -op->info.pid; + fl->fl_pid = op->info.pid; + if (op->info.nodeid != dlm_our_nodeid()) + fl->fl_pid = -fl->fl_pid; fl->fl_start = op->info.start; fl->fl_end = op->info.end; rv = 0;
This patch mainly reverts what commit b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling") introduced. Except two things, checking if op->done got true under ops_lock after it got interrupted and changing "no op" messages to debug printout.
There is currently problems with cleaning up pending operations. The main idea of commit b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling") was to wait for a reply and if it was interrupted then the cleanup routine e.g. list_del(), do_unlock_close() will be executed.
This requires that for every dlm op request a answer in dev_write() comes back. The cleanup routine do_unlock_close() is not operating in the dlm user space software on a per request basis and will cleanup everything else what matches certain plock op fields which concludes that we don't get anymore for all request a result back. This will have some leftovers inside the dlm plock recv_list which will never being deleted.
It was confirmed with a new debugfs entry to look if some plock lists have still entries left when there is no posix lock activity, checked by dlm_tool plocks $LS, ongoing anymore. In the specific testcase on a gfs2 mountpoint the following command was executed:
stress-ng --fcntl 32
and the stress-ng program was killed after certain time.
Due the fact that do_unlock_close() cleans more than just a specific operation and the dlm operation is already removed by list_del(). This list_del() can either be operating on send_list or recv_list. If it hits recv_list it still can be that answers coming back for an ongoing operation and do_unlock_close() is not synchronized with the list_del(). This will end in "no op ..." log_print(), to not confuse the user about such issues which seems to be there by design we move this logging information to pr_debug() as those are expected log messages.
Cc: stable@vger.kernel.org Fixes: b92a4e3f86b1 ("fs: dlm: change posix lock sigint handling") Signed-off-by: Alexander Aring aahringo@redhat.com --- fs/dlm/plock.c | 25 ++++++------------------- 1 file changed, 6 insertions(+), 19 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index ff364901f22b..fea2157fac5b 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -30,8 +30,6 @@ struct plock_async_data { struct plock_op { struct list_head list; int done; - /* if lock op got interrupted while waiting dlm_controld reply */ - bool sigint; struct dlm_plock_info info; /* if set indicates async handling */ struct plock_async_data *data; @@ -167,12 +165,14 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file, spin_unlock(&ops_lock); goto do_lock_wait; } - - op->sigint = true; + list_del(&op->list); spin_unlock(&ops_lock); + log_debug(ls, "%s: wait interrupted %x %llx pid %d", __func__, ls->ls_global_id, (unsigned long long)number, op->info.pid); + do_unlock_close(&op->info); + dlm_release_plock_op(op); goto out; }
@@ -434,19 +434,6 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, if (iter->info.fsid == info.fsid && iter->info.number == info.number && iter->info.owner == info.owner) { - if (iter->sigint) { - list_del(&iter->list); - spin_unlock(&ops_lock); - - pr_debug("%s: sigint cleanup %x %llx pid %d", - __func__, iter->info.fsid, - (unsigned long long)iter->info.number, - iter->info.pid); - do_unlock_close(&iter->info); - memcpy(&iter->info, &info, sizeof(info)); - dlm_release_plock_op(iter); - return count; - } list_del_init(&iter->list); memcpy(&iter->info, &info, sizeof(info)); if (iter->data) @@ -465,8 +452,8 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, else wake_up(&recv_wq); } else - log_print("%s: no op %x %llx", __func__, - info.fsid, (unsigned long long)info.number); + pr_debug("%s: no op %x %llx", __func__, + info.fsid, (unsigned long long)info.number); return count; }
This patch will revert commit a6b1533e9a57 ("dlm: make posix locks interruptible"). It was probably introduced to reach the fcntl() F_SETLKW requirement to make fcntl() calls interruptible, see:
"F_SETLKW (struct flock *):
As for F_SETLK, but if a conflicting lock is held on the file, then wait for that lock to be released. If a signal is caught while waiting, then the call is interrupted and (after the signal handler has returned) returns immediately (with return value -1 and errno set to EINTR; see signal(7))."
This requires to interrupt the current plock operation in question sitting in the wait_event_interruptible() waiting for op->done becomes true. This isn't currently the case as do_unlock_close() will act like the process was killed as it unlocks all previously acquired locks which can occur into data corruption because the process still thinks it has the previously acquired locks still acquired.
To test this behaviour a ltp testcase was created [0]. After this patch the process can't be interrupted anymore which is against the API but considered currently as a limitation of DLM. However it will stop to unlock all previously acquired locks and the user process isn't aware of it.
It requires more work in DLM to support such feature as intended. It requires some lock request cancellation request which does not yet exists for dlm plock user space communication. As this feature never worked as intended and have side effects as mentioned aboe this patch moves the wait to killable again.
[0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/ke...
Cc: stable@vger.kernel.org Fixes: a6b1533e9a57 ("dlm: make posix locks interruptible") Signed-off-by: Alexander Aring aahringo@redhat.com --- fs/dlm/plock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index fea2157fac5b..31bc601ee3d8 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -155,7 +155,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
send_op(op);
- rv = wait_event_interruptible(recv_wq, (op->done != 0)); + rv = wait_event_killable(recv_wq, (op->done != 0)); if (rv == -ERESTARTSYS) { spin_lock(&ops_lock); /* recheck under ops_lock if we got a done != 0,
This patch changes that only F_SETLKW will be killable. As the man page of fcntl() states out that F_SETLKW is the only one interruptible cmd as I supposed it can block an unknown amount of time when it hits contention. We use killable for the same reason just that the process isn't alive anymore.
The command F_SETLK is a trylock command, a result can be expected very close as the operation was send to the user space like unlock or get operations which uses wait_event() only.
Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring aahringo@redhat.com --- fs/dlm/plock.c | 38 +++++++++++++++++++++----------------- 1 file changed, 21 insertions(+), 17 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index 31bc601ee3d8..c9e1d5f54194 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -155,25 +155,29 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
send_op(op);
- rv = wait_event_killable(recv_wq, (op->done != 0)); - if (rv == -ERESTARTSYS) { - spin_lock(&ops_lock); - /* recheck under ops_lock if we got a done != 0, - * if so this interrupt case should be ignored - */ - if (op->done != 0) { + if (op->info.wait) { + rv = wait_event_killable(recv_wq, (op->done != 0)); + if (rv == -ERESTARTSYS) { + spin_lock(&ops_lock); + /* recheck under ops_lock if we got a done != 0, + * if so this interrupt case should be ignored + */ + if (op->done != 0) { + spin_unlock(&ops_lock); + goto do_lock_wait; + } + list_del(&op->list); spin_unlock(&ops_lock); - goto do_lock_wait; - } - list_del(&op->list); - spin_unlock(&ops_lock);
- log_debug(ls, "%s: wait interrupted %x %llx pid %d", - __func__, ls->ls_global_id, - (unsigned long long)number, op->info.pid); - do_unlock_close(&op->info); - dlm_release_plock_op(op); - goto out; + log_debug(ls, "%s: wait interrupted %x %llx pid %d", + __func__, ls->ls_global_id, + (unsigned long long)number, op->info.pid); + do_unlock_close(&op->info); + dlm_release_plock_op(op); + goto out; + } + } else { + wait_event(recv_wq, (op->done != 0)); }
do_lock_wait:
This patch fixes a possible plock op collisions when using F_SETLKW lock requests and fsid, number and owner are not enough to identify a result for a pending request. The ltp testcases [0] and [1] are examples when this is not enough in case of using classic posix locks with threads and open filedescriptor posix locks.
The idea to fix the issue here is to split recv_list, which contains plock ops expecting a result from user space, into a F_SETLKW op recv_setlkw_list and for all other commands recv_list. Due DLM user space behavior e.g. dlm_controld a request and writing a result back can only happen in an ordered way. That means a lookup and iterating over the recv_list is not required. To place the right plock op as the first entry of recv_list a change to list_move_tail() was made.
This behaviour is for F_SETLKW not possible as multiple waiters can be get a result back in an random order. To avoid a collisions in cases like [0] or [1] this patch adds more fields to compare the plock operations as the lock request is the same. This is also being made in NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We still can't find the exact lock request for a specific result if the lock request is the same, but if this is the case we don't care the order how the identical lock requests get their result back to grant the lock.
[0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/ke... [1] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/ke... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/incl... [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l...
Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring aahringo@redhat.com --- fs/dlm/plock.c | 47 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 34 insertions(+), 13 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index c9e1d5f54194..540a30a342f0 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -17,6 +17,7 @@ static DEFINE_SPINLOCK(ops_lock); static LIST_HEAD(send_list); static LIST_HEAD(recv_list); +static LIST_HEAD(recv_setlkw_list); static DECLARE_WAIT_QUEUE_HEAD(send_wq); static DECLARE_WAIT_QUEUE_HEAD(recv_wq);
@@ -392,10 +393,14 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count, spin_lock(&ops_lock); if (!list_empty(&send_list)) { op = list_first_entry(&send_list, struct plock_op, list); - if (op->info.flags & DLM_PLOCK_FL_CLOSE) + if (op->info.flags & DLM_PLOCK_FL_CLOSE) { list_del(&op->list); - else - list_move(&op->list, &recv_list); + } else { + if (op->info.wait) + list_move(&op->list, &recv_setlkw_list); + else + list_move_tail(&op->list, &recv_list); + } memcpy(&info, &op->info, sizeof(info)); } spin_unlock(&ops_lock); @@ -434,18 +439,34 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, return -EINVAL;
spin_lock(&ops_lock); - list_for_each_entry(iter, &recv_list, list) { - if (iter->info.fsid == info.fsid && - iter->info.number == info.number && - iter->info.owner == info.owner) { - list_del_init(&iter->list); - memcpy(&iter->info, &info, sizeof(info)); - if (iter->data) + if (info.wait) { + list_for_each_entry(iter, &recv_setlkw_list, list) { + if (iter->info.fsid == info.fsid && + iter->info.number == info.number && + iter->info.owner == info.owner && + iter->info.pid == info.pid && + iter->info.start == info.start && + iter->info.end == info.end) { + list_del_init(&iter->list); + memcpy(&iter->info, &info, sizeof(info)); + if (iter->data) + do_callback = 1; + else + iter->done = 1; + op = iter; + break; + } + } + } else { + op = list_first_entry_or_null(&recv_list, struct plock_op, + list); + if (op) { + list_del_init(&op->list); + memcpy(&op->info, &info, sizeof(info)); + if (op->data) do_callback = 1; else - iter->done = 1; - op = iter; - break; + op->done = 1; } } spin_unlock(&ops_lock);
Hi,
On Fri, May 19, 2023 at 11:21 AM Alexander Aring aahringo@redhat.com wrote:
This patch fixes a possible plock op collisions when using F_SETLKW lock requests and fsid, number and owner are not enough to identify a result for a pending request. The ltp testcases [0] and [1] are examples when this is not enough in case of using classic posix locks with threads and open filedescriptor posix locks.
The idea to fix the issue here is to split recv_list, which contains plock ops expecting a result from user space, into a F_SETLKW op recv_setlkw_list and for all other commands recv_list. Due DLM user space behavior e.g. dlm_controld a request and writing a result back can only happen in an ordered way. That means a lookup and iterating over the recv_list is not required. To place the right plock op as the first entry of recv_list a change to list_move_tail() was made.
This behaviour is for F_SETLKW not possible as multiple waiters can be get a result back in an random order. To avoid a collisions in cases like [0] or [1] this patch adds more fields to compare the plock operations as the lock request is the same. This is also being made in NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We still can't find the exact lock request for a specific result if the lock request is the same, but if this is the case we don't care the order how the identical lock requests get their result back to grant the lock.
[0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/ke... [1] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/ke... [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/incl... [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/l...
Cc: stable@vger.kernel.org Signed-off-by: Alexander Aring aahringo@redhat.com
fs/dlm/plock.c | 47 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 34 insertions(+), 13 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c index c9e1d5f54194..540a30a342f0 100644 --- a/fs/dlm/plock.c +++ b/fs/dlm/plock.c @@ -17,6 +17,7 @@ static DEFINE_SPINLOCK(ops_lock); static LIST_HEAD(send_list); static LIST_HEAD(recv_list); +static LIST_HEAD(recv_setlkw_list); static DECLARE_WAIT_QUEUE_HEAD(send_wq); static DECLARE_WAIT_QUEUE_HEAD(recv_wq);
@@ -392,10 +393,14 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count, spin_lock(&ops_lock); if (!list_empty(&send_list)) { op = list_first_entry(&send_list, struct plock_op, list);
if (op->info.flags & DLM_PLOCK_FL_CLOSE)
if (op->info.flags & DLM_PLOCK_FL_CLOSE) { list_del(&op->list);
else
list_move(&op->list, &recv_list);
} else {
if (op->info.wait)
list_move(&op->list, &recv_setlkw_list);
else
list_move_tail(&op->list, &recv_list);
} memcpy(&info, &op->info, sizeof(info)); } spin_unlock(&ops_lock);
@@ -434,18 +439,34 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, return -EINVAL;
spin_lock(&ops_lock);
list_for_each_entry(iter, &recv_list, list) {
if (iter->info.fsid == info.fsid &&
iter->info.number == info.number &&
iter->info.owner == info.owner) {
list_del_init(&iter->list);
memcpy(&iter->info, &info, sizeof(info));
if (iter->data)
if (info.wait) {
list_for_each_entry(iter, &recv_setlkw_list, list) {
if (iter->info.fsid == info.fsid &&
iter->info.number == info.number &&
iter->info.owner == info.owner &&
iter->info.pid == info.pid &&
iter->info.start == info.start &&
iter->info.end == info.end) {
There is a missing condition for info.ex, otherwise a lock request for F_WRLCK and F_RDLCK could be evaluated as the same request. NFS is doing this check as well by checking on fl1->fl_type == fl2->fl_type, we don't have fl_type but info.ex which is the only difference in F_SETLKW to distinguish F_WRLCK and F_RDLCK.
I will send a v2.
- Alex
linux-stable-mirror@lists.linaro.org