On 24 Jun 2025, at 11:22, David Hildenbrand david@redhat.com wrote:
On 24.06.25 10:07, David Hildenbrand wrote:
Is that actually required? The man page explicitly documents: " EINVAL A previous UFFDIO_API call already enabled one or more features for this userfaultfd. Calling UFF‐ DIO_API twice, the first time with no features set, is explicitly allowed as per the two-step feature detection handshake. " So if that doesn't work, something might be broken.
CCing Nadav and Peter:
Could it be that
commit 22e5fe2a2a279d9a6fcbdfb4dffe73821bef1c90 Author: Nadav Amit nadav.amit@gmail.com Date: Thu Sep 2 14:58:59 2021 -0700
userfaultfd: prevent concurrent API initialization userfaultfd assumes that the enabled features are set once and never changed after UFFDIO_API ioctl succeeded. However, currently, UFFDIO_API can be called concurrently from two different threads, succeed on both threads and leave userfaultfd's features in non-deterministic state. Theoretically, other uffd operations (ioctl's and page-faults) can be dispatched while adversely affected by such changes of features. Moreover, the writes to ctx->state and ctx->features are not ordered, which can - theoretically, again - let userfaultfd_ioctl() think that userfaultfd API completed, while the features are still not initialized. To avoid races, it is arguably best to get rid of ctx->state. Since there are only 2 states, record the API initialization in ctx->features as the uppermost bit and remove ctx->state.
Accidentally broke the documented two-step handshake in the man page where we can avoid closing + reopening the fd?
I agree the code is not correct (and my patch didn’t address this issue), but I don’t see it broke it either.
Unless I’m missing something the code before my patch, when uffdio_api.features == 0, also set ctx->state to UFFD_STATE_RUNNING, which meant another invocation would see (ctx->state != UFFD_STATE_WAIT_API) and fail.
Without testing, the following might fix it if I am right:
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 22f4bf956ba1c..f03e7c980e1c5 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1944,9 +1944,9 @@ static int userfaultfd_move(struct userfaultfd_ctx *ctx, static int userfaultfd_api(struct userfaultfd_ctx *ctx, unsigned long arg) {
unsigned int new_features, old_features = 0; struct uffdio_api uffdio_api; void __user *buf = (void __user *)arg;
unsigned int ctx_features; int ret; __u64 features;
@@ -1990,9 +1990,12 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, goto out; /* only enable the requested features for this uffd context */
ctx_features = uffd_ctx_features(features);
new_features = uffd_ctx_features(features);
/* allow two-step handshake */
if (userfaultfd_is_initialized(ctx))
old_features = UFFD_FEATURE_INITIALIZED; ret = -EINVAL;
if (cmpxchg(&ctx->features, 0, ctx_features) != 0)
if (cmpxchg(&ctx->features, old_features, new_features) != old_features) goto err_out; ret = 0;
I am not sure it is right since you would return EINVAL in this case. It also looks a bit overly complicated - are you concerned about a race? My whole concern about race was that somebody would exploit it to overcome non-cooperative UFFD (IIRC).
So perhaps just add a check for the case features if 0 and be done with it? Something like adding:
ret = 0; if (ctx->features == 0 && features == 0) goto err_out; /* no error but copying of uffdio_api required */
before enabling the requested features for this uffd context.