Hi all,
This series is based on previous RFCs/discussions:
Tech topic: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@fb.com/
RFCv1: https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
RFCv2: https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
The background/rationale is covered in more detail in the RFC cover
letters. The TL;DR is:
The goal is to enable userspace driver designs that use VFIO to export
DMABUFs representing subsets of PCI device BARs, and "vend" those
buffers from a primary process to other subordinate processes by fd.
These processes then mmap() the buffers and their access to the device
is isolated to the exported ranges. This is an improvement on sharing
the VFIO device fd to subordinate processes, which would allow
unfettered access.
This is achieved by enabling mmap() of vfio-pci DMABUFs, passed by fd
to subordinate processes. Second, a new revocation mechanism is added
to allow the primary process to forcibly revoke access to
previously-shared BAR spans, even if the subordinate processes haven't
cleanly exited.
(The related topic of safe delegation of iommufd control to the
subordinate processes is not addressed here, and is follow-up work.)
As well as isolation and revocation, another advantage to accessing a
BAR through a VMA backed by a DMABUF is that it's straightforward to
mmap() the buffer with access attributes, such as write-combining.
Feedback from the RFCs requested that, instead of creating
DMABUF-specific vm_ops and .fault paths, to go the whole way and
migrate the existing VFIO PCI BAR mmap() to be backed by a DMABUF too,
resulting in a common vm_ops and fault handler for mmap()s of both the
VFIO device and explicitly-exported DMABUFs. This will help future
iommufd emulation of VFIO Type1 peer-to-peer, making it easier to get
a DMABUF for a VFIO BAR as a DMA target.
mmap() conversion to use DMABUF underneath has been done for vfio-pci,
but not sub-drivers:
nvgrace-gpu's mmap() override path is unchanged; I kept this out of
scope for now not least because I don't have a thorough test setup
for this system. I would prefer to help the nvgrace-gpu maintainers
enable BAR mmap() DMABUFs themselves.
Notes on patches
================
PCI/P2PDMA: Split pool-related cleanup out of pci_p2pdma_release()
PCI/P2PDMA: Add CONFIG_PCI_P2PDMA_CORE
Later in the series, vfio-pci's mmap() is going to depend on
pcim_p2pdma_provider() which depended on CONFIG_PCI_P2PDMA, which
in turn depended on ZONE_DEVICE. That isn't available on 32-bit
and some archs, because they lack MEMORY_HOTPLUG and friends.
VFIO does _not_ require actual P2P to be present for basic mmap()
functionality, only for the optional CONFIG_DMA_SHARED_BUFFER
feature.
These split out p2pdma_core.c under CONFIG_PCI_P2PDMA_CORE (which
currently contains pcim_p2pdma_provider()), and an optional
CONFIG_PCI_P2PDMA which depends on ZONE_DEVICE etc. providing
P2P functionality in the existing p2pdma.c. The first splits
out pool cleanup from the release path, and the second does the
refactor/code move to the new file.
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
The first adds a DMABUF VMA fault handler helper to determine
arbitrary-sized PFNs from ranges in DMABUF. Secondly, refactor
DMABUF export for use by the existing export feature and add a
helper that creates a DMABUF corresponding to a VFIO BAR mmap()
request.
vfio/pci: Convert BAR mmap() to use a DMABUF
The vfio-pci core mmap() creates a DMABUF with the helper, and the
vm_ops fault handler uses the other helper to resolve the fault.
Because this depends on DMABUF structs/code, CONFIG_VFIO_PCI_CORE
needs to depend on CONFIG_DMA_SHARED_BUFFER. The
CONFIG_VFIO_PCI_DMABUF still conditionally enables the export
support code.
NOTE: The user mmap()s a device fd, but the resulting VMA's vm_file
becomes that of the DMABUF. The DMABUF takes ownership of the
device file and put()s it on release, which maintains the existing
behaviour of a VMA keeping the VFIO device open.
BAR zapping then happens via the existing vfio_pci_dma_buf_move()
path, which now needs to unmap PTEs in the DMABUF's address_space.
vfio/pci: Provide a user-facing name for BAR mappings
There was a request for decent debug naming in /proc/<pid>/maps
etc. comparable to the existing VFIO names: since the VMAs are
DMABUFs, they have a "dmabuf:" prefix and can't be 100% identical
to before. This is a user-visible change, but this patch at least
now gives us extra info on the BDF & BAR being mapped.
vfio/pci: Clean up BAR zap and revocation
In general (see NOTE!) the vfio_pci_zap_bars() is now obsolete,
since it unmaps PTEs in the VFIO device address_space which is now
unused. This consolidates all calls (e.g. around reset) with the
neighbouring vfio_pci_dma_buf_move()s into new functions, to
revoke/unrevoke (making the steps clearer).
NOTE: Because drivers can use their own vm_ops and override .mmap,
the core must conservatively assume an overridden .mmap might still
add PTEs to the VFIO device address_space and therefore still does
the zap. A new flag, zap_bars_on_revoke, enables the zap when
.mmap is overridden. A driver that does not need the zap can clear
this to opt-out, e.g. if the driver calls down to the common mmap
(and so uses DMABUFs).
vfio/pci: Support mmap() of a VFIO DMABUF
Adds mmap() for a DMABUF fd exported from vfio-pci.
It was a goal to keep the VFIO device fd lifetime behaviour
unchanged with respect to the DMABUFs. An application can close
all device fds, and this will revoke/clean up all DMABUFs; no
mappings or other access can be performed now. When enabling
mmap() of the DMABUFs, this means access through the VMA is also
revoked. This complicates the fault handler because whilst the
DMABUF exists, it has no guarantee that the corresponding VFIO
device is still alive. Adds synchronisation ensuring the vdev is
available before vdev->memory_lock is touched; this holds the
device registration so that even if the buffer has been cleaned up,
vdev hasn't been freed and so the lock can be safely taken.
vfio/pci: Permanently revoke a DMABUF on request
By weight, this is mostly a rename of revoked to an enum, status.
There are now 3 states for a buffer, usable and revoked
temporary/permanent. A new VFIO feature is added,
VFIO_DEVICE_FEATURE_DMA_BUF_REVOKE, which takes a DMABUF (exported
from the same device) and permanently revokes it. Thus a userspace
driver can guarantee any downstream consumers of a shared fd are
prevented from accessing a BAR range, and that range can be reused.
NOTE: This might block userspace, waiting on importers to detach.
The code doing revocation in vfio_pci_dma_buf_move() is moved, to a
common function for use by ..._move() and this new feature.
NOTE: See changelog, by request v4 added a condition to the
existing code to elide the unnecessary invalidation/sync on the
un-revoke path.)
vfio/pci: Add mmap() attributes to DMABUF feature
Adds a new VFIO feature, VFIO_DEVICE_FEATURE_DMA_BUF_MEMATTR.
After a DMABUF is exported, this feature is used to set a memory
attribute that will be used by future mmap()s of the DMABUF fd. It
doesn't affect existing maps.
The default is UC, and via the feature one can specify CPU access
as WC. The attribute is an enum/scalar rather than
bitmap/cumulative. The attributes follow a "try-fail" model where
a client can request an attribute and either succeed or fail with
ENOENT if it's unknown; if future attributes are platform-specific
then their support can be probed.
Since it's just UC/WC for now, there is no reservation or numeric
structure to the namespace yet, but we could support
system/arch-specific values in future by carving out base +
arch-specific + IMPDEF ranges.
Testing
=======
(The [RFC ONLY] userspace test program, for QEMU edu-plus, can be
found in the GitHub branch below. It at least illustrates how the
export, map, revoke, attribute, and close semantics interoperate.)
This code has been tested in mapping DMABUFs of single/multiple ranges
from multiple BARs, aliasing mmap()s, aliasing ranges across DMABUFs,
vm_pgoff > 0, revocation, shutdown/cleanup scenarios, and hugepage
mappings. I've lightly tested WC mappings also (by observing
resulting PTEs as having the correct attributes...). No regressions
observed on the VFIO selftests, or on our internal vfio-pci
applications. VFIO on i386 has been build-tested.
Dear Reviewers,
===============
I was grateful for the reviews and Reviewed-Bys on previous versions
from several of you; I've added some Reviewed-Bys.
But, various changes were also requested and I'm erring on the
conservative side: I have NOT included your Reviewed-Bys where the
patch has changed after your review (or where requested changes ended
up more than super-trivial). I hope that's okay.
End
===
This is based on v7.1.
These commits are on GitHub for easier browsing, along with
"[RFC ONLY] selftests: vfio: Add standalone vfio_dmabuf_mmap_test":
https://github.com/metamev/linux/compare/8cd9520d35a6...dev/mev/vfio-dmabuf…
Thanks for reading,
Matt
================================================================================
Changelog:
v4:
- Rebased on v7.1
- 1/9: Split the p2pdma.c pool release code into a new patch, making
the second patch a pure code-move exercise. Reworded the commit
messages, comment cleanups.
- 2/9 Look up PFNs helper: renamed DMABUF range search loop variables
for clarity, and simplified search loop and fallthrough/exit logic.
Moved WARN to ratelimited warnings. Rearranged pfn arithmetic to
avoid potential overflow. Clarified comments, better explanation
of vma_pgoff_adjust, spelling.
- 4/9 convert BAR mmap(): Trivial comment change, move. Used 'true'
instead of '1' for unmap_mapping_range() arg, consistent with
elsewhere in vfio-pci.
- 5/9 User-facing name for mappings: Uses kasprintf() instead.
- 6/9 Clean up BAR zap: Renamed functions to simplify/shorten names,
emphasising "revoke/unrevoke" actions. (Then internally this will
do a DMABUF revoke and possibly a PTE zap.)
NOTE: We debated ordering of a previous zap before a transition
to D0 before reset. The conclusion was the current patch is OK.
- 7/9 mmap() of a DMABUF: Added helper vfio_pci_set_vma_ops() to keep
ops struct static/local. Squashed vfio_pci_dma_buf_mmap() comments
for space/clarity.
- 8/9 DMABUF revoke: Fix typos in commit message. Implement request
to add a condition to revocation path to only invalidate/wait when
a buffer is being revoked (avoiding it on un-revoke).
NOTE: This means a (small) change to the code moved from
vfio_pci_dma_buf_move.
NOTE: Also, we discussed adding warnings for setting a state
matching the existing state; I didn't add them after all, because
these situations can occur in normal usage (e.g. a revoke of a
buffer from a device in D3, or a cleanup of a lingering
user-revoked buffer).
v3:
https://lore.kernel.org/all/20260610154327.37758-1-matt@ozlabs.org/
- Refactor p2pdma.c: split out pcim_p2pdma_provider() into a new
p2pdma_core.c under CONFIG_PCI_P2PDMA_CORE.
- vfio_pci_dma_buf_find_pfn() cleanups: Rename parameter to priv,
remove bad WARN, move unnecessary addition out of inner loop.
- vfio_pci_core_mmap_prep_dmabuf() cleanups: Remove uint32_t, remove
unnecessary const variable.
- Conversion of BAR mmap() to DMABUF: VFIO_PCI_DMABUF depends on
VFIO_PCI_CORE. vfio_pci_mmap_huge_fault(): move dev_dbg() outside
of lock (argh), remove READ_ONCE(vdev)/move priv->vdev read and
improve comment explanation.
- On revoke, BAR zap defaults to on if .mmap is overridden by a
driver (and implements an opt-out for the hisi_acc_vfio_pci driver,
which overrides mmap() with a simple wrapper that ends up using the
common DMABUF mmap() rather than custom mappings).
- Reworded commit "vfio/pci: Support mmap() of a VFIO DMABUF" message
for clarity. Reworded vfio_pci_mmap_huge_fault() comment for
accuracy (vdev validity depends on not being revoked).
Added comment in mmap() explaining belt-and-braces approach for
early detecting a map of a revoked buffer.
- Revoke now uses VFIO_DEVICE_FEATURE_DMA_BUF rather than a new
ioctl(); instead of the revoke helper taking 'revoked/permanently'
bools, it's become vfio_pci_dma_buf_set_status() taking a single
status enum. Added a READ_ONCE() for the lockless test of
priv->vdev (flags it as intentional, even if it's in practice going
to be a single-copy atomic read).
- Removed GET on vfio_pci_core_feature_dma_buf_memattr(), removed
unnecessary taking of memory_lock, fixed error return values. In
particular, removes ENOTSUPP, and uses ENOENT to indicate an
unknown attribute enum value was passed to SET. In the discussion
here,
https://lore.kernel.org/all/20260602131417.41366391@shazbot.org/
we'd agreed on EOPNOTSUPP before I realised that's already used
elsewhere. ENOENT uniquely indicates an unknown attribute.
v2:
https://lore.kernel.org/all/20260527102319.100128-1-mattev@meta.com/
- Rebase on VFIO next, picking up Alex's
vfio_pci_dma_buf_move()/vfio_pci_dma_buf_cleanup() fixes, and
dropping "vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put"
- Added "PCI/P2PDMA: Add CONFIG_PCI_P2PDMA_CORE" so that the
newly-added vfio-pci hard dependency on the P2PDMA provider instead
pulls in the _CORE variant and not the full-fat CONFIG_PCI_P2PDMA.
This means that the core of vfio-pci does not need ZONE_DEVICE, but
if it's available then enabling P2PDMA in turn enables DMABUF
export. Fixes basic VFIO operation on 32b or other platforms without
ZONE_DEVICE.
- Fixed comment inaccuracy in vfio_pci_dma_buf_revoke() and cleaned
up vdev validity test.
- vfio_pci_dma_buf_find_pfn(): use PAGE_ALIGN(), better span variable
naming, OVF check
- Made vm_pgoffs use consistent (keeping the resource index at the
top and masking where offset is used). For BAR mmap, use new
vma_pgoff_adjust to create the DMABUF with the exact mmap()ed span
instead of from the start of the BAR with an invisible portion
before the mapping.
- Added VFIO_DEVICE_FEATURE_DMA_BUF_MEMATTR to set memory attributes,
instead of using the export `flags` field.
- vfio_pci_ioctl_reset: Moved vfio_pci_zap_revoke_bars()
(effectively, vfio_pci_dma_buf_move()) back after D0 transition.
Note, if a BAR zap is needed, it's done in this function so now
happens after this D0 transition with the _move; it was done before
it at the time of the memory_lock taking.
- Minimised vfio_pci_dma_buf_mmap() (removed redundant span check),
added READ_ONCE for memattr
- Misc fixes: comment in DMABUF name generation, removed superfluous
READ_ONCE from faulthandler
v1:
https://lore.kernel.org/kvm/20260416131815.2729131-1-mattev@meta.com/
- Cleanup of the common DMABUF-aware VMA vm_ops fault handler and
export code.
- Fixed a lot of races, particularly faults racing with DMABUF
cleanup (if the VFIO device fds close, for example).
- Added nicer human-readable names for VFIO mmap() VMAs
RFCv2: Respin based on the feedback/suggestions:
https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
- Transform the existing VFIO BAR mmap path to also use DMABUFs
behind the scenes, and then simply share that code for
explicitly-mapped DMABUFs. Jason wanted to go that direction to
enable iommufd VFIO type 1 emulation to pick up a DMABUF for an IO
mapping.
- Revoke buffers using a VFIO device fd ioctl
RFCv1:
https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
Matt Evans (10):
PCI/P2PDMA: Split pool-related cleanup out of pci_p2pdma_release()
PCI/P2PDMA: Add CONFIG_PCI_P2PDMA_CORE
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
vfio/pci: Convert BAR mmap() to use a DMABUF
vfio/pci: Provide a user-facing name for BAR mappings
vfio/pci: Clean up BAR zap and revocation
vfio/pci: Support mmap() of a VFIO DMABUF
vfio/pci: Permanently revoke a DMABUF on request
vfio/pci: Add mmap() attributes to DMABUF feature
MAINTAINERS | 2 +-
drivers/pci/Kconfig | 10 +-
drivers/pci/Makefile | 1 +
drivers/pci/p2pdma.c | 109 +---
drivers/pci/p2pdma.h | 29 +
drivers/pci/p2pdma_core.c | 118 ++++
drivers/vfio/pci/Kconfig | 5 +-
drivers/vfio/pci/Makefile | 3 +-
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 8 +
drivers/vfio/pci/vfio_pci_config.c | 30 +-
drivers/vfio/pci/vfio_pci_core.c | 211 +++++--
drivers/vfio/pci/vfio_pci_dmabuf.c | 568 +++++++++++++++---
drivers/vfio/pci/vfio_pci_priv.h | 63 +-
include/linux/pci-p2pdma.h | 24 +-
include/linux/pci.h | 2 +-
include/linux/vfio_pci_core.h | 1 +
include/uapi/linux/vfio.h | 47 ++
17 files changed, 960 insertions(+), 271 deletions(-)
create mode 100644 drivers/pci/p2pdma.h
create mode 100644 drivers/pci/p2pdma_core.c
--
2.50.1 (Apple Git-155)
On Wed, 01 Jul 2026 18:08:13 +0200, Thierry Reding wrote:
> From: Thierry Reding <treding(a)nvidia.com>
>
> Add the memory-region and memory-region-names properties to the bindings
> for the display controllers and the host1x engine found on various Tegra
> generations. These memory regions are used to access firmware-provided
> framebuffer memory as well as the video protection region.
>
> Signed-off-by: Thierry Reding <treding(a)nvidia.com>
> ---
> Changes in v3:
> - document properties for VIC
> ---
> .../devicetree/bindings/display/tegra/nvidia,tegra124-vic.yaml | 8 ++++++++
> .../devicetree/bindings/display/tegra/nvidia,tegra186-dc.yaml | 10 ++++++++++
> .../devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml | 10 +++++++++-
> .../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 7 +++++++
> 4 files changed, 34 insertions(+), 1 deletion(-)
>
My bot found errors running 'make dt_binding_check' on your patch:
yamllint warnings/errors:
dtschema/dtc warnings/errors:
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml: properties:memory-region-names: 'anyOf' conditional failed, one must be fixed:
'maxitems' is not one of ['$ref', 'additionalItems', 'additionalProperties', 'allOf', 'anyOf', 'const', 'contains', 'default', 'dependencies', 'dependentRequired', 'dependentSchemas', 'deprecated', 'description', 'else', 'enum', 'exclusiveMaximum', 'exclusiveMinimum', 'items', 'if', 'minItems', 'minimum', 'maxItems', 'maximum', 'multipleOf', 'not', 'oneOf', 'pattern', 'patternProperties', 'properties', 'required', 'then', 'typeSize', 'unevaluatedProperties', 'uniqueItems']
'type' was expected
from schema $id: http://devicetree.org/meta-schemas/keywords.yaml
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml: properties:memory-region-names:items: {'enum': ['framebuffer', 'protected']} is not of type 'array'
from schema $id: http://devicetree.org/meta-schemas/string-array.yaml
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml: properties:memory-region-names: Additional properties are not allowed ('maxitems' was unexpected)
from schema $id: http://devicetree.org/meta-schemas/string-array.yaml
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml: properties:memory-region-names:items: {'enum': ['framebuffer', 'protected']} is not of type 'array'
from schema $id: http://devicetree.org/meta-schemas/string-array.yaml
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-dc.yaml: properties:memory-region-names: Additional properties are not allowed ('maxitems' was unexpected)
from schema $id: http://devicetree.org/meta-schemas/string-array.yaml
doc reference errors (make refcheckdocs):
See https://patchwork.kernel.org/project/devicetree/patch/20260701-tegra-vpr-v3…
The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.
On Wed, 01 Jul 2026 18:08:12 +0200, Thierry Reding wrote:
> From: Thierry Reding <treding(a)nvidia.com>
>
> The Video Protection Region (VPR) found on NVIDIA Tegra chips is a
> region of memory that is protected from CPU accesses. It is used to
> decode and play back DRM protected content.
>
> It is a standard reserved memory region that can exist in two forms:
> static VPR where the base address and size are fixed (uses the "reg"
> property to describe the memory) and a resizable VPR where only the
> size is known upfront and the OS can allocate it wherever it can be
> accomodated.
>
> Reviewed-by: Rob Herring (Arm) <robh(a)kernel.org>
> Signed-off-by: Thierry Reding <treding(a)nvidia.com>
> ---
> Changes in v2:
> - add examples for fixed and resizable VPR
> ---
> .../nvidia,tegra-video-protection-region.yaml | 76 ++++++++++++++++++++++
> 1 file changed, 76 insertions(+)
>
My bot found errors running 'make dt_binding_check' on your patch:
yamllint warnings/errors:
dtschema/dtc warnings/errors:
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/reserved-memory/nvidia,tegra-video-protection-region.example.dtb: protected@2a8000000 (nvidia,tegra-video-protection-region): reg: [[2, 2818572288], [0, 1879048192]] is too long
from schema $id: http://devicetree.org/schemas/reserved-memory/nvidia,tegra-video-protection…
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/reserved-memory/nvidia,tegra-video-protection-region.example.dtb: protected@2a8000000 (nvidia,tegra-video-protection-region): Unevaluated properties are not allowed ('no-map', 'reg' were unexpected)
from schema $id: http://devicetree.org/schemas/reserved-memory/nvidia,tegra-video-protection…
doc reference errors (make refcheckdocs):
See https://patchwork.kernel.org/project/devicetree/patch/20260701-tegra-vpr-v3…
The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.
Every devmem dmabuf binding hands the page_pool PAGE_SIZE niovs today.
On NICs that consume one descriptor per netmem, this caps a single RX
descriptor at PAGE_SIZE and burns CPU on buffer churn.
In this series, we add a bind-time netlink attribute,
NETDEV_A_DMABUF_RX_BUF_SIZE, that lets userspace request a larger niov size
(power of two >= PAGE_SIZE). Drivers must opt in via
queue_mgmt_ops.QCFG_RX_PAGE_SIZE.
Selftests use udmabuf, but udmabuf sgtables were previously hardcoded to
PAGE_SIZE. This series modifies udmabuf to respect folio sizes in its exported
sgtable. The result is that when backing udmabuf with MFD_HUGETLB 2MB pages,
the sgtable is populated with 2MB entries, allowing devmem's gen_pool to carve
out large (eg. 64K) niovs.
Measurements
------------
Setup: kperf devmem RX/TX cuda, 4 flows, 64 MB messages, 60s, dctcp,
num-rx-queues=4, dmabuf-rx/tx-size-mb=2048, 10 runs per niov size,
mlx5.
niov RX dev Gbps RX flow avg Gbps app sys %
----- ---------------- ----------------- ----------------
4K 300.63 +/- 53.21 75.16 +/- 13.30 54.15 +/- 10.23
16K 321.35 +/- 28.20 80.34 +/- 7.05 41.05 +/- 8.87
32K 347.63 +/- 2.20 86.91 +/- 0.55 44.54 +/- 3.51
64K 332.11 +/- 14.26 83.03 +/- 3.56 35.47 +/- 3.11
RX app sys % drops ~19% from 4K to 64K.
kperf support (not yet merged):
https://github.com/facebookexperimental/kperf/commit/8837577f920876bce6986e…
Signed-off-by: Bobby Eshleman <bobbyeshleman(a)meta.com>
---
Changes in v4:
- ncdevmem: fix the possible overflow in ncdevmem (Sashiko)
- drop the udmabuf patch because the fix is now already in net-next
- silenced two pylint complaints in devmem_lib.py
- Link to v3: https://lore.kernel.org/r/20260612-tcpdm-large-niovs-v3-0-a3b693e76fcb@meta…
Changes in v3:
- fix a bunch of non-reverse christmas tree declarations (Stan)
- remove extra uint32 cast for getpagesize() (Stan)
- remove overzealous strtoul checking (Stan)
- remove value checks that the kernel already performs on rx_buf_size
(Stan)
- Link to v2: https://lore.kernel.org/r/20260611-tcpdm-large-niovs-v2-0-ee2bf15e7523@meta…
Changes in v2:
- Use NL_SET_ERR_MSG_FMT for sg alignment failure details (Stan)
- Keep -E2BIG (not a direct ask, but seemed preferred, Stan)
- Update udmabuf commit message and comments explaining why
"one sg ent per folio" is useful (Christian)
- Set/restore nr_hugepages in py harness (Stan)
- Link to v1: https://lore.kernel.org/r/20260603-tcpdm-large-niovs-v1-0-f37a4ac6726c@meta…
---
Bobby Eshleman (3):
net: devmem: allow rx-buf-size > PAGE_SIZE per dmabuf binding
selftests/net: ncdevmem: add -b option to set rx-buf-size on bind
selftests/net: devmem.py: add check_rx_large_niov
Documentation/netlink/specs/netdev.yaml | 8 +++
include/uapi/linux/netdev.h | 1 +
net/core/devmem.c | 55 +++++++++++---------
net/core/devmem.h | 13 +++--
net/core/netdev-genl-gen.c | 5 +-
net/core/netdev-genl.c | 19 ++++++-
tools/include/uapi/linux/netdev.h | 1 +
tools/testing/selftests/drivers/net/hw/devmem.py | 12 ++++-
.../testing/selftests/drivers/net/hw/devmem_lib.py | 59 +++++++++++++++++++++-
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 36 +++++++++++--
.../testing/selftests/drivers/net/hw/nk_devmem.py | 11 +++-
11 files changed, 180 insertions(+), 40 deletions(-)
---
base-commit: 805185b7c7a1069e407b6f7b3bc98e44d415f484
change-id: 20260602-tcpdm-large-niovs-56523a3a1077
Best regards,
--
Bobby Eshleman <bobbyeshleman(a)meta.com>
Both Tvrtko [1] and I [2] have recently proposed some improvals for
drm_sched.
While taking Tvrtko's feedback into account for my patch, I realized
that both his and my patch can be fully replaced with a bigger and far
more beautiful series.
If I am not mistaken, it turns out that the entire entity->entity_idle
completion is also nothing but a workaround around the grave mistake of
not using the greatest helper with parallel programming that exists in
computer science: Locking.
This series adds locking to the last_scheduled field and all checks
related to detect the idleness of the entity. As before, the
job_scheduled event queue causes the periodic checks.
This way, we can get rid of memory barriers, RCU, a few lines of code,
make things more readable, understandable...
Tested with drm-sched-unit tests. I'm a bit busy right now, but wanted
to show you guys the idea. Before merging I'd test it more exhaustively
with Nouveau.
Greetings,
Philipp
[1] https://lore.kernel.org/dri-devel/20260611123423.39819-1-tvrtko.ursulin@iga…
[2] https://lore.kernel.org/dri-devel/20260626081942.2122144-2-phasta@kernel.or…
Philipp Stanner (5):
drm/sched: Protect entity->last_scheduled with spinlock
drm/sched: Lock spsc_queue in drm_sched_entity_pop_job()
drm/sched: Avoid lock cycle for sched_entity
drm/sched: Lock drm_sched_entity_is_idle()
drm/sched: Remove entity->entity_idle
drivers/gpu/drm/scheduler/sched_entity.c | 75 +++++++++++-------------
drivers/gpu/drm/scheduler/sched_main.c | 2 -
drivers/gpu/drm/scheduler/sched_rq.c | 5 +-
include/drm/gpu_scheduler.h | 16 ++---
4 files changed, 41 insertions(+), 57 deletions(-)
base-commit: be4f10d44757211fd656fa57f37034657f26c883
--
2.54.0
On Tue, 2026-06-30 at 12:04 -0400, Shahyan Soltani wrote:
> The num_fences, count, i, and j variables in dma_fence_dedup_array() and
> __dma_fence_unwrap_merge() have inconsistent integer types, mixing both
> unsigned int and int.
>
> Use type size_t consistently for these instead, and update the return
> type of dma_fence_dedup_array() accordingly.
>
> Signed-off-by: Shahyan Soltani <shahyan.soltani(a)amd.com>
> Suggested-by: Philipp Stanner <phasta(a)mailbox.org>
Thx for fixing this, cool work
Reviewed-by: Philipp Stanner <phasta(a)kernel.org>
> ---
> The rest of the subsystems (dma_resv_reserve_fences, drm_exec, drm_gpuvm,
> xe, nouveau, etc) uses "unsigned int" for num_fences, for example the
> amdgpu caller in amdgpu_userq_fence.c.
You mention that because you can't / won't change them?
My suggestion actually has been to go for `unsigned int`. Christian
opinioned that it should be size_t. Shouldn't be a big deal, though, my
issue was just the possibility for negative numbers.
Christian, would it be a bit better to be consistent with the parties
Shayan mentions?
P.
>
> Â drivers/dma-buf/dma-fence-unwrap.c | 8 ++++----
>  include/linux/dma-fence-unwrap.h  | 6 ++++--
> Â 2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
> index 53bb40e70b27..65e87d263c3a 100644
> --- a/drivers/dma-buf/dma-fence-unwrap.c
> +++ b/drivers/dma-buf/dma-fence-unwrap.c
> @@ -93,9 +93,9 @@ static int fence_cmp(const void *_a, const void *_b)
> Â *
> Â * Return: Number of unique fences remaining in the array.
> Â */
> -int dma_fence_dedup_array(struct dma_fence **fences, int num_fences)
> +size_t dma_fence_dedup_array(struct dma_fence **fences, size_t num_fences)
> Â {
> - int i, j;
> + size_t i, j;
> Â
> Â sort(fences, num_fences, sizeof(*fences), fence_cmp, NULL);
> Â
> @@ -115,14 +115,14 @@ int dma_fence_dedup_array(struct dma_fence **fences, int num_fences)
> Â EXPORT_SYMBOL_GPL(dma_fence_dedup_array);
> Â
> Â /* Implementation for the dma_fence_merge() marco, don't use directly */
> -struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
> +struct dma_fence *__dma_fence_unwrap_merge(size_t num_fences,
> Â Â Â struct dma_fence **fences,
> Â Â Â struct dma_fence_unwrap *iter)
> Â {
> Â struct dma_fence *tmp, *unsignaled = NULL, **array;
> Â struct dma_fence_array *result;
> Â ktime_t timestamp;
> - int i, count;
> + size_t i, count;
> Â
> Â count = 0;
> Â timestamp = ns_to_ktime(0);
> diff --git a/include/linux/dma-fence-unwrap.h b/include/linux/dma-fence-unwrap.h
> index 62df222fe0f1..7bfacdf79de2 100644
> --- a/include/linux/dma-fence-unwrap.h
> +++ b/include/linux/dma-fence-unwrap.h
> @@ -8,6 +8,8 @@
> Â #ifndef __LINUX_DMA_FENCE_UNWRAP_H
> Â #define __LINUX_DMA_FENCE_UNWRAP_H
> Â
> +#include <linux/types.h>
> +
> Â struct dma_fence;
> Â
> Â /**
> @@ -48,11 +50,11 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor);
> Â for (fence = dma_fence_unwrap_first(head, cursor); fence; \
> Â Â Â Â Â fence = dma_fence_unwrap_next(cursor))
> Â
> -struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
> +struct dma_fence *__dma_fence_unwrap_merge(size_t num_fences,
> Â Â Â struct dma_fence **fences,
> Â Â Â struct dma_fence_unwrap *cursors);
> Â
> -int dma_fence_dedup_array(struct dma_fence **array, int num_fences);
> +size_t dma_fence_dedup_array(struct dma_fence **array, size_t num_fences);
> Â
> Â /**
> Â * dma_fence_unwrap_merge - unwrap and merge fences
In a recent discussion with Philip and Danilo the question came up what
was already tried and never finished to cleanup the dma_fence framework.
So here are the different ideas I came with but never fully finished,
with the patches itself modernized and rebased on top of drm-misc-next.
The main goal of those changes is to make it easier to implement dma_fence
backends and don't enforce unnecessary constrains on implementations.
As first step the locking around the dma_fence_ops.signaled callback is
made consistent by removing the dma_fence_is_signaled_locked() function.
This was mostly used by backends itself, but if polling the HW is desired
the backends can call their own functions for this directly without going
through the dma-fence layer.
XE actually seems to be the only driver which make use of that for a bit
more handling. For all other cases testing the signaled flag should be enough.
Then forcefully calling dma_fence_signaled() is removed from the dma-fence
layer and moved into the backend implementations.
This allows the backend implementations to cleanup after they have
signaled the fence. Such cleanup can include removing now signaled fences
from lists, dropping references, starting work etc....
Especially nouveau seems to have some really messy workaround because of
that involving the DMA_FENCE_FLAG_USER_BITS and installing callbacks
because the reference to the context couldn't be dropped directly after
signaling. This can now be cleaned up as far as I can see.
In the long term this should also allow reworking the error handling, e.g.
removing dma_fence_set_error() and instead giving the error as mandatory
parameter to dma_fence_signal().
Then the last piece is dropping calling enable_signaling callback with the
dma_fence lock held. This makes it possible for backends to acquire locks
which are semantically ordered outside of the dma_fence lock.
This is necessary to allows using the dma_fence inline lock in more cases,
previously backends used some common external lock for their dma_fences to
for example make it possible remove fences from linked lists.
Please comment and review,
Christian.
Currently, `fill_sg_entry()` splits the scatterlist using `UINT_MAX`.
This creates a non-page-aligned DMA length (`0xFFFFFFFF`) for the
first entry, resulting in non-page-aligned DMA addresses for all
subsequent entries.
While the underlying IOMMU mapping may be contiguous, hardware
DMA engines often require explicit address alignment (e.g., page,
cacheline, or storage sector boundaries). Passing unaligned
addresses and lengths can cause explicit failures in DMA descriptor
creation or silent data corruption if lower unaligned bits are
truncated.
Fix this by splitting the scatterlist by the largest possible page
aligned chunk within `UINT_MAX` (`ALIGN_DOWN(UINT_MAX, PAGE_SIZE)`).
This ensures all scatterlist DMA addresses and lengths remain page
aligned and satisfy hardware constraints.
Page-aligned entries allow the system to cleanly chunk payloads into
PCIe MaxPayloadSize (MPS) (e.g., 128 bytes, 256 bytes, 512 bytes).
As a result, this may help reduce TLP fragmentation in P2P transfers
and alleviate potential congestion within a logical PCIe switch
partition, especially when Relaxed Ordering is not possible due to
hardware constraints.
Reported-by: sashiko-bot <sashiko-bot(a)kernel.org>
Closes: https://lore.kernel.org/all/20260609165431.778061F00893@smtp.kernel.org/
Fixes: 3aa31a8bb11e ("dma-buf: provide phys_vec to scatter-gather mapping routine")
Cc: stable(a)vger.kernel.org
Signed-off-by: David Hu <xuehaohu(a)google.com>
---
drivers/dma-buf/dma-buf-mapping.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/dma-buf/dma-buf-mapping.c b/drivers/dma-buf/dma-buf-mapping.c
index 794acff2546a..f2bde38fdb1f 100644
--- a/drivers/dma-buf/dma-buf-mapping.c
+++ b/drivers/dma-buf/dma-buf-mapping.c
@@ -5,6 +5,9 @@
*/
#include <linux/dma-buf-mapping.h>
#include <linux/dma-resv.h>
+#include <linux/align.h>
+
+#define MAX_ENT_SZ ALIGN_DOWN(UINT_MAX, PAGE_SIZE)
static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length,
dma_addr_t addr)
@@ -12,9 +15,9 @@ static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length,
unsigned int len, nents;
int i;
- nents = DIV_ROUND_UP(length, UINT_MAX);
+ nents = DIV_ROUND_UP(length, MAX_ENT_SZ);
for (i = 0; i < nents; i++) {
- len = min_t(size_t, length, UINT_MAX);
+ len = min_t(size_t, length, MAX_ENT_SZ);
length -= len;
/*
* DMABUF abuses scatterlist to create a scatterlist
@@ -24,7 +27,7 @@ static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length,
* does not require the CPU list for mapping or unmapping.
*/
sg_set_page(sgl, NULL, 0, 0);
- sg_dma_address(sgl) = addr + (dma_addr_t)i * UINT_MAX;
+ sg_dma_address(sgl) = addr + (dma_addr_t)i * MAX_ENT_SZ;
sg_dma_len(sgl) = len;
sgl = sg_next(sgl);
}
@@ -41,14 +44,14 @@ static unsigned int calc_sg_nents(struct dma_iova_state *state,
if (!state || !dma_use_iova(state)) {
for (i = 0; i < nr_ranges; i++)
- nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);
+ nents += DIV_ROUND_UP(phys_vec[i].len, MAX_ENT_SZ);
} else {
/*
* In IOVA case, there is only one SG entry which spans
* for whole IOVA address space, but we need to make sure
* that it fits sg->length, maybe we need more.
*/
- nents = DIV_ROUND_UP(size, UINT_MAX);
+ nents = DIV_ROUND_UP(size, MAX_ENT_SZ);
}
return nents;
--
2.55.0.rc0.738.g0c8ab3ebcc-goog
On Tue, 2026-06-30 at 10:23 +0100, Tvrtko Ursulin wrote:
>
> On 26/06/2026 09:19, Philipp Stanner wrote:
> > The entity->last_scheduled field has always been set and read with
> > special RCU functions in addition to memory barriers. There is no
> > obvious reason for that, since the entity lock is available and taken at
> > all places that evaluate the last_scheduled field. The only exception is
> > drm_sched_entity_error(), which is not performance critical in any way.
>
> I agree this looks odd since all call sites apart from
> drm_sched_entity_error() use
> "rcu_dereference_check(entity->last_scheduled, true);" ie. "ignore" the RCU.
>
> Btw this was added in:
>
> commit 70102d77ff22dd88a0111b1c3bac5099ac5d0425
> Author: Christian König <christian.koenig(a)amd.com>
> Date:Â Â Mon Apr 17 17:32:11 2023 +0200
>
> Â Â Â Â drm/scheduler: add drm_sched_entity_error and use rcu for
> last_scheduled
>
> You may want to add this as a reference in the commit message.
I did git-blame for that commit. It looks like this:
drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduled
Switch to using RCU handling for the last scheduled job and add a
function to return the error code of it.
It's a good example of why I think it's so vital to write verbose
commit messages. The only way to find out why this was added is to ask
the author, if he's still around [which is the case in this case].
I can't see the value of adding a link? That commit says "add foo" and
my commit says "remove foo because it achieves nothing".
> I guess it relied on dma-fence RCU destruction to enable lockless
> lookups from the AMD submit path. Given how many other locks we have in
> those paths it is probably noise to have one more so maybe it is a win
> to remove some barriers and those rcu_dereference_check-true lines. I
> think Christian will need to comment.
My argument is more that locks are the right tool to use unless there
is proof to the contrary.
>
> > Improve robustness, readability and maintainability by replacing RCU and
> > barriers with the lock.
> >
> > As a preparational step, while at it, also guard spsc_queue_pop() with
> > the lock, since spsc_queue is deprecated and supposed to be replaced
> > with a locked list.
>
> You would have said to split the logical changes into separate patches.
Me? :D
In this case, a lock that did not exist is added from nowhere. But I
tend to think that you are right. We could leave spsc_queue lockless
for now. That's cleaner.
>
> >
[…]
> >
> > Â struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> > Â {
> > + /* Helper to avoid dropping the reference while the entity lock is held,
> > + * just to have some more robustness.
> > + */
> > + struct dma_fence *prev_last_scheduled;
> > Â Â struct drm_sched_job *sched_job;
> > Â
> > Â Â sched_job = drm_sched_entity_queue_peek(entity);
> > @@ -523,19 +532,20 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
> > Â Â if (entity->guilty && atomic_read(entity->guilty))
> > Â Â dma_fence_set_error(&sched_job->s_fence->finished, -ECANCELED);
> > Â
> > - dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));
> > - rcu_assign_pointer(entity->last_scheduled,
> > - Â Â dma_fence_get(&sched_job->s_fence->finished));
> > + spin_lock(&entity->lock);
> > + prev_last_scheduled = entity->last_scheduled;
> > + entity->last_scheduled = dma_fence_get(&sched_job->s_fence->finished);
> > Â
> > - /*
> > - * If the queue is empty we allow drm_sched_entity_select_rq() to
> > - * locklessly access ->last_scheduled. This only works if we set the
> > - * pointer before we dequeue and if we a write barrier here.
> > + /* A recent rework required taking the spinlock above. Since spsc_queue
> > + * is scheduled for removal as per the DRM-TODO-list, we access it here
> > + * locked already to prepare for that cleanup.
> > + *
> > + * TODO: Fully replace spsc_queue with a locked (h)list.
> > Â Â */
> > - smp_wmb();
> > -
> > Â Â spsc_queue_pop(&entity->job_queue);
> > + spin_unlock(&entity->lock);
> > Â
> > + dma_fence_put(prev_last_scheduled);
> > Â Â drm_sched_rq_pop_entity(entity);
>
> Notice the entity->lock ends up cycled twice for no good reason (second
Getting rid of hard to understand barriers + RCU *is* a _very_ good
reason.
> is in drm_sched_rq_pop_entity()). So I would suggest you somehow reduce
> that to once. Probably just pull out entity->lock out of the
> drm_sched_rq_pop_entity() to drm_sched_entity_pop_job()?
Can you see danger in sense of a significant performance regression
because of that?
>
> I guess if you do that then the "while at it" part of the commit message
> can be "upgraded" to "spsc_queue_pop() being under the lock as a
> consequence of the rework" and then no need to split it.
I agree with you that it should be *downgraded* instead.
>
> > Â
> > Â Â /* Jobs and entities might have different lifecycles. Since we're
> > @@ -561,21 +571,15 @@ void drm_sched_entity_select_rq(struct drm_sched_entity *entity)
> > Â Â if (spsc_queue_count(&entity->job_queue))
> > Â Â return;
> > Â
> > - /*
> > - * Only when the queue is empty are we guaranteed that
> > - * drm_sched_run_job_work() cannot change entity->last_scheduled. To
> > - * enforce ordering we need a read barrier here. See
> > - * drm_sched_entity_pop_job() for the other side.
> > - */
> > - smp_rmb();
> > -
> > - fence = rcu_dereference_check(entity->last_scheduled, true);
> > + spin_lock(&entity->lock);
> > + fence = entity->last_scheduled;
> > Â
> > Â Â /* stay on the same engine if the previous job hasn't finished */
> > - if (fence && !dma_fence_is_signaled(fence))
> > + if (fence && !dma_fence_is_signaled(fence)) {
> > + spin_unlock(&entity->lock);
>
> Have you tried with lockdep to see if there are any hidden lock
> inversions with this?
As far as I could grep really no one touches the entity lock (which is
not surprising, since the entire drm_sched design resolves around the
central philosophy: "NEVER use a spinlock unless you absolutely have
to". When you look at the old code and documentation, you see that
locks were really only ever used to protect lists.
Anyways. This is the scheduler's fence. It can never implement any
callback to someone who might interfere with the entity lock, can it?
>
> I also wonder if we could demote this to a flag check only and remove
> any doubt. I don't think opportunistic signalling matter in this code path.
With the new fence API, where we can bypass the ops, that would
probably be the more canonical code. But that's then indeed something
for a separate patch.
P.
Introduction to Wordle Unlimited Experience
Wordle Unlimited is an online word puzzle game that expands the classic word guessing concept into an endless format. Instead of waiting for a daily puzzle, players can enjoy continuous gameplay with unlimited attempts and unlimited rounds. This format makes it appealing for players who enjoy vocabulary challenges, logic thinking, and casual gaming without time restrictions. https://wordleunlimitedgame.org/
The main objective remains simple. Players try to guess a hidden five letter word within a limited number of attempts. After each guess, feedback is provided through color indicators that help narrow down the correct answer. This simple design combined with endless replay value makes Wordle Unlimited highly engaging for both beginners and experienced puzzle players.
How Wordle Unlimited Gameplay Works
Wordle Unlimited follows a straightforward set of rules that are easy to understand. Each round begins with a hidden word that players must discover. Players enter a five letter word as a guess, and the system responds with color coded hints.
Green indicates a correct letter in the correct position. Yellow indicates a correct letter in the wrong position. Gray indicates a letter that is not part of the hidden word. These feedback signals guide players toward the correct answer step by step.
Unlike traditional daily word puzzles, Wordle Unlimited allows continuous play without waiting periods. Players can restart instantly after finishing a round, making it suitable for practice, entertainment, or improving vocabulary skills. This unlimited structure also allows experimentation with different guessing strategies.
Key Features of Wordle Unlimited
One of the most important features of Wordle Unlimited is unlimited gameplay. Players are not restricted to a single puzzle per day, which increases engagement and learning opportunities. This feature is especially useful for users who enjoy repetitive practice or competitive improvement.
Another key feature is accessibility. The game runs directly in a web browser, meaning no installation is required. It works on desktop computers, tablets, and mobile devices, making it convenient for users in different environments.
Wordle Unlimited also maintains a simple interface. There are no complicated menus or distractions. The focus remains entirely on word solving. This minimal design helps players concentrate and improves the overall puzzle solving experience.
Additionally, the game supports learning and vocabulary development. Players are exposed to different word patterns and letter combinations, which can help improve language skills over time.
Effective Strategies for Winning Wordle Unlimited
A strong strategy in Wordle Unlimited begins with choosing a good starting word. Many players select words that contain common vowels and frequently used consonants. This approach increases the chance of identifying correct letters early in the game.
Another effective strategy is to avoid repeating incorrect letters. Once a letter is marked as gray, it is usually best to exclude it from future guesses. This helps narrow down possibilities more efficiently.
Players also benefit from analyzing letter placement carefully. When a letter is marked yellow, it should be repositioned in the next guess. This process of elimination is essential for solving puzzles in fewer attempts.
It is also helpful to think in word patterns rather than random guesses. English words often follow predictable structures, and recognizing these patterns can significantly improve success rates.
Finally, patience plays an important role. Rushing guesses can lead to repeated mistakes. Taking time to evaluate feedback from each attempt leads to more accurate solutions.
Benefits of Playing Wordle Unlimited Regularly
Playing Wordle Unlimited regularly offers several cognitive benefits. One major advantage is vocabulary improvement. Players are exposed to a wide range of words, which helps expand language knowledge over time.
Another benefit is mental exercise. Word puzzle games stimulate logical thinking, pattern recognition, and memory recall. These skills are useful in both academic and professional contexts.
Wordle Unlimited also provides stress relief for many players. The simple structure and short gameplay sessions make it a relaxing activity that can be enjoyed during breaks or free time.
In addition, the unlimited nature of the game allows continuous practice. This is especially beneficial for players who want to improve performance or challenge themselves with faster solving times.
Social interaction is another indirect benefit. Many players enjoy sharing results or competing with friends, which adds a fun and competitive element to the experience.
Why Wordle Unlimited Remains Popular
The popularity of Wordle Unlimited comes from its balance of simplicity and challenge. It does not require advanced gaming skills, yet it still offers a satisfying mental challenge. This combination makes it accessible to a wide audience.
The unlimited format also contributes to its popularity. Players are no longer restricted by daily limits, which means they can engage with the game whenever they want. This flexibility aligns well with modern digital habits.
Another reason for its popularity is its quick gameplay loop. Each round can be completed in a short time, making it ideal for casual entertainment. Despite its simplicity, the game continues to offer new challenges with every hidden word.
Conclusion on Wordle Unlimited Experience
Wordle Unlimited delivers a simple yet highly engaging word puzzle experience that appeals to players of all ages. With unlimited gameplay, easy rules, and strong cognitive benefits, it stands out as an effective and enjoyable word game.