Hi,
On Mon, Oct 23, 2023 at 10:25:50AM -0700, Doug Anderson wrote:
> On Mon, Oct 23, 2023 at 9:31 AM Yuran Pereira <yuran.pereira(a)hotmail.com> wrote:
> >
> > Since "Clean up checks for already prepared/enabled in panels" has
> > already been done and merged [1], I think there is no longer a need
> > for this item to be in the gpu TODO.
> >
> > [1] https://patchwork.freedesktop.org/patch/551421/
> >
> > Signed-off-by: Yuran Pereira <yuran.pereira(a)hotmail.com>
> > ---
> > Documentation/gpu/todo.rst | 25 -------------------------
> > 1 file changed, 25 deletions(-)
>
> It's not actually all done. It's in a bit of a limbo state right now,
> unfortunately. I landed all of the "simple" cases where panels were
> needlessly tracking prepare/enable, but the less simple cases are
> still outstanding.
>
> Specifically the issue is that many panels have code to properly power
> cycle themselves off at shutdown time and in order to do that they
> need to keep track of the prepare/enable state. After a big, long
> discussion [1] it was decided that we could get rid of all the panel
> code handling shutdown if only all relevant DRM KMS drivers would
> properly call drm_atomic_helper_shutdown().
>
> I made an attempt to get DRM KMS drivers to call
> drm_atomic_helper_shutdown() [2] [3] [4]. I was able to land the
> patches that went through drm-misc, but currently many of the
> non-drm-misc ones are blocked waiting for attention.
>
> ...so things that could be done to help out:
>
> a) Could review patches that haven't landed in [4]. Maybe adding a
> Reviewed-by tag would help wake up maintainers?
>
> b) Could see if you can identify panels that are exclusively used w/
> DRM drivers that have already been converted and then we could post
> patches for just those panels. I have no idea how easy this task would
> be. Is it enough to look at upstream dts files by "compatible" string?
I think it is, yes.
Maxime
Feeling the need for speed and a bit of winter fun, even when the weather outside is frightful? Then maybe it’s time to check out Snow Rider 3D. This simple but surprisingly addictive game offers a thrill of downhill skiing and snowboarding right from your browser, no downloads required. Let’s break down how to jump in and start enjoying this surprisingly engaging title.
https://snowriderfree.com/
Gameplay: Simple Controls, Endless Possibilities
The core gameplay of Snow Rider 3D is deceptively straightforward. You control your character's direction using the left and right arrow keys (or A and D). Your objective? Navigate through a series of procedurally generated slopes littered with obstacles. These obstacles range from simple ramps and rails to more challenging hazards like trees, snowdrifts, and even abandoned shacks.
The beauty of Snow Rider 3D lies in its physics. While simple, they feel surprisingly realistic. You'll need to anticipate turns, adjust your speed, and time your jumps to successfully navigate the terrain. A crash will reset you to the beginning of the course, so precision and patience are key.
The game offers different levels, each presenting a unique challenge. Some focus on speed and long jumps, while others demand skillful maneuvering through tight spaces. As you progress, you unlock new skins and sleds, adding a touch of customization to your experience. Think of it as a casual time-killer that can quickly turn into an hour-long obsession!
Tips for Mastering the Mountain:
Alright, so you're ready to hit the slopes. Here are a few tips to help you improve your runs and avoid those frustrating wipeouts:
Practice Makes Perfect: Don't get discouraged by early crashes. The more you play, the better you'll understand the physics and learn to anticipate the terrain.
Master the Turns: Smooth, controlled turns are essential for maintaining speed and avoiding obstacles. Practice feathering the arrow keys to make subtle adjustments.
Timing is Everything: When approaching jumps and ramps, pay close attention to your speed and angle. A well-timed jump can make all the difference.
Don't Be Afraid to Slow Down: Sometimes, the fastest route isn't the safest. Don't be afraid to ease off the gas and navigate tricky sections with caution. Consider looking up guides for specific levels of Snow Rider 3D at websites like Snow Rider 3D if you’re really struggling.
Experiment with Sleds and Skins: Different sleds may offer slight variations in handling. Try out different options to find one that suits your playstyle.
Conclusion: A Fun and Accessible Winter Escape
Snow Rider 3D is a surprisingly addictive and accessible game that’s perfect for a quick dose of winter fun. It's simple controls and challenging gameplay make it easy to pick up and play, while its procedural generation ensures that each run is a unique experience. So, whether you're looking for a casual time-killer or a challenging skill-based game, Snow Rider 3D is definitely worth checking out.
Ready to unleash your inner fruit ninja without the mess? Then get ready to dive into the addictively simple, yet surprisingly challenging world of Slice Master. This game, readily available online, is perfect for a quick burst of fun or a more extended gaming session. It’s a testament to the fact that gameplay doesn't need to be complex to be engaging.
https://slicemasterfree.com
Gameplay: Simple Mechanics, Endless Fun
The core concept of Slice Master is refreshingly straightforward. Colorful fruits are launched into the air, and your mission is to slice them into pieces before they fall off the screen. You control a virtual blade with your mouse or finger (depending on the platform), and drawing lines through the fruit initiates the slicing action.
The catch? You have limited lives, and letting too many fruits fall untouched will result in a game over. Occasionally, you'll also encounter bombs mixed in with the fruit barrage. Accidentally slicing a bomb will end your run instantly, adding a layer of strategic thinking to the rapid-fire action.
As you progress, the game throws different types of fruit at you, some requiring multiple slices, and the speed increases gradually, demanding faster reflexes and more precise movements. Special fruits might offer score multipliers or other benefits, adding further depth to the gameplay. It’s a game where practice truly makes perfect, and mastering the art of fruit slicing is incredibly satisfying. You can try it out now by clicking on Slice Master.
Tips for Achieving Fruit-Slicing Mastery
While the game seems simple on the surface, a few strategies can significantly improve your score and extend your gameplay.
• Focus on Efficiency: Instead of frantically slashing at individual fruits, try to slice multiple fruits with a single, well-aimed swipe. This not only increases your score but also conserves your limited slicing time.
• Prioritize High-Value Fruits: Keep an eye out for special fruits that offer bonus points or multipliers. Slicing these at the right moment can dramatically boost your score.
• Be Mindful of Bombs: This one is crucial! Always be aware of the position of the bombs and avoid them at all costs. A moment of carelessness can instantly end your game. Try to train yourself to recognize them early and plan your slices accordingly.
• Practice Makes Perfect: Like any skill-based game, practice is essential for improving your reflexes and accuracy. The more you play, the better you'll become at predicting fruit trajectories and executing precise slices. So, keep practicing and you'll be reaching new high scores in no time!
In Conclusion: A Slice of Addictive Fun
Slice Master offers a surprisingly addictive and engaging gaming experience, despite its simple premise. Its accessible gameplay, combined with the escalating challenge, makes it a perfect choice for a quick dose of entertainment or a more extended gaming session. Whether you're looking for a casual distraction or a skill-based challenge, Slice Master provides a satisfying and fun way to test your reflexes and accuracy. So, grab your virtual blade and prepare to unleash your inner fruit-slicing ninja!
When dumping IB contents from a hung job, amdgpu_devcoredump_format()
acquires the VM root PD's reservation lock via amdgpu_vm_lock_by_pasid()
and then, for each IB referenced by the job, calls amdgpu_bo_reserve()
on the BO that backs the IB. Both reservations are taken on
reservation_ww_class_mutex objects but neither uses a ww_acquire_ctx,
which trips lockdep:
WARNING: possible recursive locking detected
--------------------------------------------
kworker/u128:0 is trying to acquire lock:
ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4},
at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
but task is already holding lock:
ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4},
at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]
Possible unsafe locking scenario:
CPU0
----
lock(reservation_ww_class_mutex);
lock(reservation_ww_class_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]
Call Trace:
__ww_mutex_lock.constprop.0
ww_mutex_lock
amdgpu_bo_reserve
amdgpu_devcoredump_format+0x1594 [amdgpu]
amdgpu_devcoredump_deferred_work+0xea [amdgpu]
process_one_work
worker_thread
kthread
The two reservations are on different BOs in the captured trace, so the
splat is a lockdep-correctness warning, not an observed deadlock. It
becomes a real self-deadlock whenever the IB BO shares its dma_resv
with the root PD (the always-valid case, see
amdgpu_vm_is_bo_always_valid()): amdgpu_bo_reserve(abo) re-acquires the
same ww_mutex without a ticket and blocks forever.
With amdgpu.gpu_recovery=0 the timeout handler refires every ~2 s and
each invocation produces this splat, drowning the kernel ring buffer.
Fix it by collecting the per-IB BO references under the root PD's
reservation, then releasing the root before reserving each IB BO
individually. The walk over the VM mapping tree must remain under the
root lock (mappings can be torn down without it), but the actual
content copies do not need to nest inside it. Each per-IB reservation
is now an independent top-level acquire, eliminating the nested
ww_mutex.
The collect/release logic is factored out into two small helpers
(amdgpu_devcoredump_collect_ib_refs / amdgpu_devcoredump_release_ib_refs)
to keep the main function's indentation reasonable.
This also fixes a BO refcount leak in the original code: when
amdgpu_bo_reserve() failed, control jumped to free_ib_content without
running amdgpu_bo_unref(). In the new structure the per-IB BO refs
are released unconditionally in the cleanup helper.
Reproducer (~150 LoC libdrm_amdgpu): submit a single GFX IB containing
PACKET3_INDIRECT_BUFFER chained at GPU VA 0 and wait for the fence.
The TDR fires within ~10 s and the deferred coredump worker produces
the splat above on every invocation.
Fixes: 7b15fc2d1f1a ("drm/amdgpu: dump job ibs in the devcoredump")
Cc: stable(a)vger.kernel.org # 7.1
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 147 +++++++++++++-----
1 file changed, 110 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
index d386bc775d03..f6bb968de756 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
@@ -207,6 +207,72 @@ static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev,
}
}
+struct amdgpu_devcoredump_ib_ref {
+ struct amdgpu_bo *bo;
+ u64 offset;
+};
+
+/*
+ * Walk the VM's mapping tree under the root PD's reservation to obtain the BO
+ * that backs each IB and pin it with a refcount. The root PD reservation is
+ * dropped before this function returns; the caller can then reserve each IB
+ * BO individually without nesting ww_mutex acquires on
+ * reservation_ww_class_mutex.
+ *
+ * Returns an array of num_ibs entries (each ib_refs[i].bo may be NULL if its
+ * mapping was not found), or NULL on allocation failure / VM lookup failure.
+ * The caller must release the BO refs and free the array.
+ */
+static struct amdgpu_devcoredump_ib_ref *
+amdgpu_devcoredump_collect_ib_refs(struct amdgpu_device *adev,
+ struct amdgpu_coredump_info *coredump)
+{
+ struct amdgpu_devcoredump_ib_ref *ib_refs;
+ struct amdgpu_bo_va_mapping *mapping;
+ struct amdgpu_bo *root;
+ struct amdgpu_vm *vm;
+ u64 va_start;
+
+ ib_refs = kcalloc(coredump->num_ibs, sizeof(*ib_refs), GFP_KERNEL);
+ if (!ib_refs)
+ return NULL;
+
+ vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
+ if (!vm) {
+ kfree(ib_refs);
+ return NULL;
+ }
+
+ for (int i = 0; i < coredump->num_ibs; i++) {
+ va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
+ mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
+ if (!mapping)
+ continue;
+
+ ib_refs[i].bo = amdgpu_bo_ref(mapping->bo_va->base.bo);
+ ib_refs[i].offset = va_start -
+ mapping->start * AMDGPU_GPU_PAGE_SIZE;
+ }
+
+ amdgpu_bo_unreserve(root);
+ amdgpu_bo_unref(&root);
+
+ return ib_refs;
+}
+
+static void
+amdgpu_devcoredump_release_ib_refs(struct amdgpu_devcoredump_ib_ref *ib_refs,
+ int num_ibs)
+{
+ if (!ib_refs)
+ return;
+
+ for (int i = 0; i < num_ibs; i++)
+ if (ib_refs[i].bo)
+ amdgpu_bo_unref(&ib_refs[i].bo);
+ kfree(ib_refs);
+}
+
static ssize_t
amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_info *coredump)
{
@@ -214,13 +280,11 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
struct drm_printer p;
struct drm_print_iterator iter;
struct amdgpu_vm_fault_info *fault_info;
- struct amdgpu_bo_va_mapping *mapping;
struct amdgpu_ip_block *ip_block;
struct amdgpu_res_cursor cursor;
- struct amdgpu_bo *abo, *root;
- uint64_t va_start, offset;
+ struct amdgpu_bo *abo;
+ uint64_t offset;
struct amdgpu_ring *ring;
- struct amdgpu_vm *vm;
u32 *ib_content;
uint8_t *kptr;
int ver, i, j, r;
@@ -343,43 +407,52 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
drm_printf(&p, "VRAM is lost due to GPU reset!\n");
if (coredump->num_ibs) {
- /* Don't try to lookup the VM or map the BOs when calculating the
- * size required to store the devcoredump.
+ struct amdgpu_devcoredump_ib_ref *ib_refs = NULL;
+
+ /*
+ * Snapshot per-IB BO references under the root PD's reservation,
+ * then release the root before reserving each IB BO individually
+ * to copy its contents.
+ *
+ * Reserving an IB BO while the root PD is still reserved would
+ * be a nested ww_mutex acquire on reservation_ww_class_mutex
+ * without a ww_acquire_ctx, which trips lockdep's recursive-
+ * locking check and self-deadlocks for IB BOs that share their
+ * dma_resv with the root PD (always-valid BOs).
+ *
+ * Skip lookup/reservation entirely on the sizing pass: it does
+ * not write IB content, and the size estimate doesn't depend on
+ * whether the BOs are reachable.
*/
- if (sizing_pass)
- vm = NULL;
- else
- vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
+ if (!sizing_pass)
+ ib_refs = amdgpu_devcoredump_collect_ib_refs(adev, coredump);
- for (int i = 0; i < coredump->num_ibs && (sizing_pass || vm); i++) {
+ for (int i = 0; i < coredump->num_ibs; i++) {
ib_content = kvmalloc_array(coredump->ibs[i].ib_size_dw, 4,
GFP_KERNEL);
if (!ib_content)
continue;
- /* vm=NULL can only happen when 'sizing_pass' is true. Skip to the
- * drm_printf() calls (ib_content doesn't need to be initialized
- * as its content won't be written anywhere).
- */
- if (!vm)
+ if (sizing_pass)
goto output_ib_content;
- va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
- mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
- if (!mapping)
- goto free_ib_content;
+ if (!ib_refs || !ib_refs[i].bo)
+ goto output_ib_content;
+
+ abo = ib_refs[i].bo;
+ offset = ib_refs[i].offset;
- offset = va_start - (mapping->start * AMDGPU_GPU_PAGE_SIZE);
- abo = amdgpu_bo_ref(mapping->bo_va->base.bo);
r = amdgpu_bo_reserve(abo, false);
if (r)
- goto free_ib_content;
+ goto output_ib_content;
if (abo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS) {
off = 0;
- if (abo->tbo.resource->mem_type != TTM_PL_VRAM)
- goto unreserve_abo;
+ if (abo->tbo.resource->mem_type != TTM_PL_VRAM) {
+ amdgpu_bo_unreserve(abo);
+ goto output_ib_content;
+ }
amdgpu_res_first(abo->tbo.resource, offset,
coredump->ibs[i].ib_size_dw * 4,
@@ -395,8 +468,10 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
r = ttm_bo_kmap(&abo->tbo, 0,
PFN_UP(abo->tbo.base.size),
&abo->kmap);
- if (r)
- goto unreserve_abo;
+ if (r) {
+ amdgpu_bo_unreserve(abo);
+ goto output_ib_content;
+ }
kptr = amdgpu_bo_kptr(abo);
kptr += offset;
@@ -406,21 +481,19 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
amdgpu_bo_kunmap(abo);
}
+ amdgpu_bo_unreserve(abo);
+
output_ib_content:
drm_printf(&p, "\nIB #%d 0x%llx %d dw\n",
i, coredump->ibs[i].gpu_addr, coredump->ibs[i].ib_size_dw);
- for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
- drm_printf(&p, "0x%08x\n", ib_content[j]);
-unreserve_abo:
- if (vm)
- amdgpu_bo_unreserve(abo);
-free_ib_content:
+ if (!sizing_pass && ib_refs && ib_refs[i].bo) {
+ for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
+ drm_printf(&p, "0x%08x\n", ib_content[j]);
+ }
kvfree(ib_content);
}
- if (vm) {
- amdgpu_bo_unreserve(root);
- amdgpu_bo_unref(&root);
- }
+
+ amdgpu_devcoredump_release_ib_refs(ib_refs, coredump->num_ibs);
}
return count - iter.remain;
--
2.54.0
The patch set allows to register a dmabuf to an io_uring instance for
a specified file and use it with io_uring read / write requests. The
infrastructure is not tied to io_uring and there could be more users
in the future. A similar idea was attempted some years ago by Keith [1],
from where I borrowed a good number of changes, and later was brough up
by Tushar and Vishal from Intel.
It's an opt-in feature for files, and they need to implement a new
file operation to use it. Only NVMe block devices are supported in this
series. The user API is built on top of io_uring's "registered buffers",
where a dmabuf is registered in a special way, but after it can be used
as any other "registered buffer" with IORING_OP_{READ,WRITE}_FIXED
requests. It's created via a new file operation and the resulted map is
then passed through the I/O stack in a new iterator type. There is some
additional infrastructure to bind it all, which also counts requests
using a dmabuf map and managing lifetimes, which is used to implement
map invalidation.
It was tested for GPU <-> NVMe transfers. Also, as it maintains a
long-term dma mapping, it helps with the IOMMU cost. The numbers
below are for udmabuf reads previously run by Anuj for different
IOMMU modes:
- STRICT: before = 570 KIOPS, after = 5.01 MIOPS
- LAZY: before = 1.93 MIOPS, after = 5.01 MIOPS
- PASSTHROUGH: before = 5.01 MIOPS, after = 5.01 MIOPS
There are some liburing tests that can serve as an example:
git: https://github.com/isilence/liburing.git rw-dmabuf-tests-v3
url: https://github.com/isilence/liburing/tree/rw-dmabuf-tests-v3
[1] https://lore.kernel.org/io-uring/20220805162444.3985535-1-kbusch@fb.com/
v3: - Rework io_uring registration
- Move token/map infrastructure code out of blk-mq
- Simplify callbacks: remove a separate blk-mq table, which was
mostly just forwarding calls (to nvme).
- Don't skip dma sync depending on request direction
- Fix a couple of hangs
- Rename s/dma/dmabuf/
- Other small changes
v2: - Don't pass raw dma addresses, wrap it into a driver specific object
- Split into two objects: token and map
- Implement move_notify
Pavel Begunkov (10):
file: add callback for creating long-term dmabuf maps
iov_iter: add iterator type for dmabuf maps
block: move bvec init into __bio_clone
block: introduce dma map backed bio type
lib: add dmabuf token infrastructure
block: forward create_dmabuf_token to drivers
nvme-pci: implement dma_token backed requests
io_uring/rsrc: introduce buf registration structure
io_uring/rsrc: extend buffer update
io_uring/rsrc: add dmabuf backed registered buffers
block/bio.c | 28 +++-
block/blk-merge.c | 14 ++
block/blk.h | 3 +-
block/fops.c | 16 ++
drivers/nvme/host/pci.c | 282 ++++++++++++++++++++++++++++++++
include/linux/bio.h | 19 ++-
include/linux/blk-mq.h | 9 +
include/linux/blk_types.h | 8 +-
include/linux/fs.h | 2 +
include/linux/io_dmabuf_token.h | 92 +++++++++++
include/linux/io_uring_types.h | 5 +
include/linux/uio.h | 11 ++
include/uapi/linux/io_uring.h | 31 +++-
io_uring/io_uring.c | 3 +-
io_uring/rsrc.c | 266 +++++++++++++++++++++++++-----
io_uring/rsrc.h | 30 +++-
io_uring/rw.c | 4 +-
lib/Kconfig | 4 +
lib/Makefile | 2 +
lib/io_dmabuf_token.c | 272 ++++++++++++++++++++++++++++++
lib/iov_iter.c | 29 +++-
21 files changed, 1071 insertions(+), 59 deletions(-)
create mode 100644 include/linux/io_dmabuf_token.h
create mode 100644 lib/io_dmabuf_token.c
--
2.53.0
From: Jiri Pirko <jiri(a)nvidia.com>
Confidential computing (CoCo) VMs/guests, such as AMD SEV and Intel TDX,
run with private/encrypted memory which creates a challenge
for devices that do not support DMA to it (no TDISP support).
For kernel-only DMA operations, swiotlb bounce buffering provides a
transparent solution by copying data through shared memory.
However, the only way to get this memory into userspace is via the DMA
API's dma_alloc_pages()/dma_mmap_pages() type interfaces which limits
the use of the memory to a single DMA device, and is incompatible with
pin_user_pages().
These limitations are particularly problematic for the RDMA subsystem
which makes heavy use of pin_user_pages() and expects flexible memory
usage between many different DMA devices.
This patch series enables userspace to explicitly request shared
(decrypted) memory allocations from new dma-buf system_cc_shared heap.
Userspace can mmap this memory and pass the dma-buf fd to other
existing importers such as RDMA or DRM devices to access the
memory. The DMA API is improved to allow the dma heap exporter to DMA
map the shared memory to each importing device.
Based on dma-mapping-for-next e7442a68cd1ee797b585f045d348781e9c0dde0d
Jiri Pirko (2):
dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory
dma-buf: heaps: system: add system_cc_shared heap for explicitly
shared memory
drivers/dma-buf/heaps/system_heap.c | 103 ++++++++++++++++++++++++++--
include/linux/dma-mapping.h | 10 +++
include/trace/events/dma.h | 3 +-
kernel/dma/direct.h | 14 +++-
kernel/dma/mapping.c | 13 +++-
5 files changed, 132 insertions(+), 11 deletions(-)
--
2.51.1
Hi,
The recent introduction of heaps in the optee driver [1] made possible
the creation of heaps as modules.
It's generally a good idea if possible, including for the already
existing system and CMA heaps.
The system one is pretty trivial, the CMA is now easy too with the
reworks we got in 7.1-r1.
Let me know what you think,
Maxime
1: https://lore.kernel.org/dri-devel/20250911135007.1275833-4-jens.wiklander@l…
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v5:
- Rebase on 7.1-rc1
- Add a patch to enable the heaps in arm64 defconfig
- Link to v4: https://lore.kernel.org/r/20260331-dma-buf-heaps-as-modules-v4-0-e18fda5044…
Changes in v4:
- Fix compilation failure
- Rework to take into account OF_RESERVED_MEM
- Fix regression making the default CMA area disappear if not created
through the DT
- Added some documentation and comments
- Link to v3: https://lore.kernel.org/r/20260303-dma-buf-heaps-as-modules-v3-0-24344812c7…
Changes in v3:
- Squashed cma_get_name and cma_alloc/release patches
- Fixed typo in Export dev_get_cma_area commit title
- Fixed compilation failure with DMA_CMA but not OF_RESERVED_MEM
- Link to v2: https://lore.kernel.org/r/20260227-dma-buf-heaps-as-modules-v2-0-454aee7e06…
Changes in v2:
- Collect tags
- Don't export dma_contiguous_default_area anymore, but export
dev_get_cma_area instead
- Mentioned that heap modules can't be removed
- Link to v1: https://lore.kernel.org/r/20260225-dma-buf-heaps-as-modules-v1-0-2109225a09…
---
Maxime Ripard (4):
dma-buf: heaps: Export mem_accounting parameter
dma-buf: heaps: cma: Turn the heap into a module
dma-buf: heaps: system: Turn the heap into a module
arm64: defconfig: Enable dma-buf heaps
arch/arm64/configs/defconfig | 3 +++
drivers/dma-buf/dma-heap.c | 1 +
drivers/dma-buf/heaps/Kconfig | 4 ++--
drivers/dma-buf/heaps/cma_heap.c | 3 +++
drivers/dma-buf/heaps/system_heap.c | 5 +++++
5 files changed, 14 insertions(+), 2 deletions(-)
---
base-commit: 5e9b7d093f3f77cb0af4409559e3d139babfb443
change-id: 20260225-dma-buf-heaps-as-modules-1034b3ec9f2a
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Hi all,
This series is based on previous RFCs/discussions:
Tech topic: https://lore.kernel.org/linux-iommu/20250918214425.2677057-1-amastro@fb.com/
RFCv1: https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
RFCv2: https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
The background/rationale is covered in more detail in the RFC cover
letters. The TL;DR is:
The goal is to enable userspace driver designs that use VFIO to export
DMABUFs representing subsets of PCI device BARs, and "vend" those
buffers from a primary process to other subordinate processes by fd.
These processes then mmap() the buffers and their access to the device
is isolated to the exported ranges. This is an improvement on sharing
the VFIO device fd to subordinate processes, which would allow
unfettered access .
This is achieved by enabling mmap() of vfio-pci DMABUFs. Second, a
new ioctl()-based revocation mechanism is added to allow the primary
process to forcibly revoke access to previously-shared BAR spans, even
if the subordinate processes haven't cleanly exited.
(The related topic of safe delegation of iommufd control to the
subordinate processes is not addressed here, and is follow-up work.)
As well as isolation and revocation, another advantage to accessing a
BAR through a VMA backed by a DMABUF is that it's straightforward to
create the buffer with access attributes, such as write-combining.
Notes on patches
================
Feedback from the RFCs requested that, instead of creating
DMABUF-specific vm_ops and .fault paths, to go the whole way and
migrate the existing VFIO PCI BAR mmap() to be backed by a DMABUF too,
resulting in a common vm_ops and fault handler for mmap()s of both the
VFIO device and explicitly-exported DMABUFs. This has been done for
vfio-pci, but not sub-drivers (nvgrace-gpu's special-case mappings are
unchanged).
vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put
A bug fix to a related are, whose context is a depdency for later
patches.
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
The first is for a DMABUF VMA fault handler to determine
arbitrary-sized PFNs from ranges in DMABUF. Secondly, refactor
DMABUF export for use by the existing export feature and a new
helper that creates a DMABUF corresponding to a VFIO BAR mmap()
request.
vfio/pci: Convert BAR mmap() to use a DMABUF
The vfio-pci core mmap() creates a DMABUF with the helper, and the
vm_ops fault handler uses the other helper to resolve the fault.
Because this depends on DMABUF structs/code, CONFIG_VFIO_PCI_CORE
needs to depend on CONFIG_DMA_SHARED_BUFFER. The
CONFIG_VFIO_PCI_DMABUF still conditionally enables the export
support code.
NOTE: The user mmap()s a device fd, but the resulting VMA's vm_file
becomes that of the DMABUF which takes ownership of the device and
puts it on release. This maintains the existing behaviour of a VMA
keeping the VFIO device open.
BAR zapping then happens via the existing vfio_pci_dma_buf_move()
path, which now needs to unmap PTEs in the DMABUF's address_space.
vfio/pci: Provide a user-facing name for BAR mappings
There was a request for decent debug naming in /proc/<pid>/maps
etc. comparable to the existing VFIO names: since the VMAs are
DMABUFs, they have a "dmabuf:" prefix and can't be 100% identical
to before. This is a user-visible change, but this patch at least
now gives us extra info on the BDF & BAR being mapped.
vfio/pci: Clean up BAR zap and revocation
In general (see NOTE!) the vfio_pci_zap_bars() is now obsolete,
since it unmaps PTEs in the VFIO device address_space which is now
unused. This consolidates all calls (e.g. around reset) with the
neighbouring vfio_pci_dma_buf_move()s into new functions, to
revoke-zap/unrevoke.
NOTE: the nvgrace-gpu driver continues to use its own private
vm_ops, fault handler, etc. for its special memregions, and these
DO still add PTEs to the VFIO device address_space. So, a
temporary flag, vdev->bar_needs_zap, maintains the old behaviour
for this use. At least this patch's consolidation makes it easy
to remove the remaining zap when this need goes away.
A FIXME is added: if nvgrace-gpu is converted to DMABUFs, remove
the flag and final zap.
vfio/pci: Support mmap() of a VFIO DMABUF
Adds mmap() for a DMABUF fd exported from vfio-pci.
It was a goal to keep the VFIO device fd lifetime behaviour
unchanged with respect to the DMABUFs. An application can close
all device fds, and this will revoke/clean up all DMABUFs; no
mappings or other access can be performed now. When enabling
mmap() of the DMABUFs, this means access through the VMA is also
revoked. This complicates the fault handler because whilst the
DMABUF exists, it has no guarantee that the corresponding VFIO
device is still alive. Adds synchronisation ensuring the vdev is
available before vdev->memory_lock is touched.
(I decided against the alternative of preventing cleanup by holding
the VFIO device open if any DMABUFs exist, because it's both a
change of behaviour and less clean overall.)
I've added a chonky comment in place, happy to clarify more if you
have ideas.
vfio/pci: Permanently revoke a DMABUF on request
By weight, this is mostly a rename of revoked to an enum, status.
There are now 3 states for a buffer, usable and revoked
temporary/permanent. A new VFIO device ioctl is added,
VFIO_DEVICE_PCI_DMABUF_REVOKE, which passes a DMABUF (exported from
that device) and permanently revokes it. Thus a userspace driver
can guarantee any downstream consumers of a shared fd are prevented
from accessing a BAR range, and that range can be reused.
The code doing revocation in vfio_pci_dma_buf_move() is moved,
unchanged, to a common function for use by _move() and the new
ioctl path.
Q: I can't think of a good reason to temporarily revoke/unrevoke
buffers from userspace, so didn't add a 'flags' field to the ioctl
struct. Easy to add if people think it's worthwhile for future
use.
vfio/pci: Add mmap() attributes to DMABUF feature
Reserves bits [31:28] in vfio_device_feature_dma_buf to allow a
(CPU) mapping attribute to be specified for an exported set of
ranges. The default is the current UC, and a new flag can specify
CPU access as WC.
Q: I've taken 4 bits; the intention is for this field to be a
scalar not a bitmap (i.e. mutually-exclusive access properties).
Perhaps 4 is a bit too many?
Testing
=======
(The [RFC ONLY] userspace test program, for QEMU edu-plus, has been
dropped, but can be found in the GitHub branch below.)
This code has been tested in mapping DMABUFs of single/multiple
ranges, aliasing mmap()s, aliasing ranges across DMABUFs, vm_pgoff >
0, revocation, shutdown/cleanup scenarios, and hugepage mappings seem
to work correctly. I've lightly tested WC mappings also (by observing
resulting PTEs as having the correct attributes...). No regressions
observed on the VFIO selftests, or on our internal vfio-pci
applications.
End
===
This is based on -next (next-20260414 but will merge earlier), as it
depends on Leon's series "vfio: Wait for dma-buf invalidation to
complete":
https://lore.kernel.org/linux-iommu/20260205-nocturnal-poetic-chamois-f566a…
These commits are on GitHub, along with "[RFC ONLY] selftests: vfio: Add
standalone vfio_dmabuf_mmap_test":
https://github.com/metamev/linux/compare/next-20260414...metamev:linux:dev/…
Thanks for reading,
Matt
================================================================================
Change log:
v1:
- Cleanup of the common DMABUF-aware VMA vm_ops fault handler and
export code.
- Fixed a lot of races, particularly faults racing with DMABUF
cleanup (if the VFIO device fds close, for example).
- Added nicer human-readable names for VFIO mmap() VMAs
RFCv2: Respin based on the feedback/suggestions:
https://lore.kernel.org/kvm/20260312184613.3710705-1-mattev@meta.com/
- Transform the existing VFIO BAR mmap path to also use DMABUFs
behind the scenes, and then simply share that code for
explicitly-mapped DMABUFs. Jason wanted to go that direction to
enable iommufd VFIO type 1 emulation to pick up a DMABUF for an IO
mapping.
- Revoke buffers using a VFIO device fd ioctl
RFCv1:
https://lore.kernel.org/all/20260226202211.929005-1-mattev@meta.com/
Matt Evans (9):
vfio/pci: Fix vfio_pci_dma_buf_cleanup() double-put
vfio/pci: Add a helper to look up PFNs for DMABUFs
vfio/pci: Add a helper to create a DMABUF for a BAR-map VMA
vfio/pci: Convert BAR mmap() to use a DMABUF
vfio/pci: Provide a user-facing name for BAR mappings
vfio/pci: Clean up BAR zap and revocation
vfio/pci: Support mmap() of a VFIO DMABUF
vfio/pci: Permanently revoke a DMABUF on request
vfio/pci: Add mmap() attributes to DMABUF feature
drivers/vfio/pci/Kconfig | 3 +-
drivers/vfio/pci/Makefile | 3 +-
drivers/vfio/pci/nvgrace-gpu/main.c | 5 +
drivers/vfio/pci/vfio_pci_config.c | 30 +-
drivers/vfio/pci/vfio_pci_core.c | 224 ++++++++++---
drivers/vfio/pci/vfio_pci_dmabuf.c | 500 +++++++++++++++++++++++-----
drivers/vfio/pci/vfio_pci_priv.h | 49 ++-
include/linux/vfio_pci_core.h | 1 +
include/uapi/linux/vfio.h | 42 ++-
9 files changed, 690 insertions(+), 167 deletions(-)
--
2.47.3
The kerneldoc comment on dma_fence_init() and dma_fence_init64() describe
the legacy reason to pass an external lock as a need to prevent multiple
fences "from signaling out of order". However, this wording is a bit
misleading: a shared spinlock does not (and cannot) prevent the signaler
from signaling out of order. Signaling order is the driver's responsibility
regardless of whether the lock is shared or per-fence.
What a shared lock actually provides is serialization of signaling and
observation across fences in a given context, so that observers never
see a later fence signaled while an earlier one is not.
Reword both comments to describe this more accurately.
Signed-off-by: Maíra Canal <mcanal(a)igalia.com>
---
v1 -> v2: https://lore.kernel.org/dri-devel/20260411185756.1887119-4-mcanal@igalia.co…
- Be more explicit about not allowing new users to use an external lock.
- De-duplicate the explanation in dma_fence_init64() by pointing to the
dma_fence_init() documentation.
---
drivers/dma-buf/dma-fence.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 1c1eaecaf1b0..63b349ba9a34 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -1102,9 +1102,11 @@ __dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
* context and seqno are used for easy comparison between fences, allowing
* to check which fence is later by simply using dma_fence_later().
*
- * It is strongly discouraged to provide an external lock because this couples
- * lock and fence life time. This is only allowed for legacy use cases when
- * multiple fences need to be prevented from signaling out of order.
+ * External locks are a relic from legacy use cases that needed a shared lock
+ * to serialize signaling and observation of fences within a context, so that
+ * observers never see a later fence signaled while an earlier one isn't. New
+ * users MUST NOT use external locks, as they force the issuer to outlive all
+ * fences that reference the lock.
*/
void
dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
@@ -1129,9 +1131,8 @@ EXPORT_SYMBOL(dma_fence_init);
* Context and seqno are used for easy comparison between fences, allowing
* to check which fence is later by simply using dma_fence_later().
*
- * It is strongly discouraged to provide an external lock because this couples
- * lock and fence life time. This is only allowed for legacy use cases when
- * multiple fences need to be prevented from signaling out of order.
+ * New users MUST NOT use external locks. Check the documentation in
+ * dma_fence_init() to understand the motives behind the legacy use cases.
*/
void
dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops,
--
2.53.0