Most of this patch series has already been pushed upstream, this is just
the second half of the patch series that has not been pushed yet + some
additional changes which were required to implement changes requested by
the mailing list. This patch series is originally from Asahi, previously
posted by Daniel Almeida.
The previous version of the patch series can be found here:
https://patchwork.freedesktop.org/series/164580/
Branch with patches applied available here:
https://gitlab.freedesktop.org/lyudess/linux/-/commits/rust/gem-shmem
This patch series applies on top of drm-rust-next
Patch-series wide changes since V15:
* Fix some major rebasing errors I somehow didn't notice :(
* Drop the dependency on LazyInit, use the trick that Alice suggested
instead.
* Fix dependency ordering so that Tyr can get the vmap stuff first
without the other bits.
Patch-series wide changes since V16:
* Fix ordering one more time (SetOnce::reset() doesn't need to come
before adding vmap functions)
* Rebase against the latest DeviceContext changes from me that got
pushed.
Patch-series wide changes since V20:
* Lots of Sashiko fixes, excluding the comments that I couldn't prove
weren't just bogus.
Lyude Paul (4):
rust: drm: gem: shmem: Add DmaResvGuard helper
rust: drm: gem: shmem: Add vmap functions
rust: faux: Allow retrieving a bound Device
rust: drm: gem: Introduce shmem::Object::sg_table()
rust/kernel/drm/gem/shmem.rs | 547 ++++++++++++++++++++++++++++++++++-
rust/kernel/faux.rs | 18 +-
2 files changed, 549 insertions(+), 16 deletions(-)
base-commit: 550dc7536644db2d67c6f8cf525bba682fba08d9
--
2.54.0
dma_fence_timeline_name() incorrectly invokes ops->get_driver_name()
instead of ops->get_timeline_name(), so every caller receives the
driver name where the timeline name was expected.
This is a copy-paste regression that has resurfaced twice. It was
originally introduced by commit 62918542b7bf ("dma-fence: Fix sparse
warnings due __rcu annotations") when adding the __rcu casts, fixed
by commit 033559473dd3 ("dma-fence: Fix safe access wrapper to call
timeline name method"), and then accidentally reintroduced by commit
e58b4dea9054 ("dma-buf/dma-fence: Add dma_fence_test_signaled_flag()")
when both wrappers were refactored to use the new helper.
Signed-off-by: Baineng Shou <shoubaineng(a)gmail.com>
---
drivers/dma-buf/dma-fence.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index b3bfa6943a8e..5292d714419b 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -1202,7 +1202,7 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence)
/* RCU protection is required for safe access to returned string */
ops = rcu_dereference(fence->ops);
if (!dma_fence_test_signaled_flag(fence))
- return (const char __rcu *)ops->get_driver_name(fence);
+ return (const char __rcu *)ops->get_timeline_name(fence);
else
return (const char __rcu *)"signaled-timeline";
}
--
2.34.1
Hi! I'm curious to know if you pay attention to how your daily habits can impact your health. Nutrition, sleep, stress levels, and recovery are often interrelated. Do you think it's worth starting with lifestyle changes before looking for solutions to maintain hormonal balance?
From: Bryam Vargas <hexlabsecurity(a)proton.me>
begin_cpu_udmabuf() builds and caches ubuf->sg with an unserialised
check-then-set, and end_cpu_udmabuf() reads the same field unlocked. The
core invokes both cpu-access hooks without holding the reservation lock and
DMA_BUF_IOCTL_SYNC is unlocked, so concurrent SYNC ioctls on a shared
udmabuf fd race on ubuf->sg: two begins can both observe NULL and both call
get_sg_table(), and the later store orphans the earlier table and its DMA
mapping, which release_udmabuf() never frees. Each won race permanently
leaks an sg_table and an unbalanced DMA mapping.
Serialize both hooks under the buffer's reservation lock, as panfrost and
panthor do. Take it interruptibly: the lock can be held across a wait for
hardware to finish, so an uninterruptible acquire would park a SYNC
ioctl in TASK_UNINTERRUPTIBLE. dma_buf_begin/end_cpu_access() already
annotate might_lock() on that lock, so taking it here matches the
documented contract. Single-threaded callers are unaffected.
Fixes: 284562e1f348 ("udmabuf: implement begin_cpu_access/end_cpu_access hooks")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity(a)proton.me>
---
v2: Take the reservation lock interruptibly (dma_resv_lock_interruptible())
in both hooks instead of the uninterruptible dma_resv_lock(), and return
the error; the lock can be held across a wait for hardware to finish, so
an uninterruptible acquire could park a SYNC ioctl in
TASK_UNINTERRUPTIBLE. With a NULL ww_acquire_ctx the call returns only 0
or -EINTR, so a single error check is enough. (Christian König)
v1: https://lore.kernel.org/all/20260625-b4-disp-67d1f3db-v1-1-a47fb9edab9e@pro…
Same leak-with-dangling-pointer class as CVE-2024-56712 (export_udmabuf()
error path) -- a distinct site the 2024 fix does not cover.
udmabuf is the only exporter that lazily builds its sg_table cache inside the
cpu-access hook without serialising the check-then-set. The exporters that do
comparable in-hook cache work all take a lock first: panfrost and panthor
dma_resv_lock() (both hooks), omapdrm omap_obj->lock around its lazy page-get,
the dma-heaps buffer->lock, and the TTM/GEM exporters (amdgpu, i915, xe) their
object's reservation lock. tegra and videobuf2 take no lock here because they
only sync an sg_table built earlier, so there is nothing to serialise.
Confirmed with an out-of-tree A/B exercising the begin/begin race: this driver
built as a module with get_sg_table()/put_sg_table() counting allocations
against frees, driven by a userspace racer that creates 3000 udmabufs and fires
DMA_BUF_IOCTL_SYNC(SYNC_START) from N threads on each shared fd. The lock
serialises the check-then-set identically whether it is taken interruptibly or
not; the run below used the reservation lock:
arm leaked sg_tables (of 3000 buffers)
vulnerable, 4 threads 4761
control, 1 thread 0
patched (resv lock), 4 threads 0
One sg_table and its DMA mapping leak per won race; the single-thread control
does not leak, isolating the race; with the lock the lazy-init runs once per
buffer (3000 allocations, zero leaked). end_cpu_udmabuf() is locked for the
same field too: an unlocked end could otherwise observe the transient IS_ERR
store begin makes before resetting ubuf->sg to NULL, and dereference it. In a
tighter 5000-iteration loop the unpatched leak runs around 15-20 MB/s of slab.
---
drivers/dma-buf/udmabuf.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index bced421c0d65..d6a137f0de1f 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -226,6 +226,10 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
struct device *dev = ubuf->device->this_device;
int ret = 0;
+ ret = dma_resv_lock_interruptible(buf->resv, NULL);
+ if (ret)
+ return ret;
+
if (!ubuf->sg) {
ubuf->sg = get_sg_table(dev, buf, direction);
if (IS_ERR(ubuf->sg)) {
@@ -238,6 +242,8 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
}
+ dma_resv_unlock(buf->resv);
+
return ret;
}
@@ -246,12 +252,20 @@ static int end_cpu_udmabuf(struct dma_buf *buf,
{
struct udmabuf *ubuf = buf->priv;
struct device *dev = ubuf->device->this_device;
+ int ret = 0;
+
+ ret = dma_resv_lock_interruptible(buf->resv, NULL);
+ if (ret)
+ return ret;
if (!ubuf->sg)
- return -EINVAL;
+ ret = -EINVAL;
+ else
+ dma_sync_sgtable_for_device(dev, ubuf->sg, direction);
- dma_sync_sgtable_for_device(dev, ubuf->sg, direction);
- return 0;
+ dma_resv_unlock(buf->resv);
+
+ return ret;
}
static const struct dma_buf_ops udmabuf_ops = {
---
base-commit: 7eed1fb17959e721031555e5b5654083fe6a7d02
change-id: 20260625-b4-disp-a9216ef0-d068373aff05
Best regards,
--
Bryam Vargas <hexlabsecurity(a)proton.me>
From: Bryam Vargas <hexlabsecurity(a)proton.me>
begin_cpu_udmabuf() builds and caches ubuf->sg with an unserialised
check-then-set, and end_cpu_udmabuf() reads the same field unlocked. The
core invokes both cpu-access hooks without holding the reservation lock and
DMA_BUF_IOCTL_SYNC is unlocked, so concurrent SYNC ioctls on a shared
udmabuf fd race on ubuf->sg: two begins can both observe NULL and both call
get_sg_table(), and the later store orphans the earlier table and its DMA
mapping, which release_udmabuf() never frees. Each won race permanently
leaks an sg_table and an unbalanced DMA mapping.
Serialize both hooks under the buffer's reservation lock, as panfrost and
panthor do. dma_buf_begin/end_cpu_access() already annotate might_lock() on
that lock, so taking it here matches the documented contract.
Single-threaded callers are unaffected.
Fixes: 284562e1f348 ("udmabuf: implement begin_cpu_access/end_cpu_access hooks")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryam Vargas <hexlabsecurity(a)proton.me>
---
Same leak-with-dangling-pointer class as CVE-2024-56712 (export_udmabuf()
error path) -- a distinct site the 2024 fix does not cover.
udmabuf is the only exporter that lazily builds its sg_table cache inside the
cpu-access hook without serialising the check-then-set. The exporters that do
comparable in-hook cache work all take a lock first: panfrost and panthor
dma_resv_lock() (both hooks), omapdrm omap_obj->lock around its lazy page-get,
the dma-heaps buffer->lock, and the TTM/GEM exporters (amdgpu, i915, xe) their
object's reservation lock. tegra and videobuf2 take no lock here because they
only sync an sg_table built earlier, so there is nothing to serialise.
Confirmed with an out-of-tree A/B exercising the begin/begin race: this driver
built as a module with get_sg_table()/put_sg_table() counting allocations
against frees, driven by a userspace racer that creates 3000 udmabufs and fires
DMA_BUF_IOCTL_SYNC(SYNC_START) from N threads on each shared fd.
arm leaked sg_tables (of 3000 buffers)
vulnerable, 4 threads 4761
control, 1 thread 0
patched (resv lock), 4 threads 0
One sg_table and its DMA mapping leak per won race; the single-thread control
does not leak, isolating the race; with the lock the lazy-init runs once per
buffer (3000 allocations, zero leaked). end_cpu_udmabuf() is locked for the
same field too: an unlocked end could otherwise observe the transient IS_ERR
store begin makes before resetting ubuf->sg to NULL, and dereference it. In a
tighter 5000-iteration loop the unpatched leak runs around 15-20 MB/s of slab.
---
drivers/dma-buf/udmabuf.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index bced421c0d65..702ae13b97d1 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -226,6 +226,8 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
struct device *dev = ubuf->device->this_device;
int ret = 0;
+ dma_resv_lock(buf->resv, NULL);
+
if (!ubuf->sg) {
ubuf->sg = get_sg_table(dev, buf, direction);
if (IS_ERR(ubuf->sg)) {
@@ -238,6 +240,8 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
}
+ dma_resv_unlock(buf->resv);
+
return ret;
}
@@ -246,12 +250,18 @@ static int end_cpu_udmabuf(struct dma_buf *buf,
{
struct udmabuf *ubuf = buf->priv;
struct device *dev = ubuf->device->this_device;
+ int ret = 0;
+
+ dma_resv_lock(buf->resv, NULL);
if (!ubuf->sg)
- return -EINVAL;
+ ret = -EINVAL;
+ else
+ dma_sync_sgtable_for_device(dev, ubuf->sg, direction);
- dma_sync_sgtable_for_device(dev, ubuf->sg, direction);
- return 0;
+ dma_resv_unlock(buf->resv);
+
+ return ret;
}
static const struct dma_buf_ops udmabuf_ops = {
---
base-commit: 7eed1fb17959e721031555e5b5654083fe6a7d02
change-id: 20260625-b4-disp-67d1f3db-0082918fdcb5
Best regards,
--
Bryam Vargas <hexlabsecurity(a)proton.me>
UDMABUF_CREATE_LIST copies an array whose element count comes from
userspace. The count is compared against list_limit, but list_limit is a
signed module parameter while the count is u32.
If the limit is raised too far or made negative, that comparison no
longer bounds the count to a range where sizeof(*list) * count fits in
the u32 temporary used for the copy length. A wrapped copy length lets
memdup_user() copy fewer entries than udmabuf_create() subsequently
walks, leading to out-of-bounds reads from the copied list.
Take a positive snapshot of the module limit and use memdup_array_user()
so the multiplication is checked before copying.
Signed-off-by: Yousef Alhouseen <alhouseenyousef(a)gmail.com>
---
drivers/dma-buf/udmabuf.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index bced421c0..b4078ec84 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -469,14 +469,15 @@ static long udmabuf_ioctl_create_list(struct file *filp, unsigned long arg)
struct udmabuf_create_list head;
struct udmabuf_create_item *list;
int ret = -EINVAL;
- u32 lsize;
+ int limit;
if (copy_from_user(&head, (void __user *)arg, sizeof(head)))
return -EFAULT;
- if (head.count > list_limit)
+ limit = READ_ONCE(list_limit);
+ if (!head.count || limit <= 0 || head.count > limit)
return -EINVAL;
- lsize = sizeof(struct udmabuf_create_item) * head.count;
- list = memdup_user((void __user *)(arg + sizeof(head)), lsize);
+ list = memdup_array_user((void __user *)(arg + sizeof(head)),
+ head.count, sizeof(*list));
if (IS_ERR(list))
return PTR_ERR(list);
--
2.54.0
In the vast and ever-growing world of mobile gaming, sometimes the simplest concepts deliver the most satisfying experiences. One such gem is Block Blast, a captivating puzzle game that combines the familiar mechanics of Tetris with a fresh, strategic twist. If you're looking for a relaxing yet engaging way to challenge your mind, then Block Blast might just be your next addiction.
https://blockblasts.io/
What is Block Blast?
Imagine a 10x10 grid, initially empty. Your goal is to fill this grid with various-shaped blocks that appear at the bottom of the screen, one set of three at a time. The catch? You can't rotate the blocks, and you must place all three given blocks before new ones appear. The objective is to clear lines and columns by filling them completely. Once a line or column is full, it disappears, freeing up space for more blocks and earning you points. The game ends when you can no longer place any of the current blocks on the grid.
Gameplay: Simple to Grasp, Challenging to Master
The beauty of Block Blast lies in its straightforward mechanics. You simply drag and drop the blocks from the bottom onto the grid. There's no time limit, no frantic swiping – just thoughtful placement. However, don't let the simplicity fool you. As the grid fills up, strategic thinking becomes paramount. You'll quickly learn the importance of anticipating future block shapes and planning your placements to create more clearing opportunities. Should you save that long straight piece for a full column clear, or use it now to open up a crucial corner? These are the delightful dilemmas you'll face. The game encourages a flow state, where you’re constantly evaluating and adapting to the evolving grid.
Tips for Becoming a Block Blast Master
While the game is easy to pick up, a few strategies can significantly boost your scores and enjoyment:
Prioritize Clears: Don't be afraid to clear lines or columns, even if it means using a block in a less-than-ideal spot. Clearing space is crucial for longevity.
Think Ahead: Always glance at the next set of blocks. This allows you to plan your current placements with future pieces in mind, creating combos and larger clears.
Corners are King: Filling corners and edges can be tricky, so try to tackle them early when you have more space. Don't leave isolated blocks in odd spots.
The "L" and "T" Block Dilemma: These oddly shaped blocks can be your best friends or worst enemies. Learn how to integrate them into your clears effectively, often by leaving gaps for them.
Practice Makes Perfect: Like any puzzle game, consistent play will improve your spatial reasoning and pattern recognition, leading to higher scores.
Conclusion
Block Blast offers a delightful blend of relaxation and mental stimulation. Its intuitive design makes it accessible to everyone, while its strategic depth keeps players engaged for countless hours. Whether you're looking for a quick brain break or a long, meditative puzzle session, Block Blast provides a satisfying and rewarding experience. Give it a try, and you might just find your new favorite way to unwind and sharpen your mind, one block at a time.
In a recent discussion with Philip and Danilo the question came up what
was already tried and never finished to cleanup the dma_fence framework.
So here are the different ideas I came with but never fully finished,
with the patches itself modernized and rebased on top of drm-misc-next.
The main goal of those changes is to make it easier to implement dma_fence
backends and don't enforce unnecessary constrains on implementations.
As first step the locking around the dma_fence_ops.signaled callback is
made consistent by removing the dma_fence_is_signaled_locked() function.
This was mostly used by backends itself, but if polling the HW is desired
the backends can call their own functions for this directly without going
through the dma-fence layer.
XE actually seems to be the only driver which make use of that for a bit
more handling. For all other cases testing the signaled flag should be enough.
Then forcefully calling dma_fence_signaled() is removed from the dma-fence
layer and moved into the backend implementations.
This allows the backend implementations to cleanup after they have
signaled the fence. Such cleanup can include removing now signaled fences
from lists, dropping references, starting work etc....
Especially nouveau seems to have some really messy workaround because of
that involving the DMA_FENCE_FLAG_USER_BITS and installing callbacks
because the reference to the context couldn't be dropped directly after
signaling. This can now be cleaned up as far as I can see.
In the long term this should also allow reworking the error handling, e.g.
removing dma_fence_set_error() and instead giving the error as mandatory
parameter to dma_fence_signal().
Then the last piece is dropping calling enable_signaling callback with the
dma_fence lock held. This makes it possible for backends to acquire locks
which are semantically ordered outside of the dma_fence lock.
This is necessary to allows using the dma_fence inline lock in more cases,
previously backends used some common external lock for their dma_fences to
for example make it possible remove fences from linked lists.
Please comment and review,
Christian.