On Mon, May 13, 2024 at 01:51:23PM +0000, Simon Ser wrote:
On Wednesday, May 8th, 2024 at 17:49, Daniel Vetter daniel@ffwll.ch wrote:
On Wed, May 08, 2024 at 09:38:33AM +0100, Daniel Stone wrote:
On Wed, 8 May 2024 at 09:33, Daniel Vetter daniel@ffwll.ch wrote:
On Wed, May 08, 2024 at 06:46:53AM +0100, Daniel Stone wrote:
That would have the unfortunate side effect of making sandboxed apps less efficient on some platforms, since they wouldn't be able to do direct scanout anymore ...
I was assuming that everyone goes through pipewire, and ideally that is the only one that can even get at these special chardev.
If pipewire is only for sandboxed apps then yeah this aint great :-/
No, PipeWire is fine, I mean graphical apps.
Right now, if your platform requires CMA for display, then the app needs access to the GPU render node and the display node too, in order to allocate buffers which the compositor can scan out directly. If it only has access to the render nodes and not the display node, it won't be able to allocate correctly, so its content will need a composition pass, i.e. performance penalty for sandboxing. But if it can allocate correctly, then hey, it can exhaust CMA just like heaps can.
Personally I think we'd be better off just allowing access and figuring out cgroups later. It's not like the OOM story is great generally, and hey, you can get there with just render nodes ...
Imo the right fix is to ask the compositor to allocate the buffers in this case, and then maybe have some kind of revoke/purge behaviour on these buffers. Compositor has an actual idea of who's a candidate for direct scanout after all, not the app. Or well at least force migrate the memory from cma to shmem.
If you only whack cgroups on this issue you're still stuck in the world where either all apps together can ddos the display or no one can realistically direct scanout.
So yeah on the display side the problem isn't solved either, but we knew that already.
What makes scanout memory so special?
The way I see it, any kind of memory will always be a limited resource: regular programs can exhaust system memory, as well as GPU VRAM, as well as scanout memory. I think we need to have ways to limit/control/arbiter the allocations regardless, and I don't think scanout memory should be a special case here.
(Long w/en and I caught a cold)
It's not scanout that's special, it's cma memory that's special. Because once you've allocated it, it's gone since it cannot be swapped out, and there's not a lot of it to go around. Which means even if we'd have cgroups for all the various gpu allocation heaps, you can't use cgroups to manage cma in a meaningful way:
- You set the cgroup limits so low for apps that it's guaranteed that the compositor will always be able to allocate enough scanout memory for it's need. That will be low enough that apps can never allocate scanout buffers themselves.
- Or you set the limit high enough so that apps can allocate enough, which means (as soon as you have more than just one app and not a totally bonkers amount of cma) that the compositor might not be able to allocate anymore.
It's kinda shit situation, which is also why you need the compositor to be able to revoke cma allocations it has handed to clients (like with drm leases).
Or we just keep the current yolo situation.
For any other memory type than CMA most of the popular drivers at least implement swapping, which gives you a ton more flexibility in setting up limits in a way that actually work. But even there we'd need cgroups first to make sure things don't go wrong too badly in the face of evil apps ... -Sima