On Mon, Mar 21, 2022 at 10:37 AM Michal Koutný mkoutny@suse.com wrote:
Hello.
On Wed, Mar 09, 2022 at 04:52:11PM +0000, "T.J. Mercier" tjmercier@google.com wrote:
+The new cgroup controller would:
+* Allow setting per-cgroup limits on the total size of buffers charged to it.
What is the meaning of the total? (I only have very naïve understanding of the device buffers.)
So "total" is used twice here in two different contexts. The first one is the global "GPU" cgroup context. As in any buffer that any exporter claims is a GPU buffer, regardless of where/how it is allocated. So this refers to the sum of all gpu buffers of any type/source. An exporter contributes to this total by registering a corresponding gpucg_device and making charges against that device when it exports. The second one is in a per device context. This allows us to make a distinction between different types of GPU memory based on who exported the buffer. A single process can make use of several different types of dma buffers (for example cached and uncached versions of the same type of memory), and it would be useful to have different limits for each. These are distinguished by the device name string chosen when the gpucg_device is first registered.
Is it like a) there's global pool of memory that is partitioned among individual devices or b) each device has its own specific type of memory and adding across two devices is adding apples and oranges or c) there can be various devices both of a) and b) type?
So I guess the most correct answer to this question is c.
(Apologies not replying to previous versions and possibly missing anything.)
Thanks, Michal