On Mon, Mar 21, 2022 at 04:54:26PM -0700, "T.J. Mercier" tjmercier@google.com wrote:
Since the charge is duplicated in two cgroups for a short period before it is uncharged from the source cgroup I guess the situation you're thinking about is a global (or common ancestor) limit?
The common ancestor was on my mind (after the self-shortcut).
I can see how that would be a problem for transfers done this way and an alternative would be to swap the order of the charge operations: first uncharge, then try_charge. To be certain the uncharge is reversible if the try_charge fails, I think I'd need either a mutex used at all gpucg_*charge call sites or access to the gpucg_mutex,
Yes, that'd provide safe conditions for such operations, although I'm not sure these special types of memory can afford global lock on their fast paths.
which implies adding transfer support to gpu.c as part of the gpucg_* API itself and calling it here. Am I following correctly here?
My idea was to provide a special API (apart from gpucp_{try_charge,uncharge}) to facilitate transfers...
This series doesn't actually add limit support just accounting, but I'd like to get it right here.
...which could be implemented (or changed) depending on how the charging is realized internally.
Michal
On Tue, Mar 22, 2022 at 2:52 AM Michal Koutný mkoutny@suse.com wrote:
On Mon, Mar 21, 2022 at 04:54:26PM -0700, "T.J. Mercier" tjmercier@google.com wrote:
Since the charge is duplicated in two cgroups for a short period before it is uncharged from the source cgroup I guess the situation you're thinking about is a global (or common ancestor) limit?
The common ancestor was on my mind (after the self-shortcut).
I can see how that would be a problem for transfers done this way and an alternative would be to swap the order of the charge operations: first uncharge, then try_charge. To be certain the uncharge is reversible if the try_charge fails, I think I'd need either a mutex used at all gpucg_*charge call sites or access to the gpucg_mutex,
Yes, that'd provide safe conditions for such operations, although I'm not sure these special types of memory can afford global lock on their fast paths.
I have a benchmark I think is suitable, so let me try this change to the transfer implementation and see how it compares.
which implies adding transfer support to gpu.c as part of the gpucg_* API itself and calling it here. Am I following correctly here?
My idea was to provide a special API (apart from gpucp_{try_charge,uncharge}) to facilitate transfers...
This series doesn't actually add limit support just accounting, but I'd like to get it right here.
...which could be implemented (or changed) depending on how the charging is realized internally.
Michal
On Tue, Mar 22, 2022 at 9:47 AM T.J. Mercier tjmercier@google.com wrote:
On Tue, Mar 22, 2022 at 2:52 AM Michal Koutný mkoutny@suse.com wrote:
On Mon, Mar 21, 2022 at 04:54:26PM -0700, "T.J. Mercier" tjmercier@google.com wrote:
Since the charge is duplicated in two cgroups for a short period before it is uncharged from the source cgroup I guess the situation you're thinking about is a global (or common ancestor) limit?
The common ancestor was on my mind (after the self-shortcut).
I can see how that would be a problem for transfers done this way and an alternative would be to swap the order of the charge operations: first uncharge, then try_charge. To be certain the uncharge is reversible if the try_charge fails, I think I'd need either a mutex used at all gpucg_*charge call sites or access to the gpucg_mutex,
Yes, that'd provide safe conditions for such operations, although I'm not sure these special types of memory can afford global lock on their fast paths.
I have a benchmark I think is suitable, so let me try this change to the transfer implementation and see how it compares.
I added a mutex to struct gpucg which is locked when charging the cgroup initially during allocation, and also only for the source cgroup during dma_buf_charge_transfer. Then I used a multithreaded benchmark where each thread allocates 4, 8, 16, or 32 DMA buffers and then sends them through Binder to another process with charge transfer enabled. This was intended to generate contention for the mutex in dma_buf_charge_transfer. The results of this benchmark show that the difference between a mutex protected charge transfer and an unprotected charge transfer is within measurement noise. The worst data point shows about 3% overheard for the mutex.
So I'll prep this change for the next revision. Thanks for pointing it out.
which implies adding transfer support to gpu.c as part of the gpucg_* API itself and calling it here. Am I following correctly here?
My idea was to provide a special API (apart from gpucp_{try_charge,uncharge}) to facilitate transfers...
This series doesn't actually add limit support just accounting, but I'd like to get it right here.
...which could be implemented (or changed) depending on how the charging is realized internally.
Michal
linaro-mm-sig@lists.linaro.org