On Wed, Aug 14, 2019 at 9:46 AM Mike Kravetz mike.kravetz@oracle.com wrote:
On 8/13/19 4:54 PM, Mike Kravetz wrote:
On 8/8/19 4:13 PM, Mina Almasry wrote:
For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives in the resv_map entries, in file_region->reservation_counter.
When a file_region entry is added to the resv_map via region_add, we also charge the appropriate hugetlb_cgroup and put the pointer to that in file_region->reservation_counter. This is slightly delicate since we need to not modify the resv_map until we know that charging the reservation has succeeded. If charging doesn't succeed, we report the error to the caller, so that the kernel fails the reservation.
I wish we did not need to modify these region_() routines as they are already difficult to understand. However, I see no other way with the desired semantics.
I suspect you have considered this, but what about using the return value from region_chg() in hugetlb_reserve_pages() to charge reservation limits? There is a VERY SMALL race where the value could be too large, but that can be checked and adjusted at region_add time as is done with normal accounting today.
I have not actually until now; I didn't consider doing stuff with the resv_map while not holding onto the resv_map->lock. I guess that's the small race you're talking about. Seems fine to me, but I'm more worried about hanging off the vma below.
If the question is, where would we store the information to uncharge?, then we can hang a structure off the vma. This would be similar to what is done for private mappings. In fact, I would suggest making them both use a new cgroup reserve structure hanging off the vma.
I actually did consider hanging off the info to uncharge off the vma, but I didn't for a couple of reasons:
1. region_del is called from hugetlb_unreserve_pages, and I don't have access to the vma there. Maybe there is a way to query the proper vma I don't know about? 2. hugetlb_reserve_pages seems to be able to conduct a reservation with a NULL *vma. Not sure what to do in that case.
Is there a way to get around these that I'm missing here?
FWIW I think tracking is better in resv_map since the reservations are in resv_map themselves. If I do another structure, then for each reservation there will be an entry in resv_map and an entry in the new structure and they need to be kept in sync and I need to handle errors for when they get out of sync.
One issue I see is what to do if a vma is split? The private mapping case 'should' handle this today, but I would not be surprised if such code is missing or incorrect.
-- Mike Kravetz