On 9/3/19 10:57 AM, Mike Kravetz wrote:
On 8/29/19 12:18 AM, Michal Hocko wrote:
[Cc cgroups maintainers]
On Wed 28-08-19 10:58:00, Mina Almasry wrote:
On Wed, Aug 28, 2019 at 4:23 AM Michal Hocko mhocko@kernel.org wrote:
On Mon 26-08-19 16:32:34, Mina Almasry wrote:
mm/hugetlb.c | 493 ++++++++++++------ mm/hugetlb_cgroup.c | 187 +++++--
This is a lot of changes to an already subtle code which hugetlb reservations undoubly are.
For what it's worth, I think this patch series is a net decrease in the complexity of the reservation code, especially the region_* functions, which is where a lot of the complexity lies. I removed the race between region_del and region_{add|chg}, refactored the main logic into smaller code, moved common code to helpers and deleted the duplicates, and finally added lots of comments to the hard to understand pieces. I hope that when folks review the changes they will see that! :)
Post those improvements as standalone patches and sell them as improvements. We can talk about the net additional complexity of the controller much easier then.
All such changes appear to be in patch 4 of this series. The commit message says "region_add() and region_chg() are heavily refactored to in this commit to make the code easier to understand and remove duplication.". However, the modifications were also added to accommodate the new cgroup reservation accounting. I think it would be helpful to explain why the existing code does not work with the new accounting. For example, one change is because "existing code coalesces resv_map entries for shared mappings. new cgroup accounting requires that resv_map entries be kept separate for proper uncharging."
I am starting to review the changes, but it would help if there was a high level description. I also like Michal's idea of calling out the region_* changes separately. If not a standalone patch, at least the first patch of the series. This new code will be exercised even if cgroup reservation accounting not enabled, so it is very important than no subtle regressions be introduced.
While looking at the region_* changes, I started thinking about this no coalesce change for shared mappings which I think is necessary. Am I mistaken, or is this a requirement?
If it is a requirement, then think about some of the possible scenarios such as: - There is a hugetlbfs file of size 10 huge pages. - Task A has reservations for pages at offset 1 3 5 7 and 9 - Task B then mmaps the entire file which should result in reservations at 0 2 4 6 and 8. - region_chg will return 5, but will also need to allocate 5 resv_map entries for the subsequent region_add which can not fail. Correct? The code does not appear to handle this.
BTW, this series will BUG when running libhugetlbfs test suite. It will hit this in resv_map_release().
VM_BUG_ON(resv_map->adds_in_progress);