On Fri, Aug 9, 2019 at 4:27 AM Michal Koutný mkoutny@suse.com wrote:
(+CC cgroups@vger.kernel.org)
On Thu, Aug 08, 2019 at 12:40:02PM -0700, Mina Almasry almasrymina@google.com wrote:
We have developers interested in using hugetlb_cgroups, and they have expressed dissatisfaction regarding this behavior.
I assume you still want to enforce a limit on a particular group and the application must be able to handle resource scarcity (but better notified than SIGBUS).
Alternatives considered: [...]
(I did not try that but) have you considered: 3) MAP_POPULATE while you're making the reservation,
I have tried this, and the behaviour is not great. Basically if userspace mmaps more memory than its cgroup limit allows with MAP_POPULATE, the kernel will reserve the total amount requested by the userspace, it will fault in up to the cgroup limit, and then it will SIGBUS the task when it tries to access the rest of its 'reserved' memory.
So for example: - if /proc/sys/vm/nr_hugepages == 10, and - your cgroup limit is 5 pages, and - you mmap(MAP_POPULATE) 7 pages.
Then the kernel will reserve 7 pages, and will fault in 5 of those 7 pages, and will SIGBUS you when you try to access the remaining 2 pages. So the problem persists. Folks would still like to know they are crossing the limits on mmap time.
- Using multple hugetlbfs mounts with respective limits.
I assume you mean the size=<value> option on the hugetlbfs mount. This would only limit hugetlb memory usage via the hugetlbfs mount. Tasks can still allocate hugetlb memory without any mount via mmap(MAP_HUGETLB) and shmget/shmat APIs, and all these calls will deplete the global, shared hugetlb memory pool.
Caveats:
- This support is implemented for cgroups-v1. I have not tried hugetlb_cgroups with cgroups v2, and AFAICT it's not supported yet. This is largely because we use cgroups-v1 for now.
Adding something new into v1 without v2 counterpart, is making migration harder, that's one of the reasons why v1 API is rather frozen now. (I'm not sure whether current hugetlb controller fits into v2 at all though.)
In my estimation it's maybe fine to make this change in v1 because, as far as I understand, hugetlb_cgroups are a little used feature of the kernel (although we see it getting requested) and hugetlb_cgroups aren't supported in v2 yet, and I don't *think* this change makes it any harder to port hugetlb_cgroups to v2.
But, like I said if there is consensus this must not be checked in without hugetlb_cgroups v2 supported is added alongside, I can take a look at that.
Michal