On Fri, 2021-01-29 at 09:23 +0100, Michal Hocko wrote:
On Thu 28-01-21 13:05:02, James Bottomley wrote:
Obviously the API choice could be revisited but do you have anything to add over the previous discussion, or is this just to get your access control?
Well, access control is certainly one thing which I still believe is missing. But if there is a general agreement that the direct map manipulation is not that critical then this will become much less of a problem of course.
The secret memory is a scarce resource but it's not a facility that should only be available to some users.
It all boils down whether secret memory is a scarce resource. With the existing implementation it really is. It is effectivelly repeating same design errors as hugetlb did. And look now, we have a subtle and convoluted reservation code to track mmap requests and we have a cgroup controller to, guess what, have at least some control over distribution if the preallocated pool. See where am I coming from?
I'm fairly sure rlimit is the correct way to control this. The subtlety in both rlimit and memcg tracking comes from deciding to account under an existing category rather than having our own new one. People don't like new stuff in accounting because it requires modifications to everything in userspace. Accounting under and existing limit keeps userspace the same but leads to endless arguments about which limit it should be under. It took us several patch set iterations to get to a fragile consensus on this which you're now disrupting for reasons you're not making clear.
If the secret memory is more in line with mlock without any imposed limit (other than available memory) in the end then, sure, using the same access control as mlock sounds reasonable. Btw. if this is really just a more restrictive mlock then is there any reason to not hook this into the existing mlock infrastructure (e.g. MCL_EXCLUSIVE)? Implications would be that direct map would be handled on instantiation/tear down paths, migration would deal with the same (if possible). Other than that it would be mlock like.
In the very first patch set we proposed a mmap flag to do this. Under detailed probing it emerged that this suffers from several design problems: the KVM people want VMM to be able to remove the secret memory range from the process; there may be situations where sharing is useful and some people want to be able to seal the operations. All of this ended up convincing everyone that a file descriptor based approach was better than a mmap one.
James