Re: [PATCH v2 00/28] The new cgroup slab memory controller

2 Sep 2020


      ...
...
This is how we are using it at Microsoft: there is  a very large
number of small memory machines (8G each) with low downtime
requirements (reboot must be under a second). There is also a large
state ~2G of memory that we need to transfer during reboot, otherwise
it is very expensive to recreate the state. We have 2G of system
memory memory reserved as a pmem in the device tree, and use it to
pass information across reboots. Once the information is not needed we
hot-add that memory and use it during runtime, before shutdown we
hot-remove the 2G, save the program state on it, and do the reboot.
I still do not get it. So what does guarantee that the memory is
offlineable in the first place?
It is in a movable zone, and we have more than 2G of free memory for
successful migrations.
...
Also what is the difference between
offlining and simply shutting the system down so that the memory is not
used in the first place. In other words what kind of difference
hotremove makes?
For performance reasons during system updates/reboots we do not erase
memory content. The memory content is erased only on power cycle,
which we do not do in production.
Once we hot-remove the memory, we convert it back into DAXFS PMEM
device, format it into EXT4, mount it as DAX file system, and allow
programs to serialize their states to it so they can read it back
after the reboot.
During startup we mount pmem, programs read the state back, and after
that we hotplug the PMEM DAX as a movable zone. This way during normal
runtime we have 8G available to programs.
Pasha

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2 00/28] The new cgroup slab memory controller