On Wed 02-09-20 08:42:13, Pavel Tatashin wrote:
Am 02.09.2020 um 11:53 schrieb Vlastimil Babka vbabka@suse.cz:
On 8/28/20 6:47 PM, Pavel Tatashin wrote:
There appears to be another problem that is related to the cgroup_mutex -> mem_hotplug_lock deadlock described above.
In the original deadlock that I described, the workaround is to replace crash dump from piping to Linux traditional save to files method. However, after trying this workaround, I still observed hardware watchdog resets during machine shutdown.
The new problem occurs for the following reason: upon shutdown systemd calls a service that hot-removes memory, and if hot-removing fails for
Why is that hotremove even needed if we're shutting down? Are there any (virtualization?) platforms where it makes some difference over plain shutdown/restart?
If all it‘s doing is offlining random memory that sounds unnecessary and dangerous. Any pointers to this service so we can figure out what it‘s doing and why? (Arch? Hypervisor?)
Hi David,
This is how we are using it at Microsoft: there is a very large number of small memory machines (8G each) with low downtime requirements (reboot must be under a second). There is also a large state ~2G of memory that we need to transfer during reboot, otherwise it is very expensive to recreate the state. We have 2G of system memory memory reserved as a pmem in the device tree, and use it to pass information across reboots. Once the information is not needed we hot-add that memory and use it during runtime, before shutdown we hot-remove the 2G, save the program state on it, and do the reboot.
I still do not get it. So what does guarantee that the memory is offlineable in the first place? Also what is the difference between offlining and simply shutting the system down so that the memory is not used in the first place. In other words what kind of difference hotremove makes?