On Thu, Sep 28, 2023 at 10:42:39AM +0800, Yuanhe Shu wrote:
In public cloud scenario, if kdump service works abnormally, users cannot get vmcore. Without vmcore, user has no idea why the kernel crashed. Meanwhile, there is no additional information to find the reason why the kdump service is abnormal.
One way is to obtain console messages through VNC. The drawback is that VNC is real-time, if user missed the timing to get the VNC output, the crash needs to be retriggered.
Another way is to enable the console frontend of pstore and record the console messages to the pstore backend. On the one hand, the console logs only contain kernel printk logs and does not cover user-mode print logs. Although we can redirect user-mode logs to the pmsg frontend provided by pstore, user-mode information related to booting and kdump service vary from systemd, kdump.sh, and so on which makes redirection troublesome. So we added a tty frontend and save all logs of tty driver to the pstore backend.
This is a clever solution!
Another problem is that currently pstore only supports a single backend. For debugging kdump problems, we hope to save the console logs and tty logs to the ramoops backend of pstore, as it will not be lost after rebooting. If the user has enabled another backend, the ramoops backend will not be registered. To this end, we add the multi-backend function to support simultaneous registration of multiple backends.
Ah very cool; I really like this idea. I'd wanted to do it for a while just to make testing easier, but I hadn't had time to attempt it.
Based on the above changes, we can enable pstore in the crashdump kernel and save the console logs and tty logs to the ramoops backend of pstore. After rebooting, we can view the relevant logs by mounting the pstore file system.
So, before I do a line-at-a-time review of this code, I'd like to address some design issues first.
I really don't want to make behavioral differences when we don't have to:
- The multi-backend will enable _all possible_ backends, and that's a big change that will do weird things for some pstore users. I would prefer a pstore option to opt-in to enabling all backends. Perhaps have "pstore.backend=" be parsed with commas, so a list of backends can be provided, or "all" for the "all backends" behavior.
- Moving the pstorefs files into a subdirectory will break userspace immediately (e.g. systemd-pstore expects very specifically named files). Using subdirectories seems like a good idea, but perhaps we need hardlinks into the root pstorefs for the "first" backend, or some other creative solution here.
Then some technical thoughts about the TTY frontend's behavior:
- That 2 pstore records are created for every line of TTY output feels kind of inefficient, though I don't have a better idea. This is really only doable as you have it because the ramoops and zone backends treat the single prz as a circular buffer. I wonder about supporting this on other backends like EFI, but perhaps it's just not going to happen.
- I'd like to check with the TTY folks to see if this is the "right" place to hook to get a copy of what's being written.
Thanks and let me know what you think!
-Kees