On Sat, Nov 16, 2024 at 05:59:16PM +0000, Pasha Tatashin wrote:
Page Detective is a new kernel debugging tool that provides detailed information about the usage and mapping of physical memory pages.
It is often known that a particular page is corrupted, but it is hard to extract more information about such a page from live system. Examples are:
- Checksum failure during live migration
- Filesystem journal failure
- dump_page warnings on the console log
- Unexcpected segfaults
Page Detective helps to extract more information from the kernel, so it can be used by developers to root cause the associated problem.
I like the _concept_ of providing more information like this.
But you've bizarrely gone to great lengths to expose mm internal implementation details to drivers in order to implement this as a driver.
This is _very clearly_ an mm thing, and _very clearly_ subject to change depending on how mm changes. It should live under mm/ and not be a loadable driver.
I am also very very much not in favour of re-implementing yet another page table walker, this time in driver code (!). Please no.
So NACK in its current form. This has to be implemented within mm if we are to take it.
I'm also concerned about its scalability and impact on the system, as it takes every single mm lock in the system at once, which seems kinda unwise or at least problematic, and not something we want happening outside of mm, at any rate.
It operates through the Linux debugfs interface, with two files: "virt" and "phys".
The "virt" file takes a virtual address and PID and outputs information about the corresponding page.
The "phys" file takes a physical address and outputs information about that page.
The output is presented via kernel log messages (can be accessed with dmesg), and includes information such as the page's reference count, mapping, flags, and memory cgroup. It also shows whether the page is mapped in the kernel page table, and if so, how many times.
I mean, even though I'm not a huge fan of kernel pointer hashing etc. this is obviously leaking as much information as you might want about kernel internal state to the point of maybe making the whole kernel pointer hashing thing moot.
I know this requires CAP_SYS_ADMIN, but there are things that also require that which _still_ obscure kernel pointers.
And you're outputting it all to dmesg.
So yeah, a security person (Jann?) would be better placed to comment on this than me, but are we sure we want to do this when not in a CONFIG_DEBUG_VM* kernel?
Pasha Tatashin (6): mm: Make get_vma_name() function public pagewalk: Add a page table walker for init_mm page table mm: Add a dump_page variant that accept log level argument misc/page_detective: Introduce Page Detective misc/page_detective: enable loadable module selftests/page_detective: Introduce self tests for Page Detective
Documentation/misc-devices/index.rst | 1 + Documentation/misc-devices/page_detective.rst | 78 ++ MAINTAINERS | 8 + drivers/misc/Kconfig | 11 + drivers/misc/Makefile | 1 + drivers/misc/page_detective.c | 808 ++++++++++++++++++ fs/inode.c | 18 +- fs/kernfs/dir.c | 1 + fs/proc/task_mmu.c | 61 -- include/linux/fs.h | 5 +- include/linux/mmdebug.h | 1 + include/linux/pagewalk.h | 2 + kernel/pid.c | 1 + mm/debug.c | 53 +- mm/memcontrol.c | 1 + mm/oom_kill.c | 1 + mm/pagewalk.c | 32 + mm/vma.c | 60 ++ tools/testing/selftests/Makefile | 1 + .../selftests/page_detective/.gitignore | 1 + .../testing/selftests/page_detective/Makefile | 7 + tools/testing/selftests/page_detective/config | 4 + .../page_detective/page_detective_test.c | 727 ++++++++++++++++ 23 files changed, 1787 insertions(+), 96 deletions(-) create mode 100644 Documentation/misc-devices/page_detective.rst create mode 100644 drivers/misc/page_detective.c create mode 100644 tools/testing/selftests/page_detective/.gitignore create mode 100644 tools/testing/selftests/page_detective/Makefile create mode 100644 tools/testing/selftests/page_detective/config create mode 100644 tools/testing/selftests/page_detective/page_detective_test.c
-- 2.47.0.338.g60cca15819-goog