Hi, Marc! It's been a while; hope you're well.
On Sun, Aug 24, 2025 at 1:55 AM Marc Zyngier maz@kernel.org wrote:
Hi Sam,
On Sun, 24 Aug 2025 04:05:05 +0100, Sam Edwards cfsworks@gmail.com wrote:
On Sat, Aug 23, 2025 at 5:29 PM Ard Biesheuvel ardb@kernel.org wrote:
[...]
Under which conditions would PGD_SIZE assume a value greater than PAGE_SIZE?
I might be doing my math wrong, but wouldn't 52-bit VA with 4K granules and 5 levels result in this?
No. 52bit VA at 4kB granule results in levels 0-3 each resolving 9 bits, and level -1 resolving 4 bits. That's a total of 40 bits, plus the 12 bits coming directly from the VA making for the expected 52.
Thank you, that makes it clear: I made an off-by-one mistake in my counting of the levels.
Each PTE represents 4K of virtual memory, so covers VA bits [11:0] (this is level 3)
That's where you got it wrong. The architecture is pretty clear that each level resolves PAGE_SHIFT-3 bits, hence the computation above. The bottom PAGE_SHIFT bits are directly extracted from the VA, without any translation.
Bear with me a moment while I unpack which part of that I got wrong: A PTE is the terminal entry of the MMU walk, so I believe I'm correct (in this example, and assuming no hugepages) that each PTE represents 4K of virtual memory: that means the final step of computing a PA takes a (valid) PTE and the low 12 bits of the VA, then just adds those bits to the physical frame address. It sounds like what you're saying is "That isn't a *level* though: that's just concatenation. A 'level' always takes a bitslice of the VA and uses it as an index into a table of word-sized entries. PTEs don't point to a further table: they have all of the final information encoded directly." That makes a lot more sense to me, but contradicts how I read this comment from pgtable-hwdef.h: * Level 3 descriptor (PTE). I took this as, "a PTE describes how to perform level 3 of the translation." But because in fact there are no "levels" after a PTE, it must actually be saying "Level 3 of the translation is a lookup into an array of PTEs."? The problem with that latter reading is that this comment... * Level -1 descriptor (PGD). ...when read the same way, is saying "Level -1 of the translation is a lookup into an array of PGDs." An "array of PGDs" is nonsense, so I reverted back to my earlier readings: "PGD describes how to do level -1." and "PTE describes how to do level 3."
This smells like a classic "fencepost problem": The "PXX" Linuxisms refer to the *nodes* along the MMU walk, while the "levels" in ARM parlance are the actual steps of the walk taken by hardware -- edges, not nodes, getting us from fencepost to fencepost. A fence with five segments needs six posts, but we only have five currently.
So: where do the terms P4D, PUD, and PMD fit in here? And which one's our missing fencepost? PGD ----> ??? ----> ??? ----> ??? ----> ??? ----> PTE (|| low VA bits = final PA)
Note that at stage 1, arm64 does not support page table concatenation, and so the root page table is never larger than a page.
Doesn't PGD_SIZE refer to the size used for userspace PGDs after the boot progresses beyond stage 1? (What do you mean by "never" here? "Under no circumstances is it larger than a page at stage 1"? Or "during the entire lifecycle of the system, there is no time at which it's larger than a page"?)
Never, ever, is a S1 table bigger than a page. This concept doesn't exist in the architecture. Only S2 tables can use concatenation at the top-most level, for up to 16 pages (in order to skip a level when possible).
The top-level can be smaller than a page, with some alignment constraints, but that's about the only degree of freedom you have for S1 page tables.
Okay, that clicked for me: I was reading "stage" in the context of the boot process. These explanations make a lot more sense when reading "stage" in the context of the MMU.
So PGD_SIZE <= PAGE_SIZE, the PAGE_SIZE spacing in vmlinux.lds.S is for alignment, and I should be looking at cases where PGDs are assumed to be PAGE_SIZE to make those consistent instead. Thanks!
Cheers, Sam
Thanks,
M.
-- Jazz isn't dead. It just smells funny.