On Thu, Aug 21, 2025 at 4:29 PM David Hildenbrand david@redhat.com wrote:
Because doing a 64-bit shift on x86-32 is like three cycles. Doing a 64-bit signed division by a simple constant is something like ten strange instructions even if the end result is only 32-bit.
I would have thought that the compiler is smart enough to optimize that? PAGE_SIZE is a constant.
Oh, the compiler optimizes things. But dividing a 64-bit signed value with a constant is still quite complicated.
It doesn't generate a 'div' instruction, but it generates something like this:
movl %ebx, %edx sarl $31, %edx movl %edx, %eax xorl %edx, %edx andl $4095, %eax addl %ecx, %eax adcl %ebx, %edx
and that's certainly a lot faster than an actual 64-bit divide would be.
An unsigned divide - or a shift - results in just
shrdl $12, %ecx, %eax
which is still not the fastest instruction (I think shrld gets split into two uops), but it's certainly simpler and easier to read.
Linus