Hi all,
(Sent to cross-distro with debian-arm on cc.)
We have an 'interesting' situation ahead of us, or indeed some of us have already fallen into it:
ARM64 platforms with > 512GB between the lowest and highest RAM addresses end up getting their amount of usable memory truncated if the kernel is built for 39-bit VA (which is what currently happens for Debian kernels). For 4.7, the arm64 defconfig was changed to enable 48-bit VA by default.
While itself not a critical error (but really annoying), in combination with GRUB putting the initrd near the top of available RAM, we end up with systems not booting. We think we've also seen issues with ACPI tables above this waterline.
Simple - all we need to do then is enable 48-bit VA in the arm64 kernel config? Well, yes. I know Fedora are already doing this, and I have raised a bug[1] for Debian to do the same.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=834505
The problem is - some pieces of software have had time to be written in a ... let's charitably call it a "focused on amd64" fasion ... with the embedded assumption that anything above virtual address bit 44 is a pointer-tag free-for-all.
On the Debian-ish side, we're coming up on both Ubuntu 16.10 and the freeze for Stretch, leaving a pretty short window to resolve this unholy kernel->initrd->userland triangle.
The applications we know are affected are luajit and mozjs (libv8 is not a problem). But this has a follow-on cost: both of these are used by other packages. Other jit/runtime packages could have their own issues.
The mozjs bug is fixed on trunk, and will hopefully make it into release 49[2], but it remains to be seen if that's too late for some distributions.
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1143022
For luajit, I'm told this has been fixed on 2.1 branch, but not merged to master?
Now, Jeremy (cc:d) tells me the list of currently-known Fedora packages affected by this are: couchdb elinks erlang-js freewrl libEAI libproxy-mozjs mediatomb pacrunner plowshare polkit cinnamon cjs cjs cjs-tests gjs gjs gjs-tests gnome-shell 0ad mongodb mongodb-server
Some of these may only need an updated luajit/mozjs package, but some may need more invasive changes.
Anyway, this is just a heads up - anyone sitting on more information than I've put into this email is very welcome to share it.
/ Leif
Op 17 aug. 2016, om 16:39 heeft Leif Lindholm leif.lindholm@linaro.org het volgende geschreven:
Hi all,
(Sent to cross-distro with debian-arm on cc.)
We have an 'interesting' situation ahead of us, or indeed some of us have already fallen into it:
ARM64 platforms with > 512GB between the lowest and highest RAM addresses end up getting their amount of usable memory truncated if the kernel is built for 39-bit VA (which is what currently happens for Debian kernels). For 4.7, the arm64 defconfig was changed to enable 48-bit VA by default.
While itself not a critical error (but really annoying), in combination with GRUB putting the initrd near the top of available RAM, we end up with systems not booting. We think we've also seen issues with ACPI tables above this waterline.
Simple - all we need to do then is enable 48-bit VA in the arm64 kernel config? Well, yes. I know Fedora are already doing this, and I have raised a bug[1] for Debian to do the same.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=834505
The problem is - some pieces of software have had time to be written in a ... let's charitably call it a "focused on amd64" fasion ... with the embedded assumption that anything above virtual address bit 44 is a pointer-tag free-for-all.
On the Debian-ish side, we're coming up on both Ubuntu 16.10 and the freeze for Stretch, leaving a pretty short window to resolve this unholy kernel->initrd->userland triangle.
The applications we know are affected are luajit and mozjs (libv8 is not a problem). But this has a follow-on cost: both of these are used by other packages. Other jit/runtime packages could have their own issues.
The mozjs bug is fixed on trunk, and will hopefully make it into release 49[2], but it remains to be seen if that's too late for some distributions.
Since mozjs is “special” when it comes to API, you have to depend on a specific version of it, so policykit hard-depends on mozjs17, which has no fixes available for the VA problem :( I’ve seen reports of systems unable to boot because of the systemd->polkit->mozjs chain. Is anyone working on fixing that?
regards,
Koen
Hi,
On 08/17/2016 09:51 AM, Koen Kooi wrote:
Op 17 aug. 2016, om 16:39 heeft Leif Lindholm leif.lindholm@linaro.org het volgende geschreven:
Hi all,
(Sent to cross-distro with debian-arm on cc.)
We have an 'interesting' situation ahead of us, or indeed some of us have already fallen into it:
(trimming)
The applications we know are affected are luajit and mozjs (libv8 is not a problem). But this has a follow-on cost: both of these are used by other packages. Other jit/runtime packages could have their own issues.
The mozjs bug is fixed on trunk, and will hopefully make it into release 49[2], but it remains to be seen if that's too late for some distributions.
Since mozjs is “special” when it comes to API, you have to depend on a specific version of it, so policykit hard-depends on mozjs17, which has no fixes available for the VA problem :( I’ve seen reports of systems unable to boot because of the systemd->polkit->mozjs chain. Is anyone working on fixing that?
Special, might be an understatement.
But for polkit, I've posted patches pulling it forward to mozjs24 (needs testing!)
https://lists.freedesktop.org/archives/polkit-devel/2016-August/000487.html
With mozjs24 the mainline fix applies fairly cleanly without breaking the ABI.
Am 17.08.2016 um 16:58 schrieb Jeremy Linton jeremy.linton@arm.com:
Hi,
On 08/17/2016 09:51 AM, Koen Kooi wrote:
Op 17 aug. 2016, om 16:39 heeft Leif Lindholm leif.lindholm@linaro.org het volgende geschreven:
Hi all,
(Sent to cross-distro with debian-arm on cc.)
We have an 'interesting' situation ahead of us, or indeed some of us have already fallen into it:
(trimming)
The applications we know are affected are luajit and mozjs (libv8 is not a problem). But this has a follow-on cost: both of these are used by other packages. Other jit/runtime packages could have their own issues.
The mozjs bug is fixed on trunk, and will hopefully make it into release 49[2], but it remains to be seen if that's too late for some distributions.
Since mozjs is “special” when it comes to API, you have to depend on a specific version of it, so policykit hard-depends on mozjs17, which has no fixes available for the VA problem :( I’ve seen reports of systems unable to boot because of the systemd->polkit->mozjs chain. Is anyone working on fixing that?
Special, might be an understatement.
But for polkit, I've posted patches pulling it forward to mozjs24 (needs testing!)
Given that mozjs24 is out of upstream support for quite some time now, wouldn't it make more sense to port it to something more recent?
Alex
On 08/17/2016 05:16 PM, Alexander Graf wrote:
Am 17.08.2016 um 16:58 schrieb Jeremy Linton jeremy.linton@arm.com:
Hi,
On 08/17/2016 09:51 AM, Koen Kooi wrote:
Op 17 aug. 2016, om 16:39 heeft Leif Lindholm leif.lindholm@linaro.org het volgende geschreven:
Hi all,
(Sent to cross-distro with debian-arm on cc.)
We have an 'interesting' situation ahead of us, or indeed some of us have already fallen into it:
(trimming)
The applications we know are affected are luajit and mozjs (libv8 is not a problem). But this has a follow-on cost: both of these are used by other packages. Other jit/runtime packages could have their own issues.
The mozjs bug is fixed on trunk, and will hopefully make it into release 49[2], but it remains to be seen if that's too late for some distributions.
Since mozjs is “special” when it comes to API, you have to depend on a specific version of it, so policykit hard-depends on mozjs17, which has no fixes available for the VA problem :( I’ve seen reports of systems unable to boot because of the systemd->polkit->mozjs chain. Is anyone working on fixing that?
Special, might be an understatement.
But for polkit, I've posted patches pulling it forward to mozjs24 (needs testing!)
Given that mozjs24 is out of upstream support for quite some time now, wouldn't it make more sense to port it to something more recent?
S
My plan originally was to pull everything forward to mozjs45, but the JSAPI churn is such that moving it that far forward without breaking mozjs17/1.8.5 builds was more effort than I was willing to spend. OTOH, the maintainers seem open to dropping support for the older mozjs, so its probably worth revisiting. For the time being, aligning it with the rest of gnome/cinnamon (which is built against mozjs24 on fedora) seemed like a good first step. I'm focused on the packages built against 1.8.5, and should be publishing a few more changes to pull those packages forward to 24 real soon now. It seems to me that maintaining one or two versions of mozjs is better than the 6 currently in fedora rawhide.
BTW, I've actually got a version of polkit running against 45 but, it fails the abort script unit test right now, and I'm not confident it doesn't leak memory due to some error in rooting.
On 17 Aug 2016, at 16:51, Koen Kooi koen@dominion.thruhere.net wrote:
Op 17 aug. 2016, om 16:39 heeft Leif Lindholm leif.lindholm@linaro.org het volgende geschreven:
Hi all,
(Sent to cross-distro with debian-arm on cc.)
We have an 'interesting' situation ahead of us, or indeed some of us have already fallen into it:
ARM64 platforms with > 512GB between the lowest and highest RAM addresses end up getting their amount of usable memory truncated if the kernel is built for 39-bit VA (which is what currently happens for Debian kernels). For 4.7, the arm64 defconfig was changed to enable 48-bit VA by default.
While itself not a critical error (but really annoying), in combination with GRUB putting the initrd near the top of available RAM, we end up with systems not booting. We think we've also seen issues with ACPI tables above this waterline.
Simple - all we need to do then is enable 48-bit VA in the arm64 kernel config? Well, yes. I know Fedora are already doing this, and I have raised a bug[1] for Debian to do the same.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=834505
The problem is - some pieces of software have had time to be written in a ... let's charitably call it a "focused on amd64" fasion ... with the embedded assumption that anything above virtual address bit 44 is a pointer-tag free-for-all.
On the Debian-ish side, we're coming up on both Ubuntu 16.10 and the freeze for Stretch, leaving a pretty short window to resolve this unholy kernel->initrd->userland triangle.
The applications we know are affected are luajit and mozjs (libv8 is not a problem). But this has a follow-on cost: both of these are used by other packages. Other jit/runtime packages could have their own issues.
The mozjs bug is fixed on trunk, and will hopefully make it into release 49[2], but it remains to be seen if that's too late for some distributions.
Since mozjs is “special” when it comes to API, you have to depend on a specific version of it, so policykit hard-depends on mozjs17, which has no fixes available for the VA problem :( I’ve seen reports of systems unable to boot because of the systemd->polkit->mozjs chain. Is anyone working on fixing that?
If all you need is a backport of the upstream patch to mozjs17, feel free to take mine:
https://build.opensuse.org/package/view_file/openSUSE:Factory/mozjs17/mozjs-...
The hardest nut to crack for me right now is the couchdb issue, as that relies on SpiderMonkey, which predates mozjs17 and the dynamic allocation patch alone doesn’t actually fix it, as it pregenerates bits into its .rodata section which may get mapped high in address space again.
Alex
Hi,
On Wed, Aug 17, 2016 at 03:39:36PM +0100, Leif Lindholm wrote:
We have an 'interesting' situation ahead of us, or indeed some of us have already fallen into it:
ARM64 platforms with > 512GB between the lowest and highest RAM addresses end up getting their amount of usable memory truncated if the kernel is built for 39-bit VA (which is what currently happens for Debian kernels). For 4.7, the arm64 defconfig was changed to enable 48-bit VA by default.
The problem is - some pieces of software have had time to be written in a ... let's charitably call it a "focused on amd64" fasion ... with the embedded assumption that anything above virtual address bit 44 is a pointer-tag free-for-all.
I suspect that we're likely to fall further into this.
ARMv8.2 bumps the maximum address limit to 52 bits [1]. Architecturally, only the upper 8 bits of address are reserved for tagging (and this has been the case since the original ARMv8-A release), and all other bits are reserved.
Given the above, it seems possible/likely that we may see address spaces of up to 56 bits in future.
So shuffling bits along a few places is only likely to buy us some time, and won't solve the problem entirely.
Thanks, Mark.
On Wed, Aug 17, 2016 at 04:03:07PM +0100, Mark Rutland wrote:
ARM64 platforms with > 512GB between the lowest and highest RAM addresses end up getting their amount of usable memory truncated if the kernel is built for 39-bit VA (which is what currently happens for Debian kernels). For 4.7, the arm64 defconfig was changed to enable 48-bit VA by default.
The problem is - some pieces of software have had time to be written in a ... let's charitably call it a "focused on amd64" fasion ... with the embedded assumption that anything above virtual address bit 44 is a pointer-tag free-for-all.
I suspect that we're likely to fall further into this.
ARMv8.2 bumps the maximum address limit to 52 bits [1]. Architecturally, only the upper 8 bits of address are reserved for tagging (and this has been the case since the original ARMv8-A release), and all other bits are reserved.
Given the above, it seems possible/likely that we may see address spaces of up to 56 bits in future.
So shuffling bits along a few places is only likely to buy us some time, and won't solve the problem entirely.
Absolutely - but it's still something we need to do, now.
And I'm hoping that by the time > 48-bit VA becomes an issue, it will be an issue for Intel also, and we won't need to do all of the lifting on the ARM side.
/ Leif
On Wed, Aug 17, 2016 at 04:16:37PM +0100, Leif Lindholm wrote:
On Wed, Aug 17, 2016 at 04:03:07PM +0100, Mark Rutland wrote:
ARM64 platforms with > 512GB between the lowest and highest RAM addresses end up getting their amount of usable memory truncated if the kernel is built for 39-bit VA (which is what currently happens for Debian kernels). For 4.7, the arm64 defconfig was changed to enable 48-bit VA by default.
The problem is - some pieces of software have had time to be written in a ... let's charitably call it a "focused on amd64" fasion ... with the embedded assumption that anything above virtual address bit 44 is a pointer-tag free-for-all.
I suspect that we're likely to fall further into this.
ARMv8.2 bumps the maximum address limit to 52 bits [1]. Architecturally, only the upper 8 bits of address are reserved for tagging (and this has been the case since the original ARMv8-A release), and all other bits are reserved.
Given the above, it seems possible/likely that we may see address spaces of up to 56 bits in future.
So shuffling bits along a few places is only likely to buy us some time, and won't solve the problem entirely.
Absolutely - but it's still something we need to do, now.
And I'm hoping that by the time > 48-bit VA becomes an issue, it will be an issue for Intel also, and we won't need to do all of the lifting on the ARM side.
Sure. I just wanted to make clear that there are already things in the pipe beyond 48-bit.
w.r.t. future lifting it depends on how quickly those ARMv8.2 systems appear.
Thanks, Mark.
On Wed, Aug 17, 2016 at 04:40:29PM +0100, Mark Rutland wrote:
ARMv8.2 bumps the maximum address limit to 52 bits [1]. Architecturally, only the upper 8 bits of address are reserved for tagging (and this has been the case since the original ARMv8-A release), and all other bits are reserved.
Given the above, it seems possible/likely that we may see address spaces of up to 56 bits in future.
So shuffling bits along a few places is only likely to buy us some time, and won't solve the problem entirely.
Absolutely - but it's still something we need to do, now.
And I'm hoping that by the time > 48-bit VA becomes an issue, it will be an issue for Intel also, and we won't need to do all of the lifting on the ARM side.
Sure. I just wanted to make clear that there are already things in the pipe beyond 48-bit.
Indeed. And we need to stay on the ball with that, and try to ensure any changes we do from this point onwards are at least 56-bit safe. And start agitating against pointer tagging in general.
w.r.t. future lifting it depends on how quickly those ARMv8.2 systems appear.
Well, I would say it depends on: - how quickly those ARMv8.2 systems start implementing ginormous++ physical address space ranges for DRAM. - whether we can separate kernel/user VA handling before that happens.
/ Leif
On 17 August 2016 at 17:06, Leif Lindholm leif.lindholm@linaro.org wrote:
And start agitating against pointer tagging in general.
Why would you want to do that when the architecture has specific support for it?
thanks -- PMM
On Wed, Aug 17, 2016 at 06:34:31PM +0100, Peter Maydell wrote:
On 17 August 2016 at 17:06, Leif Lindholm leif.lindholm@linaro.org wrote:
And start agitating against pointer tagging in general.
Why would you want to do that when the architecture has specific support for it?
Apart from the situation we find ourselves in, the guarantee that it will hit us again even within our own architecture, and the certainty that we will at some point grow beyond the 56-bit limit as well?
No reason.
It would be just about manageable if we had only one architecture to worry about, but we don't - we have ~3 64-bit ones, or if there was some sort of industry consortium to ensure alignment - but there isn't.
Unless someone can think of a way to sneak it into c++17.
/ Leif
On Wed, Aug 17, 2016 at 3:39 PM, Leif Lindholm leif.lindholm@linaro.org wrote:
Hi all,
(Sent to cross-distro with debian-arm on cc.)
We have an 'interesting' situation ahead of us, or indeed some of us have already fallen into it:
ARM64 platforms with > 512GB between the lowest and highest RAM addresses end up getting their amount of usable memory truncated if the kernel is built for 39-bit VA (which is what currently happens for Debian kernels). For 4.7, the arm64 defconfig was changed to enable 48-bit VA by default.
While itself not a critical error (but really annoying), in combination with GRUB putting the initrd near the top of available RAM, we end up with systems not booting. We think we've also seen issues with ACPI tables above this waterline.
Simple - all we need to do then is enable 48-bit VA in the arm64 kernel config? Well, yes. I know Fedora are already doing this, and I have raised a bug[1] for Debian to do the same.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=834505
The problem is - some pieces of software have had time to be written in a ... let's charitably call it a "focused on amd64" fasion ... with the embedded assumption that anything above virtual address bit 44 is a pointer-tag free-for-all.
On the Debian-ish side, we're coming up on both Ubuntu 16.10 and the freeze for Stretch, leaving a pretty short window to resolve this unholy kernel->initrd->userland triangle.
The applications we know are affected are luajit and mozjs (libv8 is not a problem). But this has a follow-on cost: both of these are used by other packages. Other jit/runtime packages could have their own issues.
The mozjs bug is fixed on trunk, and will hopefully make it into release 49[2], but it remains to be seen if that's too late for some distributions.
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1143022
For luajit, I'm told this has been fixed on 2.1 branch, but not merged to master?
Now, Jeremy (cc:d) tells me the list of currently-known Fedora packages affected by this are: couchdb elinks erlang-js freewrl libEAI libproxy-mozjs mediatomb pacrunner plowshare polkit cinnamon cjs cjs cjs-tests gjs gjs gjs-tests gnome-shell 0ad mongodb mongodb-server
Some of these may only need an updated luajit/mozjs package, but some may need more invasive changes.
Actually that list doesn't include any luajit based packages in Fedora because there's not, upstreamed at least, support for aarch64 in luajit as yet.
Peter