This is a summary of discussions we had on IRC between kernel and toolchain engineers regarding support for JITs and 52-bit virtual address space (mostly in the context of LuaJIT, but this concerns other JITs too).
The summary is that we need to consider ways of reducing the size of VA for a given process or container on a Linux system.
The high-level problem is that JITs tend to use upper bits of addresses to encode various pieces of data, and that the number of available bits is shrinking due to VA size increasing. With the usual 42-bit VA (which is what most JITs assume) they have 22 bits to encode various performance-critical data. With 48-bit VA (e.g., ThunderX world) things start to get complicated, and JITs need to be non-trivially patched at the source level to continue working with less bits available for their performance-critical storage. With upcoming 52-bit VA things might get dire enough for some JITs to declare such configurations unsupported.
On the other hand, most JITs are not expected to requires terabytes of RAM and huge VA for their applications. Most JIT applications will happily live in 42-bit world with mere 4 terabytes of RAM that it provides. Therefore, what JITs need in the modern world is a way to make mmap() return addresses below a certain threshold, and error out with ENOMEM when "lower" memory is exhausted. This is very similar to ADDR_LIMIT_32BIT personality, but extended to common VA sizes on 64-bit systems: 39-bit, 42-bit, 48-bit, 52-bit, etc.
Since we do not want to penalize the whole system (using an artificially low-size VA), it would be best to have a way to enable VA limit on per-process basis (similar to ADDR_LIMIT_32BIT personality). If that's not possible -- then on per-container / cgroup basis. If that's not possible -- then on system level (similar to vm.mmap_min_addr, but from the other end).
Dear kernel people, what can be done to address the JITs need to reduce effective VA size?
--
Maxim Kuvyrkov
www.linaro.org
Changes in cpufreq_06.sh to calculate summatory and
average of frequency measurements.
Bug: The tests for frequency deviation always
fails with 'Err'
Cause and solution:
In the function 'compute_freq_ratio' the frequency
is stored in an array of variables by indirect reference
(http://www.tldp.org/LDP/abs/html/ivr.html)
but the variable 'index' that points to next element
in the array is initialized to 0 every time
hence overriding the values. The same happens for the
function 'compute_freq_ratio_sum' where the 'index'
variable is always initialized to 0 and the variable
'sum' is always set to 0 not adding the subsequent
values. Hence these initializations must be deleted
from those functions and written into the
'function check_deviation' for keeping
the correct values of frequency and calculations of
average and summatory.
With these changes the tests can now calculate
the real values of average for the frequencies
and the deviations can be tested and the tests
now passes.
Signed-off-by: Saul Romero <saul.romero(a)arm.com>
---
cpufreq/cpufreq_06.sh | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
diff --git a/cpufreq/cpufreq_06.sh b/cpufreq/cpufreq_06.sh
index b323dc8..8f1dc22 100755
--- a/cpufreq/cpufreq_06.sh
+++ b/cpufreq/cpufreq_06.sh
@@ -31,7 +31,6 @@ CPUCYCLE=../utils/cpucycle
freq_results_array="results"
compute_freq_ratio() {
- index=0
cpu=$1
freq=$2
@@ -49,25 +48,18 @@ compute_freq_ratio() {
}
compute_freq_ratio_sum() {
- index=0
- sum=0
-
res=$(eval echo \$$freq_results_array$index)
sum=$(echo $sum $res | awk '{ printf "%f", $1 + $2 }')
index=$((index + 1))
-
}
__check_freq_deviation() {
res=$(eval echo \$$freq_results_array$index)
-
if [ ! -z "$res" ]; then
# compute deviation
dev=$(echo $res $avg | awk '{printf "%.3f", (($1 - $2) / $2) * 100}')
-
# change to absolute
dev=$(echo $dev | awk '{ print ($1 >= 0) ? $1 : 0 - $1}')
-
index=$((index + 1))
res=$(echo $dev | awk '{printf "%f", ($dev > 5.0)}')
@@ -85,23 +77,20 @@ check_freq_deviation() {
cpu=$1
freq=$2
-
check "deviation for frequency $(frequnit $freq)" __check_freq_deviation
-
}
check_deviation() {
cpu=$1
-
set_governor $cpu userspace
-
+ index=0
for_each_frequency $cpu compute_freq_ratio
-
+ index=0;sum=0
for_each_frequency $cpu compute_freq_ratio_sum
avg=$(echo $sum $index | awk '{ printf "%.3f", $1 / $2}')
-
+ index=0
for_each_frequency $cpu check_freq_deviation
}
--
2.5.0
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Changes in cpufreq_06.sh to calculate summatory and
average of frequency measurements.
In the function 'compute_freq_ratio' the frequency
is stored in an array of variables by indirect reference
(http://www.tldp.org/LDP/abs/html/ivr.html)
but the variable 'index' that points to next element
in the array is initialized to 0 every time
hence overriding the values. The same happens for the
function 'compute_freq_ratio_sum' where the index
variable is always initialized to 0 and the variable
'sum' is always set to 0 not adding the subsequent
values. Hence this initializations must be deleted
from those functions and written into the
'function check_deviation' for keeping
the correct values and calculations of average.
With these changes the tests now produce the
correct results and the tests passes.
Signed-off-by: Saul Romero <saul.romero(a)arm.com>
---
cpufreq/cpufreq_06.sh | 19 ++++++-------------
1 file changed, 6 insertions(+), 13 deletions(-)
diff --git a/cpufreq/cpufreq_06.sh b/cpufreq/cpufreq_06.sh
index b323dc8..a25bbd2 100755
--- a/cpufreq/cpufreq_06.sh
+++ b/cpufreq/cpufreq_06.sh
@@ -31,7 +31,7 @@ CPUCYCLE=../utils/cpucycle
freq_results_array="results"
compute_freq_ratio() {
- index=0
+ #index=0
cpu=$1
freq=$2
@@ -49,25 +49,21 @@ compute_freq_ratio() {
}
compute_freq_ratio_sum() {
- index=0
- sum=0
+ #index=0
+ #sum=0
res=$(eval echo \$$freq_results_array$index)
sum=$(echo $sum $res | awk '{ printf "%f", $1 + $2 }')
index=$((index + 1))
-
}
__check_freq_deviation() {
res=$(eval echo \$$freq_results_array$index)
-
if [ ! -z "$res" ]; then
# compute deviation
dev=$(echo $res $avg | awk '{printf "%.3f", (($1 - $2) / $2) * 100}')
-
# change to absolute
dev=$(echo $dev | awk '{ print ($1 >= 0) ? $1 : 0 - $1}')
-
index=$((index + 1))
res=$(echo $dev | awk '{printf "%f", ($dev > 5.0)}')
@@ -85,23 +81,20 @@ check_freq_deviation() {
cpu=$1
freq=$2
-
check "deviation for frequency $(frequnit $freq)" __check_freq_deviation
-
}
check_deviation() {
cpu=$1
-
set_governor $cpu userspace
-
+ index=0
for_each_frequency $cpu compute_freq_ratio
-
+ index=0;sum=0
for_each_frequency $cpu compute_freq_ratio_sum
avg=$(echo $sum $index | awk '{ printf "%.3f", $1 / $2}')
-
+ index=0
for_each_frequency $cpu check_freq_deviation
}
--
2.5.0
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
"No amount of careful planning will ever replace dumb luck."
~ Anonymous
The Linaro 16.05 release is now available for download!
Using the Android-based images
=======================
The Android-based images come in three parts: system, userdata and boot.
These need to be combined to form a complete Android install. For an
explanation of how to do this please see:
http://wiki.linaro.org/Platform/Android/ImageInstallation
If you are interested in getting the source and building these images
yourself please see the following pages:
http://wiki.linaro.org/Platform/Android/GetSourcehttp://wiki.linaro.org/Platform/Android/BuildSource
Using the OpenEmbedded-based images
=======================
With the Linaro provided downloads and with ARM’s Fast Models virtual
platform, you can boot a virtual ARMv8 system and run 64-bit binaries.
For more information please see:
http://www.linaro.org/engineering/armv8
Using the Debian-based images
=======================
The Debian-based images consist of two parts. The first part is a
hardware pack, which can be found under the hwpacks directory and
contains hardware specific packages (such as the kernel and bootloader).
The second part is the rootfs, which is combined with the hardware pack
to create a complete image. For more information on how to create an
image please see:
http://wiki.linaro.org/Platform/DevPlatform/Ubuntu/ImageInstallation
Getting involved
============
More information on Linaro can be found on our websites:
* Homepage:
http://www.linaro.org
* Wiki:
http://wiki.linaro.org
Also subscribe to the important Linaro mailing lists and join our IRC
channels to stay on top of Linaro developments:
* Announcements:
http://lists.linaro.org/mailman/listinfo/linaro-announce
* Development:
http://lists.linaro.org/mailman/listinfo/linaro-dev
* IRC:
#linaro on irc.linaro.org or irc.freenode.net
#linaro-android irc.linaro.org or irc.freenode.net
Known issues with this release
=====================
Bug reports for this release should be filed in Bugzilla
(http://bugs.linaro.org) against the
individual packages or projects that are affected.
On behalf of the release team,
Koen Kooi
Builds and Baselines | Release Manager
Linaro.org | Open source software for ARM SoCs
On Fri, May 06, 2016 at 02:03:42PM -0400, Jon Masters wrote:
> On 05/06/2016 01:10 PM, Mark Brown wrote:
> > On Fri, May 06, 2016 at 12:20:40PM -0400, Jon Masters wrote:
> > > But is it really worth trying after so long of the right thing not
> > > happening? If anyone really cared about making general purpose distros boot
> > > on embedded boards, efforts to compel standards would have happened years
> > > ago. To do it right, we would need to have a couple of vendors involved who
> > > could compel vendors to comply.
> Note: by standards above, I specifically mean "separate platform flash" in
> addition to all of the other associated things. Actually, to do it right you
Oh, right. In that case this is all irrelevant anyway, there's no need
to worry about magic areas of the disk and distros can just do whatever.
> > Distros care and currently do ship on such systems - Debian stable lists
> > a bunch of boards (something like 20 IIRC) as actively tested for
> > example. The board and SoC vendors are to an extent irrelevant here,
> I get your point, but for separate flash and getting vendors to ship
> firmware, they very much need to be involved. Today, we can't agree as an
> industry who is on the hook for this. With an enterprise hat on, I get to
Personally I'm not convinced it's particularly worth worrying about -
there's enough sensible ways to build systems where the separate storage
for bootloaders just isn't solving problems people have that such things
are going to be around for a while and inevitably people will end up
wanting to run distros on them so at least the community distros need to
work with them.
> compel vendors to do the only sane thing (in my opinion) which is "thou
> shalt ship EFI, on flash that we don't touch". And those who screwed up and
> put EFI parameters on hidden disk partitions, or thought EFI variables were
> a place to store MAC and platform parameters are slowly found and forced to
> comply with the way the industry works. But on embedded, spending a few
> cents to do the "right" thing is something that isn't going to happen unless
> everyone mandates and pushes for it.
It's not just cents, it's also things like board area, manufacturing
process, usage models and for some use cases customers who want full
control over the software stack. For some kinds of system like the
enterprise market what you're describing is absolutely the right way to
go but there's other segments where it either isn't solving problems
people have or is currently actively worse so there's no compelling
reason to adopt. But this is a bit off topic...
On Fri, May 06, 2016 at 12:20:40PM -0400, Jon Masters wrote:
> But is it really worth trying after so long of the right thing not
> happening? If anyone really cared about making general purpose distros boot
> on embedded boards, efforts to compel standards would have happened years
> ago. To do it right, we would need to have a couple of vendors involved who
> could compel vendors to comply.
Distros care and currently do ship on such systems - Debian stable lists
a bunch of boards (something like 20 IIRC) as actively tested for
example. The board and SoC vendors are to an extent irrelevant here,
when people do this they often don't use any of the software the board
vendors provide and just use things like upstream u-boot. Coming up
with something that covers all the cases with minimal code for the
distros will make life easier for them, if board vendors want to pick it
up as well in what they're shipping (and ideally in how they spec their
media usage) that's great but we're already winning even if all the
board vendors totally ignore it.
On Fri, May 6, 2016 at 2:10 AM, Tom Rini <trini(a)konsulko.com> wrote:
> On Thu, May 05, 2016 at 10:21:25PM +0200, Alexander Graf wrote:
>> On 05.05.16 17:21, Grant Likely wrote:
>> > On Thu, May 5, 2016 at 12:45 PM, Marcin Juszkiewicz
>> > <marcin.juszkiewicz(a)linaro.org> wrote:
>> >> Recently my angry post on Google+ [1] got so many comments that it was clear
>> >> that it would be better to move to some mailing list with discussion.
>> >>
>> >> As it is about boot loaders and Linaro has engineers from most of SoC vendor
>> >> companies I thought that this will be best one.
>> >>
>> >> 1. https://plus.google.com/u/0/+MarcinJuszkiewicz/posts/J79qhndV6FY
>> >>
>> >>
>> >> All started when I got Pine64 board (based on Allwinner A64 SoC) and had
>> >> same issue as on several boards in past - boot loader written in some random
>> >> place on SD card.
>> >>
>> >> Days where people used Texas Instruments SoC chips were great - in-cpu boot
>> >> loader knew how to read MBR partition table and was able to load 1st stage
>> >> boot loader (called MLO) from it as long it was FAT filesystem.
>> >>
>> >> GPU used by Raspberry/Pi is able to read MBR, finds 1st partition and reads
>> >> firmware files from there as long it is FAT.
>> >>
>> >> Chromebooks have some SPI flash to keep boot loaders and use GPT
>> >> partitioning to find where from load kernel (or another boot loader).
>> >>
>> >> And then we have all those boards where vendors decided that SPI flash for
>> >> boot loader is too expensive so it will be read from SD card instead. From
>> >> any random place of course...
>> >>
>> >>
>> >> Then we have distributions. And instead of generating bunch of images per
>> >> board they want to make one clean image which will be able to handle as much
>> >> as possible.
>> >>
>> >> If there are UEFI machines on a list of supported ones then GPT partitioning
>> >> will be used, boot loader will be stored in "EFI system area" and it boots.
>> >> This is how AArch64 SBSA/SBBR machines work.
>> >>
>> >> But there are also all those U-Boot (or fastboot/redboot/whateverboot) ones.
>> >> They are usually handled by taking image from previous stage and adding boot
>> >> loader(s) by separate script. And this is where "fun" starts...
>> >>
>> >> GPT takes first 17KB of storage media as it allow to store information about
>> >> 128 partitions. Sure, no one is using so many on such devices but still
>> >> space is reserved.
>> >>
>> >> But most of chips expects boot loader(s) to be stored:
>> >>
>> >> - right after MBR
>> >> - from 1KB
>> >> - from 8KB
>> >> - any other random place
>> >>
>> >> So scripts start to be sets of magic written to handle all those SoCs...
>> >>
>> >> Solution for existing SoCs is usually adding 1MB of SPI flash during design
>> >> phase of device and store boot loader(s) there. But it is so expensive
>> >> someone would say when it is in 10-30 cents range...
>> >>
>> >
>> > To try and summarize, what you're asking for is to define the usage
>> > model for eMMC/SD when both the firmware* and OS are stored on the
>> > same media. Some argue that these things should always be on separate
>> > devices, but while the debate is interesting, it doesn't match the
>> > reality of how hardware is being built. In which case, the derived
>> > requirements are:
>> >
>> > 1) Co-exist with MBR partitioning
>> > 2) Co-exist with GPT partitioning
>> > 3) Be detectable --- partitioning tools must respect it
>> > 4) Be speced. Write it down so that tool and SoC developers can see it
>> > as a requirement
>> > 5) Be usable regardless of firmware type (UEFI, U-Boot, Little Kernel, etc)
>> > 6) Support some form of firmware non-volatile storage (variable storage)
>> >
>> > It would be really nice if we could also have:
>> > 7) Support SoCs that hardcode boot code to specific locations
>> > (after-MBR, 1K, 8K, random)
>> > - May not be able to support all variants, but it is a worthy design goal.
>> >
>> > Agreed?
>> >
>> > * I'm ignoring eMMC's separate boot area because that solution has
>> > firmware and OS logically separated. Strong recommendation is for SoCs
>> > to boot from boot area. Then normal GPT/MBR partitioning works just
>> > fine. The rest of this discussion only applies If the SoC cannot do
>> > that
>> >
>> > (For the following discussion, I refer to the UEFI spec because that
>> > is where GPT is defined, but the expectation is that anything
>> > described here can equally be used by non-UEFI platforms)
>> >
>> > I've just read through the UEFI GPT spec, and here are the constraints:
>> > - MBR must be at the start of LBA0 (0 - 0.5k)
>> > - Primary GPT must be at the start of LBA1 (0.5k to 4k, but may
>> > collide with fw),
>> > - It /seems/ like the GPT Header and GPT table can be separated by
>> > some blocks. The GPT header has a PartitionEntryLBA field which
>> > describes where the actual table of partitions starts.
>> > - GPTHeader is only 92 bytes.
>> > - It should be possible to have: GPTHeader @ start of LBA1 and
>> > GPTPartitionTable @ an LBA that doesn't conflict with firmware.
>> >
>> > I think we have everything we need to work around the location of the
>> > FW boot image without breaking the UEFI spec. The biggest problem is
>> > making sure partitioning tools don't go stomping over required
>> > firmware data and rendering systems unbootable. I *think* we can solve
>> > that problem by extending the MBR definition to block out a required
>> > region and then work around that. Tools can generically see the
>> > special region in the MBR and work around it accordingly.
>>
>> So what's the goal here? Are we trying to force GPT on systems whose
>> vendors never intended them to run with GPT?
>>
>> It really shouldn't matter at the end of the day whether we use GPT or
>> MBR. All uEFI firmware implementations I'm aware of support both. So if
>> you have a device whose bootloader collides with the GPT, just use MBR.
>>
>> As for the "protection" mechanism, I don't think it's a problem either.
>> IIRC parted starts to create partitions with a sensible alignment (1 or
>> 2MB). Most boot loaders fit perfectly fine within that gap.
>>
>> So this really boils down to
>>
>> - use GPT for systems that were designed for it or
>> - use MBR with alignment gap to first partition
>>
>> end of story. There shouldn't be any middle ground. At least I haven't
>> seen any so far :).
That is a good point. MBR is a pain to deal with, but I don't think
there is anything that it is absolutely required for in UEFI land.
> This, for many use cases is also true. A reason that various SoCs pick
> a magic location that is MBR compatible and not strictly GPT compatible
> is that they don't see a use case for GPT-not-MBR being used.
> By-and-large saying your SD card shall have an MBR and shall leave a gap
> that is also well aligned for the card anyhow is enough for (enough)
> firmware to reside in the magic locations.
I think the key issue is how do the tools know what they are allowed
to do. If (for example) the system is booted into a recovery/install
image and needs to repartition and install onto the eMMC, can we get
away from the tools requiring board specific knowledge? A generic
partitioning/install tool needs to know:
- Is an MBR required (ie. SoC reads MBR to find firmware)
- Is FW location at a fixed offset?
- Is GPT supported?
If the tools can get that information, then it can make good decisions
when reimaging a device, and it will make Marcin happier I'm sure. :-)
I /don't/ think the general tools should need to know how to install
the firmware itself. That is still a nasty board-specific problem....
(although even here we could make things better if we had a spec for
managing the SoC's FW partition. I would like to see separate steps
for FW provisioning, and OS install. ie. A board-specific tool to prep
an SD card with the bare minimum for FW, and then generic tools to
complete partitioning and install.
For on-board devices (eMMC), FW provisioning should only be needed once
For removable devices (SD), FW provisioning is only needed when FW
must be on SD. (ie. no-onboard eMMC, or for FW recovery/upgrade)
g.
On Fri, May 6, 2016 at 2:10 AM, Tom Rini <trini(a)konsulko.com> wrote:
> On Thu, May 05, 2016 at 05:30:55PM +0100, Grant Likely wrote:
>> On Thu, May 5, 2016 at 4:59 PM, Mark Brown <broonie(a)kernel.org> wrote:
>> > On Thu, May 05, 2016 at 09:01:05PM +0530, Amit Kucheria wrote:
>> >> On Thu, May 5, 2016 at 5:15 PM, Marcin Juszkiewicz
>> >
>> >> > Solution for existing SoCs is usually adding 1MB of SPI flash during design
>> >> > phase of device and store boot loader(s) there. But it is so expensive
>> >> > someone would say when it is in 10-30 cents range...
>> >
>> >> > Even 96boards CE specification totally ignored that fact while it could be a
>> >> > way of showing how to make popular board. Instead it became
>> >> > yet-another-board-to-laugh (EE spec did not improve much).
>> >
>> >> > Is there a way to get it improved? At least for new designs?
>> >
>> >> Yes! I've added this suggestion to a list of suggestions for evolution
>> >> of the 96boards spec.
>> >
>> > We already went round the houses repeatedly on that one :(
>>
>> It's not so dismal as all that! With the board size and price point
>> the CE spec aims at, it is very unlikely that vendors will build
>> boards with a separate storage device for FW.* However, by and large
>> these platforms will be built with eMMC and eMMC has a separate boot
>> area that is logically separate from the main pool. We can add a
>> requirement for future versions to boot from the boot area when
>> booting from eMMC.
>
> Saying that supporting the eMMC boot partitions shall be bootable AND
> documenting how the various relevant EXT_CSD flags need to be set be
> public is important. It won't do everyone good if you can't program the
> board to boot from the boot partition.
Yes, of course. This is definitely an aspirational item. It doesn't do
any good for current SoCs that already have the boot scheme fixed, but
it is a place where we can put pressure on vendors for what to do in
future SoCs.
g.
On 05/05/2016 12:05 PM, Bill Gatliff wrote:
>
> On Thu, May 5, 2016 at 11:50 AM Martin Stadtler
> <martin.stadtler(a)linaro.org <mailto:martin.stadtler@linaro.org>> wrote:
>
> Specifically for the 96boards, the spec is a recommended view, but
> its not meant to be constraining, however it does allow one to
> then show a best practice, that others can adopt. That's where
> the RPB comes in to play, again to demonstrate and not restrict.
>
>
> Sorry to jump in on this, since my horse in this race is pretty small...
>
> That whole "best practice" point is REALLY important. But it's more
> nuanced than just "do it this way":
>
> A. Vendors will do what they do, and they'll have their reasons. If
> the community offers them a "best practices" guideline, especially one
> that's easy to adopt (in full or in part), then hopefully they'll be
> less likely to stray.
>
> B. If that best-practices document offers more than a
> one-size-fits-all recommendation, then so much the better. Again, it
> keeps necessary variants closer to the community than they might have
> been; a partial victory isn't a total loss.
>
> C. (This is my main point.) When I see a document that says "best
> practices", then I understand that there's no binding requirement per
> se, and that it's likely that there will be deviations.
>
> D. When that's the case, I'm more likely to produce code that's easily
> adapted to the various permutations of best-practices that I
> encounter, often with the blessing of my customer. Doubly so if those
> variations are predictable.
>
> Otherwise, a customer will tell me to just "code to the
> requirement"---and that's all they'll pay for. Then my solution isn't
> likely to be as widely deployable, which cuts off opportunities to
> recycle that solution back to my point (A), even if the code becomes
> public.
>
> I can't always code for every possibility. But, with a best-practices
> guideline that says "if you can do it X way, then do so; if you can't,
> but you can do Y, that's almost as good; else, for gods' sake don't do
> Z", I can better plan for where the changes might arise later.
>
> Maybe I won't code the solution for everyone, but I'm likely to get a
> lot closer than I would have.
One thing I like to see in Best Practices guide is what is the benefit
of the following the best practice. From that it is much easier to
make the assessment of whether the promised result is worth following
the guidance. All of the best practices people here are talking about
appear to be geared toward a frictionless connection to the ARM Linux
ecosystem. That's something many software focused Linaro participants
care about, but is that something manufacturers care about? Usually I
only hear about saving pennies leading to profits at scale being a
priority. So if we can talk up gaining scale by following the
practices, there's a better chance members will listen.
--
Brendan Conoboy / RHEL Development Coordinator / Red Hat, Inc.