Single zImage at Linaro Connect - linaro-dev

28 Jul 2011


      To everyone, and especially to those who are expected to work on this 
topic next week, please find below a list of tasks that needs to be 
investigated and/or accomplished.  I'll coordinate the work and collect 
patches for the team.
If you have comments on this, or if you know about some omissions, 
please feel free to provide them as a reply to this message.
I'd like to know if people are particularly interested in one (or more :-)) 
items they would like to work on.  If so please say so as well.
Without further ado, here it is:
<><><><><>
0) The so called "single zImage" project
We wish to provide the ability to build as many ARM platforms as 
possible into a single kernel binary image. This will greatly simplify 
the archive packaging and maintenance effort by having only one kernel 
that could be built and booted on multiple ARM targets.  A side effect 
of this is also to enforce better source code architecture even if the 
resulting binaries are not always supporting multiple targets.
This work started a while ago.  Some initial description can be found 
here:
https://wiki.ubuntu.com/Specs/ARMSingleKernel
Part of it has been implemented already, namely the runtime determined 
PHYS_OFFSET, the AUTO_ZRELADDR and some other items referenced below.  
But there is still a large amount of work remaining.
1) Removal of any dependencies on <mach/*.h> from generic header files
To see the current culprits:
$ git grep "#include <mach/.*.h>" arch/arm/include/
arch/arm/include/asm/clkdev.h:#include <mach/clkdev.h>
arch/arm/include/asm/dma.h:#include <mach/isa-dma.h>
arch/arm/include/asm/floppy.h:#include <mach/floppy.h>
arch/arm/include/asm/gpio.h:#include <mach/gpio.h>
arch/arm/include/asm/hardware/dec21285.h:#include <mach/hardware.h>
arch/arm/include/asm/hardware/iop3xx-adma.h:#include <mach/hardware.h>
arch/arm/include/asm/hardware/iop3xx-gpio.h:#include <mach/hardware.h>
arch/arm/include/asm/hardware/sa1111.h:#include <mach/bitfield.h>
arch/arm/include/asm/io.h:#include <mach/io.h>
arch/arm/include/asm/irq.h:#include <mach/irqs.h>
arch/arm/include/asm/mc146818rtc.h:#include <mach/irqs.h>
arch/arm/include/asm/memory.h:#include <mach/memory.h>
arch/arm/include/asm/mtd-xip.h:#include <mach/mtd-xip.h>
arch/arm/include/asm/pci.h:#include <mach/hardware.h> /* for PCIBIOS_MIN_* */
arch/arm/include/asm/pgtable.h:#include <mach/vmalloc.h>
arch/arm/include/asm/system.h:#include <mach/barriers.h>
arch/arm/include/asm/timex.h:#include <mach/timex.h>
arch/arm/include/asm/vga.h:#include <mach/hardware.h>
1.1) mach/memory.h
This may contain the following defines:
1.1.1) ARM_DMA_ZONE_SIZE
This can be eliminated by moving that value into struct machine_desc.
The work is done already, but presented as an example for other tasks: 
http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h...
And as of now this is merged in mainline already for v3.1-rc1.
1.1.2) PLAT_PHYS_OFFSET
Most occurrences can be eliminated.  With CONFIG_ARM_PATCH_PHYS_VIRT, it 
is possible to determine PHYS_OFFSET at run time.  Remains to remove the 
direct uses, mostly by mdesc->boot_params initializers.  Changing 
boot_params into atag_offset has two effects: that makes it clearer that 
it is only about ATAGs and not DT, and a relative offset plays more 
nicely with a runtime determined PHYS_OFFSET.
This work is done but not yet accepted:
http://news.gmane.org/group/gmane.linux.ports.arm.kernel/thread=123480
1.1.3) FLUSH_BASE, FLUSH_BASE_PHYS, FLUSH_BASE_MINICACHE, UNCACHEABLE_ADDR
Those are StrongARM related constants, and different for each variants.
Fixing this involves making the virtual addresses constant for all 
variants, and hiding the differences in the physical addresses during 
the actual mapping.
The solution is here:
http://news.gmane.org/group/gmane.linux.ports.arm.kernel/thread=123477/force...
1.1.4) CONSISTENT_DMA_SIZE
Maybe the CMA work will make this obsolete and the consistent DMA area 
could be dynamically adjusted.  In the mean time, the easiest solution 
is probably to store this in the machine_desc structure just like with 
ARM_DMA_ZONE_SIZE.
This has not been addressed yet.
1.1.5) Other weird things
Some machines have non linear memory maps or bus address translations, 
sparsemem, etc. Examples of that are:
arch/arm/mach-realview/include/mach/memory.h
arch/arm/mach-integrator/include/mach/memory.h
I think solving this is out of scope for this round.  Deleting 
arch/arm/mach-*/include/mach/memory.h can't be done universally.  So a 
new Kconfig symbol (NO_MACH_MEMORY_H) is introduced to indicate which 
machine class has its legacy <mach/memory.h> file removed.  The single 
zImage for multiple targets will be restricted, amongst other things, to 
those machines or SOCs with that symbol selected.  Partial result here:
http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h...
1.2) mach/io.h
This contains things like IO_SPACE_LIMIT, __io(), __mem_pci(), and 
sometimes __arch_ioremap()/__arch_unmap().  but in most cases, the 
definitions here are pretty similar from one machine class to another.
Arnd says:
|I have a plan. When CONFIG_PCI is disabled (along with CONFIG_ISA and
|CONFIG_PCMCIA), we should have neither of IO_SPACE_LIMIT, __io()
|and get no inb/outb functions as a result.
|
|When it is enabled, the 'common' platforms need only one I/O window
|of 64KB, so we should find a common place in the virtual address space
|for that and hardcode __io, while the platform specific PCI initialization
|code (or map_io for that matter) ensures that the window is pointing
|to the physical window.
|
|__arch_ioremap()/__arch_unmap() are not really needed as far as I can
|tell but are used as an optimization to redirect ioremap to the
|hardcoded virtal address mapping. In the first step we can disable
|this for combined kernels, later we can find a generic way so
|__arch_ioremap walks the list of static mappings.
1.3) mach/timex.h
Most instances simply define a dummy CLOCK_TICK_RATE value. This can 
probably be removed altogether, or simply have a common value in 
arch/arm/include/asm/timex.h, as nothing seriously uses that anymore.
Reference: http://lkml.org/lkml/2011/2/21/323
1.4) mach/vmalloc.h
This universally contains only a definition for VMALLOC_END, but not an 
universal definition. Would be nice to have VMALLOC_eND dynamically 
determined from the static IO mappings, but the highmem threshold 
depends on the value of VMALLOC_END, and memory has to be initialized 
before the static IO mappings can be processed.
Therefore the best solution so far appears to use another value in
struct machine_desc for it so it can be set at run time.  this is a 
mechanical conversion that has to be done.
1.5) mach/irqs.h
The only information globally required from those files is the value of 
NR_IRQS.  Yet there is already a nr_irqs member in the machine_desc 
structure for this, used by arch_probe_nr_irqs() in 
arch/arm/kernel/irq.c).
So the first step would be to add
.nr_irqs	= NR_IRQS,
to all machine_desc instances, making sure that <mach/irqs.h> is 
included in those files.  Then, <mach/irqs.h> should be removed from 
arch/arm/include/asm/irq.h, and adjust things so everything still 
compiles.
1.6) mach/gpio.h
This is a tough one.  This depends on CONFIG_GENERIC_GPIO which is 
selected by many machine types.  They should all be converted to (or 
configurable with) CONFIG_GPIOLIB so each SOC's specific GPIO handling 
is made into runtime code instead of static inline functions.  Care to 
preserve the ability to not use gpiolib might be desireable in some 
cases for performance reasons.
Definitely in need of serious investigation.
1.7) mach/mtd-xip.h
No need to care about those.  This is for running the kernel XIP from 
ROM memory.  A XIP kernel is already incompatible with the notion of a 
single kernel image since it obviously can't be modified at run time (as 
needed by CONFIG_ARM_PATCH_PHYS_VIRT).
1.8) mach/isa-dma.h, mach/floppy.h
Those are used by old targets we might not care much about.
1.9) mach/entry-macro.S
This one gets included directly from arch/arm/kernel/entry-armv.S.
The only relevant macro still widely used is get_irqnr_preamble and 
get_irqnr_and_base.  They can be overridden by CONFIG_MULTI_IRQ_HANDLER
and the equivalent code hooked to the handle_irq member of the 
machine_desc structure.
1.10) mach/debug-macro.S
This is used when CONFIG_DEBUG_LL is set.  Supporting that option with a 
single kernel image might prove very difficult with a rapidly 
diminishing return on the investment.
This code is in need of some refactoring already:
http://article.gmane.org/gmane.linux.ports.arm.kernel/118525
To still benefit from the most likely needed debugging aid, we might
consider the ability to still allow the selection of one amongst the
existing implementation when building a kernel with many SOC support.
Obviously that would only work on the one hardware platform for which the selected printch implementation was
designed, but that should be good enough for debugging purposes.
1.11) mach/system.h
This is included from arch/arm/kernel/process.c and expected to provide 
the following static inline functions or equivalent:
1.11.1) arch_idle()
Called when system is idle.  Most of them just call cpu_do_idle().
The call to cpu_do_idle() should be moved to default_idle() and the exception
cases moved out of line where they can be hooked to the pm_idle callback.
1.11.2) arch_reset()
Used to reset the system.  This is far from being a hot path and doesn't 
justify a static inline function.  An out-of-line version hooked to a 
global arch_reset function pointer would work just fine.
1.12) mach/uncompress.h
This is used to define per SOC methods to output some progress feedback 
from the kernel decompressor over a serial port.  Once again, supporting 
this with a single kernel image might prove very difficult with a 
rapidly diminishing return on the investment.  So it is probably best to 
simply use generic empty stubs whenever more than one SOC family is 
configured in a common kernel image.
2) Removal of any dependencies on <mach/*.h> from driver code
A couple possibilities:
a) We move the required header files next to the driver code.  In many 
cases, having a .h file with only the defines relevant to the concerned 
driver is best.  But this is a _lot_ of work.
b) We change those <mach/foo.h> into something more absolute, such as 
<mach/omap2/foo.h>.  This can be done on a per SOC basis, first by 
moving the header files one level deeper, and then fixing up all 
affected drivers.
c) We change those <mach/foo.h> files into something more precise, e.g. 
<mach/omap2_foo.h> and fix concerned drivers.
I think the best solution here is (b) which doesn't preclude (a) 
eventually or if it is trivial.  But (c) is dangerous as files might be 
added easily without paying too much attention to the file prefix.
3) Change thes to the build system
We need to move towards the ability to actually build more than one SOC 
family at the same time.
3.1) Kconfig
This involves changes to Kconfig where currently only one out of all the 
different architectures is selected through the big "ARM system type" 
choice prompt.  We need to determine a good way to move some of them 
into simply bool prompts and keep track of which architecture can be 
built concurrently with which.  We know for instance that it is unlikely 
that pre-ARMv6 and ARMv6/7 will ever be buildable together.  Today we 
know that nothing can be built with anything else and therefore this 
should be the starting default.  This needs investigating.
3.2) Makefile
Currently the arch/arm/Makefile is organized so the lowest instruction 
set level and the highest optimization level are selected from all the 
configured options.  So this part should already be fine.
However the machine-$(*), plat-$(*), machdirs and platdirs variables 
must go.  In (2) above we should have removed the need for adding to the 
global KBUILD_CPPFLAGS to add a path to some specific architecture 
includes already.  Keeping them only for the code under each 
architecture subdirectory should be sufficient.
For example, this might be all that is needed:
obj-$(CONFIG_ARCH_MSM) += mach-msm/
or
obj-$(CONFIG_ARCH_KIRKWOOD) += mach-kirkwood/ plat-orion/
obj-$(CONFIG_ARCH_ORION5X) += mach-orion5x/ plat-orion/
Etc.
And within each of these directories, using the subdir-ccflags-y 
variable to include the locally needed architecture specific include 
files will do the trick.
3.3) defconfig
We need a defconfig file adding as many architectures to it as possible 
for build coverage.  Ideally the resulting binary should be boot tested 
on as many targets it supports as possible.
4) Picking up broken pieces
Things will certainly break along the way.  There are certainly issues 
that I didn't foresee.  My experience so far tend to indicate that 
this is a somewhat recursive process where the tackling of one work item 
reveals a few more which are prerequisite to the first one, etc.  So any 
estimate for this work needs to consider a large fudge factor.
Nicolas