To everyone, and especially to those who are expected to work on this topic next week, please find below a list of tasks that needs to be investigated and/or accomplished. I'll coordinate the work and collect patches for the team.
If you have comments on this, or if you know about some omissions, please feel free to provide them as a reply to this message.
I'd like to know if people are particularly interested in one (or more :-)) items they would like to work on. If so please say so as well.
Without further ado, here it is:
<><><><><>
0) The so called "single zImage" project
We wish to provide the ability to build as many ARM platforms as possible into a single kernel binary image. This will greatly simplify the archive packaging and maintenance effort by having only one kernel that could be built and booted on multiple ARM targets. A side effect of this is also to enforce better source code architecture even if the resulting binaries are not always supporting multiple targets.
This work started a while ago. Some initial description can be found here:
https://wiki.ubuntu.com/Specs/ARMSingleKernel
Part of it has been implemented already, namely the runtime determined PHYS_OFFSET, the AUTO_ZRELADDR and some other items referenced below. But there is still a large amount of work remaining.
1) Removal of any dependencies on <mach/*.h> from generic header files
To see the current culprits:
$ git grep "#include <mach/.*.h>" arch/arm/include/ arch/arm/include/asm/clkdev.h:#include <mach/clkdev.h> arch/arm/include/asm/dma.h:#include <mach/isa-dma.h> arch/arm/include/asm/floppy.h:#include <mach/floppy.h> arch/arm/include/asm/gpio.h:#include <mach/gpio.h> arch/arm/include/asm/hardware/dec21285.h:#include <mach/hardware.h> arch/arm/include/asm/hardware/iop3xx-adma.h:#include <mach/hardware.h> arch/arm/include/asm/hardware/iop3xx-gpio.h:#include <mach/hardware.h> arch/arm/include/asm/hardware/sa1111.h:#include <mach/bitfield.h> arch/arm/include/asm/io.h:#include <mach/io.h> arch/arm/include/asm/irq.h:#include <mach/irqs.h> arch/arm/include/asm/mc146818rtc.h:#include <mach/irqs.h> arch/arm/include/asm/memory.h:#include <mach/memory.h> arch/arm/include/asm/mtd-xip.h:#include <mach/mtd-xip.h> arch/arm/include/asm/pci.h:#include <mach/hardware.h> /* for PCIBIOS_MIN_* */ arch/arm/include/asm/pgtable.h:#include <mach/vmalloc.h> arch/arm/include/asm/system.h:#include <mach/barriers.h> arch/arm/include/asm/timex.h:#include <mach/timex.h> arch/arm/include/asm/vga.h:#include <mach/hardware.h>
1.1) mach/memory.h
This may contain the following defines:
1.1.1) ARM_DMA_ZONE_SIZE
This can be eliminated by moving that value into struct machine_desc. The work is done already, but presented as an example for other tasks: http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h... And as of now this is merged in mainline already for v3.1-rc1.
1.1.2) PLAT_PHYS_OFFSET
Most occurrences can be eliminated. With CONFIG_ARM_PATCH_PHYS_VIRT, it is possible to determine PHYS_OFFSET at run time. Remains to remove the direct uses, mostly by mdesc->boot_params initializers. Changing boot_params into atag_offset has two effects: that makes it clearer that it is only about ATAGs and not DT, and a relative offset plays more nicely with a runtime determined PHYS_OFFSET.
This work is done but not yet accepted: http://news.gmane.org/group/gmane.linux.ports.arm.kernel/thread=123480
1.1.3) FLUSH_BASE, FLUSH_BASE_PHYS, FLUSH_BASE_MINICACHE, UNCACHEABLE_ADDR
Those are StrongARM related constants, and different for each variants. Fixing this involves making the virtual addresses constant for all variants, and hiding the differences in the physical addresses during the actual mapping.
The solution is here: http://news.gmane.org/group/gmane.linux.ports.arm.kernel/thread=123477/force...
1.1.4) CONSISTENT_DMA_SIZE
Maybe the CMA work will make this obsolete and the consistent DMA area could be dynamically adjusted. In the mean time, the easiest solution is probably to store this in the machine_desc structure just like with ARM_DMA_ZONE_SIZE.
This has not been addressed yet.
1.1.5) Other weird things
Some machines have non linear memory maps or bus address translations, sparsemem, etc. Examples of that are:
arch/arm/mach-realview/include/mach/memory.h arch/arm/mach-integrator/include/mach/memory.h
I think solving this is out of scope for this round. Deleting arch/arm/mach-*/include/mach/memory.h can't be done universally. So a new Kconfig symbol (NO_MACH_MEMORY_H) is introduced to indicate which machine class has its legacy <mach/memory.h> file removed. The single zImage for multiple targets will be restricted, amongst other things, to those machines or SOCs with that symbol selected. Partial result here: http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h...
1.2) mach/io.h
This contains things like IO_SPACE_LIMIT, __io(), __mem_pci(), and sometimes __arch_ioremap()/__arch_unmap(). but in most cases, the definitions here are pretty similar from one machine class to another.
Arnd says:
|I have a plan. When CONFIG_PCI is disabled (along with CONFIG_ISA and |CONFIG_PCMCIA), we should have neither of IO_SPACE_LIMIT, __io() |and get no inb/outb functions as a result. | |When it is enabled, the 'common' platforms need only one I/O window |of 64KB, so we should find a common place in the virtual address space |for that and hardcode __io, while the platform specific PCI initialization |code (or map_io for that matter) ensures that the window is pointing |to the physical window. | |__arch_ioremap()/__arch_unmap() are not really needed as far as I can |tell but are used as an optimization to redirect ioremap to the |hardcoded virtal address mapping. In the first step we can disable |this for combined kernels, later we can find a generic way so |__arch_ioremap walks the list of static mappings.
1.3) mach/timex.h
Most instances simply define a dummy CLOCK_TICK_RATE value. This can probably be removed altogether, or simply have a common value in arch/arm/include/asm/timex.h, as nothing seriously uses that anymore.
Reference: http://lkml.org/lkml/2011/2/21/323
1.4) mach/vmalloc.h
This universally contains only a definition for VMALLOC_END, but not an universal definition. Would be nice to have VMALLOC_eND dynamically determined from the static IO mappings, but the highmem threshold depends on the value of VMALLOC_END, and memory has to be initialized before the static IO mappings can be processed.
Therefore the best solution so far appears to use another value in struct machine_desc for it so it can be set at run time. this is a mechanical conversion that has to be done.
1.5) mach/irqs.h
The only information globally required from those files is the value of NR_IRQS. Yet there is already a nr_irqs member in the machine_desc structure for this, used by arch_probe_nr_irqs() in arch/arm/kernel/irq.c).
So the first step would be to add
.nr_irqs = NR_IRQS,
to all machine_desc instances, making sure that <mach/irqs.h> is included in those files. Then, <mach/irqs.h> should be removed from arch/arm/include/asm/irq.h, and adjust things so everything still compiles.
1.6) mach/gpio.h
This is a tough one. This depends on CONFIG_GENERIC_GPIO which is selected by many machine types. They should all be converted to (or configurable with) CONFIG_GPIOLIB so each SOC's specific GPIO handling is made into runtime code instead of static inline functions. Care to preserve the ability to not use gpiolib might be desireable in some cases for performance reasons.
Definitely in need of serious investigation.
1.7) mach/mtd-xip.h
No need to care about those. This is for running the kernel XIP from ROM memory. A XIP kernel is already incompatible with the notion of a single kernel image since it obviously can't be modified at run time (as needed by CONFIG_ARM_PATCH_PHYS_VIRT).
1.8) mach/isa-dma.h, mach/floppy.h
Those are used by old targets we might not care much about.
1.9) mach/entry-macro.S
This one gets included directly from arch/arm/kernel/entry-armv.S. The only relevant macro still widely used is get_irqnr_preamble and get_irqnr_and_base. They can be overridden by CONFIG_MULTI_IRQ_HANDLER and the equivalent code hooked to the handle_irq member of the machine_desc structure.
1.10) mach/debug-macro.S
This is used when CONFIG_DEBUG_LL is set. Supporting that option with a single kernel image might prove very difficult with a rapidly diminishing return on the investment.
This code is in need of some refactoring already: http://article.gmane.org/gmane.linux.ports.arm.kernel/118525
To still benefit from the most likely needed debugging aid, we might consider the ability to still allow the selection of one amongst the existing implementation when building a kernel with many SOC support. Obviously that would only work on the one hardware platform for which the selected printch implementation was designed, but that should be good enough for debugging purposes.
1.11) mach/system.h
This is included from arch/arm/kernel/process.c and expected to provide the following static inline functions or equivalent:
1.11.1) arch_idle()
Called when system is idle. Most of them just call cpu_do_idle(). The call to cpu_do_idle() should be moved to default_idle() and the exception cases moved out of line where they can be hooked to the pm_idle callback.
1.11.2) arch_reset()
Used to reset the system. This is far from being a hot path and doesn't justify a static inline function. An out-of-line version hooked to a global arch_reset function pointer would work just fine.
1.12) mach/uncompress.h
This is used to define per SOC methods to output some progress feedback from the kernel decompressor over a serial port. Once again, supporting this with a single kernel image might prove very difficult with a rapidly diminishing return on the investment. So it is probably best to simply use generic empty stubs whenever more than one SOC family is configured in a common kernel image.
2) Removal of any dependencies on <mach/*.h> from driver code
A couple possibilities:
a) We move the required header files next to the driver code. In many cases, having a .h file with only the defines relevant to the concerned driver is best. But this is a _lot_ of work.
b) We change those <mach/foo.h> into something more absolute, such as <mach/omap2/foo.h>. This can be done on a per SOC basis, first by moving the header files one level deeper, and then fixing up all affected drivers.
c) We change those <mach/foo.h> files into something more precise, e.g. <mach/omap2_foo.h> and fix concerned drivers.
I think the best solution here is (b) which doesn't preclude (a) eventually or if it is trivial. But (c) is dangerous as files might be added easily without paying too much attention to the file prefix.
3) Change thes to the build system
We need to move towards the ability to actually build more than one SOC family at the same time.
3.1) Kconfig
This involves changes to Kconfig where currently only one out of all the different architectures is selected through the big "ARM system type" choice prompt. We need to determine a good way to move some of them into simply bool prompts and keep track of which architecture can be built concurrently with which. We know for instance that it is unlikely that pre-ARMv6 and ARMv6/7 will ever be buildable together. Today we know that nothing can be built with anything else and therefore this should be the starting default. This needs investigating.
3.2) Makefile
Currently the arch/arm/Makefile is organized so the lowest instruction set level and the highest optimization level are selected from all the configured options. So this part should already be fine.
However the machine-$(*), plat-$(*), machdirs and platdirs variables must go. In (2) above we should have removed the need for adding to the global KBUILD_CPPFLAGS to add a path to some specific architecture includes already. Keeping them only for the code under each architecture subdirectory should be sufficient.
For example, this might be all that is needed:
obj-$(CONFIG_ARCH_MSM) += mach-msm/
or
obj-$(CONFIG_ARCH_KIRKWOOD) += mach-kirkwood/ plat-orion/ obj-$(CONFIG_ARCH_ORION5X) += mach-orion5x/ plat-orion/
Etc.
And within each of these directories, using the subdir-ccflags-y variable to include the locally needed architecture specific include files will do the trick.
3.3) defconfig
We need a defconfig file adding as many architectures to it as possible for build coverage. Ideally the resulting binary should be boot tested on as many targets it supports as possible.
4) Picking up broken pieces
Things will certainly break along the way. There are certainly issues that I didn't foresee. My experience so far tend to indicate that this is a somewhat recursive process where the tackling of one work item reveals a few more which are prerequisite to the first one, etc. So any estimate for this work needs to consider a large fudge factor.
Nicolas
On Wed, Jul 27, 2011 at 10:58:36PM -0400, Nicolas Pitre wrote:
To everyone, and especially to those who are expected to work on this topic next week, please find below a list of tasks that needs to be investigated and/or accomplished. I'll coordinate the work and collect patches for the team.
If you have comments on this, or if you know about some omissions, please feel free to provide them as a reply to this message.
I'd like to know if people are particularly interested in one (or more :-)) items they would like to work on. If so please say so as well.
[...]
Currently, device tree is not fully supported upstream for vexpress. Lorenzo Pieralisi wrote some patches, but there were a few outstanding issues and these weren't merged yet.
It could make sense for me to take a look at this, since vexpress is also the base for our initial Cortex-A15 refernece platform. With the right people around next week, we have a chance to get any issues thrashed out more quickly.
Is this a good idea for next week, or should we be focusing more on core issues?
Some more thoughts:
1.1.5) Other weird things
Some machines have non linear memory maps or bus address translations, sparsemem, etc. Examples of that are:
arch/arm/mach-realview/include/mach/memory.h arch/arm/mach-integrator/include/mach/memory.h
I think solving this is out of scope for this round. Deleting arch/arm/mach-*/include/mach/memory.h can't be done universally. So a new Kconfig symbol (NO_MACH_MEMORY_H) is introduced to indicate which machine class has its legacy <mach/memory.h> file removed. The single zImage for multiple targets will be restricted, amongst other things, to those machines or SOCs with that symbol selected. Partial result here: http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h...
One possibility here is to enable any non-sparsemem platform to be built with sparsemem enabled by just defining a single memory block in this case to encompass the platform's RAM. I believe that platforms which have small I/O holes in their memory maps can continue to use memblock_reserve techniques as before -- there's no need to represent the holes via sparsemem (which would be expensive)
Having talked to Will Deacon about this, it sounds like the feasibiltiy and performance impact may depend on how often things like pfn_valid get called when sparsemem is enabled.
Is this worth looking into?
I have a slightly biased interest in this, since ARM seems to like funky memory maps for many of its newer boards, and it would be unfortunate for these to get left out of the whole single effort.
Of course we could include those platforms in non-sparsemem kernels, but since that will often mean sacrificing half the RAM, this is far from ideal.
LPAE (Seems to fit neatly under "weird things" for now ;)
Do we care strongly about supporting LPAE and non-LPAE platforms in a single kernel?
I think this is probably out of scope for now, becuase it could end up being contraversial if trying to support both of these touches arch-independent code. Also, for reasons which should be obvious, LPAE is not too useful without HIGHMEM turned on.
(LPAE is the large physical address extension, which adds the new 3-level page-table format for accessing physical memory > 4GB on Cortex-A15)
Catalin's LPAE patches are still not upsteam, since Russell has a few outstandig issues with them.
Anyone interested can take a look at:
http://git.kernel.org/?p=linux/kernel/git/maz/ael-kernel.git%3Ba=shortlog%3B...
or
http://git.linaro.org/gitweb?p=people/dmart/linux-2.6-arm.git%3Ba=shortlog%3... (which is my merge of the AEL patches on top of the current linaro tree)
Note that so long as hardware designers are sensible and keep their peripherals < 4GB, it's possible to run a non-LPAE kernel on a Cortex-A15 platform; but the implication is that you will probably lose access to a lot of RAM.
[...]
1.10) mach/debug-macro.S
This is used when CONFIG_DEBUG_LL is set. Supporting that option with a single kernel image might prove very difficult with a rapidly diminishing return on the investment.
This code is in need of some refactoring already: http://article.gmane.org/gmane.linux.ports.arm.kernel/118525
To still benefit from the most likely needed debugging aid, we might consider the ability to still allow the selection of one amongst the existing implementation when building a kernel with many SOC support. Obviously that would only work on the one hardware platform for which the selected printch implementation was designed, but that should be good enough for debugging purposes.
DT is precisely for solving this kind of problem... We would be dependent on examining the device tree very earlier, however. It looks like the code in drivers/of/ won't work in the zImage loader environment without a lot of modifications; so we might need to create a separate, very minimal lightweight parser for this.
Then we can build all the relevant printch() implementations into the kernel and also into the zImage loader, and pick the right one based on the DT? The DT could define a special alias for the UART available for low-level debug.
We could even embed the printch() function body into the DT if we wanted to make the kernel's job even simpler. Realistic implementations of this function are tiny, so this shouldn't be too cumbersome. That really seems an abuse of the DT though, since this goes beyond just describing the hardware.
[...]
1.12) mach/uncompress.h
This is used to define per SOC methods to output some progress feedback from the kernel decompressor over a serial port. Once again, supporting this with a single kernel image might prove very difficult with a rapidly diminishing return on the investment. So it is probably best to simply use generic empty stubs whenever more than one SOC family is configured in a common kernel image.
Is this a separate problem from low-level debug? It feels like if we solve one problem, we'll have created all the infrastructure to solve the other problem trivially.
Neither of these is needed for single zImage to work though, so as you suggest it may be best to postpone these until later.
[...]
- Change thes to the build system
[...]
One other change I think we'll eventually want is the ability to discard all the unused parts of the kernel at runtime, after the platform has been identified.
If we build many platforms into a single zImage, we potentially end up with rather a lot unused code when booting the kernel on any specific platform; this applies to any board support code as well as built-in drivers for hardware which is only applicable to a cartain board family.
We ought to think about the implications of this, and what metadata would be needed in order for the kernel to work out what can be discarded.
This also feels like something we can consider later though; it isn't required for single zImage to work.
Cheers ---Dave
On Thu, 28 Jul 2011, Dave Martin wrote:
On Wed, Jul 27, 2011 at 10:58:36PM -0400, Nicolas Pitre wrote:
To everyone, and especially to those who are expected to work on this topic next week, please find below a list of tasks that needs to be investigated and/or accomplished. I'll coordinate the work and collect patches for the team.
If you have comments on this, or if you know about some omissions, please feel free to provide them as a reply to this message.
I'd like to know if people are particularly interested in one (or more :-)) items they would like to work on. If so please say so as well.
[...]
Currently, device tree is not fully supported upstream for vexpress. Lorenzo Pieralisi wrote some patches, but there were a few outstanding issues and these weren't merged yet.
It could make sense for me to take a look at this, since vexpress is also the base for our initial Cortex-A15 refernece platform. With the right people around next week, we have a chance to get any issues thrashed out more quickly.
Is this a good idea for next week, or should we be focusing more on core issues?
DT enablement will also be an important topic next week. I'm sure you'll be appreciated whatever you work on.
Some more thoughts:
1.1.5) Other weird things
Some machines have non linear memory maps or bus address translations, sparsemem, etc. Examples of that are:
arch/arm/mach-realview/include/mach/memory.h arch/arm/mach-integrator/include/mach/memory.h
I think solving this is out of scope for this round. Deleting arch/arm/mach-*/include/mach/memory.h can't be done universally. So a new Kconfig symbol (NO_MACH_MEMORY_H) is introduced to indicate which machine class has its legacy <mach/memory.h> file removed. The single zImage for multiple targets will be restricted, amongst other things, to those machines or SOCs with that symbol selected. Partial result here: http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h...
One possibility here is to enable any non-sparsemem platform to be built with sparsemem enabled by just defining a single memory block in this case to encompass the platform's RAM. I believe that platforms which have small I/O holes in their memory maps can continue to use memblock_reserve techniques as before -- there's no need to represent the holes via sparsemem (which would be expensive)
Having talked to Will Deacon about this, it sounds like the feasibiltiy and performance impact may depend on how often things like pfn_valid get called when sparsemem is enabled.
Is this worth looking into?
Certainly, especially given your bias. ;-)
The sparsemem might get pretty inefficient if the build time constants such as SECTION_SIZE_BITS can't be relied upon though.
I have a slightly biased interest in this, since ARM seems to like funky memory maps for many of its newer boards, and it would be unfortunate for these to get left out of the whole single effort.
Of course we could include those platforms in non-sparsemem kernels, but since that will often mean sacrificing half the RAM, this is far from ideal.
Maybe that half could simply be pushed to home.
LPAE (Seems to fit neatly under "weird things" for now ;)
Do we care strongly about supporting LPAE and non-LPAE platforms in a single kernel?
No. I think that would be rather insane. The page table definitions have deep repercussions all over the place, often in performance critical paths down in core kernel code. Trying to introduce variable elements which are otherwise build time optimized constants is probably not going to find many followers.
1.10) mach/debug-macro.S
This is used when CONFIG_DEBUG_LL is set. Supporting that option with a single kernel image might prove very difficult with a rapidly diminishing return on the investment.
This code is in need of some refactoring already: http://article.gmane.org/gmane.linux.ports.arm.kernel/118525
To still benefit from the most likely needed debugging aid, we might consider the ability to still allow the selection of one amongst the existing implementation when building a kernel with many SOC support. Obviously that would only work on the one hardware platform for which the selected printch implementation was designed, but that should be good enough for debugging purposes.
DT is precisely for solving this kind of problem... We would be dependent on examining the device tree very earlier, however. It looks like the code in drivers/of/ won't work in the zImage loader environment without a lot of modifications; so we might need to create a separate, very minimal lightweight parser for this.
Then we can build all the relevant printch() implementations into the kernel and also into the zImage loader, and pick the right one based on the DT? The DT could define a special alias for the UART available for low-level debug.
We could even embed the printch() function body into the DT if we wanted to make the kernel's job even simpler. Realistic implementations of this function are tiny, so this shouldn't be too cumbersome. That really seems an abuse of the DT though, since this goes beyond just describing the hardware.
Well... I'm not entirely against the idea. This could be seen as the most efficient way to describe how to output a character over some serial device for the given hardware. The danger is that some vendors might be tempted to abuse that idea by stuffing BIOS-like services in there that the kernel would have to cope with.
1.12) mach/uncompress.h
This is used to define per SOC methods to output some progress feedback from the kernel decompressor over a serial port. Once again, supporting this with a single kernel image might prove very difficult with a rapidly diminishing return on the investment. So it is probably best to simply use generic empty stubs whenever more than one SOC family is configured in a common kernel image.
Is this a separate problem from low-level debug? It feels like if we solve one problem, we'll have created all the infrastructure to solve the other problem trivially.
Yes.
Neither of these is needed for single zImage to work though, so as you suggest it may be best to postpone these until later.
Right. For now we should be realistic and simply do the ground work to simply be able to produce a common binary for more than two platforms. Achieving only that is already hard enough even if we sidestep a couple non-critical issues. Once the basic concept is workable, refinements will come naturally afterward.
Nicolas
On Thursday 28 July 2011 14:44:17 Nicolas Pitre wrote:
We could even embed the printch() function body into the DT if we wanted to make the kernel's job even simpler. Realistic implementations of this function are tiny, so this shouldn't be too cumbersome. That really seems an abuse of the DT though, since this goes beyond just describing the hardware.
Well... I'm not entirely against the idea. This could be seen as the most efficient way to describe how to output a character over some serial device for the given hardware. The danger is that some vendors might be tempted to abuse that idea by stuffing BIOS-like services in there that the kernel would have to cope with.
I thought about this before, but then remembered the horrors of RTAS on powerpc and quickly discarded the idea. There are so many ways in which this can backfire:
* People putting entire closed source device drivers into the firmware and letting the kernel call that
* changing linkage convention in the kernel in a way that doesn't work with the code: thumb2, big-endian, oabi, ...
* sharing (part of) a device tree on SoC platforms that have multiple CPU architectures: TI C6x, qualcomm hexagon, freescale ppc, xilinx microblaze, openrisc
* subtle resource conflicts between the embedded code and other device drivers, e.g. using the same IRQ, registers or pins.
Doing it just for printch sounds harmless at first, but I think we should say no to this solution whenever it comes up, so it doesn't have the chance to grow into something evil.
Arnd
On Thu, Jul 28, 2011 at 09:42:15PM +0200, Arnd Bergmann wrote:
On Thursday 28 July 2011 14:44:17 Nicolas Pitre wrote:
We could even embed the printch() function body into the DT if we wanted to make the kernel's job even simpler. Realistic implementations of this function are tiny, so this shouldn't be too cumbersome. That really seems an abuse of the DT though, since this goes beyond just describing the hardware.
Well... I'm not entirely against the idea. This could be seen as the most efficient way to describe how to output a character over some serial device for the given hardware. The danger is that some vendors might be tempted to abuse that idea by stuffing BIOS-like services in there that the kernel would have to cope with.
I thought about this before, but then remembered the horrors of RTAS on powerpc and quickly discarded the idea. There are so many ways in which this can backfire:
- People putting entire closed source device drivers into the firmware
and letting the kernel call that
- changing linkage convention in the kernel in a way that doesn't
work with the code: thumb2, big-endian, oabi, ...
- sharing (part of) a device tree on SoC platforms that have
multiple CPU architectures: TI C6x, qualcomm hexagon, freescale ppc, xilinx microblaze, openrisc
- subtle resource conflicts between the embedded code and other
device drivers, e.g. using the same IRQ, registers or pins.
Doing it just for printch sounds harmless at first, but I think we should say no to this solution whenever it comes up, so it doesn't have the chance to grow into something evil.
That sounds sensible to me; I was just throwing ideas about, really. The embedded code idea would work for this, but doesn't feel like the correct solution -- as you say, it invites future breakages and abuses, as well as going against the basic design principles of devicetree.
I guess the flipside is that it would be good to have a way to extract specific bits of information from the device tree really early during boot. We don't need this yet, but it could be worth exploring it next week.
Cheers ---Dave
On Friday 29 July 2011 10:09:46 Dave Martin wrote:
That sounds sensible to me; I was just throwing ideas about, really.
Yes, and it was good that you brought it up. It's not obvious at all why we shouldn't put code into the device tree and I'm sure people will have the same idea in the future, so we will need to explain this to everybody who suggests doing it.
I guess the flipside is that it would be good to have a way to extract specific bits of information from the device tree really early during boot. We don't need this yet, but it could be worth exploring it next week.
Agreed.
Arnd
On Thu, Jul 28, 2011 at 02:44:17PM -0400, Nicolas Pitre wrote:
On Thu, 28 Jul 2011, Dave Martin wrote:
[...]
Currently, device tree is not fully supported upstream for vexpress. Lorenzo Pieralisi wrote some patches, but there were a few outstanding issues and these weren't merged yet.
It could make sense for me to take a look at this, since vexpress is also the base for our initial Cortex-A15 refernece platform. With the right people around next week, we have a chance to get any issues thrashed out more quickly.
Is this a good idea for next week, or should we be focusing more on core issues?
DT enablement will also be an important topic next week. I'm sure you'll be appreciated whatever you work on.
OK, I may spend some time on that then. Lorenzo will be around for some of the time too.
Some more thoughts:
1.1.5) Other weird things
Some machines have non linear memory maps or bus address translations, sparsemem, etc. Examples of that are:
arch/arm/mach-realview/include/mach/memory.h arch/arm/mach-integrator/include/mach/memory.h
I think solving this is out of scope for this round. Deleting arch/arm/mach-*/include/mach/memory.h can't be done universally. So a new Kconfig symbol (NO_MACH_MEMORY_H) is introduced to indicate which machine class has its legacy <mach/memory.h> file removed. The single zImage for multiple targets will be restricted, amongst other things, to those machines or SOCs with that symbol selected. Partial result here: http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h...
One possibility here is to enable any non-sparsemem platform to be built with sparsemem enabled by just defining a single memory block in this case to encompass the platform's RAM. I believe that platforms which have small I/O holes in their memory maps can continue to use memblock_reserve techniques as before -- there's no need to represent the holes via sparsemem (which would be expensive)
Having talked to Will Deacon about this, it sounds like the feasibiltiy and performance impact may depend on how often things like pfn_valid get called when sparsemem is enabled.
Is this worth looking into?
Certainly, especially given your bias. ;-)
The sparsemem might get pretty inefficient if the build time constants such as SECTION_SIZE_BITS can't be relied upon though.
I will try and take a look.
We may find that a lot of platforms can be supported with a single, sane definition of SECTION_SIZE_BITS. If they can't all be supported by the same definition, this at least grows the single zImage kernel scope to include more platforms.
I have a slightly biased interest in this, since ARM seems to like funky memory maps for many of its newer boards, and it would be unfortunate for these to get left out of the whole single effort.
Of course we could include those platforms in non-sparsemem kernels, but since that will often mean sacrificing half the RAM, this is far from ideal.
Maybe that half could simply be pushed to home.
Eh?
LPAE (Seems to fit neatly under "weird things" for now ;)
Do we care strongly about supporting LPAE and non-LPAE platforms in a single kernel?
No. I think that would be rather insane. The page table definitions have deep repercussions all over the place, often in performance critical paths down in core kernel code. Trying to introduce variable elements which are otherwise build time optimized constants is probably not going to find many followers.
Subtle hint taken ;)
[...]
We could even embed the printch() function body into the DT if we wanted to make the kernel's job even simpler. Realistic implementations of this function are tiny, so this shouldn't be too cumbersome. That really seems an abuse of the DT though, since this goes beyond just describing the hardware.
Well... I'm not entirely against the idea. This could be seen as the most efficient way to describe how to output a character over some serial device for the given hardware. The danger is that some vendors might be tempted to abuse that idea by stuffing BIOS-like services in there that the kernel would have to cope with.
See my reply to Arnd...
It's not high priority anyway.
Cheers ---Dave
On Fri, 29 Jul 2011, Dave Martin wrote:
On Thu, Jul 28, 2011 at 02:44:17PM -0400, Nicolas Pitre wrote:
On Thu, 28 Jul 2011, Dave Martin wrote:
I have a slightly biased interest in this, since ARM seems to like funky memory maps for many of its newer boards, and it would be unfortunate for these to get left out of the whole single effort.
Of course we could include those platforms in non-sparsemem kernels, but since that will often mean sacrificing half the RAM, this is far from ideal.
Maybe that half could simply be pushed to home.
Eh?
Euh... I meant "himem". No idea how my fingers turned that into "home".
We could even embed the printch() function body into the DT if we wanted to make the kernel's job even simpler. Realistic implementations of this function are tiny, so this shouldn't be too cumbersome. That really seems an abuse of the DT though, since this goes beyond just describing the hardware.
Well... I'm not entirely against the idea. This could be seen as the most efficient way to describe how to output a character over some serial device for the given hardware. The danger is that some vendors might be tempted to abuse that idea by stuffing BIOS-like services in there that the kernel would have to cope with.
See my reply to Arnd...
What might be done is to describe the data FIFO register location, and the FIFO full bit mask, and the FIFO empty bit mask. That's all we need really.
Nicolas
On 07/29/2011 07:40 AM, Nicolas Pitre wrote:
On Fri, 29 Jul 2011, Dave Martin wrote:
On Thu, Jul 28, 2011 at 02:44:17PM -0400, Nicolas Pitre wrote:
On Thu, 28 Jul 2011, Dave Martin wrote:
I have a slightly biased interest in this, since ARM seems to like funky memory maps for many of its newer boards, and it would be unfortunate for these to get left out of the whole single effort.
Of course we could include those platforms in non-sparsemem kernels, but since that will often mean sacrificing half the RAM, this is far from ideal.
Maybe that half could simply be pushed to home.
Eh?
Euh... I meant "himem". No idea how my fingers turned that into "home".
We could even embed the printch() function body into the DT if we wanted to make the kernel's job even simpler. Realistic implementations of this function are tiny, so this shouldn't be too cumbersome. That really seems an abuse of the DT though, since this goes beyond just describing the hardware.
Well... I'm not entirely against the idea. This could be seen as the most efficient way to describe how to output a character over some serial device for the given hardware. The danger is that some vendors might be tempted to abuse that idea by stuffing BIOS-like services in there that the kernel would have to cope with.
See my reply to Arnd...
What might be done is to describe the data FIFO register location, and the FIFO full bit mask, and the FIFO empty bit mask. That's all we need really.
Except for RPC which outputs to video memory. We don't care about that one for single kernel, but that may limit a new common solution.
BTW, I did implement a conversion to use debug macro code for uncompress, but it doesn't work for RPC, OMAP and Davinci. So while it shrinks the the code by over 2K lines, we probably need a new approach like you suggest. The patches are here if you want to take a look:
git://git.jdl.com/software/linux-3.0.git http://git.jdl.com/gitweb/?p=linux-3.0.git%3Ba=shortlog%3Bh=refs/heads/uncom...
Rob
On Fri, 29 Jul 2011, Rob Herring wrote:
On 07/29/2011 07:40 AM, Nicolas Pitre wrote:
On Fri, 29 Jul 2011, Dave Martin wrote:
On Thu, Jul 28, 2011 at 02:44:17PM -0400, Nicolas Pitre wrote:
On Thu, 28 Jul 2011, Dave Martin wrote:
I have a slightly biased interest in this, since ARM seems to like funky memory maps for many of its newer boards, and it would be unfortunate for these to get left out of the whole single effort.
Of course we could include those platforms in non-sparsemem kernels, but since that will often mean sacrificing half the RAM, this is far from ideal.
Maybe that half could simply be pushed to home.
Eh?
Euh... I meant "himem". No idea how my fingers turned that into "home".
We could even embed the printch() function body into the DT if we wanted to make the kernel's job even simpler. Realistic implementations of this function are tiny, so this shouldn't be too cumbersome. That really seems an abuse of the DT though, since this goes beyond just describing the hardware.
Well... I'm not entirely against the idea. This could be seen as the most efficient way to describe how to output a character over some serial device for the given hardware. The danger is that some vendors might be tempted to abuse that idea by stuffing BIOS-like services in there that the kernel would have to cope with.
See my reply to Arnd...
What might be done is to describe the data FIFO register location, and the FIFO full bit mask, and the FIFO empty bit mask. That's all we need really.
Except for RPC which outputs to video memory. We don't care about that one for single kernel, but that may limit a new common solution.
There will always be exceptions. Since RPC is so different and unique, it better be left alone and we should not try to find a solution that covers it as well.
I doubt it will ever use DT either.
BTW, I did implement a conversion to use debug macro code for uncompress, but it doesn't work for RPC, OMAP and Davinci. So while it shrinks the the code by over 2K lines, we probably need a new approach like you suggest. The patches are here if you want to take a look:
git://git.jdl.com/software/linux-3.0.git http://git.jdl.com/gitweb/?p=linux-3.0.git%3Ba=shortlog%3Bh=refs/heads/uncom...
Will try to have a look.
Nicolas
On 07/27/2011 09:58 PM, Nicolas Pitre wrote:
To everyone, and especially to those who are expected to work on this topic next week, please find below a list of tasks that needs to be investigated and/or accomplished. I'll coordinate the work and collect patches for the team.
If you have comments on this, or if you know about some omissions, please feel free to provide them as a reply to this message.
I'd like to know if people are particularly interested in one (or more :-)) items they would like to work on. If so please say so as well.
Without further ado, here it is:
<><><><><>
- The so called "single zImage" project
We wish to provide the ability to build as many ARM platforms as possible into a single kernel binary image. This will greatly simplify the archive packaging and maintenance effort by having only one kernel that could be built and booted on multiple ARM targets. A side effect of this is also to enforce better source code architecture even if the resulting binaries are not always supporting multiple targets.
This work started a while ago. Some initial description can be found here:
https://wiki.ubuntu.com/Specs/ARMSingleKernel
Part of it has been implemented already, namely the runtime determined PHYS_OFFSET, the AUTO_ZRELADDR and some other items referenced below. But there is still a large amount of work remaining.
- Removal of any dependencies on <mach/*.h> from generic header files
To see the current culprits:
$ git grep "#include <mach/.*.h>" arch/arm/include/ arch/arm/include/asm/clkdev.h:#include <mach/clkdev.h> arch/arm/include/asm/dma.h:#include <mach/isa-dma.h> arch/arm/include/asm/floppy.h:#include <mach/floppy.h> arch/arm/include/asm/gpio.h:#include <mach/gpio.h> arch/arm/include/asm/hardware/dec21285.h:#include <mach/hardware.h> arch/arm/include/asm/hardware/iop3xx-adma.h:#include <mach/hardware.h> arch/arm/include/asm/hardware/iop3xx-gpio.h:#include <mach/hardware.h> arch/arm/include/asm/hardware/sa1111.h:#include <mach/bitfield.h> arch/arm/include/asm/io.h:#include <mach/io.h> arch/arm/include/asm/irq.h:#include <mach/irqs.h> arch/arm/include/asm/mc146818rtc.h:#include <mach/irqs.h> arch/arm/include/asm/memory.h:#include <mach/memory.h> arch/arm/include/asm/mtd-xip.h:#include <mach/mtd-xip.h> arch/arm/include/asm/pci.h:#include <mach/hardware.h> /* for PCIBIOS_MIN_* */ arch/arm/include/asm/pgtable.h:#include <mach/vmalloc.h> arch/arm/include/asm/system.h:#include <mach/barriers.h> arch/arm/include/asm/timex.h:#include <mach/timex.h> arch/arm/include/asm/vga.h:#include <mach/hardware.h>
1.1) mach/memory.h
This may contain the following defines:
1.1.1) ARM_DMA_ZONE_SIZE
This can be eliminated by moving that value into struct machine_desc. The work is done already, but presented as an example for other tasks: http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h... And as of now this is merged in mainline already for v3.1-rc1.
1.1.2) PLAT_PHYS_OFFSET
Most occurrences can be eliminated. With CONFIG_ARM_PATCH_PHYS_VIRT, it is possible to determine PHYS_OFFSET at run time. Remains to remove the direct uses, mostly by mdesc->boot_params initializers. Changing boot_params into atag_offset has two effects: that makes it clearer that it is only about ATAGs and not DT, and a relative offset plays more nicely with a runtime determined PHYS_OFFSET.
This work is done but not yet accepted: http://news.gmane.org/group/gmane.linux.ports.arm.kernel/thread=123480
1.1.3) FLUSH_BASE, FLUSH_BASE_PHYS, FLUSH_BASE_MINICACHE, UNCACHEABLE_ADDR
Those are StrongARM related constants, and different for each variants. Fixing this involves making the virtual addresses constant for all variants, and hiding the differences in the physical addresses during the actual mapping.
The solution is here: http://news.gmane.org/group/gmane.linux.ports.arm.kernel/thread=123477/force...
1.1.4) CONSISTENT_DMA_SIZE
Maybe the CMA work will make this obsolete and the consistent DMA area could be dynamically adjusted. In the mean time, the easiest solution is probably to store this in the machine_desc structure just like with ARM_DMA_ZONE_SIZE.
This has not been addressed yet.
1.1.5) Other weird things
Some machines have non linear memory maps or bus address translations, sparsemem, etc. Examples of that are:
arch/arm/mach-realview/include/mach/memory.h arch/arm/mach-integrator/include/mach/memory.h
I think solving this is out of scope for this round. Deleting arch/arm/mach-*/include/mach/memory.h can't be done universally. So a new Kconfig symbol (NO_MACH_MEMORY_H) is introduced to indicate which machine class has its legacy <mach/memory.h> file removed. The single zImage for multiple targets will be restricted, amongst other things, to those machines or SOCs with that symbol selected. Partial result here: http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h...
1.2) mach/io.h
This contains things like IO_SPACE_LIMIT, __io(), __mem_pci(), and sometimes __arch_ioremap()/__arch_unmap(). but in most cases, the definitions here are pretty similar from one machine class to another.
Arnd says:
|I have a plan. When CONFIG_PCI is disabled (along with CONFIG_ISA and |CONFIG_PCMCIA), we should have neither of IO_SPACE_LIMIT, __io() |and get no inb/outb functions as a result. | |When it is enabled, the 'common' platforms need only one I/O window |of 64KB, so we should find a common place in the virtual address space |for that and hardcode __io, while the platform specific PCI initialization |code (or map_io for that matter) ensures that the window is pointing |to the physical window. | |__arch_ioremap()/__arch_unmap() are not really needed as far as I can |tell but are used as an optimization to redirect ioremap to the |hardcoded virtal address mapping. In the first step we can disable |this for combined kernels, later we can find a generic way so |__arch_ioremap walks the list of static mappings.
1.3) mach/timex.h
Most instances simply define a dummy CLOCK_TICK_RATE value. This can probably be removed altogether, or simply have a common value in arch/arm/include/asm/timex.h, as nothing seriously uses that anymore.
Reference: http://lkml.org/lkml/2011/2/21/323
1.4) mach/vmalloc.h
This universally contains only a definition for VMALLOC_END, but not an universal definition. Would be nice to have VMALLOC_eND dynamically determined from the static IO mappings, but the highmem threshold depends on the value of VMALLOC_END, and memory has to be initialized before the static IO mappings can be processed.
Therefore the best solution so far appears to use another value in struct machine_desc for it so it can be set at run time. this is a mechanical conversion that has to be done.
1.5) mach/irqs.h
The only information globally required from those files is the value of NR_IRQS. Yet there is already a nr_irqs member in the machine_desc structure for this, used by arch_probe_nr_irqs() in arch/arm/kernel/irq.c).
So the first step would be to add
.nr_irqs = NR_IRQS,
to all machine_desc instances, making sure that <mach/irqs.h> is included in those files. Then, <mach/irqs.h> should be removed from arch/arm/include/asm/irq.h, and adjust things so everything still compiles.
1.6) mach/gpio.h
This is a tough one. This depends on CONFIG_GENERIC_GPIO which is selected by many machine types. They should all be converted to (or configurable with) CONFIG_GPIOLIB so each SOC's specific GPIO handling is made into runtime code instead of static inline functions. Care to preserve the ability to not use gpiolib might be desireable in some cases for performance reasons.
Definitely in need of serious investigation.
1.7) mach/mtd-xip.h
No need to care about those. This is for running the kernel XIP from ROM memory. A XIP kernel is already incompatible with the notion of a single kernel image since it obviously can't be modified at run time (as needed by CONFIG_ARM_PATCH_PHYS_VIRT).
1.8) mach/isa-dma.h, mach/floppy.h
Those are used by old targets we might not care much about.
1.9) mach/entry-macro.S
This one gets included directly from arch/arm/kernel/entry-armv.S. The only relevant macro still widely used is get_irqnr_preamble and get_irqnr_and_base. They can be overridden by CONFIG_MULTI_IRQ_HANDLER and the equivalent code hooked to the handle_irq member of the machine_desc structure.
1.10) mach/debug-macro.S
This is used when CONFIG_DEBUG_LL is set. Supporting that option with a single kernel image might prove very difficult with a rapidly diminishing return on the investment.
This code is in need of some refactoring already: http://article.gmane.org/gmane.linux.ports.arm.kernel/118525
To still benefit from the most likely needed debugging aid, we might consider the ability to still allow the selection of one amongst the existing implementation when building a kernel with many SOC support. Obviously that would only work on the one hardware platform for which the selected printch implementation was designed, but that should be good enough for debugging purposes.
1.11) mach/system.h
This is included from arch/arm/kernel/process.c and expected to provide the following static inline functions or equivalent:
1.11.1) arch_idle()
Called when system is idle. Most of them just call cpu_do_idle(). The call to cpu_do_idle() should be moved to default_idle() and the exception cases moved out of line where they can be hooked to the pm_idle callback.
1.11.2) arch_reset()
Used to reset the system. This is far from being a hot path and doesn't justify a static inline function. An out-of-line version hooked to a global arch_reset function pointer would work just fine.
1.12) mach/uncompress.h
This is used to define per SOC methods to output some progress feedback from the kernel decompressor over a serial port. Once again, supporting this with a single kernel image might prove very difficult with a rapidly diminishing return on the investment. So it is probably best to simply use generic empty stubs whenever more than one SOC family is configured in a common kernel image.
- Removal of any dependencies on <mach/*.h> from driver code
A couple possibilities:
a) We move the required header files next to the driver code. In many cases, having a .h file with only the defines relevant to the concerned driver is best. But this is a _lot_ of work.
b) We change those <mach/foo.h> into something more absolute, such as <mach/omap2/foo.h>. This can be done on a per SOC basis, first by moving the header files one level deeper, and then fixing up all affected drivers.
c) We change those <mach/foo.h> files into something more precise, e.g. <mach/omap2_foo.h> and fix concerned drivers.
I think the best solution here is (b) which doesn't preclude (a) eventually or if it is trivial. But (c) is dangerous as files might be added easily without paying too much attention to the file prefix.
- Change thes to the build system
We need to move towards the ability to actually build more than one SOC family at the same time.
3.1) Kconfig
This involves changes to Kconfig where currently only one out of all the different architectures is selected through the big "ARM system type" choice prompt. We need to determine a good way to move some of them into simply bool prompts and keep track of which architecture can be built concurrently with which. We know for instance that it is unlikely that pre-ARMv6 and ARMv6/7 will ever be buildable together. Today we know that nothing can be built with anything else and therefore this should be the starting default. This needs investigating.
3.2) Makefile
Currently the arch/arm/Makefile is organized so the lowest instruction set level and the highest optimization level are selected from all the configured options. So this part should already be fine.
However the machine-$(*), plat-$(*), machdirs and platdirs variables must go. In (2) above we should have removed the need for adding to the global KBUILD_CPPFLAGS to add a path to some specific architecture includes already. Keeping them only for the code under each architecture subdirectory should be sufficient.
For example, this might be all that is needed:
obj-$(CONFIG_ARCH_MSM) += mach-msm/
or
obj-$(CONFIG_ARCH_KIRKWOOD) += mach-kirkwood/ plat-orion/ obj-$(CONFIG_ARCH_ORION5X) += mach-orion5x/ plat-orion/
Etc.
And within each of these directories, using the subdir-ccflags-y variable to include the locally needed architecture specific include files will do the trick.
3.3) defconfig
We need a defconfig file adding as many architectures to it as possible for build coverage. Ideally the resulting binary should be boot tested on as many targets it supports as possible.
- Picking up broken pieces
Things will certainly break along the way. There are certainly issues that I didn't foresee. My experience so far tend to indicate that this is a somewhat recursive process where the tackling of one work item reveals a few more which are prerequisite to the first one, etc. So any estimate for this work needs to consider a large fudge factor.
There's also collisions with the platform SMP and localtimer code. Things like platform_secondary_init, platform_smp_prepare_cpus, etc. need to be converted over to something like smp_ops on PowerPC. There's been some work on the local timer code by Marc Z.
Rob
Deepak, Nicolas,
On 07/27/2011 09:58 PM, Nicolas Pitre wrote:
To everyone, and especially to those who are expected to work on this topic next week, please find below a list of tasks that needs to be investigated and/or accomplished. I'll coordinate the work and collect patches for the team.
If you have comments on this, or if you know about some omissions, please feel free to provide them as a reply to this message.
I'd like to know if people are particularly interested in one (or more :-)) items they would like to work on. If so please say so as well.
Are you going to publish a summary of the week? For example, are there any refinements to this list in terms of additional items or approach on how to fix. Who is working each item and which ones need help?
Rob
Without further ado, here it is:
<><><><><>
- The so called "single zImage" project
We wish to provide the ability to build as many ARM platforms as possible into a single kernel binary image. This will greatly simplify the archive packaging and maintenance effort by having only one kernel that could be built and booted on multiple ARM targets. A side effect of this is also to enforce better source code architecture even if the resulting binaries are not always supporting multiple targets.
This work started a while ago. Some initial description can be found here:
https://wiki.ubuntu.com/Specs/ARMSingleKernel
Part of it has been implemented already, namely the runtime determined PHYS_OFFSET, the AUTO_ZRELADDR and some other items referenced below. But there is still a large amount of work remaining.
- Removal of any dependencies on <mach/*.h> from generic header files
To see the current culprits:
$ git grep "#include <mach/.*.h>" arch/arm/include/ arch/arm/include/asm/clkdev.h:#include <mach/clkdev.h> arch/arm/include/asm/dma.h:#include <mach/isa-dma.h> arch/arm/include/asm/floppy.h:#include <mach/floppy.h> arch/arm/include/asm/gpio.h:#include <mach/gpio.h> arch/arm/include/asm/hardware/dec21285.h:#include <mach/hardware.h> arch/arm/include/asm/hardware/iop3xx-adma.h:#include <mach/hardware.h> arch/arm/include/asm/hardware/iop3xx-gpio.h:#include <mach/hardware.h> arch/arm/include/asm/hardware/sa1111.h:#include <mach/bitfield.h> arch/arm/include/asm/io.h:#include <mach/io.h> arch/arm/include/asm/irq.h:#include <mach/irqs.h> arch/arm/include/asm/mc146818rtc.h:#include <mach/irqs.h> arch/arm/include/asm/memory.h:#include <mach/memory.h> arch/arm/include/asm/mtd-xip.h:#include <mach/mtd-xip.h> arch/arm/include/asm/pci.h:#include <mach/hardware.h> /* for PCIBIOS_MIN_* */ arch/arm/include/asm/pgtable.h:#include <mach/vmalloc.h> arch/arm/include/asm/system.h:#include <mach/barriers.h> arch/arm/include/asm/timex.h:#include <mach/timex.h> arch/arm/include/asm/vga.h:#include <mach/hardware.h>
1.1) mach/memory.h
This may contain the following defines:
1.1.1) ARM_DMA_ZONE_SIZE
This can be eliminated by moving that value into struct machine_desc. The work is done already, but presented as an example for other tasks: http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h... And as of now this is merged in mainline already for v3.1-rc1.
1.1.2) PLAT_PHYS_OFFSET
Most occurrences can be eliminated. With CONFIG_ARM_PATCH_PHYS_VIRT, it is possible to determine PHYS_OFFSET at run time. Remains to remove the direct uses, mostly by mdesc->boot_params initializers. Changing boot_params into atag_offset has two effects: that makes it clearer that it is only about ATAGs and not DT, and a relative offset plays more nicely with a runtime determined PHYS_OFFSET.
This work is done but not yet accepted: http://news.gmane.org/group/gmane.linux.ports.arm.kernel/thread=123480
1.1.3) FLUSH_BASE, FLUSH_BASE_PHYS, FLUSH_BASE_MINICACHE, UNCACHEABLE_ADDR
Those are StrongARM related constants, and different for each variants. Fixing this involves making the virtual addresses constant for all variants, and hiding the differences in the physical addresses during the actual mapping.
The solution is here: http://news.gmane.org/group/gmane.linux.ports.arm.kernel/thread=123477/force...
1.1.4) CONSISTENT_DMA_SIZE
Maybe the CMA work will make this obsolete and the consistent DMA area could be dynamically adjusted. In the mean time, the easiest solution is probably to store this in the machine_desc structure just like with ARM_DMA_ZONE_SIZE.
This has not been addressed yet.
1.1.5) Other weird things
Some machines have non linear memory maps or bus address translations, sparsemem, etc. Examples of that are:
arch/arm/mach-realview/include/mach/memory.h arch/arm/mach-integrator/include/mach/memory.h
I think solving this is out of scope for this round. Deleting arch/arm/mach-*/include/mach/memory.h can't be done universally. So a new Kconfig symbol (NO_MACH_MEMORY_H) is introduced to indicate which machine class has its legacy <mach/memory.h> file removed. The single zImage for multiple targets will be restricted, amongst other things, to those machines or SOCs with that symbol selected. Partial result here: http://git.linaro.org/gitweb?p=people/nico/linux.git%3Ba=shortlog%3Bh=refs/h...
1.2) mach/io.h
This contains things like IO_SPACE_LIMIT, __io(), __mem_pci(), and sometimes __arch_ioremap()/__arch_unmap(). but in most cases, the definitions here are pretty similar from one machine class to another.
Arnd says:
|I have a plan. When CONFIG_PCI is disabled (along with CONFIG_ISA and |CONFIG_PCMCIA), we should have neither of IO_SPACE_LIMIT, __io() |and get no inb/outb functions as a result. | |When it is enabled, the 'common' platforms need only one I/O window |of 64KB, so we should find a common place in the virtual address space |for that and hardcode __io, while the platform specific PCI initialization |code (or map_io for that matter) ensures that the window is pointing |to the physical window. | |__arch_ioremap()/__arch_unmap() are not really needed as far as I can |tell but are used as an optimization to redirect ioremap to the |hardcoded virtal address mapping. In the first step we can disable |this for combined kernels, later we can find a generic way so |__arch_ioremap walks the list of static mappings.
1.3) mach/timex.h
Most instances simply define a dummy CLOCK_TICK_RATE value. This can probably be removed altogether, or simply have a common value in arch/arm/include/asm/timex.h, as nothing seriously uses that anymore.
Reference: http://lkml.org/lkml/2011/2/21/323
1.4) mach/vmalloc.h
This universally contains only a definition for VMALLOC_END, but not an universal definition. Would be nice to have VMALLOC_eND dynamically determined from the static IO mappings, but the highmem threshold depends on the value of VMALLOC_END, and memory has to be initialized before the static IO mappings can be processed.
Therefore the best solution so far appears to use another value in struct machine_desc for it so it can be set at run time. this is a mechanical conversion that has to be done.
1.5) mach/irqs.h
The only information globally required from those files is the value of NR_IRQS. Yet there is already a nr_irqs member in the machine_desc structure for this, used by arch_probe_nr_irqs() in arch/arm/kernel/irq.c).
So the first step would be to add
.nr_irqs = NR_IRQS,
to all machine_desc instances, making sure that <mach/irqs.h> is included in those files. Then, <mach/irqs.h> should be removed from arch/arm/include/asm/irq.h, and adjust things so everything still compiles.
1.6) mach/gpio.h
This is a tough one. This depends on CONFIG_GENERIC_GPIO which is selected by many machine types. They should all be converted to (or configurable with) CONFIG_GPIOLIB so each SOC's specific GPIO handling is made into runtime code instead of static inline functions. Care to preserve the ability to not use gpiolib might be desireable in some cases for performance reasons.
Definitely in need of serious investigation.
1.7) mach/mtd-xip.h
No need to care about those. This is for running the kernel XIP from ROM memory. A XIP kernel is already incompatible with the notion of a single kernel image since it obviously can't be modified at run time (as needed by CONFIG_ARM_PATCH_PHYS_VIRT).
1.8) mach/isa-dma.h, mach/floppy.h
Those are used by old targets we might not care much about.
1.9) mach/entry-macro.S
This one gets included directly from arch/arm/kernel/entry-armv.S. The only relevant macro still widely used is get_irqnr_preamble and get_irqnr_and_base. They can be overridden by CONFIG_MULTI_IRQ_HANDLER and the equivalent code hooked to the handle_irq member of the machine_desc structure.
1.10) mach/debug-macro.S
This is used when CONFIG_DEBUG_LL is set. Supporting that option with a single kernel image might prove very difficult with a rapidly diminishing return on the investment.
This code is in need of some refactoring already: http://article.gmane.org/gmane.linux.ports.arm.kernel/118525
To still benefit from the most likely needed debugging aid, we might consider the ability to still allow the selection of one amongst the existing implementation when building a kernel with many SOC support. Obviously that would only work on the one hardware platform for which the selected printch implementation was designed, but that should be good enough for debugging purposes.
1.11) mach/system.h
This is included from arch/arm/kernel/process.c and expected to provide the following static inline functions or equivalent:
1.11.1) arch_idle()
Called when system is idle. Most of them just call cpu_do_idle(). The call to cpu_do_idle() should be moved to default_idle() and the exception cases moved out of line where they can be hooked to the pm_idle callback.
1.11.2) arch_reset()
Used to reset the system. This is far from being a hot path and doesn't justify a static inline function. An out-of-line version hooked to a global arch_reset function pointer would work just fine.
1.12) mach/uncompress.h
This is used to define per SOC methods to output some progress feedback from the kernel decompressor over a serial port. Once again, supporting this with a single kernel image might prove very difficult with a rapidly diminishing return on the investment. So it is probably best to simply use generic empty stubs whenever more than one SOC family is configured in a common kernel image.
- Removal of any dependencies on <mach/*.h> from driver code
A couple possibilities:
a) We move the required header files next to the driver code. In many cases, having a .h file with only the defines relevant to the concerned driver is best. But this is a _lot_ of work.
b) We change those <mach/foo.h> into something more absolute, such as <mach/omap2/foo.h>. This can be done on a per SOC basis, first by moving the header files one level deeper, and then fixing up all affected drivers.
c) We change those <mach/foo.h> files into something more precise, e.g. <mach/omap2_foo.h> and fix concerned drivers.
I think the best solution here is (b) which doesn't preclude (a) eventually or if it is trivial. But (c) is dangerous as files might be added easily without paying too much attention to the file prefix.
- Change thes to the build system
We need to move towards the ability to actually build more than one SOC family at the same time.
3.1) Kconfig
This involves changes to Kconfig where currently only one out of all the different architectures is selected through the big "ARM system type" choice prompt. We need to determine a good way to move some of them into simply bool prompts and keep track of which architecture can be built concurrently with which. We know for instance that it is unlikely that pre-ARMv6 and ARMv6/7 will ever be buildable together. Today we know that nothing can be built with anything else and therefore this should be the starting default. This needs investigating.
3.2) Makefile
Currently the arch/arm/Makefile is organized so the lowest instruction set level and the highest optimization level are selected from all the configured options. So this part should already be fine.
However the machine-$(*), plat-$(*), machdirs and platdirs variables must go. In (2) above we should have removed the need for adding to the global KBUILD_CPPFLAGS to add a path to some specific architecture includes already. Keeping them only for the code under each architecture subdirectory should be sufficient.
For example, this might be all that is needed:
obj-$(CONFIG_ARCH_MSM) += mach-msm/
or
obj-$(CONFIG_ARCH_KIRKWOOD) += mach-kirkwood/ plat-orion/ obj-$(CONFIG_ARCH_ORION5X) += mach-orion5x/ plat-orion/
Etc.
And within each of these directories, using the subdir-ccflags-y variable to include the locally needed architecture specific include files will do the trick.
3.3) defconfig
We need a defconfig file adding as many architectures to it as possible for build coverage. Ideally the resulting binary should be boot tested on as many targets it supports as possible.
- Picking up broken pieces
Things will certainly break along the way. There are certainly issues that I didn't foresee. My experience so far tend to indicate that this is a somewhat recursive process where the tackling of one work item reveals a few more which are prerequisite to the first one, etc. So any estimate for this work needs to consider a large fudge factor.
Nicolas
linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel
On 5 August 2011 14:40, Rob Herring robherring2@gmail.com wrote:
Deepak, Nicolas,
On 07/27/2011 09:58 PM, Nicolas Pitre wrote:
To everyone, and especially to those who are expected to work on this topic next week, please find below a list of tasks that needs to be investigated and/or accomplished. I'll coordinate the work and collect patches for the team.
If you have comments on this, or if you know about some omissions, please feel free to provide them as a reply to this message.
I'd like to know if people are particularly interested in one (or more :-)) items they would like to work on. If so please say so as well.
Are you going to publish a summary of the week? For example, are there any refinements to this list in terms of additional items or approach on how to fix. Who is working each item and which ones need help?
Hi Rob,
I have a bunch of meeting notes that I'm translating into a high-level blog post for the Linaro site and I'll do a more technical summary and send it in email, including a breakdown of who's doing what.. Nicolas is out on vacation this week but once he's back we'll go through and re-factor the existing list.
~Deepak