linaro-dev May 2011

linaro-dev@lists.linaro.org

122 participants
116 discussions

Booting linux-linaro-2.6.37 on Beagle Board

by Avik Sil

Hi, I tried booting linux-linaro-2.6.37 kernel on my beagle board C4. I executed following: 1. Installed linaro on a 4 GB SD card using linaro-image-tools 0.4.1 with hwpack daily snapshot hwpack_linaro-omap3_20110125-0_armel_supported.tar.gz and linaro-natty-headless-tar-20101202-1.tar.gz. It was booting properly on my BB. 2. Cloned linux-linaro-2.6.37. Changed to source directory 3. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- omap2plus_defconfig 4. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- menuconfig (enabled EARLY_PRINTK) 5. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- uImage 6. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- modules 7. make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- modules_install INSTALL_MOD_PATH=/media/rootfs 8. cp arch/arm/boot/uImage /media/boot; sync Everything went on smoothly. Then I put the SD card on BB and powered it on. I got a kernel panic: http://paste.ubuntu.com/560562 Please help me figuring out the problem. Is it because I didn't create uInitrd? If so, then how to create it for ARM? Regards, Avik

12 years, 6 months

Optimized kernel memcpy/memset

by Christian Robottom Reis

Hey there, I was asked today in the board meeting about the use of NEON routines in the kernel; I said we had looked into this but hadn't done it because a) it wasn't conclusively better and b) if better, it would need to be done conditionally per-platform. But I wanted to double-check that's actually true (and I'm copying Vijay to keep me honest). I have some references: http://lists.linaro.org/pipermail/linaro-toolchain/2011-January/000722.html http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fb… http://www.spinics.net/lists/arm-kernel/msg106503.html http://dev.gentoo.org/~armin76/arm/memcpy-neon_result.txt https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemcpy?hi… https://wiki.linaro.org/WorkingGroups/ToolChain/StringRoutines?highlight=%2… Incidentally, this ties into the question sent earlier this week which had to do with Nico's work item in: https://blueprints.launchpad.net/linux-linaro/+spec/other-kernel-thumb2 Which IIRC Nico says probably isn't worth it, right? -- Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko

13 years, 2 months

gcc: Thumb interworking and weakly linked functions

by Aneesh V

Hi, I have an interesting observation that I thought might be interesting to the tool-chain team. I was trying to build u-boot in Thumb2 for OMAP4. Everything was fine until I added some patches recently. One of these patches introduced an API (let's say foo()) that has a weakly linked alias(let's say __foo()) and a strongly linked implementation(the real foo()) in an assembly file. Although I give -mthumb and -mthumb-interwork for all the files, apparently GCC generates ARM code for assembly files. In the final image foobar() calls foo() using a BL. Since foobar() is in Thumb and foo() in ARM, it ends up crashing. Looks like foobar() assumed foo() to be Thumb because __foo() is Thumb. Also I see that 'objdump -S' aborts when it tries to parse foo(). I could workaround this problem by having foo() also in a C file that in turn calls into the assembly file. I tried Linaro GCC 4.5.2 and Codesourcery Lite GCC 4.4.1. Both seem to have the issue. Isn't this an issue with GCC or am I missing something? -Aneesh

13 years, 4 months

Usefulness of GCC's 64bit __sync_* ops on ARM

by Ken Werner

Hi, We've been thinking about adding support for the built-in functions for 64bit atomic memory access and I'd like to know if this is of any interest. Currently the main use of these functions seems to be to implement (SMP safe) locking mechanisms where the existent 32bit memory ops are sufficient. However, there might be code out there that implements a parallel algorithm using 64bit atomic memory operations. Currently the GCC ARM backend doesn't provide a pattern to inline 64bit __sync_* functions but the compiler emits __sync_*_8 function calls [1]. The libgcc does not provide these symbols via the usual thin wrapper around the kernel helper [2] because the ARM Linux __kernel_cmpxchg supports 32bit only. My understanding is that for ARMv7 the GCC backend could be enhanced to inline the __sync_* functions by using the LDREXD and STREXD instructions. But for ARMv5 we would still rely on a new kernel helper. Any ideas/thoughts are appreciated. [1] https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations#GCC [2] https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations#impl… Regards Ken

14 years

ST-E STM Driver Review

by Deao, Douglas

Sorry it took a while to get back to you guys. I was visiting customers last week. Most of my comments are just highlighting the differences between TI's STM 1.0 driver and ST-E's STM 1.0 driver, but there are a few questions, observations and suggestions. At the end I included some discussion on TI's meta data and OST header requirements. I have not had a chance to look at your actual implementation yet. Did you do anything to abstract the actual HW transport ports and control registers from the higher level driver functions? I realize there is a lot here to work through so if you would rather schedule a conference call to talk through the differences I can do that. I would like to start work on a Linaro (Unified) STM Spec next week if I can get feedback from everybody over the next few days. I will be out of the office on 5/27 and 5/31. I am especially interested in details of what you guys have in mind for a "common trace framework to receive STM drivers". If by framework you mean well defined APIs that are implemented for specific devices, then I think we are in agreement. What Michael and I have talked about is a common STM user mode experience across all Linaro supported devices, making Linux user mode code 100% portable between our devices. ST-E STM Driver stm-trace.txt review: 1. Software Overview In your "Software Overview" it states: "The end of data packet is marked by a time stamp on latest byte(s) only." I assume that user messages can be made up of any number of bytes, half-words, words or longs (what ever is most efficient) and you simply terminate the last element of the message with a time-stamp - right? In the TI STM implementation a message can be any number and combination of bytes, half-words, or word transfers terminated with a time-stamp on the last element. In addition to that we also add an OST header to a message. (See below for discussion on OST header). 2. Lossless/Lossy modes. TI only supports lossless mode for sw generated messages and is enforced in our hw implementation. Lossy mode is reserved for true hw messages. I did not notice that you documented a way to modify this through the debugfs API or IOCTLS. I am kind of thinking that may be ok since this is really a hw configuration choice in your case, but in the TI case the user does not get to make that choice. 3. Channel Assignment TI makes the assignment with mknod using the minor number to assign a fixed channel. This allows the user mode application to overload the channel usage for categorizing data (not my idea). I think we see the error of our ways here and will be ok with a dynamic channel allocation. I am thinking that for each unique pid a channel should be assigned when the device is opened. I would guess you are keeping a channel table around and write() just checks the table for a pid assignment (no time to look at your implementation yet), if none is found the first free channel is used. If you moved this function back to open then you could do the IOCTL STM_GET_CHANNEL_NO anytime, not just after the first write. In write how do you flag an error if you exhaust the number of available channels? 4. Kernel API TI does not support a Kernel API (yet). I can see that the Alloc/Free and File IO type functions are useful and should be standard. Not sure what you mean by "lockless" trace functions? It looks like your "low level atomic trace functions for 1, 2, 4 or 8 bytes" is similar to TI's binary library functions (not supported by the TI STM Driver). This is what we use the OST header for, allowing our tool chain to differentiate between different message formats, rather than just assuming the data is a simple stream of bytes. 5. Debugfs APIs. TI used a different approach. The tool-chain on the host provides all the transport setup through JTAG, so our driver does not support setting up the actual STM data export (number of pins and clock rate). In our case device transport parameters must match the host receiver's collection setup. With your approach the user can change the clock rate and export pin width effectively at any time. Our tools actually go through a calibration process during initialization so any changes to the device's transport setup (clock rate, number of pins data exported on) would cause the TI tool chain a lot of grief. There are some parameters we know we need to add (like master enables). This are currently also handled by the host tools. TI's STM module allows up to 4 SW masters to be enabled (with id masks that can be used to enable multiple masters from the same group) and 4 HW masters that can be enabled at the same time as the SW masters. If the user tries to enable more than the HW allows do you have a mechanism to flag an error? I don't have a lot of experience with debugfs but I am assuming it's primarily used for allowing scripts to configure a driver (like in your example) or extract information. We may want to define a standard set of debugfs options whose implementation is vendor specific. But that raises some questions: - How do we deal with options that don't make sense for a specific vendor? Maybe just doing nothing is acceptable or do we want to provide a discovery mechanism? - Would user scripts then also be vendor specific? We should probably make an effort to avoid this. A discovery mechanism may allow user mode scripts to be generic. 6. Mapped Channels I believe the TI hw transport channel mapping is compatible. In the TI case a channel is mapped into two spaces, the first half is for non-timestamp transfers and the second half is for time-stamped transfers. When we write a message (from a user mode write call for example) we simply write all the data except the last element through the non-timestamp port, and then the last element is written to the time-stamped port. So I think we could be compatible here. With that said I am not sure about exposing all channels to a user mode library. You are relying on the library to use the convention of getting a free channel from the driver to make sure there are no conflicts. If the channel assignment is made when you open the device, you could conceivably map just the address space needed for the single channel, thus eliminating the need to get a free channel from the driver. In the TI case a single channel's transport mapping is 4K bytes, which matches the typical PAGE_SIZE. I realize not all hw implementations will match up with the PAGE_SIZE, which may be why you simply map all the channels back to user space. Since free channels can become busy rapidly, maybe a better convention would be to simply use another device node if the user wants the library STM data to be transmitted on a different STM channel than the current process. This may be a case where providing a mechanism (see meta data discussion below) to allow channels to be named for the toolchain may be a good idea (provide task name and process id). 7. 8-byte Writes TI does not support 64-bit writes with our STM 1.0 module. We may need an IOCTL to get the largest transfer supported for the mmap case. For all other cases this should just be hidden in the device dependent code. 8. Kernel Internal Usage I like the idea of having dedicated support in the driver for common kernel logging. Any ideas on how you would support kernel STM channel assignments without hard-coding? We may need a mechanism to communicate the definition of each hard-coded channel to our tools. The following are TI specific: 9. Data protection In SMP systems if the processor is switched a new master is generated (in some TI devices). So we protect the data with a mutex to guarantee a complete message is generated by the same master. 10. Meta Data Our user mode HW libraries use meta data to transport data needed to process the HW profiling STM messages. Items like processor speed, sampling rate, processing options, ... (just a predefined byte buffer our tool-chain understands). The meta data is currently broadcasted on a dedicated channel (255), which conflicts with your hard-coded channel for logging printk output. So we will need to resolve hard-coded conflicts. We need the driver to support registration and transport of the meta data on demand from the library (when the HW master is disabled, in case the collection buffer is small and circular). I am thinking an IOCTL could be used to register meta data and then the data simply broadcast on a STM channel (will need to figure out which one) when the HW master is enabled and disabled. Meta data transmission is problematic for circular buffers (like ETB's) thus the reason for also sending meta data when a hw master is disabled. SW masters are not typically disabled, and our HW does not provide a transmission byte count (remember there are HW messages also being generated in the TI case). So there is no way from a driver we can tell when the recoding buffer will wrap even if the user told us the buffer size. I am thinking the best solution would be to force the user to gracefully disable the channel to get any sw channel meta data provided by the driver. TI supports three cases of data capture: - DTC/Host collection (stop on buffer full) - DTC/Host collection (circular buffer) - ETB/on-chip collection (circular buffer) Of if the user is at a point in their code where they know thery will stop recoding on the hOst or ETB, we provide an IOCTL that simply disables all channels. In the ETB case we may want to simply disable any open STM channels when the user decides to stop recording as a fail safe mechanism. Note: Periodic transmission of meta data into a small circular buffer will not work well. In cases where the data is sparse the buffer will simply be filled with meta data rather than useful data. 11. OST Headers Adding an OST header to each message is a requirement for compatibility with TI's toolchain. There are a couple of ways to approach: Completely hidden from the user - The device specific code will know if the header is necessary. On a write, prior to the copy from user space, the device independent code would have to make a call to get a properly sized memory buffer from the device dependent code that would include the header. User enabled - Provide an IOCTL that allows the user to put the driver in a tool-chain specific mode (like add OST headers). Regards, Doug Deao ________________________________________ From: Philippe Langlais [mailto:philippe.langlais@linaro.org] Sent: Wednesday, May 04, 2011 3:08 AM To: Deao, Douglas Cc: Linus Walleij Subject: Re: STM at UDS-Budapest Hi Doug, On STE ux500 platforms we have the same STM module (follow MIDP STP 1.0), I have already posted our current implementation to the LKML and Linaro ML, it's very similar to your proposal. I can't be present to the Linaro summit but Linus Walleij can replace me for this topic, he proposes to write a common trace framework to receive STM drivers. Attached all our current proposal and work around STM. Regards Philippe Langlais ST-Ericsson On 3 May 2011 00:42, Deao, Douglas <d-deao(a)ti.com> wrote: I am hosting an introductory session on System Trace at the summit. TI's System Trace Module (STM) provides a common protocol for instrumentation messages across multiple cores and system level hardware profiling in complex SoCs. Attached is a whitepaper for background reading. Looking forward to meeting you at the summit. Regards, Doug Deao Texas Instruments _______________________________________________ linaro-dev mailing list linaro-dev(a)lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev

14 years

[PATCH android/toolchain/build] Fix host-libbfd installation problem caused by undefined $(INSTALL)

by Jim Huang

While executing target install-host-libbfd, the build system complains: make -C libbfd-binutils-2.20.1/bfd install \ bfdlibdir=/tmp/android-toolchain-eabi/lib bfdincludedir=/tmp/android-toolchain-eabi/include && \ -m 644 libbfd-binutils-2.20.1/intl/libintl.a \ /tmp/android-toolchain-eabi/lib && \ -m 644 libbfd-binutils-2.20.1/libiberty/libiberty.a \ /tmp/android-toolchain-eabi/lib /bin/sh: line 2: -m: command not found The problem was caused by undefined $(INSTALL). The patch attempts to configure `install' program by autotool in order to set $(INSTALL) properly and replace $(INSTALL) -m 644 with multi-platform friendly $(INSTALL_DATA). Code Review: https://review.source.android.com/#change,23179

14 years, 1 month

[PATCH v4 00/12] mmc: use nonblock mmc requests to minimize latency

by Per Forlin

How significant is the cache maintenance over head? It depends, the eMMC are much faster now compared to a few years ago and cache maintenance cost more due to multiple cache levels and speculative cache pre-fetch. In relation the cost for handling the caches have increased and is now a bottle neck dealing with fast eMMC together with DMA. The intention for introducing none blocking mmc requests is to minimize the time between a mmc request ends and another mmc request starts. In the current implementation the MMC controller is idle when dma_map_sg and dma_unmap_sg is processing. Introducing none blocking mmc request makes it possible to prepare the caches for next job parallel with an active mmc request. This is done by making the issue_rw_rq() none blocking. The increase in throughput is proportional to the time it takes to prepare (major part of preparations is dma_map_sg and dma_unmap_sg) a request and how fast the memory is. The faster the MMC/SD is the more significant the prepare request time becomes. Measurements on U5500 and Panda on eMMC and SD shows significant performance gain for large reads when running DMA mode. In the PIO case the performance is unchanged. There are two optional hooks pre_req() and post_req() that the host driver may implement in order to move work to before and after the actual mmc_request function is called. In the DMA case pre_req() may do dma_map_sg() and prepare the dma descriptor and post_req runs the dma_unmap_sg. Details on measurements from IOZone and mmc_test: https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req Changes since v3: * Based on 2.6.39-rc7 * Add error check for testlist in mmc_test.c * Resolve in mmc-queue-thread that caused the mmc-thread to miss a wakeup. * Move parallel request handling to core.c. This simplifies the interface from 4 public functions to 1. This also gives access for SDIO to use the same functionallity, even though the function is not tuned for the SDIO execution flow yet. Per Forlin (12): mmc: add none blocking mmc request function omap_hsmmc: use original sg_len for dma_unmap_sg omap_hsmmc: add support for pre_req and post_req mmci: implement pre_req() and post_req() mmc: mmc_test: add debugfs file to list all tests mmc: mmc_test: add test for none blocking transfers mmc: add member in mmc queue struct to hold request data mmc: add a block request prepare function mmc: move error code in mmc_block_issue_rw_rq to a separate function. mmc: add a second mmc queue request member mmc: test: add random fault injection in core.c mmc: add handling for two parallel block requests in issue_rw_rq drivers/mmc/card/block.c | 452 +++++++++++++++++++++++++---------------- drivers/mmc/card/mmc_test.c | 361 ++++++++++++++++++++++++++++++++- drivers/mmc/card/queue.c | 184 +++++++++++------ drivers/mmc/card/queue.h | 32 +++- drivers/mmc/core/core.c | 165 ++++++++++++++- drivers/mmc/core/debugfs.c | 5 + drivers/mmc/host/mmci.c | 146 ++++++++++++-- drivers/mmc/host/mmci.h | 8 + drivers/mmc/host/omap_hsmmc.c | 90 ++++++++- include/linux/mmc/core.h | 6 +- include/linux/mmc/host.h | 19 ++ lib/Kconfig.debug | 11 + 12 files changed, 1187 insertions(+), 292 deletions(-) -- 1.7.4.1

14 years, 1 month

Likely gcc-linaro-4.5-2011.05 misoptimization in libgui (Android)

by Jim Huang

Hello list, If you build Android using gcc-linaro-4.5-2011.05 [1], you will encounter a problem that bootanimation shows endless. It results from the mis-optimization in libgui, which handles the operations in Android SensorManager. To work around this problem, you can apply the following patch: --- a/libs/gui/Android.mk +++ b/libs/gui/Android.mk @@ -18,6 +18,8 @@ LOCAL_SHARED_LIBRARIES := \ LOCAL_MODULE:= libgui +LOCAL_CFLAGS += -O0 + ifeq ($(TARGET_SIMULATOR),true) LOCAL_LDLIBS += -lpthread endif Then, replace /system/lib/libgui.so with the newer one. I didn't look into the details. But at least, Android is able to serve user interaction again. Related bug: https://bugs.launchpad.net/linaro-android/+bug/787072 Sincerely, -jserv [1] Prebuilt x86 toolchain for Android: http://people.linaro.org/~jserv/toolchain/

14 years, 1 month

[PATCH 0/2] sdio: make sdio_single_irq optional due to suprious IRQ

by Per Forlin

Daniel Drake reported an issue in the libertas sdio client that was triggered by the sdio_single_irq functionality. His SDIO device seems to raise an interrupt even though there are no bits set in the CCCR_INTx register. This behaviour is not supported by the sdio_single_irq feature nor the SDIO spec. The purpose of the sdio_single_irq feature is to avoid the overhead of checking the CCCR_INTx registers, this result in no error handling of the case if there is a pending IRQ with none CCCR_INTx bits set. This patchset intends to resolve the libertas issue by making sdio_single_irq feature configurable and also report a warning if an SDIO interrupt is raised but none CCCR_INTx bits are set. Per Forlin (2): sdio: add function to enable and disable sdio_single_irq optimization sdio: report error if pending IRQ but none function bits drivers/mmc/core/sdio_irq.c | 22 +++++++++++++++++++++- include/linux/mmc/card.h | 1 + 2 files changed, 22 insertions(+), 1 deletions(-) -- 1.7.4.1

14 years, 1 month

Android kernel git update frequency

by Paul Sokolovsky

Hello John, I'd like to follow up on IRC conversation we had during "Android Common Tree & Upstreaming" at LDS https://blueprints.launchpad.net/linux-linaro/+spec/linaro-kernel-o-android May 11 10:49:08 <pfalcon> jstultz_vm: what about "continuous merging" of linaro tree with android patches (that's exactly why I asked about maintaining a separate patch ;-) ) May 11 10:49:46 <jstultz_vm> pfalcon: so i can do that, but part of it is the time required to validate that the combination didn't break anything. May 11 10:49:52 <jstultz_vm> (which has happened in the past) May 11 10:50:06 <jstultz_vm> that load causes me to not update constantly.. May 11 10:50:58 <jstultz_vm> i could just update it every week/few days. but i'm hesitant to push out a tree that breaks folks. I agree that no immediate changes to the process should be done, we should get 11.05 release out, hopefully with fixes for known regressions or missing features. I'm just trying to wrap my mind on how we'll bootstrap Continuous Integration in the next cycle, which is not far away. So, I just would like to make sure that all teams involved are on the same line to make CI work effectively, i.e. that kernel team is able to update Android kernel regularly, infrastructure team has build system which produces Android images reliably, and validation team has all hooks needed to start testing them as soon as they are built. So, I guess validation team would lead on CI start-up sequence, when they have all needed infrastructure and testsuites integrated. In particular, I'm waiting from them for details on how build notifications should be communicated to the LAVA system. -- Best Regards, Paul

14 years, 1 month

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

linaro-dev May 2011