linaro-kernel April 2012

linaro-kernel@lists.linaro.org

36 participants
36 discussions

by Arnd Bergmann

We've had a discussion in the Linaro storage team (Saugata, Venkat and me, with Luca joining in on the discussion) about swapping to flash based media such as eMMC. This is a summary of what we found and what we think should be done. If people agree that this is a good idea, we can start working on it. The basic problem is that Linux without swap is sort of crippled and some things either don't work at all (hibernate) or not as efficient as they should (e.g. tmpfs). At the same time, the swap code seems to be rather inappropriate for the algorithms used in most flash media today, causing system performance to suffer drastically, and wearing out the flash hardware much faster than necessary. In order to change that, we would be implementing the following changes: 1) Try to swap out multiple pages at once, in a single write request. My reading of the current code is that we always send pages one by one to the swap device, while most flash devices have an optimum write size of 32 or 64 kb and some require an alignment of more than a page. Ideally we would try to write an aligned 64 kb block all the time. Writing aligned 64 kb chunks often gives us ten times the throughput of linear 4kb writes, and going beyond 64 kb usually does not give any better performance. 2) Make variable sized swap clusters. Right now, the swap space is organized in clusters of 256 pages (1MB), which is less than the typical erase block size of 4 or 8 MB. We should try to make the swap cluster aligned to erase blocks and have the size match to avoid garbage collection in the drive. The cluster size would typically be set by mkswap as a new option and interpreted at swapon time. 3) As Luca points out, some eMMC media would benefit significantly from having discard requests issued for every page that gets freed from the swap cache, rather than at the time just before we reuse a swap cluster. This would probably have to become a configurable option as well, to avoid the overhead of sending the discard requests on media that don't benefit from this. Does this all sound appropriate for the Linux memory management people? Also, does this sound useful to the Android developers? Would you start using swap if we make it perform well and not destroy the drives? Finally, does this plan match up with the capabilities of the various eMMC devices? I know more about SD and USB devices and I'm quite convinced that it would help there, but eMMC can be more like an SSD in some ways, and the current code should be fine for real SSDs. Arnd

13 years

[PATCH v3 0/9] Fixes for common mistakes w/ for_each_process and task->mm

by Anton Vorontsov

Hi all, This is another resend of several task->mm fixes, the bugs I found during LMK code audit. Architectures were traverse the tasklist in an unsafe manner, plus there are a few cases of unsafe access to task->mm in general. There were no objections on the previous resend, and the final words were somewhere along "the patches are fine" line. In v3: - Dropped a controversal 'Make find_lock_task_mm() sparse-aware' patch; - Reword arm and sh commit messages, per Oleg Nesterov's suggestions; - Added an optimization trick in clear_tasks_mm_cpumask(): take only the rcu read lock, no need for the whole tasklist_lock. Suggested by Peter Zijlstra. In v2: - introduced a small helper in cpu.c: most arches duplicate the same [buggy] code snippet, so it's better to fix it and move the logic into a common function. Thanks, -- Anton Vorontsov Email: cbouatmailru(a)gmail.com

13 years, 1 month

[RESEND PATCH 1/2] ARM: OMAP2+: nand: Make board_onenand_init() visible to board code

by Javier Martinez Canillas

board_onenand_init() and board_nand_init() initialization functions are used to initialize OneNAND and NAND memories respectively. But only board_nand_init() was visible to be used from board code. This patch makes possible to initialize a OneNAND flash memory within platform code. Signed-off-by: Javier Martinez Canillas <javier(a)dowhile0.org> --- arch/arm/mach-omap2/board-flash.c | 4 ++-- arch/arm/mach-omap2/board-flash.h | 11 +++++++++++ 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/arm/mach-omap2/board-flash.c b/arch/arm/mach-omap2/board-flash.c index 0349fd2..70a81f9 100644 --- a/arch/arm/mach-omap2/board-flash.c +++ b/arch/arm/mach-omap2/board-flash.c @@ -87,7 +87,7 @@ static struct omap_onenand_platform_data board_onenand_data = { .dma_channel = -1, /* disable DMA in OMAP OneNAND driver */ }; -static void +void __init board_onenand_init(struct mtd_partition *onenand_parts, u8 nr_parts, u8 cs) { @@ -98,7 +98,7 @@ __init board_onenand_init(struct mtd_partition *onenand_parts, gpmc_onenand_init(&board_onenand_data); } #else -static void +void __init board_onenand_init(struct mtd_partition *nor_parts, u8 nr_parts, u8 cs) { } diff --git a/arch/arm/mach-omap2/board-flash.h b/arch/arm/mach-omap2/board-flash.h index d25503a..c44b70d 100644 --- a/arch/arm/mach-omap2/board-flash.h +++ b/arch/arm/mach-omap2/board-flash.h @@ -47,3 +47,14 @@ static inline void board_nand_init(struct mtd_partition *nand_parts, { } #endif + +#if defined(CONFIG_MTD_ONENAND_OMAP2) || \ + defined(CONFIG_MTD_ONENAND_OMAP2_MODULE) +extern void board_onenand_init(struct mtd_partition *nand_parts, + u8 nr_parts, u8 cs); +#else +static inline void board_onenand_init(struct mtd_partition *nand_parts, + u8 nr_parts, u8 cs) +{ +} +#endif -- 1.7.7.6

13 years, 2 months

[PATCH v2 0/2] vmevent: Greater-than attribute + one-shot mode + a bugfix

by Anton Vorontsov

Hi all, That's a respin of the previous patchset that tried to add a new 'cross' event type, which would trigger whenever value crosses a user-specified threshold both ways, i.e. from a lesser values side to a greater values side, and vice versa. We use the event type in an userspace low-memory killer: we get a notification when memory becomes low, so we start freeing memory by killing unneeded processes, and we get notification when memory hits the threshold from another side, so we know that we freed enough of memory. There's also a fix for a bug that makes kernel upset about sleeping in the atomic context. Per Pekka's comments here comes v2. Changes: - Added a one-shot mode plus a greater-than attribute, the two additions makes the equivalent of the cross-event type. - In the bugfix patch I added some comments about implementation details of the lock-free logic. Also, in the previous version of the fix I forgot to remove 'struct mutex' form the 'struct vmevent_watch', this is now cleaned up. As usual, the patches are against git://github.com/penberg/linux.git vmevent/core Thanks! -- Anton Vorontsov Email: cbouatmailru(a)gmail.com

13 years, 2 months

[ACTIVITY] ... for the last 2 months (!)

by Nicolas Pitre

[ Yeah OK... I'm really bad with this stuff. ] January 23 to FEbruary 03 - More on-going study of the standalone b.L switcher code. - Discussion with Le.chi Thu, Paul Larson and others about big.LITTLE switcher testing requirements. - Involved in the review cycle for a patch about the new generic ioremap optimization from Pawel Moll which turned out to be bad and needed a subsequent revert. - Obtaining and setting up the b.L Fast Model license. - Review of some RCU changes pushed down the platform idle code path by Paul E. McKenney which I ended up NAKing. - Review of a patch from Stephen Boyd to disable preemption when reading CCSIDR on ARMv7 to which I suggested a simpler alternative. - Review of the initial test plan for the big.LITTLE switcher. - REview of the SA11x0 cleanups from RMK. - Investigation and prodding sent to Andrew Lunn for fixing a few Kirkwood breakages from recent consolidation changes. - More design discussions around b.L switcher with people from ARM Ltd, notably Robin Randhawa. - Preparing for Linaro Connect. February 06 to 10 - Attending Linaro Connect. - REview of Rob Herring's series cleaning up and removing IRQ and FIQ related macros from the kernel. - Wrote an article about big.LITTLE switcher for LWN. - REview of Marc Zyngier's series to add per SoC SMP and CPU hotplug operations. - Quickstart session with Dave Martin to run the ARM Fast Model for b.L. February 13 to 24 - Refinement to my LWN article about b.L before publication. - Review of a patch series preparing the kernel for being entered in hypervisor mode by Dave Martin. - Discussion (on IRC) between Dave Martin and myself about design changes brought to the in-kernel b.L switcher. - Start experimenting with the b.L fast model. - Read and digested various documents about the ARM virtualization, the GIC, etc. - Produced some b.L project status to help project management transition from Usman to Mounir. - Posted a patch to add support for early console output via semihosting. - Wrote the first part of the big.LITTLE write-up for the monthly member report (Paul McKenney did the second part). February 27 to March 02 - Away on vacation. March 05 to 16 - Review of the initial Kirkwood conversion to FDT by jason(a)lakedaemon.net. - Review of a patch series removing most instances of io.h by Rob Herring. - Comments/suggestions on how to deal with unresponsive maintainers, prompted by Amit Kucheria. - Review of a patch by Stephen Warren to generalize u-Boot's uImage wrapping in the kernel build. - More experiments with the b.L software model, attempting to boot a 8-core SMP system, running into cross cluster cache coherency problems. Finally get it to boot, thanks to the ARM guys who provided the missing clue. - Look at the multi-cluster aware boot protocol patches by Lorenzo Pieralisi. Some of it might be directly useful for the b.L switcher. - Review of Dave Martin's patch series to facilitate custom opcode injection. - Improved a patch I posted months ago to remove the debugging restrictions inside the devicemaps_init() function and pushed upstream. REcent changes to the kernel are making this patch very useful for people to debug their own kernel. - Quick review of the Cortex-M3 support by Uwe Kleine-König. - Moved to the arm-soc tree to implement the in-kernel switcher as it contains everything to boot a vexpress config with device tree on the software model. Nicolas

13 years, 2 months

[ACTIVITY] (Dong Aisheng) April-23- April-27

by Dong Aisheng

=== Pinctrl === * sent out pinctrl core add defer probe for gpio patch, merged by Linus. * sent out pinctrl-imx v4 series patch Already got Stephen Warren and Shawn Guo's ack. No more comments for a few days. I assume Linus Walleij may pick it up soon. * Implement a common API to handle pinctrl dummy state, merged by Linus. * Reviewed some other pinctrl patches. * Implement per pin mux and config for pinctrl subsystem. INPROGRESS === Plan === * send out a draft pinctrl per pin mux and config patch and discuss it with Linus and Stehpen. * since the gpio base is dynamically allocated from DT, the exist pinctrl gpio support implementation based on fixed gpio base map may not fit any more with DT. Will think more about the solution and discuss it with Linus.

13 years, 2 months

[ACTIVITY] (Anton Vorontsov) 2012-04-23 - 2012-04-27

by Anton Vorontsov

After a big move and other things, finally I can focus on the Linux work. Now I have a high-speed internet in my new place, so using 'mumble' for conferencing is no problem any longer. == Highlights == * We've got some 'looks fine' feedback on the userland LMK, and that's reassuring. The "bad news" is that there's not much of enthusiasm overall from Android folks. That's understandable as kernel LMK driver works and already in mainline^Wstaging kernel, so why bother. Well, it can't live in staging/ forever, so we'd better hurry up. * ulmkd's Makefile is again suitable for GNU/Linux builds (as an addition to Android/Linux). This makes it easier for me to test, plus maybe there we'll be other users for the daemon. * for_each_process and task->mm fixes finally merged into -mm. I will need a small documentation update for the series, but overall the series seem to be fine. * Prepared a few fixes for the memcg slab accounting. The proposed slab accounting feature looks like exactly what was needed, except that it doesn't account slab for the root cgroup. If that's not a design decision, then it can be improved. If not, there are two ways: a) drop cgroups support and go solely w/ vmevent infrastructure b) try to push something like 'memory.available' attribute for memcg. 'a)' is easy, and 'b)' is probably what I'll try to implement tomorrow. Once implemented, we'll have all options ready, and so can mark cgroups as either fully suitable for lowmem notifications or not suitable by design. == Plans == * I wonder if I need to make a deep-dive into Android build system and try to integrate ulmkd into Android image myself? * Back to interactive governor improvements? Well, as far as I recall, the story behind interactive governor is very similar to LMK: nobody likes the cpufreq overall, and want generic power management improvements for the scheduler. At least, we need to get 'interactive vs. ondemand' cpufreq latency numbers. That would be a good starting point for any other improvements. And the problem with cpufreq latency measurements was that it takes ages for the benchmark to complete. -- Anton Vorontsov Email: cbouatmailru(a)gmail.com

13 years, 2 months

[PATCH 0/3] A few fixes for '[PATCH 00/23] slab+slub accounting for memcg' series

by Anton Vorontsov

Hello Glauber, On Fri, Apr 20, 2012 at 06:57:08PM -0300, Glauber Costa wrote: > This is my current attempt at getting the kmem controller > into a mergeable state. IMHO, all the important bits are there, and it should't > change *that* much from now on. I am, however, expecting at least a couple more > interactions before we sort all the edges out. > > This series works for both the slub and the slab. One of my main goals was to > make sure that the interfaces we are creating actually makes sense for both > allocators. > > I did some adaptations to the slab-specific patches, but the bulk of it > comes from Suleiman's patches. I did the best to use his patches > as-is where possible so to keep authorship information. When not possible, > I tried to be fair and quote it in the commit message. > > In this series, all existing caches are created per-memcg after its first hit. > The main reason is, during discussions in the memory summit we came into > agreement that the fragmentation problems that could arise from creating all > of them are mitigated by the typically small quantity of caches in the system > (order of a few megabytes total for sparsely used caches). > The lazy creation from Suleiman is kept, although a bit modified. For instance, > I now use a locked scheme instead of cmpxcgh to make sure cache creation won't > fail due to duplicates, which simplifies things by quite a bit. > > The slub is a bit more complex than what I came up with in my slub-only > series. The reason is we did not need to use the cache-selection logic > in the allocator itself - it was done by the cache users. But since now > we are lazy creating all caches, this is simply no longer doable. > > I am leaving destruction of caches out of the series, although most > of the infrastructure for that is here, since we did it in earlier > series. This is basically because right now Kame is reworking it for > user memcg, and I like the new proposed behavior a lot more. We all seemed > to have agreed that reclaim is an interesting problem by itself, and > is not included in this already too complicated series. Please note > that this is still marked as experimental, so we have so room. A proper > shrinker implementation is a hard requirement to take the kmem controller > out of the experimental state. > > I am also not including documentation, but it should only be a matter > of merging what we already wrote in earlier series plus some additions. The patches look great, thanks a lot for your work! I finally tried them, and after a few fixes the kmem accounting seems to work fine with slab. The fixes will follow this email, and if they're fine, feel free to fold them into your patches. However, with slub I'm getting kernel hangs and various traces[1]. It seems that kernel memcg recurses when trying to call memcg_create_cache_enqueue() -- it calls kmalloc_no_account() which was introduced to not recurse into memcg, but looking into 'slub: provide kmalloc_no_account' patch, I don't see any difference between _no_account and ordinary kmalloc. Hm. OK, slub apart... the accounting works with slab, which is great. There's another, more generic question: is there any particular reason why you don't want to account slab memory for root cgroup? Personally I'm interested in kmem accounting because I use memcg for lowmemory notifications. I'm installing events on the root's memory.usage_in_bytes, and the thresholds values are calculated like this: total_ram - wanted_threshold So, if we want to get a notification when there's 64 MB memory left on a 256 MB machine, we'd install an event on the 194 MB mark (the good thing about usage_in_bytes, is that it does account file caches, so the formula is simple). Obviously, without kmem accounting the formula can be very imprecise when kernel (e.g. hw drivers) itself start using a lot of memory. With root's slab accounting the problem would be solved, but for some reason you deliberately do not want to account it for root cgroup. I suspect that there are some performance concerns?.. Thanks, [1] BUG: unable to handle kernel paging request at ffffffffb2e80900 IP: [<ffffffff8105940c>] check_preempt_wakeup+0x3c/0x210 PGD 160d067 PUD 1611063 PMD 0 Thread overran stack, or stack corrupted Oops: 0000 [#1] SMP CPU 0 Pid: 943, comm: bash Not tainted 3.4.0-rc4+ #34 Bochs Bochs RIP: 0010:[<ffffffff8105940c>] [<ffffffff8105940c>] check_preempt_wakeup+0x3c/0x210 RSP: 0018:ffff880006305ee8 EFLAGS: 00010006 RAX: 00000000000109c0 RBX: ffff8800071b4e20 RCX: ffff880006306000 RDX: 0000000000000000 RSI: 0000000006306028 RDI: ffff880007c109c0 RBP: ffff880006305f28 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880007c109c0 R13: ffff88000644ddc0 R14: ffff8800071b4e68 R15: 0000000000000000 FS: 00007fad1244c700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffb2e80900 CR3: 00000000063b8000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 943, threadinfo ffff880006306000, task ffff88000644ddc0) Stack: 0000000000000000 ffff88000644de08 ffff880007c109c0 ffff880007c109c0 ffff8800071b4e20 0000000000000000 0000000000000000 0000000000000000 ffff880006305f48 ffffffff81053304 ffff880007c109c0 ffff880007c109c0 Call Trace: Code: 76 48 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 4c 8b af e0 07 00 00 49 8d 4d 48 48 89 4d c8 49 8b 4d 08 4c 3b 75 c8 8b 71 18 <48> 8b 34 f5 c0 07 65 81 48 8b bc 30 a8 00 00 00 8b 35 3a 3f 5c RIP [<ffffffff8105940c>] check_preempt_wakeup+0x3c/0x210 RSP <ffff880006305ee8> CR2: ffffffffb2e80900 ---[ end trace 78fa9c86bebb1214 ]--- -- Anton Vorontsov Email: cbouatmailru(a)gmail.com

13 years, 2 months

[ACTIVITY] (Rajanikanth H V) 2012-04-18 to 2012-04-27

by Rajanikanth H.V.

=== Highlights === * Interview with Deepak and one-on-one introductory discussion - Deepak pointed me to relevant linaro WIKI's with appropriate usage information * Discussion with lee jones and Deepak about the DT Work. ab8500 power has been assigned * OnBoarding is nearing completion * Spent some time to read through DT documents * Lee jones supported me to get the build and test readiness, Setup to start the work is ready. * Attended session on "platform perimeter" hosted by Linus Walleij - http://www.df.lth.se/~triad/papers/ESC-400Slides_Walleij.pdf * Spent good length of time with niklas to get the DeviceTree work transferred === Plans === * Complete ab8500 power DeviceTree assignments * Spend some time on DT Spec study * IT/Admin work to carryout in order to get the new laptop with UBUNTU distro. * Complete the pending onboard activity === Issues === * 2011-09 version of linaro-media-create python application did not succeed in preparing bundled image for flashing, so migrated to 2012.04-1, found issue is fixed. https://launchpad.net/ubuntu/+source/linaro-image-tools/2012.04-1/+build/34…

13 years, 2 months

[ACTIVITY] (John Stultz) Apr 23-27

by John Stultz

=== Highlights === * Tested Rafael's wakelock interface patches. Found a bug and sent a fix, which he included. * Submitted the volatile ranges patch for inclusion. Got some minor feedback. Dave Chinner suggested I rework the patch so that it uses fallocate rather then fadvise. I pushed back a bit to make sure that is a consensus opinion, but will likely try to switch things over next week. * After getting positive feedback from Arve, on my patch to convert ashmem to use wakeup sources instead of the stubbed out wakelocks, I submitted it and Greg included it into staging-next for 3.5 * Got a small RTC null pointer fix merged into tip/timers/urgent for 3.4 * Pinged the Android team on Anton's ulmkd proposal, got some interesting feedback, and no outright objections. * Submitted a talk to linux plumbers * Reviewed some patches to introduce CLOCK_TAI functionality. Queued a few community cleanups. === Plans === * Rework volatile ranges to use fallocate & resubmit to lkml === Issues === NA

13 years, 2 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

linaro-kernel April 2012