linaro-kernel October 2012

linaro-kernel@lists.linaro.org

27 participants
28 discussions

A15 and A7 big.Little support in different ways

by chao xie

hi For A15 and A7, there three kinds of usage model. 1. the cluster switch i found that there is the reference code about the switcher. So does linaro has any development based on it? 2. cpu migration It seems that only members can get the early code. Is that right? 3. MP mode I find a git "http://git.linaro.org/git/arm/big.LITTLE/mp.git". So does it contains the latest development of big and little system MP mode? Thanks.

12 years, 8 months

[ACTIVITY] (Anton Vorontsov) 2012-09-24 - 2012-10-05

by Anton Vorontsov

== Highlights == * Prepared and sent out v9 of KGDB NMI core patches, nowadays they all are in the mainline kernel; * All previously submitted persistent storage reworks and fixes are also in mainline now; * Sent out and discussed vmevents pressure factor. So far only Mel and John commented; * Prepared and sent refactorings for vmevents, another step to make it closer to the upstream. Also merged with the latest Linus' tree and requested Pekka to pull; == Plans == * Address Mel Gorman's implementation notes on vmevent's pressure factor. * Send out vm_stat enhancements for vmevent (need to prepare a good explanation); * Perform vmevents pressure factor use-case tests on a desktop, tune statistics (try Mel's idea of using multiple window sizes); * I wonder if it makes sense to resend the KGDB FIQ ARM and kiosk patches now, or maybe I should wait for the end of the merge window. I guess resending them now would be OK, we're halfway to the -rc1 anyways;

12 years, 8 months

[RFC] vmevent: Implement pressure attribute

by Anton Vorontsov

Hi all, This is just an RFC so far. It's an attempt to implement Mel Gorman's idea of detecting and measuring memory pressure by calculating the ratio of scanned vs. reclaimed pages in a given time frame. The implemented approach can notify userland about two things: - Constantly rising number of scanned pages shows that Linux is busy w/ rehashing pages in general. The more we scan, the more it's obvious that we're out of unused pages, and we're draining caches. By itself it's not critical, but for apps that want to maintain caches level (like Android) it's quite useful. The notifications are ratelimited by a specified amount of scanned pages. - Next, we calculate pressure using '100 - reclaimed/scanned * 100' formula. The value shows (in percents) how efficiently the kernel reclaims pages. If we take number of scanned pages and think of them as a time scale, then these percents basically would show us how much of the time Linux is spending to find reclaimable pages. 0% means that every page is a candidate for reclaim, 100% means that MM is not recliaming at all, it spends all the time scanning and desperately trying to find something to reclaim. The more time we're at the high percentage level, the more chances that we'll OOM soon. So, if we fail to find a page in a reasonable time frame, we're obviously in trouble, no matter how much reclaimable memory we actually have -- we're too slow, and so we'd better free something. Although it must be noted that the pressure factor might be affected by reclaimable vs. non-reclaimable pages "fragmentation" in an LRU. If there's a "hole" of reclaimable memory in an almost-OOMed system, the factor will drop temporary. On the other hand, it just shows how efficiently Linux is keeping the lists, it might be pretty inefficient, and the factor will show it. Some more notes: - Although the scheme sounds good, I noticed that reclaimer 'priority' level (i.e. scanning depth) better responds to pressure (it's more smooth), and so far I'm not sure how to make the original idea to work on a par w/ sc->priority level. - I have an idea, which I might want to try some day. Currently, the pressure callback is hooked into the inactive list reclaim path, it's the last step in the 'to be reclaimed' page's life time. But we could measure 'active -> inactive' migration speed, i.e. pages deactivation rate. Or we could measure inactive/active LRU size ratio, ideally behaving system would try to keep the ratio near 1, and it'll be close to 0 when inactive list is getting short (for anon LRU it'd be not 1, but zone->inactive_ratio actually). Thanks, Anton. --- include/linux/vmevent.h | 36 ++++++++++ mm/vmevent.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 4 ++ 3 files changed, 218 insertions(+), 1 deletion(-) diff --git a/include/linux/vmevent.h b/include/linux/vmevent.h index b1c4016..1397ade 100644 --- a/include/linux/vmevent.h +++ b/include/linux/vmevent.h @@ -10,6 +10,7 @@ enum { VMEVENT_ATTR_NR_AVAIL_PAGES = 1UL, VMEVENT_ATTR_NR_FREE_PAGES = 2UL, VMEVENT_ATTR_NR_SWAP_PAGES = 3UL, + VMEVENT_ATTR_PRESSURE = 4UL, VMEVENT_ATTR_MAX /* non-ABI */ }; @@ -46,6 +47,11 @@ struct vmevent_attr { __u64 value; /* + * Some attributes accept two configuration values. + */ + __u64 value2; + + /* * Type of profiled attribute from VMEVENT_ATTR_XXX */ __u32 type; @@ -97,4 +103,34 @@ struct vmevent_event { struct vmevent_attr attrs[]; }; +#ifdef __KERNEL__ + +struct mem_cgroup; + +extern void __vmevent_pressure(struct mem_cgroup *memcg, + ulong scanned, + ulong reclaimed); + +static inline void vmevent_pressure(struct mem_cgroup *memcg, + ulong scanned, + ulong reclaimed) +{ + if (!scanned) + return; + + if (IS_BUILTIN(CONFIG_MEMCG) && memcg) { + /* + * The vmevent API reports system pressure, for per-cgroup + * pressure, we'll chain cgroups notifications, this is to + * be implemented. + * + * memcg_vm_pressure(target_mem_cgroup, scanned, reclaimed); + */ + return; + } + __vmevent_pressure(memcg, scanned, reclaimed); +} + +#endif + #endif /* _LINUX_VMEVENT_H */ diff --git a/mm/vmevent.c b/mm/vmevent.c index d643615..12d0131 100644 --- a/mm/vmevent.c +++ b/mm/vmevent.c @@ -4,6 +4,7 @@ #include <linux/vmevent.h> #include <linux/syscalls.h> #include <linux/workqueue.h> +#include <linux/interrupt.h> #include <linux/file.h> #include <linux/list.h> #include <linux/poll.h> @@ -30,6 +31,25 @@ struct vmevent_watch { wait_queue_head_t waitq; }; +struct vmevent_pwatcher { + struct vmevent_watch *watch; + struct vmevent_attr *attr; + struct vmevent_attr *samp; + struct list_head node; + + uint scanned; + uint reclaimed; + uint window; +}; + +static LIST_HEAD(vmevent_pwatchers); +static DEFINE_SPINLOCK(vmevent_pwatchers_lock); + +static uint vmevent_scanned; +static uint vmevent_reclaimed; +static uint vmevent_minwin = UINT_MAX; /* Smallest window in the list. */ +static DEFINE_SPINLOCK(vmevent_pressure_lock); + typedef u64 (*vmevent_attr_sample_fn)(struct vmevent_watch *watch, struct vmevent_attr *attr); @@ -141,6 +161,10 @@ static bool vmevent_match(struct vmevent_watch *watch) struct vmevent_attr *samp = &watch->sample_attrs[i]; u64 val; + /* Pressure is event-driven, not polled */ + if (attr->type == VMEVENT_ATTR_PRESSURE) + continue; + val = vmevent_sample_attr(watch, attr); if (!ret && vmevent_match_attr(attr, val)) ret = 1; @@ -204,6 +228,94 @@ static void vmevent_start_timer(struct vmevent_watch *watch) vmevent_schedule_watch(watch); } +static ulong vmevent_calc_pressure(struct vmevent_pwatcher *pw) +{ + uint win = pw->window; + uint s = pw->scanned; + uint r = pw->reclaimed; + ulong p; + + /* + * We calculate the ratio (in percents) of how many pages were + * scanned vs. reclaimed in a given time frame (window). Note that + * time is in VM reclaimer's "ticks", i.e. number of pages + * scanned. This makes it possible set desired reaction time and + * serves as a ratelimit. + */ + p = win - (r * win / s); + p = p * 100 / win; + + pr_debug("%s: %3lu (s: %6u r: %6u)\n", __func__, p, s, r); + + return p; +} + +static void vmevent_match_pressure(struct vmevent_pwatcher *pw) +{ + struct vmevent_watch *watch = pw->watch; + struct vmevent_attr *attr = pw->attr; + ulong val; + + val = vmevent_calc_pressure(pw); + + /* Next round. */ + pw->scanned = 0; + pw->reclaimed = 0; + + if (!vmevent_match_attr(attr, val)) + return; + + pw->samp->value = val; + + atomic_set(&watch->pending, 1); + wake_up(&watch->waitq); +} + +static void vmevent_pressure_tlet_fn(ulong data) +{ + struct vmevent_pwatcher *pw; + uint s; + uint r; + + if (!vmevent_scanned) + return; + + spin_lock(&vmevent_pressure_lock); + s = vmevent_scanned; + r = vmevent_reclaimed; + vmevent_scanned = 0; + vmevent_reclaimed = 0; + spin_unlock(&vmevent_pressure_lock); + + rcu_read_lock(); + list_for_each_entry_rcu(pw, &vmevent_pwatchers, node) { + pw->scanned += s; + pw->reclaimed += r; + if (pw->scanned >= pw->window) + vmevent_match_pressure(pw); + } + rcu_read_unlock(); +} +static DECLARE_TASKLET(vmevent_pressure_tlet, vmevent_pressure_tlet_fn, 0); + +void __vmevent_pressure(struct mem_cgroup *memcg, + ulong scanned, + ulong reclaimed) +{ + if (vmevent_minwin == UINT_MAX) + return; + + spin_lock_bh(&vmevent_pressure_lock); + + vmevent_scanned += scanned; + vmevent_reclaimed += reclaimed; + + if (vmevent_scanned >= vmevent_minwin) + tasklet_schedule(&vmevent_pressure_tlet); + + spin_unlock_bh(&vmevent_pressure_lock); +} + static unsigned int vmevent_poll(struct file *file, poll_table *wait) { struct vmevent_watch *watch = file->private_data; @@ -259,12 +371,40 @@ out: return ret; } +static void vmevent_release_pwatcher(struct vmevent_watch *watch) +{ + struct vmevent_pwatcher *pw; + struct vmevent_pwatcher *tmp; + struct vmevent_pwatcher *del = NULL; + int last = 1; + + spin_lock(&vmevent_pwatchers_lock); + + list_for_each_entry_safe(pw, tmp, &vmevent_pwatchers, node) { + if (pw->watch != watch) { + vmevent_minwin = min(pw->window, vmevent_minwin); + last = 0; + continue; + } + WARN_ON(del); + list_del_rcu(&pw->node); + del = pw; + } + + if (last) + vmevent_minwin = UINT_MAX; + + spin_unlock(&vmevent_pwatchers_lock); + synchronize_rcu(); + kfree(del); +} + static int vmevent_release(struct inode *inode, struct file *file) { struct vmevent_watch *watch = file->private_data; cancel_delayed_work_sync(&watch->work); - + vmevent_release_pwatcher(watch); kfree(watch); return 0; @@ -289,6 +429,36 @@ static struct vmevent_watch *vmevent_watch_alloc(void) return watch; } +static int vmevent_setup_pwatcher(struct vmevent_watch *watch, + struct vmevent_attr *attr, + struct vmevent_attr *samp) +{ + struct vmevent_pwatcher *pw; + + if (attr->type != VMEVENT_ATTR_PRESSURE) + return 0; + + if (!attr->value2) + return -EINVAL; + + pw = kzalloc(sizeof(*pw), GFP_KERNEL); + if (!pw) + return -ENOMEM; + + pw->watch = watch; + pw->attr = attr; + pw->samp = samp; + pw->window = (attr->value2 + PAGE_SIZE - 1) / PAGE_SIZE; + + vmevent_minwin = min(pw->window, vmevent_minwin); + + spin_lock(&vmevent_pwatchers_lock); + list_add_rcu(&pw->node, &vmevent_pwatchers); + spin_unlock(&vmevent_pwatchers_lock); + + return 0; +} + static int vmevent_setup_watch(struct vmevent_watch *watch) { struct vmevent_config *config = &watch->config; @@ -302,6 +472,7 @@ static int vmevent_setup_watch(struct vmevent_watch *watch) struct vmevent_attr *attr = &config->attrs[i]; size_t size; void *new; + int ret; if (attr->type >= VMEVENT_ATTR_MAX) continue; @@ -322,6 +493,12 @@ static int vmevent_setup_watch(struct vmevent_watch *watch) watch->config_attrs[nr] = attr; + ret = vmevent_setup_pwatcher(watch, attr, &attrs[nr]); + if (ret) { + kfree(attrs); + return ret; + } + nr++; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 99b434b..f4dd1e0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -20,6 +20,7 @@ #include <linux/init.h> #include <linux/highmem.h> #include <linux/vmstat.h> +#include <linux/vmevent.h> #include <linux/file.h> #include <linux/writeback.h> #include <linux/blkdev.h> @@ -1334,6 +1335,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, nr_scanned, nr_reclaimed, sc->priority, trace_shrink_flags(file)); + + vmevent_pressure(sc->target_mem_cgroup, nr_scanned, nr_reclaimed); + return nr_reclaimed; } -- 1.7.12.1

12 years, 8 months

Kernel Storage Team weekly meeting mintues (Oct 5, 2012)

by Venkatraman S

Attendees: Arnd Bergmann, Balaji TK, Jakub Pavelek, Luca Porzio, Venkatraman S (#linaro-storage: arnd, balajitk, Xruxa, lupo, svenkatr) Discussion notes:- 1. New members joining storage team - Balaji TK and Ulf Hansson 2. Linaro Connect - attendance:- All above are planning to attend except Venkatraman S 3. Proposed topics to discuss at Connect and track leads a) Kernel changes for eMMC4.6 support (Lupo ) b) ext4 optimizations for Flash (Arnd) c) Swap on Flash optimizations (Venkatraman S) d) review of Samsung's F2FS file system ( Samsung / Arnd) 4. Possibility to include SDIO v3 features in next cycle ? To check ! 5. Discussion about eMMC4.6 FOTA feature and possible ways to implement Is it really a Kernel feature (as eMMC will become offline while updating) ? Need to study SCSI / block subsystem's methods and take a similar approach (IOCTL). 6. "ext4 move journal to enhanced area" BP is completed. Performance comparison to be published and the BP to be closed. - (Balaji TK to close) 7. core eMMC4.5 features to be reviewed and upstreamed with community support. Need to focus on core FS / Block layer optimization for flash. But still some features need attention (context ID and Power Off Notify) Regards, Venkat.

12 years, 8 months

[ACTIVITY] (Deepak Saxena) Sept 30th - Oct 5

by Deepak Saxena

== Deepak Saxena <dsaxena> == === Highlights === * Started creating Connect sessions and fleshing out plans * Looking at aarch64 SOC code, submitted a patch (!), and provided some feedback * Continued work on cleaning up existing BPs * Working on breaking down single zImage work into BPs * More changes to https://wiki.linaro.org/WorkingGroups/Kernel/BlueprintProcess === Plans === * Finish Connection session creation * Work with Jakub and engineers to fill in details of single zImage BPs * Create some new roadmap cards: - Existing Android BPs that are actively being worked on w/o a card - Enabling of EMMC-4.6 Features - EMMC Power Management == Issues === === Travel/Time Off === * Linaro US Holiday Monday Oct 8th * Off Friday Oct, 19th * Connect Copenhagen

12 years, 8 months

[ACTIVITY] (John Stultz) Oct 1-5th

by John Stultz

=== Highlights === * Pinged tglx on maybe reducing my 3.7 queue and possibly pushing directly to -next in the future to improve testing * Briefly chatted with Zach on arm64 android * Had to deal with internal infrastructure outage some this week. * Responded to some limited comments on my volatile range patches that I sent out * Sent Taras Glek some pointers to other work being done in Linaro that is relevant to Mozilla * Reviewed some paravirt-clock patches, suggested alternative methods to implement them. * Assisted Matheiu on some questions wrt sending his patchset to akpm * Sent Android subteam email & moved next weeks meeting to google hangouts instead of mumble * Started looking at madvise() volatile ranges === Plans === * Hopefully paternity leave? === Issues === * NA

12 years, 8 months

[ACTIVITY] (Linus Walleij) 2012-09-29 - 2012-10-05

by Linus Walleij

== Linus Walleij linusw == === Highlights === * Sent pull request to Torvalds for the v3.7 GPIO patch queue. Torvalds pulled it in. * Sent pull request to Torvalds for the v3.7 pinctrl patch queue. Torvalds pulled it in. * Rob Herring ACK:ed my patch to irqdomains to extend the simple domain. Still seeking an ACK from Grant. * SPARSE_IRQ patch set ready to be pulled into ARM SoC when the merge window ends. * Created and edited a few blueprints trying to follow the new process. * Strong progress on pinctrl adaption internally at ST-Ericsson. Helped out with review and merge. === Plans === * Discussed handling for runtime PM and ordinary suspend/resume in the PL022 SPI driver, discussion ongoing with Ulf Hansson. I'm a bit confused. * Test the PL08x patches on the Ericsson Research PB11MPCore and submit platform data for using pl08x DMA on that platform. * Look into other Ux500 stuff in need of mainlining... using an internal tracking sheet for this. * Look into regmap. Try something out, get to know it. === Issues === * Some internal time stealing, for example ST-Ericsson career development has taken some time this week. Thanks, Linus Walleij

12 years, 8 months

[ACTIVITY] (John Stultz) Sept 24-28th

by John Stultz

Gotten in the bad habit of working until late on Friday and then just dropping offline and forgetting to send out status. === Highlights === * Sent my 3.7 queue to Thomas, unfortunately he's not responded. * Had dinner with Taras Glek of Mozilla to discuss the volatile range changes and further details of how Mozilla uses Android functionality both in Android as well as in FirefoxOS. * Helped Mark in summarizing my role with Android Upstreaming. * Ran the bi-weekly Upstreaming meeting on mumble * Walked Jakub through some of the Android Upstreaming blueprints as well as some of the history in Linaro's processes. * Scratched out some more strategy thoughts on Linaro's roll in Android upstreaming, and send the draft to Deepak. * Implemented SIGBUS on volatile range access. * Wrote up a large summary document on volatile ranges * Sent out both v7 volatile range patches (including SIGBUS functoinality) along with the long summary document to lkml === Plans === * Waiting for a baby * Look at details on madvise() based volatile ranges on anonymous memory, per Mozilla's preference * 3.7 merge window work === Issues === * Some internal infrastructure snafus catching me atm. Hopefully won't last too long.

12 years, 9 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

linaro-kernel October 2012