PASR Frameworks brings support for the Partial Array Self-Refresh DDR power management feature. PASR has been introduced in LP-DDR2, and is also present in DDR3.
PASR provides 4 modes:
* Single-Ended: Only 1/1, 1/2, 1/4 or 1/8 are refreshed, masking starting at the end of the DDR die.
* Double-Ended: Same as Single-Ended, but refresh-masking does not start necessairly at the end of the DDR die.
* Bank-Selective: Refresh of each bank of a die can be masked or unmasked via a dedicated DDR register (MR16). This mode is convenient for DDR configured in BRC (Bank-Row-Column) mode.
* Segment-Selective: Refresh of each segment of a die can be masked or unmasked via a dedicated DDR register (MR17). This mode is convenient for DDR configured in RBC (Row-Bank-Column) mode.
The role of this framework is to stop the refresh of unused memory to enhance DDR power consumption.
It supports Bank-Selective and Segment-Selective modes, as the more adapted to modern OSes.
At early boot stage, a representation of the physical DDR layout is built:
Die 0 _______________________________ | I--------------------------I | | I Bank or Segment 0 I | | I--------------------------I | | I--------------------------I | | I Bank or Segment 1 I | | I--------------------------I | | I--------------------------I | | I Bank or Segment ... I | | I--------------------------I | | I--------------------------I | | I Bank or Segment n I | | I--------------------------I | |______________________________| ...
Die n _______________________________ | I--------------------------I | | I Bank or Segment 0 I | | I--------------------------I | | I--------------------------I | | I Bank or Segment 1 I | | I--------------------------I | | I--------------------------I | | I Bank or Segment ... I | | I--------------------------I | | I--------------------------I | | I Bank or Segment n I | | I--------------------------I | |______________________________|
The first level is a table where elements represent a die: * Base address, * Number of segments, * Table representing banks/segments, * MR16/MR17 refresh mask, * DDR Controller callback to update MR16/MR17 refresh mask.
The second level is the section tables representing the banks or segments, depending on hardware configuration: * Base address, * Unused memory size counter, * Possible pointer to another section it depends on (E.g. Interleaving)
When some memory becomes unused, the allocator owning this memory calls the PASR Framework's pasr_put(phys_addr, size) function. The framework finds the sections impacted and updates their counters accordingly. If a section counter reach the section size, the refresh of the section is masked. If the corresponding section has a dependency with another section (E.g. because of DDR interleaving, see figure below), it checks the "paired" section is also unused before updating the refresh mask.
When some unused memory is requested by the allocator, the allocator owning this memory calls the PASR Framework's pasr_get(phys_addr, size) function. The framework find the section impacted and updates their counters accordingly. If before the update, the section counter was to the section size, the refrewh of the section is unmasked. If the corresponding section has a dependency with another section, it also unmask the refresh of the other section.
Patch 3/6 contains modifications for the Buddy allocator. Overhead induced is very low because the PASR framework is notified only on "MAX_ORDER" pageblocs. Any allocator support(PMEM, HWMEM...) and Memory Hotplug would be added in next patch set revisions.
Maxime Coquelin (6): PASR: Initialize DDR layout PASR: Add core Framework PASR: mm: Integrate PASR in Buddy allocator PASR: Call PASR initialization PASR: Add Documentation PASR: Ux500: Add PASR support
Documentation/pasr.txt | 183 ++++++++++++ arch/arm/Kconfig | 1 + arch/arm/kernel/setup.c | 1 + arch/arm/mach-ux500/include/mach/hardware.h | 11 + arch/arm/mach-ux500/include/mach/memory.h | 8 + drivers/mfd/db8500-prcmu.c | 67 +++++ drivers/staging/Kconfig | 2 + drivers/staging/Makefile | 1 + drivers/staging/pasr/Kconfig | 19 ++ drivers/staging/pasr/Makefile | 6 + drivers/staging/pasr/core.c | 168 +++++++++++ drivers/staging/pasr/helper.c | 84 ++++++ drivers/staging/pasr/helper.h | 16 + drivers/staging/pasr/init.c | 403 +++++++++++++++++++++++++++ drivers/staging/pasr/ux500.c | 58 ++++ include/linux/pasr.h | 143 ++++++++++ include/linux/ux500-pasr.h | 11 + init/main.c | 8 + mm/page_alloc.c | 9 + 19 files changed, 1199 insertions(+), 0 deletions(-) create mode 100644 Documentation/pasr.txt create mode 100644 drivers/staging/pasr/Kconfig create mode 100644 drivers/staging/pasr/Makefile create mode 100644 drivers/staging/pasr/core.c create mode 100644 drivers/staging/pasr/helper.c create mode 100644 drivers/staging/pasr/helper.h create mode 100644 drivers/staging/pasr/init.c create mode 100644 drivers/staging/pasr/ux500.c create mode 100644 include/linux/pasr.h create mode 100644 include/linux/ux500-pasr.h
Build the DDR layout representation at early init.
To build the PASR MAP, two parameters are provided:
* ddr_die (mandatory): Should be added for every DDR dies present in the system. - Usage: ddr_die=xxx[M|G]@yyy[M|G] where xxx represents the size and yyy the base address of the die. E.g.: ddr_die=512M@0 ddr_die=512M@512M
* interleaved (optionnal): Should be added for every interleaved dependencies. - Usage: interleaved=xxx[M|G]@yyy[M|G]:zzz[M|G] where xxx is the size of the interleaved area between the adresses yyy and zzz. E.g interleaved=256M@0:512M
Signed-off-by: Maxime Coquelin maxime.coquelin@stericsson.com --- drivers/staging/Kconfig | 2 + drivers/staging/Makefile | 1 + drivers/staging/pasr/Kconfig | 14 ++ drivers/staging/pasr/Makefile | 4 + drivers/staging/pasr/helper.c | 84 +++++++++ drivers/staging/pasr/helper.h | 16 ++ drivers/staging/pasr/init.c | 403 +++++++++++++++++++++++++++++++++++++++++ include/linux/pasr.h | 73 ++++++++ 8 files changed, 597 insertions(+), 0 deletions(-) create mode 100644 drivers/staging/pasr/Kconfig create mode 100644 drivers/staging/pasr/Makefile create mode 100644 drivers/staging/pasr/helper.c create mode 100644 drivers/staging/pasr/helper.h create mode 100644 drivers/staging/pasr/init.c create mode 100644 include/linux/pasr.h
diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig index d061318..ddb56aa 100644 --- a/drivers/staging/Kconfig +++ b/drivers/staging/Kconfig @@ -182,4 +182,6 @@ source "drivers/staging/nmf-cm/Kconfig"
source "drivers/staging/camera_flash/Kconfig"
+source "drivers/staging/pasr/Kconfig" + endif # STAGING diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile index f0c7417..20a692d 100644 --- a/drivers/staging/Makefile +++ b/drivers/staging/Makefile @@ -5,6 +5,7 @@ obj-$(CONFIG_STAGING) += staging.o
obj-y += tty/ obj-y += generic_serial/ +obj-y += pasr/ obj-$(CONFIG_ET131X) += et131x/ obj-$(CONFIG_SLICOSS) += slicoss/ obj-$(CONFIG_VIDEO_GO7007) += go7007/ diff --git a/drivers/staging/pasr/Kconfig b/drivers/staging/pasr/Kconfig new file mode 100644 index 0000000..6bd2421 --- /dev/null +++ b/drivers/staging/pasr/Kconfig @@ -0,0 +1,14 @@ +config ARCH_HAS_PASR + bool + +config PASR + bool "DDR Partial Array Self-Refresh" + depends on ARCH_HAS_PASR + ---help--- + PASR consists on masking the refresh of unused segments or banks + when DDR is in self-refresh state. + +config PASR_DEBUG + bool "Add PASR debug prints" + def_bool n + depends on PASR diff --git a/drivers/staging/pasr/Makefile b/drivers/staging/pasr/Makefile new file mode 100644 index 0000000..72f7c27 --- /dev/null +++ b/drivers/staging/pasr/Makefile @@ -0,0 +1,4 @@ +pasr-objs := helper.o init.o +obj-$(CONFIG_PASR) += pasr.o + +ccflags-$(CONFIG_PASR_DEBUG) := -DDEBUG diff --git a/drivers/staging/pasr/helper.c b/drivers/staging/pasr/helper.c new file mode 100644 index 0000000..7c48051 --- /dev/null +++ b/drivers/staging/pasr/helper.c @@ -0,0 +1,84 @@ +/* + * Copyright (C) ST-Ericsson SA 2011 + * Author: Maxime Coquelin maxime.coquelin@stericsson.com for ST-Ericsson. + * License terms: GNU General Public License (GPL), version 2 + */ + +#include <linux/pasr.h> + + +struct pasr_die *pasr_addr2die(struct pasr_map *map, phys_addr_t addr) +{ + unsigned int left, right, mid; + + if (!map) + return NULL; + + left = 0; + right = map->nr_dies; + + addr &= ~(PASR_SECTION_SZ - 1); + + while (left != right) { + struct pasr_die *d; + phys_addr_t start; + + mid = (left + right) >> 1; + d = &map->die[mid]; + start = addr & ~((PASR_SECTION_SZ * d->nr_sections) - 1); + + if (start == d->start) + return d; + else if (start > d->start) + left = mid; + else + right = mid; + } + + pr_err("%s: No die found for address %#x", + __func__, addr); + return NULL; +} + +struct pasr_section *pasr_addr2section(struct pasr_map *map + , phys_addr_t addr) +{ + unsigned int left, right, mid; + struct pasr_die *die; + + /* Find the die the address it is located in */ + die = pasr_addr2die(map, addr); + if (!die) + goto err; + + left = 0; + right = die->nr_sections; + + addr &= ~(PASR_SECTION_SZ - 1); + + while (left != right) { + struct pasr_section *s; + + mid = (left + right) >> 1; + s = &die->section[mid]; + + if (addr == s->start) + return s; + else if (addr > s->start) + left = mid; + else + right = mid; + } + +err: + /* Provided address isn't in any declared section */ + pr_err("%s: No section found for address %#x", + __func__, addr); + + return NULL; +} + +phys_addr_t pasr_section2addr(struct pasr_section *s) +{ + return s->start; +} diff --git a/drivers/staging/pasr/helper.h b/drivers/staging/pasr/helper.h new file mode 100644 index 0000000..6488f2f --- /dev/null +++ b/drivers/staging/pasr/helper.h @@ -0,0 +1,16 @@ +/* + * Copyright (C) ST-Ericsson SA 2011 + * Author: Maxime Coquelin maxime.coquelin@stericsson.com for ST-Ericsson. + * License terms: GNU General Public License (GPL), version 2 + */ + +#ifndef _PASR_HELPER_H +#define _PASR_HELPER_H + +#include <linux/pasr.h> + +struct pasr_die *pasr_addr2die(struct pasr_map *map, phys_addr_t addr); +struct pasr_section *pasr_addr2section(struct pasr_map *map, phys_addr_t addr); +phys_addr_t pasr_section2addr(struct pasr_section *s); + +#endif /* _PASR_HELPER_H */ diff --git a/drivers/staging/pasr/init.c b/drivers/staging/pasr/init.c new file mode 100644 index 0000000..2c7c280 --- /dev/null +++ b/drivers/staging/pasr/init.c @@ -0,0 +1,403 @@ +/* + * Copyright (C) ST-Ericsson SA 2011 + * Author: Maxime Coquelin maxime.coquelin@stericsson.com for ST-Ericsson. + * License terms: GNU General Public License (GPL), version 2 + */ + +#include <linux/mm.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/sort.h> +#include <linux/pasr.h> + +#include "helper.h" + +#define NR_DIES 8 +#define NR_INT 8 + +struct ddr_die { + phys_addr_t addr; + unsigned long size; +}; + +struct interleaved_area { + phys_addr_t addr1; + phys_addr_t addr2; + unsigned long size; +}; + +struct pasr_info { + int nr_dies; + struct ddr_die die[NR_DIES]; + + int nr_int; + struct interleaved_area int_area[NR_INT]; +}; + +static struct pasr_info __initdata pasr_info; +static struct pasr_map pasr_map; + +static void add_ddr_die(phys_addr_t addr, unsigned long size); +static void add_interleaved_area(phys_addr_t a1, + phys_addr_t a2, unsigned long size); + +static int __init ddr_die_param(char *p) +{ + phys_addr_t start; + unsigned long size; + + size = memparse(p, &p); + + if (*p != '@') + goto err; + + start = memparse(p + 1, &p); + + add_ddr_die(start, size); + + return 0; +err: + return -EINVAL; +} +early_param("ddr_die", ddr_die_param); + +static int __init interleaved_param(char *p) +{ + phys_addr_t start1, start2; + unsigned long size; + + size = memparse(p, &p); + + if (*p != '@') + goto err; + + start1 = memparse(p + 1, &p); + + if (*p != ':') + goto err; + + start2 = memparse(p + 1, &p); + + add_interleaved_area(start1, start2, size); + + return 0; +err: + return -EINVAL; +} +early_param("interleaved", interleaved_param); + +void __init add_ddr_die(phys_addr_t addr, unsigned long size) +{ + BUG_ON(pasr_info.nr_dies >= NR_DIES); + + pasr_info.die[pasr_info.nr_dies].addr = addr; + pasr_info.die[pasr_info.nr_dies++].size = size; +} + +void __init add_interleaved_area(phys_addr_t a1, phys_addr_t a2, + unsigned long size) +{ + BUG_ON(pasr_info.nr_int >= NR_INT); + + pasr_info.int_area[pasr_info.nr_int].addr1 = a1; + pasr_info.int_area[pasr_info.nr_int].addr2 = a2; + pasr_info.int_area[pasr_info.nr_int++].size = size; +} + +#ifdef DEBUG +static void __init pasr_print_info(struct pasr_info *info) +{ + int i; + + pr_info("PASR information coherent\n"); + + + pr_info("DDR Dies layout:\n"); + pr_info("\tid - start address - end address\n"); + for (i = 0; i < info->nr_dies; i++) + pr_info("\t- %d : %#08x - %#08x\n", + i, (unsigned int)info->die[i].addr, + (unsigned int)(info->die[i].addr + + info->die[i].size - 1)); + + if (info->nr_int == 0) { + pr_info("No interleaved areas declared\n"); + return; + } + + pr_info("Interleaving layout:\n"); + pr_info("\tid - start @1 - end @2 : start @2 - end @2\n"); + for (i = 0; i < info->nr_int; i++) + pr_info("\t-%d - %#08x - %#08x : %#08x - %#08x\n" + , i + , (unsigned int)info->int_area[i].addr1 + , (unsigned int)(info->int_area[i].addr1 + + info->int_area[i].size - 1) + , (unsigned int)info->int_area[i].addr2 + , (unsigned int)(info->int_area[i].addr2 + + info->int_area[i].size - 1)); +} +#else +#define pasr_print_info(info) do {} while (0) +#endif /* DEBUG */ + +static int __init is_in_physmem(phys_addr_t addr, struct ddr_die *d) +{ + return ((addr >= d->addr) && (addr <= d->addr + d->size)); +} + +static int __init pasr_check_interleave_in_physmem(struct pasr_info *info, + struct interleaved_area *i) +{ + struct ddr_die *d; + int j; + int err = 4; + + for (j = 0; j < info->nr_dies; j++) { + d = &info->die[j]; + if (is_in_physmem(i->addr1, d)) + err--; + if (is_in_physmem(i->addr1 + i->size, d)) + err--; + if (is_in_physmem(i->addr2, d)) + err--; + if (is_in_physmem(i->addr2 + i->size, d)) + err--; + } + + return err; +} + +static int __init ddrdie_cmp(const void *_a, const void *_b) +{ + const struct ddr_die *a = _a, *b = _b; + + return a->addr < b->addr ? -1 : a->addr > b->addr ? 1 : 0; +} + +static int __init interleaved_cmp(const void *_a, const void *_b) +{ + const struct interleaved_area *a = _a, *b = _b; + + return a->addr1 < b->addr1 ? -1 : a->addr1 > b->addr1 ? 1 : 0; +} + +static int __init pasr_info_sanity_check(struct pasr_info *info) +{ + int i; + + /* Check at least one physical chunk is defined */ + if (info->nr_dies == 0) { + pr_err("%s: No DDR dies declared in command line\n", __func__); + return -EINVAL; + } + + /* Sort DDR dies areas */ + sort(&info->die, info->nr_dies, + sizeof(info->die[0]), ddrdie_cmp, NULL); + + /* Physical layout checking */ + for (i = 0; i < info->nr_dies; i++) { + struct ddr_die *d1, *d2; + + d1 = &info->die[i]; + + if (d1->size == 0) { + pr_err("%s: DDR die at %#x has 0 size\n", + __func__, d1->addr); + return -EINVAL; + } + + /* Check die is aligned on section boundaries */ + if (((d1->addr & ~(PASR_SECTION_SZ - 1)) != d1->addr) + || ((d1->size & ~(PASR_SECTION_SZ - 1)) != d1->size)) { + pr_err("%s: DDR die at %#x (size %#lx) is not aligned" + "on section boundaries %#x\n", + __func__, d1->addr, + d1->size, PASR_SECTION_SZ); + return -EINVAL; + } + + if (i == 0) + continue; + + /* Check areas are not overlapping */ + d2 = d1; + d1 = &info->die[i-1]; + if ((d1->addr + d1->size - 1) >= d2->addr) { + pr_err("%s: DDR dies at %#x and %#x are overlapping\n", + __func__, d1->addr, d2->addr); + return -EINVAL; + } + } + + /* Interleave layout checking */ + if (info->nr_int == 0) + goto out; + + /* Sort interleaved areas */ + sort(&info->int_area, info->nr_int, + sizeof(info->int_area[0]), interleaved_cmp, NULL); + + for (i = 0; i < info->nr_int; i++) { + struct interleaved_area *i1; + + i1 = &info->int_area[i]; + if (i1->size == 0) { + pr_err("%s: Interleaved area %#x/%#x has 0 size\n", + __func__, i1->addr1, i1->addr2); + return -EINVAL; + } + + /* Check area is aligned on section boundaries */ + if (((i1->addr1 & ~(PASR_SECTION_SZ - 1)) != i1->addr1) + || ((i1->addr2 & ~(PASR_SECTION_SZ - 1)) != i1->addr2) + || ((i1->size & ~(PASR_SECTION_SZ - 1)) != i1->size)) { + pr_err("%s: Interleaved area at %#x/%#x (size %#lx) is not" + "aligned on section boundaries %#x\n", + __func__, i1->addr1, i1->addr2, + i1->size, PASR_SECTION_SZ); + return -EINVAL; + } + + /* Check interleaved areas are not overlapping */ + if ((i1->addr1 + i1->size - 1) >= i1->addr2) { + pr_err("%s: Interleaved areas %#x" + "and %#x are overlapping\n", + __func__, i1->addr1, i1->addr2); + return -EINVAL; + } + + /* Check the interleaved areas are in the physical areas */ + if (pasr_check_interleave_in_physmem(info, i1)) { + pr_err("%s: Interleaved area %#x/%#x" + "not in physical memory\n", + __func__, i1->addr1, i1->addr2); + return -EINVAL; + } + } + +out: + return 0; +} + +#ifdef DEBUG +static void __init pasr_print_map(struct pasr_map *map) +{ + int i, j; + + if (!map) + goto out; + + pr_info("PASR map:\n"); + + for (i = 0; i < map->nr_dies; i++) { + struct pasr_die *die = &map->die[i]; + + pr_info("Die %d:\n", i); + for (j = 0; j < die->nr_sections; j++) { + struct pasr_section *s = &die->section[j]; + pr_info("\tSection %d: @ = %#08x, Pair = %s\n" + , j, s->start, s->pair ? "Yes" : "No"); + } + } +out: + return; +} +#else +#define pasr_print_map(map) do {} while (0) +#endif /* DEBUG */ + +static int __init pasr_build_map(struct pasr_info *info, struct pasr_map *map) +{ + int i, j; + struct pasr_die *die; + + map->nr_dies = info->nr_dies; + die = map->die; + + for (i = 0; i < info->nr_dies; i++) { + phys_addr_t addr = info->die[i].addr; + struct pasr_section *section = die[i].section; + + die[i].start = addr; + die[i].idx = i; + die[i].nr_sections = info->die[i].size >> PASR_SECTION_SZ_BITS; + + for (j = 0; j < die[i].nr_sections; j++) { + section[j].start = addr; + addr += PASR_SECTION_SZ; + section[j].die = &die[i]; + } + } + + for (i = 0; i < info->nr_int; i++) { + struct interleaved_area *ia = &info->int_area[i]; + struct pasr_section *s1, *s2; + unsigned long offset = 0; + + for (j = 0; j < (ia->size >> PASR_SECTION_SZ_BITS); j++) { + s1 = pasr_addr2section(map, ia->addr1 + offset); + s2 = pasr_addr2section(map, ia->addr2 + offset); + if (!s1 || !s2) + return -EINVAL; + + offset += PASR_SECTION_SZ; + + s1->pair = s2; + s2->pair = s1; + } + } + return 0; +} + +int __init early_pasr_setup(void) +{ + int ret; + + ret = pasr_info_sanity_check(&pasr_info); + if (ret) { + pr_err("PASR info sanity check failed (err %d)\n", ret); + return ret; + } + + pasr_print_info(&pasr_info); + + ret = pasr_build_map(&pasr_info, &pasr_map); + if (ret) { + pr_err("PASR build map failed (err %d)\n", ret); + return ret; + } + + pasr_print_map(&pasr_map); + + ret = pasr_init_core(&pasr_map); + + pr_debug("PASR: First stage init done.\n"); + + return ret; +} + +/* + * late_pasr_setup() has to be called after Linux allocator is + * initialized but before other CPUs are launched. + */ +int __init late_pasr_setup(void) +{ + int i, j; + struct pasr_section *s; + + for_each_pasr_section(i, j, pasr_map, s) { + if (!s->lock) { + s->lock = kzalloc(sizeof(spinlock_t), GFP_KERNEL); + BUG_ON(!s->lock); + spin_lock_init(s->lock); + if (s->pair) + s->pair->lock = s->lock; + } + } + + pr_debug("PASR Second stage init done.\n"); + + return 0; +} diff --git a/include/linux/pasr.h b/include/linux/pasr.h new file mode 100644 index 0000000..93867f0 --- /dev/null +++ b/include/linux/pasr.h @@ -0,0 +1,73 @@ +/* + * Copyright (C) ST-Ericsson SA 2011 + * Author: Maxime Coquelin maxime.coquelin@stericsson.com for ST-Ericsson. + * License terms: GNU General Public License (GPL), version 2 + */ +#ifndef _LINUX_PASR_H +#define _LINUX_PASR_H + +#include <linux/mm.h> +#include <linux/spinlock.h> +#include <mach/memory.h> + +#ifdef CONFIG_PASR + +/** + * struct pasr_section - Represent either a DDR Bank or Segment depending on + * the DDR configuration (Bank-Row-Column or Row-Bank-Coloumn) + * + * @start: Start address of the segment. + * @pair: Pointer on another segment in case of dependency (e.g. interleaving). + * Masking of the dependant segments have to be done accordingly. + * @free_size: Represents the free memory size in the segment. + * @lock: Protect the free_size counter + * @die: Pointer to the Die the segment is part of. + */ +struct pasr_section { + phys_addr_t start; + struct pasr_section *pair; + unsigned long free_size; + spinlock_t *lock; + struct pasr_die *die; +}; + +/** + * struct pasr_die - Represent a DDR die + * + * @start: Start address of the die. + * @idx: Index of the die. + * @nr_sections: Number of Bank or Segment in the die. + * @section: Table of the die's segments. + * @mem_reg: Represents the PASR mask of the die. It is either MR16 or MR17, + * depending on the addressing configuration (RBC or BRC). + * @apply_mask: Callback registred by the platform's PASR driver to apply the + * calculated PASR mask. + * @cookie: Private data for the platform's PASR driver. + */ +struct pasr_die { + phys_addr_t start; + int idx; + int nr_sections; + struct pasr_section section[PASR_MAX_SECTION_NR_PER_DIE]; +}; + +/** + * struct pasr_map - Represent the DDR physical map + * + * @nr_dies: Number of DDR dies. + * @die: Table of the dies. + */ +struct pasr_map { + int nr_dies; + struct pasr_die die[PASR_MAX_DIE_NR]; +}; + +#define for_each_pasr_section(i, j, map, s) \ + for (i = 0; i < map.nr_dies; i++) \ + for (s = &map.die[i].section[0], j = 0; \ + j < map.die[i].nr_sections; \ + j++, s = &map.die[i].section[j]) + +#endif /* CONFIG_PASR */ + +#endif /* _LINUX_PASR_H */
This patch introduces the core of the PASR Framework, whose role is to update sections counters and Self-Refresh masks when sections become free/used.
Signed-off-by: Maxime Coquelin maxime.coquelin@stericsson.com --- drivers/staging/pasr/Makefile | 3 +- drivers/staging/pasr/core.c | 168 +++++++++++++++++++++++++++++++++++++++++ include/linux/pasr.h | 70 +++++++++++++++++ 3 files changed, 240 insertions(+), 1 deletions(-) create mode 100644 drivers/staging/pasr/core.c
diff --git a/drivers/staging/pasr/Makefile b/drivers/staging/pasr/Makefile index 72f7c27..d172294 100644 --- a/drivers/staging/pasr/Makefile +++ b/drivers/staging/pasr/Makefile @@ -1,4 +1,5 @@ -pasr-objs := helper.o init.o +pasr-objs := helper.o init.o core.o + obj-$(CONFIG_PASR) += pasr.o
ccflags-$(CONFIG_PASR_DEBUG) := -DDEBUG diff --git a/drivers/staging/pasr/core.c b/drivers/staging/pasr/core.c new file mode 100644 index 0000000..49bacb9 --- /dev/null +++ b/drivers/staging/pasr/core.c @@ -0,0 +1,168 @@ +/* + * Copyright (C) ST-Ericsson SA 2011 + * Author: Maxime Coquelin maxime.coquelin@stericsson.com for ST-Ericsson. + * License terms: GNU General Public License (GPL), version 2 + */ + +#include <linux/mm.h> +#include <linux/spinlock.h> +#include <linux/pasr.h> + +#include "helper.h" + +enum pasr_state { + PASR_REFRESH, + PASR_NO_REFRESH, +}; + +struct pasr_fw { + struct pasr_map *map; +}; + +static struct pasr_fw pasr; + +void pasr_update_mask(struct pasr_section *section, enum pasr_state state) +{ + struct pasr_die *die = section->die; + phys_addr_t addr = section->start - die->start; + u8 bit = addr >> PASR_SECTION_SZ_BITS; + + if (state == PASR_REFRESH) + die->mem_reg &= ~(1 << bit); + else + die->mem_reg |= (1 << bit); + + pr_debug("%s(): %s refresh section 0x%08x. Die%d mem_reg = 0x%02x\n" + , __func__, state == PASR_REFRESH ? "Start" : "Stop" + , section->start, die->idx, die->mem_reg); + + if (die->apply_mask) + die->apply_mask(&die->mem_reg, die->cookie); + + return; +} + +void pasr_put(phys_addr_t paddr, unsigned long size) +{ + struct pasr_section *s; + unsigned long cur_sz; + unsigned long flags; + + if (!pasr.map) { + WARN_ONCE(1, KERN_INFO"%s(): Map not initialized.\n" + "\tCommand line parameters missing or incorrect\n" + , __func__); + goto out; + } + + do { + s = pasr_addr2section(pasr.map, paddr); + if (!s) + goto out; + + cur_sz = ((paddr + size) < (s->start + PASR_SECTION_SZ)) ? + size : s->start + PASR_SECTION_SZ - paddr; + + if (s->lock) + spin_lock_irqsave(s->lock, flags); + + s->free_size += cur_sz; + BUG_ON(s->free_size > PASR_SECTION_SZ); + + if (s->free_size < PASR_SECTION_SZ) + goto unlock; + + if (!s->pair) + pasr_update_mask(s, PASR_NO_REFRESH); + else if (s->pair->free_size == PASR_SECTION_SZ) { + pasr_update_mask(s, PASR_NO_REFRESH); + pasr_update_mask(s->pair, PASR_NO_REFRESH); + } +unlock: + if (s->lock) + spin_unlock_irqrestore(s->lock, flags); + + paddr += cur_sz; + size -= cur_sz; + } while (size); + +out: + return; +} + +void pasr_get(phys_addr_t paddr, unsigned long size) +{ + unsigned long flags; + unsigned long cur_sz; + struct pasr_section *s; + + if (!pasr.map) { + WARN_ONCE(1, KERN_INFO"%s(): Map not initialized.\n" + "\tCommand line parameters missing or incorrect\n" + , __func__); + return; + } + + do { + s = pasr_addr2section(pasr.map, paddr); + if (!s) + goto out; + + cur_sz = ((paddr + size) < (s->start + PASR_SECTION_SZ)) ? + size : s->start + PASR_SECTION_SZ - paddr; + + if (s->lock) + spin_lock_irqsave(s->lock, flags); + + if (s->free_size < PASR_SECTION_SZ) + goto unlock; + + if (!s->pair) + pasr_update_mask(s, PASR_REFRESH); + else if (s->pair->free_size == PASR_SECTION_SZ) { + pasr_update_mask(s, PASR_REFRESH); + pasr_update_mask(s->pair, PASR_REFRESH); + } +unlock: + BUG_ON(cur_sz > s->free_size); + s->free_size -= cur_sz; + + if (s->lock) + spin_unlock_irqrestore(s->lock, flags); + + paddr += cur_sz; + size -= cur_sz; + } while (size); + +out: + return; +} + +int pasr_register_mask_function(phys_addr_t addr, void *function, void *cookie) +{ + struct pasr_die *die = pasr_addr2die(pasr.map, addr); + + if (!die) { + pr_err("%s: No DDR die corresponding to address 0x%08x\n", + __func__, addr); + return -EINVAL; + } + + if (addr != die->start) + pr_warning("%s: Addresses mismatch (Die = 0x%08x, addr = 0x%08x\n" + , __func__, die->start, addr); + + die->cookie = cookie; + die->apply_mask = function; + + die->apply_mask(&die->mem_reg, die->cookie); + + return 0; +} + +int __init pasr_init_core(struct pasr_map *map) +{ + pasr.map = map; + return 0; +} + diff --git a/include/linux/pasr.h b/include/linux/pasr.h index 93867f0..e85be73 100644 --- a/include/linux/pasr.h +++ b/include/linux/pasr.h @@ -49,6 +49,10 @@ struct pasr_die { int idx; int nr_sections; struct pasr_section section[PASR_MAX_SECTION_NR_PER_DIE]; + u16 mem_reg; /* Either MR16 or MR17 */ + + void (*apply_mask)(u16 *mem_reg, void *cookie); + void *cookie; };
/** @@ -68,6 +72,72 @@ struct pasr_map { j < map.die[i].nr_sections; \ j++, s = &map.die[i].section[j])
+/** + * pasr_register_mask_function() + * + * @die_addr: Physical base address of the die. + * @function: Callback function for applying the DDR PASR mask. + * @cookie: Private data called with the callback function. + * + * This function is to be called by the platform specific PASR driver in + * charge of application of the PASR masks. + */ +int pasr_register_mask_function(phys_addr_t die_addr, + void *function, void *cookie); + +/** + * pasr_put() + * + * @paddr: Physical address of the freed memory chunk. + * @size: Size of the freed memory chunk. + * + * This function is to be placed in the allocators when memory chunks are + * inserted in the free memory pool. + * This function has only to be called for unused memory, otherwise retention + * cannot be guaranteed. + */ +void pasr_put(phys_addr_t paddr, unsigned long size); + +/** + * pasr_get() + * + * @paddr: Physical address of the allocated memory chunk. + * @size: Size of the allocated memory chunk. + * + * This function is to be placed in the allocators when memory chunks are + * removed from the free memory pool. + * If pasr_put() is used by the allocator, using this function is mandatory to + * guarantee retention. + */ +void pasr_get(phys_addr_t paddr, unsigned long size); + + +static inline void pasr_kput(struct page *page, int order) +{ + if (order != MAX_ORDER - 1) + return; + + pasr_put(page_to_phys(page), PAGE_SIZE << (MAX_ORDER - 1)); +} + +static inline void pasr_kget(struct page *page, int order) +{ + if (order != MAX_ORDER - 1) + return; + + pasr_get(page_to_phys(page), PAGE_SIZE << (MAX_ORDER - 1)); +} + +int __init early_pasr_setup(void); +int __init late_pasr_setup(void); +int __init pasr_init_core(struct pasr_map *); + +#else +#define pasr_kput(page, order) do {} while (0) +#define pasr_kget(page, order) do {} while (0) + +#define pasr_put(paddr, size) do {} while (0) +#define pasr_get(paddr, size) do {} while (0) #endif /* CONFIG_PASR */
#endif /* _LINUX_PASR_H */
Any allocators might call the PASR Framework for DDR power savings. Currently, only Linux Buddy allocator is patched, but HWMEM and PMEM physically contiguous memory allocators will follow.
Linux Buddy allocator porting uses Buddy specificities to reduce the overhead induced by the PASR Framework counter updates. Indeed, the PASR Framework is called only when MAX_ORDER (4MB page blocs by default) buddies are inserted/removed from the free lists.
To port PASR FW into a new allocator:
* Call pasr_put(phys_addr, size) each time a memory chunk becomes unused. * Call pasr_get(phys_addr, size) each time a memory chunk becomes used.
Signed-off-by: Maxime Coquelin maxime.coquelin@stericsson.com --- mm/page_alloc.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03d8c48..c62fe11 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -57,6 +57,7 @@ #include <linux/ftrace_event.h> #include <linux/memcontrol.h> #include <linux/prefetch.h> +#include <linux/pasr.h>
#include <asm/tlbflush.h> #include <asm/div64.h> @@ -534,6 +535,7 @@ static inline void __free_one_page(struct page *page, /* Our buddy is free, merge with it and move up one order. */ list_del(&buddy->lru); zone->free_area[order].nr_free--; + pasr_kget(buddy, order); rmv_page_order(buddy); combined_idx = buddy_idx & page_idx; page = page + (combined_idx - page_idx); @@ -566,6 +568,7 @@ static inline void __free_one_page(struct page *page, list_add(&page->lru, &zone->free_area[order].free_list[migratetype]); out: zone->free_area[order].nr_free++; + pasr_kput(page, order); }
/* @@ -762,6 +765,7 @@ static inline void expand(struct zone *zone, struct page *page, VM_BUG_ON(bad_range(zone, &page[size])); list_add(&page[size].lru, &area->free_list[migratetype]); area->nr_free++; + pasr_kput(page, high); set_page_order(&page[size], high); } } @@ -830,6 +834,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, list_del(&page->lru); rmv_page_order(page); area->nr_free--; + pasr_kget(page, current_order); expand(zone, page, order, current_order, area, migratetype); return page; } @@ -955,6 +960,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) page = list_entry(area->free_list[migratetype].next, struct page, lru); area->nr_free--; + pasr_kget(page, current_order);
/* * If breaking a large block of pages, move all free @@ -1281,6 +1287,8 @@ int split_free_page(struct page *page) /* Remove page from free list */ list_del(&page->lru); zone->free_area[order].nr_free--; + pasr_kget(page, order); + rmv_page_order(page); __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
@@ -5692,6 +5700,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) list_del(&page->lru); rmv_page_order(page); zone->free_area[order].nr_free--; + pasr_kget(page, order); __mod_zone_page_state(zone, NR_FREE_PAGES, - (1UL << order)); for (i = 0; i < (1 << order); i++)
On Mon, Jan 30, 2012 at 02:33:53PM +0100, Maxime Coquelin wrote:
Any allocators might call the PASR Framework for DDR power savings. Currently, only Linux Buddy allocator is patched, but HWMEM and PMEM physically contiguous memory allocators will follow.
Linux Buddy allocator porting uses Buddy specificities to reduce the overhead induced by the PASR Framework counter updates. Indeed, the PASR Framework is called only when MAX_ORDER (4MB page blocs by default) buddies are inserted/removed from the free lists.
To port PASR FW into a new allocator:
- Call pasr_put(phys_addr, size) each time a memory chunk becomes unused.
- Call pasr_get(phys_addr, size) each time a memory chunk becomes used.
Signed-off-by: Maxime Coquelin maxime.coquelin@stericsson.com
mm/page_alloc.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03d8c48..c62fe11 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -57,6 +57,7 @@ #include <linux/ftrace_event.h> #include <linux/memcontrol.h> #include <linux/prefetch.h> +#include <linux/pasr.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -534,6 +535,7 @@ static inline void __free_one_page(struct page *page, /* Our buddy is free, merge with it and move up one order. */ list_del(&buddy->lru); zone->free_area[order].nr_free--;
rmv_page_order(buddy); combined_idx = buddy_idx & page_idx; page = page + (combined_idx - page_idx);pasr_kget(buddy, order);
I did not review this series carefully and I know nothing about how you implemented PASR support but driver hooks like this in the page allocator are heavily frowned upon. It is subject to abuse but it adds overhead to the allocator although I note that you avoiding putting hooks in the per-cpu page allocator. I note that you hardcode it so only PASR can use the hook but it looks like there is no way of avoiding that overhead on platforms that do not have PASR if it is enabled in the config. At a glance, it appears to be doing a fair amount of work too - looking up maps, taking locks etc. This potentially creates a new hot lock because in this paths, we have per-zone locking but you are adding a PASR lock into the mix that may be more coarse than zone->lock (I didn't check).
You may be able to use the existing arch_alloc_page() hook and call PASR on architectures that support it if and only if PASR is present and enabled by the administrator but even this is likely to be unpopular as it'll have a measurable performance impact on platforms with PASR (not to mention the PASR lock will be even heavier as it'll now be also used for per-cpu page allocations). To get the hook you want, you'd need to show significant benefit before they were happy with the hook.
What is more likely is that you will get pushed to doing something like periodically scanning memory as part of a separate power management module and calling into PASR if regions of memory that are found that can be powered down in some ways.
Hello Mel,
Thanks for your comments,
On 01/30/2012 04:22 PM, Mel Gorman wrote:
On Mon, Jan 30, 2012 at 02:33:53PM +0100, Maxime Coquelin wrote:
Any allocators might call the PASR Framework for DDR power savings. Currently, only Linux Buddy allocator is patched, but HWMEM and PMEM physically contiguous memory allocators will follow.
Linux Buddy allocator porting uses Buddy specificities to reduce the overhead induced by the PASR Framework counter updates. Indeed, the PASR Framework is called only when MAX_ORDER (4MB page blocs by default) buddies are inserted/removed from the free lists.
To port PASR FW into a new allocator:
- Call pasr_put(phys_addr, size) each time a memory chunk becomes unused.
- Call pasr_get(phys_addr, size) each time a memory chunk becomes used.
Signed-off-by: Maxime Coquelinmaxime.coquelin@stericsson.com
mm/page_alloc.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 03d8c48..c62fe11 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -57,6 +57,7 @@ #include<linux/ftrace_event.h> #include<linux/memcontrol.h> #include<linux/prefetch.h> +#include<linux/pasr.h>
#include<asm/tlbflush.h> #include<asm/div64.h> @@ -534,6 +535,7 @@ static inline void __free_one_page(struct page *page, /* Our buddy is free, merge with it and move up one order. */ list_del(&buddy->lru); zone->free_area[order].nr_free--;
rmv_page_order(buddy); combined_idx = buddy_idx& page_idx; page = page + (combined_idx - page_idx);pasr_kget(buddy, order);
I did not review this series carefully and I know nothing about how you implemented PASR support but driver hooks like this in the page allocator are heavily frowned upon. It is subject to abuse but it adds overhead to the allocator although I note that you avoiding putting hooks in the per-cpu page allocator.
I catch your point. However, adding hooks in the page allocator is the only way I see to ensure memory that is accessed is refreshed.
I note that you hardcode it so only PASR can use the hook but it looks like there is no way of avoiding that overhead on platforms that do not have PASR if it is enabled in the config.
In that RFC patch set, I assumed the PASR would be enabled in the config only in case it is used. If not activated, there is no overhead.
This could of course be improved in next patch set.
At a glance, it appears to be doing a fair amount of work too - looking up maps, taking locks etc.
Note that we do that work only on MAX_ORDER pages, so it limits the overhead.
This potentially creates a new hot lock because in this paths, we have per-zone locking but you are adding a PASR lock into the mix that may be more coarse than zone->lock (I didn't check).
Ok. We might fall in a deadlock if the underlying PASR driver allocates something in its apply_mask callback. However, this callback should be used only to write a register of the DDR controller.
You may be able to use the existing arch_alloc_page() hook and call PASR on architectures that support it if and only if PASR is present and enabled by the administrator but even this is likely to be unpopular as it'll have a measurable performance impact on platforms with PASR (not to mention the PASR lock will be even heavier as it'll now be also used for per-cpu page allocations). To get the hook you want, you'd need to show significant benefit before they were happy with the hook.
Your proposal sounds good. AFAIK, per-cpu allocation maximum size is 32KB. Please correct me if I'm wrong. Since pasr_kget/kput() calls the PASR framework only on MAX_ORDER allocations, we wouldn't add any locking risks nor contention compared to current patch. I will update the patch set using arch_alloc/free_page().
What is more likely is that you will get pushed to doing something like periodically scanning memory as part of a separate power management module and calling into PASR if regions of memory that are found that can be powered down in some ways.
With this solution, we need in any case to add some hooks in the allocator to ensure the pages being allocated are refreshed.
Best regards, Maxime
Hello Mel, On 01/30/2012 05:52 PM, Maxime Coquelin wrote:
On 01/30/2012 04:22 PM, Mel Gorman wrote:
You may be able to use the existing arch_alloc_page() hook and call PASR on architectures that support it if and only if PASR is present and enabled by the administrator but even this is likely to be unpopular as it'll have a measurable performance impact on platforms with PASR (not to mention the PASR lock will be even heavier as it'll now be also used for per-cpu page allocations). To get the hook you want, you'd need to show significant benefit before they were happy with the hook.
Your proposal sounds good. AFAIK, per-cpu allocation maximum size is 32KB. Please correct me if I'm wrong. Since pasr_kget/kput() calls the PASR framework only on MAX_ORDER allocations, we wouldn't add any locking risks nor contention compared to current patch. I will update the patch set using arch_alloc/free_page().
I just had a deeper look at when arch_alloc_page() is called. I think it does not fit with PASR framework needs. pasr_kget() calls pasr_get() only for max order pages (same for pasr_kput()) to avoid overhead.
In current patch set, pasr_kget() is called when pages are removed from the free lists, and pasr_kput() when pages are inserted in the free lists. So, pasr_get() is called in case of : - allocation of a max order page - split of a max order page into lower order pages to fulfill allocation of pages smaller than max order And pasr_put() is called in case of: - release of a max order page - coalescence of two "max order -1" pages when smaller pages are released
If we call the PASR framework in arch_alloc_page(), we have two possibilities: 1) using pasr_kget(): the PASR framework will only be notified of max order allocations, so the coalesce/split of free pages case will not be taken into account. 2) using pasr_get(): the PASR framework will be called for every orders of page allocation/release. The induced overhead is not acceptable.
To avoid calling pasr_kget/kput() directly in page_alloc.c, do you think adding some arch specific hooks when a page is inserted or removed from the free lists could be acceptable? Something like arch_insert_freepage(struct page *page, int order) and arch_remove_freepage(struct page *page, int order).
Regards, Maxime
On Tue, Jan 31, 2012 at 01:15:55PM +0100, Maxime Coquelin wrote:
Hello Mel, On 01/30/2012 05:52 PM, Maxime Coquelin wrote:
On 01/30/2012 04:22 PM, Mel Gorman wrote:
You may be able to use the existing arch_alloc_page() hook and call PASR on architectures that support it if and only if PASR is present and enabled by the administrator but even this is likely to be unpopular as it'll have a measurable performance impact on platforms with PASR (not to mention the PASR lock will be even heavier as it'll now be also used for per-cpu page allocations). To get the hook you want, you'd need to show significant benefit before they were happy with the hook.
Your proposal sounds good. AFAIK, per-cpu allocation maximum size is 32KB. Please correct me if I'm wrong. Since pasr_kget/kput() calls the PASR framework only on MAX_ORDER allocations, we wouldn't add any locking risks nor contention compared to current patch. I will update the patch set using arch_alloc/free_page().
I just had a deeper look at when arch_alloc_page() is called. I think it does not fit with PASR framework needs. pasr_kget() calls pasr_get() only for max order pages (same for pasr_kput()) to avoid overhead.
I see. My bad.
In current patch set, pasr_kget() is called when pages are removed from the free lists, and pasr_kput() when pages are inserted in the free lists. So, pasr_get() is called in case of : - allocation of a max order page - split of a max order page into lower order pages to fulfill allocation of pages smaller than max order And pasr_put() is called in case of: - release of a max order page - coalescence of two "max order -1" pages when smaller pages are released
If we call the PASR framework in arch_alloc_page(), we have two possibilities: 1) using pasr_kget(): the PASR framework will only be notified of max order allocations, so the coalesce/split of free pages case will not be taken into account. 2) using pasr_get(): the PASR framework will be called for every orders of page allocation/release. The induced overhead is not acceptable.
To avoid calling pasr_kget/kput() directly in page_alloc.c, do you think adding some arch specific hooks when a page is inserted or removed from the free lists could be acceptable?
It's not the name that is the problem, I'm strongly against any hook that can delay the page allocator for arbitrary lengths of time like this. I am open to being convinced otherwise but for me PASR would need to demonstrate large savings for a wide variety of machines and the alternatives would have to be considered and explained why they would be far inferior or unsuitable.
For example - it seems like this could be also be done with a balloon driver instead of page allocator hooks. A governer would identify when the machine was under no memory pressure or triggered from userspace. To power down memory, it would use page reclaim and page migration to allocate large contiguous ranges of memory - CMA could potentially be adapted when it gets merged to save a lot of implementation work. The governer should register a slab shrinker so that under memory pressure it gets called so it can shrink the ballon, power the DIMMS back up and free the memory back to the buddy allocator. This would keep all the cost out of the allocator paths and move the cost to when the machine is either idle (in the case of powering down) or under memory pressure (where the cost of powering up will be small in comparison to the overall cost of the page reclaim operation).
On 01/31/2012 03:01 PM, Mel Gorman wrote:
On Tue, Jan 31, 2012 at 01:15:55PM +0100, Maxime Coquelin wrote:
In current patch set, pasr_kget() is called when pages are removed from the free lists, and pasr_kput() when pages are inserted in the free lists. So, pasr_get() is called in case of : - allocation of a max order page - split of a max order page into lower order pages to fulfill allocation of pages smaller than max order And pasr_put() is called in case of: - release of a max order page - coalescence of two "max order -1" pages when smaller pages are released
If we call the PASR framework in arch_alloc_page(), we have two possibilities: 1) using pasr_kget(): the PASR framework will only be notified of max order allocations, so the coalesce/split of free pages case will not be taken into account. 2) using pasr_get(): the PASR framework will be called for every orders of page allocation/release. The induced overhead is not acceptable.
To avoid calling pasr_kget/kput() directly in page_alloc.c, do you think adding some arch specific hooks when a page is inserted or removed from the free lists could be acceptable?
It's not the name that is the problem, I'm strongly against any hook that can delay the page allocator for arbitrary lengths of time like this. I am open to being convinced otherwise but for me PASR would need to demonstrate large savings for a wide variety of machines and the alternatives would have to be considered and explained why they would be far inferior or unsuitable.
Ok Mel, I understand your point of view.
The goal of this RFC patch set was to collect comments, so I'm glad to get your opinion. I propose to forget the patch in the Buddy allocator.
For example - it seems like this could be also be done with a balloon driver instead of page allocator hooks. A governer would identify when the machine was under no memory pressure or triggered from userspace. To power down memory, it would use page reclaim and page migration to allocate large contiguous ranges of memory - CMA could potentially be adapted when it gets merged to save a lot of implementation work. The governer should register a slab shrinker so that under memory pressure it gets called so it can shrink the ballon, power the DIMMS back up and free the memory back to the buddy allocator. This would keep all the cost out of the allocator paths and move the cost to when the machine is either idle (in the case of powering down) or under memory pressure (where the cost of powering up will be small in comparison to the overall cost of the page reclaim operation).
This is very interesting. I know Linaro plans to work on DDR power management topic. One of the options they envisage is to use the Memory Hotplug feature. However, the main problem with Memory Hotplug is to handle the memory pressure, i.e. when to re-plug the memory sections. Your proposal address this issue. I don't know if such a driver could be done in the Linaro scope.
Anyway, even with a balloon driver, I think the PASR framework could be suitable to keep an "hardware" view of the memory layout (dies, banks, segments...). Moreover, this framework is designed to also support some physically contiguous memory allocators (such as hwmem and pmem).
Best regards, Maxime
On Tue, Jan 31, 2012 at 07:55:47PM +0100, Maxime Coquelin wrote:
To avoid calling pasr_kget/kput() directly in page_alloc.c, do you think adding some arch specific hooks when a page is inserted or removed from the free lists could be acceptable?
It's not the name that is the problem, I'm strongly against any hook that can delay the page allocator for arbitrary lengths of time like this. I am open to being convinced otherwise but for me PASR would need to demonstrate large savings for a wide variety of machines and the alternatives would have to be considered and explained why they would be far inferior or unsuitable.
Ok Mel, I understand your point of view.
The goal of this RFC patch set was to collect comments, so I'm glad to get your opinion. I propose to forget the patch in the Buddy allocator.
Or at least tag it as "this should have an alternative"
For example - it seems like this could be also be done with a balloon driver instead of page allocator hooks. A governer would identify when the machine was under no memory pressure or triggered from userspace. To power down memory, it would use page reclaim and page migration to allocate large contiguous ranges of memory - CMA could potentially be adapted when it gets merged to save a lot of implementation work. The governer should register a slab shrinker so that under memory pressure it gets called so it can shrink the ballon, power the DIMMS back up and free the memory back to the buddy allocator. This would keep all the cost out of the allocator paths and move the cost to when the machine is either idle (in the case of powering down) or under memory pressure (where the cost of powering up will be small in comparison to the overall cost of the page reclaim operation).
This is very interesting. I know Linaro plans to work on DDR power management topic. One of the options they envisage is to use the Memory Hotplug feature. However, the main problem with Memory Hotplug is to handle the memory pressure, i.e. when to re-plug the memory sections.
heh, I was originally going to suggest the Memory Hotplug feature until I ran into the problem of how to bring the memory back. Technically, it could *also* use a shrinker and keep track of the memory it offlined. It would work on a similar principal to the balloon but I was worried that the cost of offlining and onlining memory would be too high for PASR. There was also the problem that it would require SPARSEMEM and would also require that you grew/shrunk memory in ranges of the section size which might be totally different to PASR.
Your proposal address this issue. I don't know if such a driver could be done in the Linaro scope.
Beats me.
Anyway, even with a balloon driver, I think the PASR framework could be suitable to keep an "hardware" view of the memory layout (dies, banks, segments...).
Oh yes, the balloon driver would still need this information!
Moreover, this framework is designed to also support some physically contiguous memory allocators (such as hwmem and pmem).
Not being familiar with hwmem or pmem, I can't be 100% certain but superficially, I would expect that the same balloon driver could be used for hwmem and pmem. The main difference between this balloon driver and others will be how it selects pages to add to the balloon.
There are existing balloon drivers that you may or may not be able to leverage. There is some talk that KVM people want to be able to balloon 2M contiguous pages. If this was ever implemented, it's possible that you could reuse it for PASR so keep an eye out for it.
On Mon, Jan 30, 2012 at 6:52 PM, Maxime Coquelin maxime.coquelin@stericsson.com wrote:
What is more likely is that you will get pushed to doing something like periodically scanning memory as part of a separate power management module and calling into PASR if regions of memory that are found that can be powered down in some ways.
With this solution, we need in any case to add some hooks in the allocator to ensure the pages being allocated are refreshed.
Why do you insist on making this happen at page level when you're only able to power off *much* larger chunks?
Signed-off-by: Maxime Coquelin maxime.coquelin@stericsson.com --- init/main.c | 8 ++++++++ 1 file changed, 8 insertions(+), 0 deletions(-)
diff --git a/init/main.c b/init/main.c index 9fd91c3..5e0aeb7 100644 --- a/init/main.c +++ b/init/main.c @@ -69,6 +69,7 @@ #include <linux/slab.h> #include <linux/perf_event.h> #include <linux/boottime.h> +#include <linux/pasr.h>
#include <asm/io.h> #include <asm/bugs.h> @@ -487,6 +488,9 @@ asmlinkage void __init start_kernel(void) page_address_init(); printk(KERN_NOTICE "%s", linux_banner); setup_arch(&command_line); +#ifdef CONFIG_PASR + early_pasr_setup(); +#endif mm_init_owner(&init_mm, &init_task); mm_init_cpumask(&init_mm); setup_command_line(command_line); @@ -555,6 +559,10 @@ asmlinkage void __init start_kernel(void)
kmem_cache_init_late();
+#ifdef CONFIG_PASR + late_pasr_setup(); +#endif + /* * HACK ALERT! This is early. We're enabling the console before * we've done PCI setups etc, and console_init() must be aware of
Signed-off-by: Maxime Coquelin maxime.coquelin@stericsson.com --- Documentation/pasr.txt | 183 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 183 insertions(+), 0 deletions(-) create mode 100644 Documentation/pasr.txt
diff --git a/Documentation/pasr.txt b/Documentation/pasr.txt new file mode 100644 index 0000000..d40e3f6 --- /dev/null +++ b/Documentation/pasr.txt @@ -0,0 +1,183 @@ +Partial Array Self-Refresh Framework + +(C) 2012 Maxime Coquelin maxime.coquelin@stericsson.com, ST-Ericsson. + +CONTENT +1. Introduction +2. Command-line parameters +3. Allocators patching +4. PASR platform drivers + + +1. Introduction + +PASR Frameworks brings support for the Partial Array Self-Refresh DDR power +management feature. PASR has been introduced in LP-DDR2, and is also present +in DDR3. + +PASR provides 4 modes: + +* Single-Ended: Only 1/1, 1/2, 1/4 or 1/8 are refreshed, masking starting at + the end of the DDR die. + +* Double-Ended: Same as Single-Ended, but refresh-masking does not start + necessairly at the end of the DDR die. + +* Bank-Selective: Refresh of each bank of a die can be masked or unmasked via + a dedicated DDR register (MR16). This mode is convenient for DDR configured + in BRC (Bank-Row-Column) mode. + +* Segment-Selective: Refresh of each segment of a die can be masked or unmasked + via a dedicated DDR register (MR17). This mode is convenient for DDR configured + in RBC (Row-Bank-Column) mode. + +The role of this framework is to stop the refresh of unused memory to enhance +DDR power consumption. + +It supports Bank-Selective and Segment-Selective modes, as the more adapted to +modern OSes. + +At early boot stage, a representation of the physical DDR layout is built: + + Die 0 +_______________________________ +| I--------------------------I | +| I Bank or Segment 0 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment 1 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment ... I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment n I | +| I--------------------------I | +|______________________________| + ... + + Die n +_______________________________ +| I--------------------------I | +| I Bank or Segment 0 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment 1 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment ... I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment n I | +| I--------------------------I | +|______________________________| + +The first level is a table where elements represent a die: +* Base address, +* Number of segments, +* Table representing banks/segments, +* MR16/MR17 refresh mask, +* DDR Controller callback to update MR16/MR17 refresh mask. + +The second level is the section tables representing the banks or segments, +depending on hardware configuration: +* Base address, +* Unused memory size counter, +* Possible pointer to another section it depends on (E.g. Interleaving) + +When some memory becomes unused, the allocator owning this memory calls the PASR +Framework's pasr_put(phys_addr, size) function. The framework finds the +sections impacted and updates their counters accordingly. +If a section counter reach the section size, the refresh of the section is +masked. If the corresponding section has a dependency with another section +(E.g. because of DDR interleaving, see figure below), it checks the "paired" section is also +unused before updating the refresh mask. + +When some unused memory is requested by the allocator, the allocator owning +this memory calls the PASR Framework's pasr_get(phys_addr, size) function. The +framework find the section impacted and updates their counters accordingly. +If before the update, the section counter was to the section size, the refrewh +of the section is unmasked. If the corresponding section has a dependency with +another section, it also unmask the refresh of the other section. + +Interleaving example: + + Die 0 +_______________________________ +| I--------------------------I | +| I Bank or Segment 0 I |<----| +| I--------------------------I | | +| I--------------------------I | | +| I Bank or Segment 1 I | | +| I--------------------------I | | +| I--------------------------I | | +| I Bank or Segment ... I | | +| I--------------------------I | | +| I--------------------------I | | +| I Bank or Segment n I | | +| I--------------------------I | | +|______________________________| | + | + Die 1 | +_______________________________ | +| I--------------------------I | | +| I Bank or Segment 0 I |<----| +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment 1 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment ... I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment n I | +| I--------------------------I | +|______________________________| + +In the above example, bank 0 of die 0 is interleaved with bank0 of die 0. +The interleaving is done in HW by inverting some addresses lines. The goal is +to improve DDR bandwidth. +Practically, one buffer seen as contiguous by the kernel might be spread +into two DDR dies physically. + + +2. Command-line parameters + +To buid the DDR physical layout representation, two parameters are requested: + +* ddr_die (mandatory): Should be added for every DDR dies present in the system. + - Usage: ddr_die=xxx[M|G]@yyy[M|G] where xxx represents the size and yyy + the base address of the die. E.g.: ddr_die=512M@0 ddr_die=512M@512M + +* interleaved (optionnal): Should be added for every interleaved dependencies. + - Usage: interleaved=xxx[M|G]@yyy[M|G]:zzz[M|G] where xxx is the size of + the interleaved area between the adresses yyy and zzz. E.g + interleaved=256M@0:512M + + +3. Allocator patching + +Any allocators might call the PASR Framework for DDR power savings. Currently, +only Linux Buddy allocator is patched, but HWMEM and PMEM physically +contiguous memory allocators will follow. + +Linux Buddy allocator porting uses Buddy specificities to reduce the overhead +induced by the PASR Framework counter updates. Indeed, the PASR Framework is +called only when MAX_ORDER (4MB page blocs by default) buddies are +inserted/removed from the free lists. + +To port PASR FW into a new allocator: + +* Call pasr_put(phys_addr, size) each time a memory chunk becomes unused. +* Call pasr_get(phys_addr, size) each time a memory chunk becomes used. + +4. PASR platform drivers + +The MR16/MR17 PASR mask registers are generally accessible through the DDR +controller. At probe time, the DDR controller driver should register the +callback used by PASR Framework to apply the refresh mask for every DDR die +using pasr_register_mask_function(die_addr, callback, cookie). + +The callback passed to apply mask must not sleep since it can me called in +interrupt contexts. +
On 01/30/2012 05:33 AM, Maxime Coquelin wrote:
Signed-off-by: Maxime Coquelin maxime.coquelin@stericsson.com
Documentation/pasr.txt | 183 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 183 insertions(+), 0 deletions(-) create mode 100644 Documentation/pasr.txt
diff --git a/Documentation/pasr.txt b/Documentation/pasr.txt new file mode 100644 index 0000000..d40e3f6 --- /dev/null +++ b/Documentation/pasr.txt @@ -0,0 +1,183 @@ +Partial Array Self-Refresh Framework
+(C) 2012 Maxime Coquelin maxime.coquelin@stericsson.com, ST-Ericsson.
+CONTENT +1. Introduction +2. Command-line parameters +3. Allocators patching +4. PASR platform drivers
+1. Introduction
+PASR Frameworks brings support for the Partial Array Self-Refresh DDR power
The PASR framework brings support
+management feature. PASR has been introduced in LP-DDR2, and is also present
was introduced in LP-DDR2 and is also present
+in DDR3.
+PASR provides 4 modes:
+* Single-Ended: Only 1/1, 1/2, 1/4 or 1/8 are refreshed, masking starting at
- the end of the DDR die.
+* Double-Ended: Same as Single-Ended, but refresh-masking does not start
- necessairly at the end of the DDR die.
necessarily
+* Bank-Selective: Refresh of each bank of a die can be masked or unmasked via
- a dedicated DDR register (MR16). This mode is convenient for DDR configured
- in BRC (Bank-Row-Column) mode.
+* Segment-Selective: Refresh of each segment of a die can be masked or unmasked
- via a dedicated DDR register (MR17). This mode is convenient for DDR configured
- in RBC (Row-Bank-Column) mode.
+The role of this framework is to stop the refresh of unused memory to enhance +DDR power consumption.
+It supports Bank-Selective and Segment-Selective modes, as the more adapted to +modern OSes.
huh? parse error above.
+At early boot stage, a representation of the physical DDR layout is built:
Die 0
+_______________________________ +| I--------------------------I | +| I Bank or Segment 0 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment 1 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment ... I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment n I | +| I--------------------------I | +|______________________________|
...
Die n
+_______________________________ +| I--------------------------I | +| I Bank or Segment 0 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment 1 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment ... I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment n I | +| I--------------------------I | +|______________________________|
+The first level is a table where elements represent a die: +* Base address, +* Number of segments, +* Table representing banks/segments, +* MR16/MR17 refresh mask, +* DDR Controller callback to update MR16/MR17 refresh mask.
+The second level is the section tables representing the banks or segments, +depending on hardware configuration: +* Base address, +* Unused memory size counter, +* Possible pointer to another section it depends on (E.g. Interleaving)
+When some memory becomes unused, the allocator owning this memory calls the PASR +Framework's pasr_put(phys_addr, size) function. The framework finds the +sections impacted and updates their counters accordingly. +If a section counter reach the section size, the refresh of the section is
reaches
+masked. If the corresponding section has a dependency with another section +(E.g. because of DDR interleaving, see figure below), it checks the "paired" section is also
it checks if the "paired" section is also
+unused before updating the refresh mask.
+When some unused memory is requested by the allocator, the allocator owning +this memory calls the PASR Framework's pasr_get(phys_addr, size) function. The +framework find the section impacted and updates their counters accordingly.
finds and updates its counter accordingly. or find the sections impacted and updates their counters accordingly.
+If before the update, the section counter was to the section size, the refrewh
was equal to the section size, the refresh
+of the section is unmasked. If the corresponding section has a dependency with +another section, it also unmask the refresh of the other section.
unmasks
+Interleaving example:
Die 0
+_______________________________ +| I--------------------------I | +| I Bank or Segment 0 I |<----| +| I--------------------------I | | +| I--------------------------I | | +| I Bank or Segment 1 I | | +| I--------------------------I | | +| I--------------------------I | | +| I Bank or Segment ... I | | +| I--------------------------I | | +| I--------------------------I | | +| I Bank or Segment n I | | +| I--------------------------I | | +|______________________________| |
|
Die 1 |
+_______________________________ | +| I--------------------------I | | +| I Bank or Segment 0 I |<----| +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment 1 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment ... I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment n I | +| I--------------------------I | +|______________________________|
+In the above example, bank 0 of die 0 is interleaved with bank0 of die 0.
bank 0 of die 1.
+The interleaving is done in HW by inverting some addresses lines. The goal is
in hardware
+to improve DDR bandwidth. +Practically, one buffer seen as contiguous by the kernel might be spread +into two DDR dies physically.
+2. Command-line parameters
+To buid the DDR physical layout representation, two parameters are requested:
build
+* ddr_die (mandatory): Should be added for every DDR dies present in the system.
die
- Usage: ddr_die=xxx[M|G]@yyy[M|G] where xxx represents the size and yyy
the base address of the die. E.g.: ddr_die=512M@0 ddr_die=512M@512M
+* interleaved (optionnal): Should be added for every interleaved dependencies.
(optional): for all interleaved dependencies.
- Usage: interleaved=xxx[M|G]@yyy[M|G]:zzz[M|G] where xxx is the size of
the interleaved area between the adresses yyy and zzz. E.g
interleaved=256M@0:512M
+3. Allocator patching
+Any allocators might call the PASR Framework for DDR power savings. Currently, +only Linux Buddy allocator is patched, but HWMEM and PMEM physically
only the Linux Buddy
+contiguous memory allocators will follow.
+Linux Buddy allocator porting uses Buddy specificities to reduce the overhead +induced by the PASR Framework counter updates. Indeed, the PASR Framework is +called only when MAX_ORDER (4MB page blocs by default) buddies are
blocks
+inserted/removed from the free lists.
+To port PASR FW into a new allocator:
the PASR framework
+* Call pasr_put(phys_addr, size) each time a memory chunk becomes unused. +* Call pasr_get(phys_addr, size) each time a memory chunk becomes used.
+4. PASR platform drivers
+The MR16/MR17 PASR mask registers are generally accessible through the DDR +controller. At probe time, the DDR controller driver should register the +callback used by PASR Framework to apply the refresh mask for every DDR die +using pasr_register_mask_function(die_addr, callback, cookie).
+The callback passed to apply mask must not sleep since it can me called in
can be
+interrupt contexts.
On 02/02/2012 04:51 AM, Randy Dunlap wrote:
On 01/30/2012 05:33 AM, Maxime Coquelin wrote:
Signed-off-by: Maxime Coquelinmaxime.coquelin@stericsson.com
Documentation/pasr.txt | 183 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 183 insertions(+), 0 deletions(-) create mode 100644 Documentation/pasr.txt
diff --git a/Documentation/pasr.txt b/Documentation/pasr.txt new file mode 100644 index 0000000..d40e3f6 --- /dev/null +++ b/Documentation/pasr.txt @@ -0,0 +1,183 @@ +Partial Array Self-Refresh Framework
+(C) 2012 Maxime Coquelinmaxime.coquelin@stericsson.com, ST-Ericsson.
+CONTENT +1. Introduction +2. Command-line parameters +3. Allocators patching +4. PASR platform drivers
+1. Introduction
+PASR Frameworks brings support for the Partial Array Self-Refresh DDR power
The PASR framework brings support
+management feature. PASR has been introduced in LP-DDR2, and is also present
was introduced in LP-DDR2 and is also present
+in DDR3.
+PASR provides 4 modes:
+* Single-Ended: Only 1/1, 1/2, 1/4 or 1/8 are refreshed, masking starting at
- the end of the DDR die.
+* Double-Ended: Same as Single-Ended, but refresh-masking does not start
- necessairly at the end of the DDR die.
necessarily
+* Bank-Selective: Refresh of each bank of a die can be masked or unmasked via
- a dedicated DDR register (MR16). This mode is convenient for DDR configured
- in BRC (Bank-Row-Column) mode.
+* Segment-Selective: Refresh of each segment of a die can be masked or unmasked
- via a dedicated DDR register (MR17). This mode is convenient for DDR configured
- in RBC (Row-Bank-Column) mode.
+The role of this framework is to stop the refresh of unused memory to enhance +DDR power consumption.
+It supports Bank-Selective and Segment-Selective modes, as the more adapted to +modern OSes.
huh? parse error above.
+At early boot stage, a representation of the physical DDR layout is built:
Die 0
+_______________________________ +| I--------------------------I | +| I Bank or Segment 0 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment 1 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment ... I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment n I | +| I--------------------------I | +|______________________________|
...
Die n
+_______________________________ +| I--------------------------I | +| I Bank or Segment 0 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment 1 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment ... I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment n I | +| I--------------------------I | +|______________________________|
+The first level is a table where elements represent a die: +* Base address, +* Number of segments, +* Table representing banks/segments, +* MR16/MR17 refresh mask, +* DDR Controller callback to update MR16/MR17 refresh mask.
+The second level is the section tables representing the banks or segments, +depending on hardware configuration: +* Base address, +* Unused memory size counter, +* Possible pointer to another section it depends on (E.g. Interleaving)
+When some memory becomes unused, the allocator owning this memory calls the PASR +Framework's pasr_put(phys_addr, size) function. The framework finds the +sections impacted and updates their counters accordingly. +If a section counter reach the section size, the refresh of the section is
reaches
+masked. If the corresponding section has a dependency with another section +(E.g. because of DDR interleaving, see figure below), it checks the "paired" section is also
it checks if the "paired" section is also
+unused before updating the refresh mask.
+When some unused memory is requested by the allocator, the allocator owning +this memory calls the PASR Framework's pasr_get(phys_addr, size) function. The +framework find the section impacted and updates their counters accordingly.
finds and updates its counter accordingly.
or find the sections impacted and updates their counters accordingly.
+If before the update, the section counter was to the section size, the refrewh
was equal to the section size, the refresh
+of the section is unmasked. If the corresponding section has a dependency with +another section, it also unmask the refresh of the other section.
unmasks
+Interleaving example:
Die 0
+_______________________________ +| I--------------------------I | +| I Bank or Segment 0 I |<----| +| I--------------------------I | | +| I--------------------------I | | +| I Bank or Segment 1 I | | +| I--------------------------I | | +| I--------------------------I | | +| I Bank or Segment ... I | | +| I--------------------------I | | +| I--------------------------I | | +| I Bank or Segment n I | | +| I--------------------------I | | +|______________________________| |
|
Die 1 |
+_______________________________ | +| I--------------------------I | | +| I Bank or Segment 0 I |<----| +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment 1 I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment ... I | +| I--------------------------I | +| I--------------------------I | +| I Bank or Segment n I | +| I--------------------------I | +|______________________________|
+In the above example, bank 0 of die 0 is interleaved with bank0 of die 0.
bank 0 of die 1.
+The interleaving is done in HW by inverting some addresses lines. The goal is
in hardware
+to improve DDR bandwidth. +Practically, one buffer seen as contiguous by the kernel might be spread +into two DDR dies physically.
+2. Command-line parameters
+To buid the DDR physical layout representation, two parameters are requested:
build
+* ddr_die (mandatory): Should be added for every DDR dies present in the system.
die
- Usage: ddr_die=xxx[M|G]@yyy[M|G] where xxx represents the size and yyy
the base address of the die. E.g.: ddr_die=512M@0 ddr_die=512M@512M
+* interleaved (optionnal): Should be added for every interleaved dependencies.
(optional): for all interleaved dependencies.
- Usage: interleaved=xxx[M|G]@yyy[M|G]:zzz[M|G] where xxx is the size of
the interleaved area between the adresses yyy and zzz. E.g
interleaved=256M@0:512M
+3. Allocator patching
+Any allocators might call the PASR Framework for DDR power savings. Currently, +only Linux Buddy allocator is patched, but HWMEM and PMEM physically
only the Linux Buddy
+contiguous memory allocators will follow.
+Linux Buddy allocator porting uses Buddy specificities to reduce the overhead +induced by the PASR Framework counter updates. Indeed, the PASR Framework is +called only when MAX_ORDER (4MB page blocs by default) buddies are
blocks
+inserted/removed from the free lists.
+To port PASR FW into a new allocator:
the PASR framework
+* Call pasr_put(phys_addr, size) each time a memory chunk becomes unused. +* Call pasr_get(phys_addr, size) each time a memory chunk becomes used.
+4. PASR platform drivers
+The MR16/MR17 PASR mask registers are generally accessible through the DDR +controller. At probe time, the DDR controller driver should register the +callback used by PASR Framework to apply the refresh mask for every DDR die +using pasr_register_mask_function(die_addr, callback, cookie).
+The callback passed to apply mask must not sleep since it can me called in
can be
+interrupt contexts.
Thanks Randy for the review.
Regards, Maxime
The MR16/MR17 PASR mask registers are generally accessible through the DDR controller. At probe time, the DDR controller driver should register the callback used by PASR Framework to apply the refresh mask for every DDR die using pasr_register_mask_function(die_addr, callback, cookie).
The callback passed to apply mask must not sleep since it can me called in interrupt contexts.
This example creates a new PASR stubbed driver for Nova platforms.
Signed-off-by: Maxime Coquelin maxime.coquelin@stericsson.com --- arch/arm/Kconfig | 1 + arch/arm/mach-ux500/include/mach/hardware.h | 11 ++++ arch/arm/mach-ux500/include/mach/memory.h | 8 +++ drivers/mfd/db8500-prcmu.c | 67 +++++++++++++++++++++++++++ drivers/staging/pasr/Kconfig | 5 ++ drivers/staging/pasr/Makefile | 1 + drivers/staging/pasr/ux500.c | 58 +++++++++++++++++++++++ include/linux/ux500-pasr.h | 11 ++++ 8 files changed, 162 insertions(+), 0 deletions(-) create mode 100644 drivers/staging/pasr/ux500.c create mode 100644 include/linux/ux500-pasr.h
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 3df3573..b8981ee 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -826,6 +826,7 @@ config ARCH_U8500 select HAVE_CLK select ARCH_HAS_CPUFREQ select NOMADIK_GPIO + select ARCH_HAS_PASR help Support for ST-Ericsson's Ux500 architecture
diff --git a/arch/arm/mach-ux500/include/mach/hardware.h b/arch/arm/mach-ux500/include/mach/hardware.h index d8f218b..11c23b1 100644 --- a/arch/arm/mach-ux500/include/mach/hardware.h +++ b/arch/arm/mach-ux500/include/mach/hardware.h @@ -39,6 +39,17 @@ #include <mach/db5500-regs.h>
/* + * DDR Dies base addresses for PASR + */ +#define U8500_CS0_BASE_ADDR 0x00000000 +#define U8500_CS1_BASE_ADDR 0x10000000 + +#define U9540_DDR0_CS0_BASE_ADDR 0x00000000 +#define U9540_DDR0_CS1_BASE_ADDR 0x20000000 +#define U9540_DDR1_CS0_BASE_ADDR 0xC0000000 +#define U9540_DDR1_CS1_BASE_ADDR 0xE0000000 + +/* * FIFO offsets for IPs */ #define MSP_TX_RX_REG_OFFSET 0 diff --git a/arch/arm/mach-ux500/include/mach/memory.h b/arch/arm/mach-ux500/include/mach/memory.h index ada8ad0..5f5c339 100644 --- a/arch/arm/mach-ux500/include/mach/memory.h +++ b/arch/arm/mach-ux500/include/mach/memory.h @@ -15,6 +15,14 @@ #define PLAT_PHYS_OFFSET UL(0x00000000) #define BUS_OFFSET UL(0x00000000)
+ +#ifdef CONFIG_UX500_PASR +#define PASR_SECTION_SZ_BITS 26 /* 64MB sections */ +#define PASR_SECTION_SZ (1 << PASR_SECTION_SZ_BITS) +#define PASR_MAX_DIE_NR 4 +#define PASR_MAX_SECTION_NR_PER_DIE 8 /* 32 * 64MB = 2GB */ +#endif + #ifdef CONFIG_UX500_SOC_DB8500 /* * STE NMF CM driver only used on the U8500 allocate using dma_alloc_coherent: diff --git a/drivers/mfd/db8500-prcmu.c b/drivers/mfd/db8500-prcmu.c index 65a644d..db4ebd8 100644 --- a/drivers/mfd/db8500-prcmu.c +++ b/drivers/mfd/db8500-prcmu.c @@ -30,6 +30,7 @@ #include <linux/mfd/dbx500-prcmu.h> #include <linux/regulator/db8500-prcmu.h> #include <linux/regulator/machine.h> +#include <linux/ux500-pasr.h> #include <mach/hardware.h> #include <mach/irqs.h> #include <mach/db8500-regs.h> @@ -105,6 +106,10 @@ #define MB0H_CONFIG_WAKEUPS_EXE 1 #define MB0H_READ_WAKEUP_ACK 3 #define MB0H_CONFIG_WAKEUPS_SLEEP 4 +#define MB0H_SET_PASR_DDR0_CS0 5 +#define MB0H_SET_PASR_DDR0_CS1 6 +#define MB0H_SET_PASR_DDR1_CS0 7 +#define MB0H_SET_PASR_DDR1_CS1 8
#define MB0H_WAKEUP_EXE 2 #define MB0H_WAKEUP_SLEEP 5 @@ -116,6 +121,8 @@ #define PRCM_REQ_MB0_DO_NOT_WFI (PRCM_REQ_MB0 + 0x3) #define PRCM_REQ_MB0_WAKEUP_8500 (PRCM_REQ_MB0 + 0x4) #define PRCM_REQ_MB0_WAKEUP_4500 (PRCM_REQ_MB0 + 0x8) +#define PRCM_REQ_MB0_PASR_MR16 (PRCM_REQ_MB0 + 0x0) +#define PRCM_REQ_MB0_PASR_MR17 (PRCM_REQ_MB0 + 0x2)
/* Mailbox 0 ACKs */ #define PRCM_ACK_MB0_AP_PWRSTTR_STATUS (PRCM_ACK_MB0 + 0x0) @@ -3909,6 +3916,52 @@ static struct mfd_cell db8500_prcmu_devs[] = { }, };
+static struct ux500_pasr_data u9540_pasr_pdata[] = { + { + .base_addr = U9540_DDR0_CS0_BASE_ADDR, + .mailbox = MB0H_SET_PASR_DDR0_CS0, + }, + { + .base_addr = U9540_DDR0_CS1_BASE_ADDR, + .mailbox = MB0H_SET_PASR_DDR0_CS1, + }, + { + .base_addr = U9540_DDR1_CS0_BASE_ADDR, + .mailbox = MB0H_SET_PASR_DDR1_CS0, + }, + { + .base_addr = U9540_DDR1_CS1_BASE_ADDR, + .mailbox = MB0H_SET_PASR_DDR1_CS1, + }, + { + /* End marker */ + .base_addr = 0xFFFFFFFF + }, +}; + +static struct ux500_pasr_data u8500_pasr_pdata[] = { + { + .base_addr = U8500_CS0_BASE_ADDR, + .mailbox = MB0H_SET_PASR_DDR0_CS0, + }, + { + .base_addr = U8500_CS1_BASE_ADDR, + .mailbox = MB0H_SET_PASR_DDR0_CS1, + }, + { + /* End marker */ + .base_addr = 0xFFFFFFFF + }, +}; + + +static struct mfd_cell ux500_pasr_devs[] = { + { + .name = "ux500-pasr", + }, +}; + + /** * prcmu_fw_init - arch init call for the Linux PRCMU fw init logic * @@ -3951,6 +4004,20 @@ static int __init db8500_prcmu_probe(struct platform_device *pdev) else pr_info("DB8500 PRCMU initialized\n");
+ if (cpu_is_u9540()) { + ux500_pasr_devs[0].platform_data = u9540_pasr_pdata; + ux500_pasr_devs[0].pdata_size = sizeof(u9540_pasr_pdata); + } else { + ux500_pasr_devs[0].platform_data = u8500_pasr_pdata; + ux500_pasr_devs[0].pdata_size = sizeof(u8500_pasr_pdata); + } + + err = mfd_add_devices(&pdev->dev, 0, ux500_pasr_devs, + ARRAY_SIZE(ux500_pasr_devs), NULL, + 0); + if (err) + pr_err("prcmu: Failed to add PASR subdevice\n"); + /* * Temporary U9540 bringup code - Enable all clock gates. * Write 1 to all bits of PRCM_YYCLKEN0_MGT_SET and diff --git a/drivers/staging/pasr/Kconfig b/drivers/staging/pasr/Kconfig index 6bd2421..b8145e0 100644 --- a/drivers/staging/pasr/Kconfig +++ b/drivers/staging/pasr/Kconfig @@ -12,3 +12,8 @@ config PASR_DEBUG bool "Add PASR debug prints" def_bool n depends on PASR + +config UX500_PASR + bool "Ux500 Family PASR driver" + def_bool n + depends on (PASR && UX500_SOC_DB8500) diff --git a/drivers/staging/pasr/Makefile b/drivers/staging/pasr/Makefile index d172294..0b18a79 100644 --- a/drivers/staging/pasr/Makefile +++ b/drivers/staging/pasr/Makefile @@ -1,5 +1,6 @@ pasr-objs := helper.o init.o core.o
obj-$(CONFIG_PASR) += pasr.o +obj-$(CONFIG_UX500_PASR) += ux500.o
ccflags-$(CONFIG_PASR_DEBUG) := -DDEBUG diff --git a/drivers/staging/pasr/ux500.c b/drivers/staging/pasr/ux500.c new file mode 100644 index 0000000..ce5df0c --- /dev/null +++ b/drivers/staging/pasr/ux500.c @@ -0,0 +1,58 @@ +/* + * Copyright (C) ST-Ericsson SA 2012 + * Author: Maxime Coquelin maxime.coquelin@stericsson.com for ST-Ericsson. + * License terms: GNU General Public License (GPL), version 2 + */ +#include <linux/module.h> +#include <linux/platform_device.h> +#include <linux/mfd/dbx500-prcmu.h> +#include <linux/pasr.h> +#include <linux/ux500-pasr.h> + + +static void ux500_pasr_apply_mask(u16 *mem_reg, void *cookie) +{ + printk(KERN_INFO"%s: cookie = %d, mem_reg = 0x%04x\n", + __func__, (int)cookie, *mem_reg); +} + +static int ux500_pasr_probe(struct platform_device *pdev) +{ + int i; + struct ux500_pasr_data *pasr_data = dev_get_platdata(&pdev->dev); + + if (!pasr_data) + return -ENODEV; + + for (i = 0; pasr_data[i].base_addr != 0xFFFFFFFF; i++) { + phys_addr_t base = pasr_data[i].base_addr; + + /* + * We don't have specific structure pointer to pass, but only + * DDR die channel in PRCMU. This may change in future + * version. + */ + void *cookie = (void *)(int)pasr_data[i].mailbox; + + if (pasr_register_mask_function(base, + &ux500_pasr_apply_mask, + cookie)) + printk(KERN_ERR"Pasr register failed\n"); + } + + return 0; +} + +static struct platform_driver ux500_pasr_driver = { + .probe = ux500_pasr_probe, + .driver = { + .name = "ux500-pasr", + .owner = THIS_MODULE, + }, +}; + +static int __init ux500_pasr_init(void) +{ + return platform_driver_register(&ux500_pasr_driver); +} +module_init(ux500_pasr_init); diff --git a/include/linux/ux500-pasr.h b/include/linux/ux500-pasr.h new file mode 100644 index 0000000..c62d961 --- /dev/null +++ b/include/linux/ux500-pasr.h @@ -0,0 +1,11 @@ +/* + * Copyright (C) ST-Ericsson SA 2012 + * Author: Maxime Coquelin maxime.coquelin@stericsson.com for ST-Ericsson. + * License terms: GNU General Public License (GPL), version 2 + */ + +struct ux500_pasr_data { + phys_addr_t base_addr; + u8 mailbox; +}; +
* Maxime Coquelin maxime.coquelin@stericsson.com wrote:
The role of this framework is to stop the refresh of unused memory to enhance DDR power consumption.
I'm wondering in what scenarios this is useful, and how consistently it is useful.
The primary concern I can see is that on most Linux systems with an uptime more than a couple of minutes RAM gets used up by the Linux page-cache:
$ uptime 14:46:39 up 11 days, 2:04, 19 users, load average: 0.11, 0.29, 0.80 $ free total used free shared buffers cached Mem: 12255096 12030152 224944 0 651560 6000452 -/+ buffers/cache: 5378140 6876956
Even mobile phones easily have days of uptime - quite often weeks of uptime. I'd expect the page-cache to fill up RAM on such systems.
So how will this actually end up saving power consistently? Does it have to be combined with a VM policy that more aggressively flushes cached pages from the page-cache?
A secondary concern is fragmentation: right now we fragment memory rather significantly. For the Ux500 PASR driver you've implemented the section size is 64 MB. Do I interpret the code correctly in that a continuous, 64MB physical block of RAM has to be 100% free for us to be able to turn off refresh and power for this block of RAM?
Thanks,
Ingo
Dear Ingo,
On 01/30/2012 02:53 PM, Ingo Molnar wrote:
- Maxime Coquelinmaxime.coquelin@stericsson.com wrote:
The role of this framework is to stop the refresh of unused memory to enhance DDR power consumption.
I'm wondering in what scenarios this is useful, and how consistently it is useful.
The primary concern I can see is that on most Linux systems with an uptime more than a couple of minutes RAM gets used up by the Linux page-cache:
$ uptime 14:46:39 up 11 days, 2:04, 19 users, load average: 0.11, 0.29, 0.80 $ free total used free shared buffers cached Mem: 12255096 12030152 224944 0 651560 6000452 -/+ buffers/cache: 5378140 6876956
Even mobile phones easily have days of uptime - quite often weeks of uptime. I'd expect the page-cache to fill up RAM on such systems.
So how will this actually end up saving power consistently? Does it have to be combined with a VM policy that more aggressively flushes cached pages from the page-cache?
You're right Ingo, page-cache fills up the RAM. This framework is to be used in combination with a page-cache flush governor. In the case of a mobile phone, we can imagine dropping the cache when system's screen is off for a while, in order to preserve user's experience.
A secondary concern is fragmentation: right now we fragment memory rather significantly.
Yes, I think fragmentation is the main challenge. This is the same problem faced for Memory Hotplug feature. The solution I see is to add a significant Movable zone in the system and use the Compaction feature from Mel Gorman. The problem of course remains for the Normal zone.
For the Ux500 PASR driver you've implemented the section size is 64 MB. Do I interpret the code correctly in that a continuous, 64MB physical block of RAM has to be 100% free for us to be able to turn off refresh and power for this block of RAM?
Current DDR (2Gb/4Gb dies) used in mobile platform have 64MB banks and segments. This is the lower granularity for Partial Array Self-refresh.
Thanks for your comments, Maxime
Thanks,
Ingo
* Maxime Coquelin maxime.coquelin@stericsson.com wrote:
Dear Ingo,
On 01/30/2012 02:53 PM, Ingo Molnar wrote:
- Maxime Coquelinmaxime.coquelin@stericsson.com wrote:
The role of this framework is to stop the refresh of unused memory to enhance DDR power consumption.
I'm wondering in what scenarios this is useful, and how consistently it is useful.
The primary concern I can see is that on most Linux systems with an uptime more than a couple of minutes RAM gets used up by the Linux page-cache:
$ uptime 14:46:39 up 11 days, 2:04, 19 users, load average: 0.11, 0.29, 0.80 $ free total used free shared buffers cached Mem: 12255096 12030152 224944 0 651560 6000452 -/+ buffers/cache: 5378140 6876956
Even mobile phones easily have days of uptime - quite often weeks of uptime. I'd expect the page-cache to fill up RAM on such systems.
So how will this actually end up saving power consistently? Does it have to be combined with a VM policy that more aggressively flushes cached pages from the page-cache?
You're right Ingo, page-cache fills up the RAM. This framework is to be used in combination with a page-cache flush governor. In the case of a mobile phone, we can imagine dropping the cache when system's screen is off for a while, in order to preserve user's experience.
Is this "page-cache flush governor" some existing code? How does it work and does it need upstream patches?
A secondary concern is fragmentation: right now we fragment memory rather significantly.
Yes, I think fragmentation is the main challenge. This is the same problem faced for Memory Hotplug feature. The solution I see is to add a significant Movable zone in the system and use the Compaction feature from Mel Gorman. The problem of course remains for the Normal zone.
Ok. I guess phones/appliances can generally live with a relatively large movable zone as they don't have serious memory pressure issues.
For the Ux500 PASR driver you've implemented the section size is 64 MB. Do I interpret the code correctly in that a continuous, 64MB physical block of RAM has to be 100% free for us to be able to turn off refresh and power for this block of RAM?
Current DDR (2Gb/4Gb dies) used in mobile platform have 64MB banks and segments. This is the lower granularity for Partial Array Self-refresh.
Ok, so do you see real, consistent power savings with a large movable zone, with page cache governor patches applied (assuming it's a kernel mechanism) and CONFIG_COMPACTION=y enabled, on an upstream kernel with all these patches applied?
Thanks,
Ingo
On 01/31/2012 01:39 PM, Ingo Molnar wrote:
- Maxime Coquelinmaxime.coquelin@stericsson.com wrote:
Dear Ingo,
On 01/30/2012 02:53 PM, Ingo Molnar wrote:
- Maxime Coquelinmaxime.coquelin@stericsson.com wrote:
The role of this framework is to stop the refresh of unused memory to enhance DDR power consumption.
I'm wondering in what scenarios this is useful, and how consistently it is useful.
The primary concern I can see is that on most Linux systems with an uptime more than a couple of minutes RAM gets used up by the Linux page-cache:
$ uptime 14:46:39 up 11 days, 2:04, 19 users, load average: 0.11, 0.29, 0.80 $ free total used free shared buffers cached Mem: 12255096 12030152 224944 0 651560 6000452 -/+ buffers/cache: 5378140 6876956
Even mobile phones easily have days of uptime - quite often weeks of uptime. I'd expect the page-cache to fill up RAM on such systems.
So how will this actually end up saving power consistently? Does it have to be combined with a VM policy that more aggressively flushes cached pages from the page-cache?
You're right Ingo, page-cache fills up the RAM. This framework is to be used in combination with a page-cache flush governor. In the case of a mobile phone, we can imagine dropping the cache when system's screen is off for a while, in order to preserve user's experience.
Is this "page-cache flush governor" some existing code? How does it work and does it need upstream patches?
For now, such a governor has not been implemented. I use the dedicated ProcFS interface to test the framework (echo 3 > /proc/sys/vm/drop_caches).
A secondary concern is fragmentation: right now we fragment memory rather significantly.
Yes, I think fragmentation is the main challenge. This is the same problem faced for Memory Hotplug feature. The solution I see is to add a significant Movable zone in the system and use the Compaction feature from Mel Gorman. The problem of course remains for the Normal zone.
Ok. I guess phones/appliances can generally live with a relatively large movable zone as they don't have serious memory pressure issues.
Actually, current high-end smartphones and tablets have 1GB DDR. Smartphones and tablets arriving later this year should have up to 2GB DDR. For example, my Android phone running for 2 days has only 230MB are used in idle once the page-caches dropped. So I think having a 1GB movable zone on a 2GB DDR phone is conceivable.
For the Ux500 PASR driver you've implemented the section size is 64 MB. Do I interpret the code correctly in that a continuous, 64MB physical block of RAM has to be 100% free for us to be able to turn off refresh and power for this block of RAM?
Current DDR (2Gb/4Gb dies) used in mobile platform have 64MB banks and segments. This is the lower granularity for Partial Array Self-refresh.
Ok, so do you see real, consistent power savings with a large movable zone, with page cache governor patches applied (assuming it's a kernel mechanism) and CONFIG_COMPACTION=y enabled, on an upstream kernel with all these patches applied?
I don't have consistent figures for now as it is being prototyped. From the DDR datasheet I gathered, the DDR power savings is about 33% when half of the die is in self-refresh, compared to the full die in self-refresh.
Thanks for your comments, Maxime
Thanks,
Ingo
-- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email:<a href=mailto:"dont@kvack.org"> email@kvack.org</a>
linaro-mm-sig@lists.linaro.org