Base on common.git, android-3.4 branch
Patch 2: Add support page wised cache flush for carveout_heap There is only one nents for carveout heap, as well as dirty bit. As a result, cache flush only takes effect for total carve heap.
Patch 3: Add oom killer. Once heap is used off, SIGKILL is send to all tasks refered the buffer with descending oom_socre_adj
Zhangfei Gao (3): gpu: ion: update carveout_heap_ops gpu: ion: carveout_heap page wised cache flush gpu: ion: oom killer
drivers/gpu/ion/ion.c | 112 ++++++++++++++++++++++++++++++++++- drivers/gpu/ion/ion_carveout_heap.c | 23 ++++++-- 2 files changed, 127 insertions(+), 8 deletions(-)
ion_device_add_heap fail without map_dma & unmap_dma
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com --- drivers/gpu/ion/ion_carveout_heap.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/gpu/ion/ion_carveout_heap.c b/drivers/gpu/ion/ion_carveout_heap.c index 5b6255b..13f6e8d 100644 --- a/drivers/gpu/ion/ion_carveout_heap.c +++ b/drivers/gpu/ion/ion_carveout_heap.c @@ -145,6 +145,8 @@ static struct ion_heap_ops carveout_heap_ops = { .map_user = ion_carveout_heap_map_user, .map_kernel = ion_carveout_heap_map_kernel, .unmap_kernel = ion_carveout_heap_unmap_kernel, + .map_dma = ion_carveout_heap_map_dma, + .unmap_dma = ion_carveout_heap_unmap_dma, };
struct ion_heap *ion_carveout_heap_create(struct ion_platform_heap *heap_data)
Extend dirty bit per PAGE_SIZE Page wised cache flush is supported and only takes effect for dirty buffer
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com --- drivers/gpu/ion/ion_carveout_heap.c | 21 ++++++++++++++++----- 1 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/ion/ion_carveout_heap.c b/drivers/gpu/ion/ion_carveout_heap.c index 13f6e8d..24f8ef2 100644 --- a/drivers/gpu/ion/ion_carveout_heap.c +++ b/drivers/gpu/ion/ion_carveout_heap.c @@ -88,25 +88,36 @@ struct sg_table *ion_carveout_heap_map_dma(struct ion_heap *heap, struct ion_buffer *buffer) { struct sg_table *table; - int ret; + struct scatterlist *sg; + int ret, i; + int nents = PAGE_ALIGN(buffer->size) / PAGE_SIZE; + struct page *page = phys_to_page(buffer->priv_phys);
table = kzalloc(sizeof(struct sg_table), GFP_KERNEL); if (!table) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table(table, 1, GFP_KERNEL); + + ret = sg_alloc_table(table, nents, GFP_KERNEL); if (ret) { kfree(table); return ERR_PTR(ret); } - sg_set_page(table->sgl, phys_to_page(buffer->priv_phys), buffer->size, - 0); + + sg = table->sgl; + for (i = 0; i < nents; i++) { + sg_set_page(sg, page + i, PAGE_SIZE, 0); + sg = sg_next(sg); + } + return table; }
void ion_carveout_heap_unmap_dma(struct ion_heap *heap, struct ion_buffer *buffer) { - sg_free_table(buffer->sg_table); + if (buffer->sg_table) + sg_free_table(buffer->sg_table); + kfree(buffer->sg_table); }
void *ion_carveout_heap_map_kernel(struct ion_heap *heap,
On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao zhangfei.gao@marvell.com wrote:
Extend dirty bit per PAGE_SIZE Page wised cache flush is supported and only takes effect for dirty buffer
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com
drivers/gpu/ion/ion_carveout_heap.c | 21 ++++++++++++++++----- 1 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/ion/ion_carveout_heap.c b/drivers/gpu/ion/ion_carveout_heap.c index 13f6e8d..24f8ef2 100644 --- a/drivers/gpu/ion/ion_carveout_heap.c +++ b/drivers/gpu/ion/ion_carveout_heap.c @@ -88,25 +88,36 @@ struct sg_table *ion_carveout_heap_map_dma(struct ion_heap *heap, struct ion_buffer *buffer) { struct sg_table *table;
int ret;
struct scatterlist *sg;
int ret, i;
int nents = PAGE_ALIGN(buffer->size) / PAGE_SIZE;
You can't use PAGE_ALIGN() & PAGE_SIZE so simply. Look at these code in below.
struct ion_heap *ion_carveout_heap_create(struct ion_platform_heap *heap_data) { struct ion_carveout_heap *carveout_heap;
carveout_heap = kzalloc(sizeof(struct ion_carveout_heap), GFP_KERNEL); if (!carveout_heap) return ERR_PTR(-ENOMEM);
carveout_heap->pool = gen_pool_create(12, -1);
Although 12 means PAGE_SIZE, you need to make 12 & your PAGE_SIZE consistent. So you need to change it to gen_pool_create(PAGE_SHIFT, -1).
Because PAGE_SIZE may be different on different architecture.
struct page *page = phys_to_page(buffer->priv_phys); table = kzalloc(sizeof(struct sg_table), GFP_KERNEL); if (!table) return ERR_PTR(-ENOMEM);
ret = sg_alloc_table(table, 1, GFP_KERNEL);
ret = sg_alloc_table(table, nents, GFP_KERNEL); if (ret) { kfree(table); return ERR_PTR(ret); }
sg_set_page(table->sgl, phys_to_page(buffer->priv_phys), buffer->size,
0);
sg = table->sgl;
for (i = 0; i < nents; i++) {
sg_set_page(sg, page + i, PAGE_SIZE, 0);
sg = sg_next(sg);
}
return table;
}
void ion_carveout_heap_unmap_dma(struct ion_heap *heap, struct ion_buffer *buffer) {
sg_free_table(buffer->sg_table);
if (buffer->sg_table)
sg_free_table(buffer->sg_table);
kfree(buffer->sg_table);
What's this? You needn't change anything at here. Please read sg_free_table().
}
void *ion_carveout_heap_map_kernel(struct ion_heap *heap,
1.7.1
On Wed, Aug 29, 2012 at 1:15 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao zhangfei.gao@marvell.com wrote:
Extend dirty bit per PAGE_SIZE Page wised cache flush is supported and only takes effect for dirty buffer
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com
drivers/gpu/ion/ion_carveout_heap.c | 21 ++++++++++++++++----- 1 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/ion/ion_carveout_heap.c b/drivers/gpu/ion/ion_carveout_heap.c index 13f6e8d..24f8ef2 100644 --- a/drivers/gpu/ion/ion_carveout_heap.c +++ b/drivers/gpu/ion/ion_carveout_heap.c @@ -88,25 +88,36 @@ struct sg_table *ion_carveout_heap_map_dma(struct ion_heap *heap, struct ion_buffer *buffer) { struct sg_table *table;
int ret;
struct scatterlist *sg;
int ret, i;
int nents = PAGE_ALIGN(buffer->size) / PAGE_SIZE;
You can't use PAGE_ALIGN() & PAGE_SIZE so simply. Look at these code in below.
struct ion_heap *ion_carveout_heap_create(struct ion_platform_heap *heap_data) { struct ion_carveout_heap *carveout_heap;
carveout_heap = kzalloc(sizeof(struct ion_carveout_heap), GFP_KERNEL); if (!carveout_heap) return ERR_PTR(-ENOMEM); carveout_heap->pool = gen_pool_create(12, -1);
Although 12 means PAGE_SIZE, you need to make 12 & your PAGE_SIZE consistent. So you need to change it to gen_pool_create(PAGE_SHIFT, -1).
Thanks Haojian, Will change gen_pool_create(12, -1) to gen_pool_create(PAGE_SHIFT, -1) as well.
Because PAGE_SIZE may be different on different architecture.
struct page *page = phys_to_page(buffer->priv_phys); table = kzalloc(sizeof(struct sg_table), GFP_KERNEL); if (!table) return ERR_PTR(-ENOMEM);
ret = sg_alloc_table(table, 1, GFP_KERNEL);
ret = sg_alloc_table(table, nents, GFP_KERNEL); if (ret) { kfree(table); return ERR_PTR(ret); }
sg_set_page(table->sgl, phys_to_page(buffer->priv_phys), buffer->size,
0);
sg = table->sgl;
for (i = 0; i < nents; i++) {
sg_set_page(sg, page + i, PAGE_SIZE, 0);
sg = sg_next(sg);
}
return table;
}
void ion_carveout_heap_unmap_dma(struct ion_heap *heap, struct ion_buffer *buffer) {
sg_free_table(buffer->sg_table);
if (buffer->sg_table)
sg_free_table(buffer->sg_table);
kfree(buffer->sg_table);
What's this? You needn't change anything at here. Please read sg_free_table().
Could you clarify more, sg_free_table does not kfree table, either check the table.
}
void *ion_carveout_heap_map_kernel(struct ion_heap *heap,
1.7.1
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Wed, Aug 29, 2012 at 1:54 PM, zhangfei gao zhangfei.gao@gmail.com wrote:
On Wed, Aug 29, 2012 at 1:15 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao zhangfei.gao@marvell.com wrote:
Extend dirty bit per PAGE_SIZE Page wised cache flush is supported and only takes effect for dirty buffer
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com
drivers/gpu/ion/ion_carveout_heap.c | 21 ++++++++++++++++----- 1 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/ion/ion_carveout_heap.c b/drivers/gpu/ion/ion_carveout_heap.c index 13f6e8d..24f8ef2 100644 --- a/drivers/gpu/ion/ion_carveout_heap.c +++ b/drivers/gpu/ion/ion_carveout_heap.c @@ -88,25 +88,36 @@ struct sg_table *ion_carveout_heap_map_dma(struct ion_heap *heap, struct ion_buffer *buffer) { struct sg_table *table;
int ret;
struct scatterlist *sg;
int ret, i;
int nents = PAGE_ALIGN(buffer->size) / PAGE_SIZE;
You can't use PAGE_ALIGN() & PAGE_SIZE so simply. Look at these code in below.
struct ion_heap *ion_carveout_heap_create(struct ion_platform_heap *heap_data) { struct ion_carveout_heap *carveout_heap;
carveout_heap = kzalloc(sizeof(struct ion_carveout_heap), GFP_KERNEL); if (!carveout_heap) return ERR_PTR(-ENOMEM); carveout_heap->pool = gen_pool_create(12, -1);
Although 12 means PAGE_SIZE, you need to make 12 & your PAGE_SIZE consistent. So you need to change it to gen_pool_create(PAGE_SHIFT, -1).
Thanks Haojian, Will change gen_pool_create(12, -1) to gen_pool_create(PAGE_SHIFT, -1) as well.
Because PAGE_SIZE may be different on different architecture.
struct page *page = phys_to_page(buffer->priv_phys); table = kzalloc(sizeof(struct sg_table), GFP_KERNEL); if (!table) return ERR_PTR(-ENOMEM);
ret = sg_alloc_table(table, 1, GFP_KERNEL);
ret = sg_alloc_table(table, nents, GFP_KERNEL); if (ret) { kfree(table); return ERR_PTR(ret); }
sg_set_page(table->sgl, phys_to_page(buffer->priv_phys), buffer->size,
0);
sg = table->sgl;
for (i = 0; i < nents; i++) {
sg_set_page(sg, page + i, PAGE_SIZE, 0);
sg = sg_next(sg);
}
return table;
}
void ion_carveout_heap_unmap_dma(struct ion_heap *heap, struct ion_buffer *buffer) {
sg_free_table(buffer->sg_table);
if (buffer->sg_table)
sg_free_table(buffer->sg_table);
kfree(buffer->sg_table);
What's this? You needn't change anything at here. Please read sg_free_table().
Could you clarify more, sg_free_table does not kfree table, either check the table.
Oh, I'm sorry that it's a real bug. I think that you can use another patch to fix it. And you needn't check buffer->sg_table, since unmap_dma() won't be called by ION if map_dma() fails.
Extend dirty bit per PAGE_SIZE Page wised cache flush is supported and only takes effect for dirty buffer
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com --- drivers/gpu/ion/ion_carveout_heap.c | 23 +++++++++++++++++------ 1 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/ion/ion_carveout_heap.c b/drivers/gpu/ion/ion_carveout_heap.c index 13f6e8d..60e97e5 100644 --- a/drivers/gpu/ion/ion_carveout_heap.c +++ b/drivers/gpu/ion/ion_carveout_heap.c @@ -88,25 +88,36 @@ struct sg_table *ion_carveout_heap_map_dma(struct ion_heap *heap, struct ion_buffer *buffer) { struct sg_table *table; - int ret; + struct scatterlist *sg; + int ret, i; + int nents = PAGE_ALIGN(buffer->size) / PAGE_SIZE; + struct page *page = phys_to_page(buffer->priv_phys);
table = kzalloc(sizeof(struct sg_table), GFP_KERNEL); if (!table) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table(table, 1, GFP_KERNEL); + + ret = sg_alloc_table(table, nents, GFP_KERNEL); if (ret) { kfree(table); return ERR_PTR(ret); } - sg_set_page(table->sgl, phys_to_page(buffer->priv_phys), buffer->size, - 0); + + sg = table->sgl; + for (i = 0; i < nents; i++) { + sg_set_page(sg, page + i, PAGE_SIZE, 0); + sg = sg_next(sg); + } + return table; }
void ion_carveout_heap_unmap_dma(struct ion_heap *heap, struct ion_buffer *buffer) { - sg_free_table(buffer->sg_table); + if (buffer->sg_table) + sg_free_table(buffer->sg_table); + kfree(buffer->sg_table); }
void *ion_carveout_heap_map_kernel(struct ion_heap *heap, @@ -157,7 +168,7 @@ struct ion_heap *ion_carveout_heap_create(struct ion_platform_heap *heap_data) if (!carveout_heap) return ERR_PTR(-ENOMEM);
- carveout_heap->pool = gen_pool_create(12, -1); + carveout_heap->pool = gen_pool_create(PAGE_SHIFT, -1); if (!carveout_heap->pool) { kfree(carveout_heap); return ERR_PTR(-ENOMEM);
ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL
How to test: mount -t debugfs none /mnt cat /mnt/ion/carveout_heap echo adj > /mnt/ion/carveout_heap send_sig SIGKILL to the task with higher adj and using ion buffer Also kill all others tasks refered the buffers, in case using empty buffer
Example:
cat /mnt/ion/carveout_heap client pid size oom_score_adj ion_test 191 4096 0 ion_test 192 4096 0 ion_test 193 4096 0
echo -1 > /mnt/ion/carveout_heap [ 1333.689318] SIGKILL pid: 191 [ 1333.692192] SIGKILL pid: 192 [ 1333.695436] SIGKILL pid: 193 [ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0 [1]+ Killed ./ion_test 2
cat /mnt/ion/carveout_heap client pid size oom_score_adj
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com --- drivers/gpu/ion/ion.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 109 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/ion/ion.c b/drivers/gpu/ion/ion.c index 34c12df..3edaab0 100644 --- a/drivers/gpu/ion/ion.c +++ b/drivers/gpu/ion/ion.c @@ -32,6 +32,7 @@ #include <linux/uaccess.h> #include <linux/debugfs.h> #include <linux/dma-buf.h> +#include <linux/oom.h>
#include "ion_priv.h"
@@ -336,6 +337,8 @@ static void ion_handle_add(struct ion_client *client, struct ion_handle *handle) rb_insert_color(&handle->node, &client->handles); }
+static int ion_shrink(struct ion_heap *heap, int kill_adj); + struct ion_handle *ion_alloc(struct ion_client *client, size_t len, size_t align, unsigned int heap_mask, unsigned int flags) @@ -367,7 +370,12 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len, /* if the caller didn't specify this heap type */ if (!((1 << heap->id) & heap_mask)) continue; +retry: buffer = ion_buffer_create(heap, dev, len, align, flags); + if (buffer == ERR_PTR(ION_CARVEOUT_ALLOCATE_FAIL)) { + if (!ion_shrink(heap, 0)) + goto retry; + } if (!IS_ERR_OR_NULL(buffer)) break; } @@ -1144,7 +1152,8 @@ static int ion_debug_heap_show(struct seq_file *s, void *unused) struct ion_device *dev = heap->dev; struct rb_node *n;
- seq_printf(s, "%16.s %16.s %16.s\n", "client", "pid", "size"); + seq_printf(s, "%16.s %16.s %16.s %16.s\n", + "client", "pid", "size", "oom_score_adj");
for (n = rb_first(&dev->clients); n; n = rb_next(n)) { struct ion_client *client = rb_entry(n, struct ion_client, @@ -1156,8 +1165,9 @@ static int ion_debug_heap_show(struct seq_file *s, void *unused) char task_comm[TASK_COMM_LEN];
get_task_comm(task_comm, client->task); - seq_printf(s, "%16.s %16u %16u\n", task_comm, - client->pid, size); + seq_printf(s, "%16.s %16u %16u %16u\n", task_comm, + client->pid, size, + client->task->signal->oom_score_adj); } else { seq_printf(s, "%16.s %16u %16u\n", client->name, client->pid, size); @@ -1171,13 +1181,109 @@ static int ion_debug_heap_open(struct inode *inode, struct file *file) return single_open(file, ion_debug_heap_show, inode->i_private); }
+static ssize_t +ion_debug_heap_write(struct file *file, const char __user *ubuf, + size_t count, loff_t *ppos) +{ + struct ion_heap *heap = + ((struct seq_file *)file->private_data)->private; + char buf[16]; + long kill_adj, ret; + + memset(buf, 0, sizeof(buf)); + + if (copy_from_user(&buf, ubuf, min_t(size_t, sizeof(buf) - 1, count))) + return -EFAULT; + + ret = kstrtol(buf, 10, &kill_adj); + if (ret) + return ret; + + ion_shrink(heap, kill_adj); + + return count; +} static const struct file_operations debug_heap_fops = { .open = ion_debug_heap_open, + .write = ion_debug_heap_write, .read = seq_read, .llseek = seq_lseek, .release = single_release, };
+/* + * ion_shrink + * kill all tasks referd the buffer by selected task + */ +static int ion_shrink(struct ion_heap *heap, int kill_adj) +{ + struct rb_node *n; + struct ion_client *client = NULL; + struct ion_device *dev = heap->dev; + struct task_struct *selected = NULL; + int selected_size = 0; + int selected_oom_score_adj = 0; + + for (n = rb_first(&dev->clients); n; n = rb_next(n)) { + size_t size; + struct task_struct *p; + + client = rb_entry(n, struct ion_client, node); + if (!client->task) + continue; + + p = client->task; + + if ((p->signal->oom_score_adj <= kill_adj) || + (p->signal->oom_score_adj < selected_oom_score_adj)) + continue; + + size = ion_debug_heap_total(client, heap->type); + if (!size) + continue; + if (size < selected_size) + continue; + + selected = p; + selected_size = size; + selected_oom_score_adj = p->signal->oom_score_adj; + } + + if (selected) { + /* kill all proeces refer buffer shared with this client */ + mutex_lock(&client->lock); + for (n = rb_first(&client->handles); n; n = rb_next(n)) { + struct rb_node *r; + struct ion_client *c; + struct ion_handle *handle = rb_entry(n, + struct ion_handle, + node); + + for (r = rb_first(&dev->clients); r; r = rb_next(r)) { + struct ion_handle *h; + + c = rb_entry(r, struct ion_client, node); + h = ion_handle_lookup(c, handle->buffer); + if (!IS_ERR_OR_NULL(h)) { + send_sig(SIGKILL, c->task, 0); + pr_info("SIGKILL pid: %u\n", + c->task->pid); + } + + } + } + mutex_unlock(&client->lock); + + send_sig(SIGKILL, selected, 0); + set_tsk_thread_flag(selected, TIF_MEMDIE); + pr_info("SIGKILL pid: %u size: %u adj: %u\n", + selected->pid, selected_size, + selected_oom_score_adj); + return 0; + } + return -EAGAIN; +} + void ion_device_add_heap(struct ion_device *dev, struct ion_heap *heap) { struct rb_node **p = &dev->heaps.rb_node;
On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao zhangfei.gao@marvell.com wrote:
ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL
How to test: mount -t debugfs none /mnt cat /mnt/ion/carveout_heap echo adj > /mnt/ion/carveout_heap send_sig SIGKILL to the task with higher adj and using ion buffer Also kill all others tasks refered the buffers, in case using empty buffer
Example:
cat /mnt/ion/carveout_heap client pid size oom_score_adj ion_test 191 4096 0 ion_test 192 4096 0 ion_test 193 4096 0
echo -1 > /mnt/ion/carveout_heap [ 1333.689318] SIGKILL pid: 191 [ 1333.692192] SIGKILL pid: 192 [ 1333.695436] SIGKILL pid: 193 [ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0 [1]+ Killed ./ion_test 2
cat /mnt/ion/carveout_heap client pid size oom_score_adj
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com
Although it's good for keeping ION can always allocating memory, I doubt this killer won't be merged in android kernel mainline.
Nobody care OOM killer on ION since CMA heap may be used in the future. Although fragment also exists in CMA heap, the solution is used to declare a larger CMA area.
ion_buffer_create is called with dev->lock acquired and so dev->lock would have to be freed and schedule out before retry so that the process getting killed can free up all the memory. Else, it will get blocked while freeing the first buffer in ion_buffer_destroy() when it tries to acquire the dev->lock. debugfs of carveout_heap might show the client not using the heap though memory is not freed. If the buffer was mapped to userspace, ion_share_dma_buf() would have taken one reference of ion_buffer() So, all the ion_buffer_put() will succeed if dma_buf fd is not released first and thus all the ion_handle_put() will succeed and will remove the handles from the client before ion_client_destroy() will try to acquire dev->lock()
A lowmem killer with multiple threshold, omm_adj values similar to android lowmem killer would be also an option but it is a matter of policy whether each carveout/cma heap needs to do a lowmem killer or not. For some cases, fallback to other heaps would be ok and for some cases fallback to other heaps like kmalloc and vmalloc or even other carveout/cma areas might not be an option.
Is there any particular reason to use send_sig() and set_tsk_thread_flag() instead of force_sig(). Just curious to know.
- Nishanth Peethambaran
On Wed, Aug 29, 2012 at 10:50 AM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao zhangfei.gao@marvell.com wrote:
ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL
How to test: mount -t debugfs none /mnt cat /mnt/ion/carveout_heap echo adj > /mnt/ion/carveout_heap send_sig SIGKILL to the task with higher adj and using ion buffer Also kill all others tasks refered the buffers, in case using empty buffer
Example:
cat /mnt/ion/carveout_heap client pid size oom_score_adj ion_test 191 4096 0 ion_test 192 4096 0 ion_test 193 4096 0
echo -1 > /mnt/ion/carveout_heap [ 1333.689318] SIGKILL pid: 191 [ 1333.692192] SIGKILL pid: 192 [ 1333.695436] SIGKILL pid: 193 [ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0 [1]+ Killed ./ion_test 2
cat /mnt/ion/carveout_heap client pid size oom_score_adj
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com
Although it's good for keeping ION can always allocating memory, I doubt this killer won't be merged in android kernel mainline.
Nobody care OOM killer on ION since CMA heap may be used in the future. Although fragment also exists in CMA heap, the solution is used to declare a larger CMA area.
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Thu, Aug 30, 2012 at 5:34 PM, Nishanth Peethambaran
Thanks Nishanth for careful review.
nishanth.peethu@gmail.com wrote:
ion_buffer_create is called with dev->lock acquired and so dev->lock would have to be freed and schedule out before retry so that the process getting killed can free up all the memory. Else, it will get blocked while freeing the first buffer in ion_buffer_destroy() when it tries to acquire the dev->lock.
Your concern is correct. My mistake for considering no problem if debugfs test no problem. Will resend the updated the patch, 1. move ion_shrink out of mutex. 2. check error flag of ERR_PTR(-ENOMEM) 3. add msleep to allow schedule out.
debugfs of carveout_heap might show the client not using the heap though memory is not freed. If the buffer was mapped to userspace, ion_share_dma_buf() would have taken one reference of ion_buffer() So, all the ion_buffer_put() will succeed if dma_buf fd is not released first and thus all the ion_handle_put() will succeed and will remove the handles from the client before ion_client_destroy() will try to acquire dev->lock()
Sorry, not fully understand. The ion_shrink will send_sig according to used size, is this solve your concern.
A lowmem killer with multiple threshold, omm_adj values similar to android lowmem killer would be also an option but it is a matter of policy whether each carveout/cma heap needs to do a lowmem killer or not. For some cases, fallback to other heaps would be ok and for some cases fallback to other heaps like kmalloc and vmalloc or even other carveout/cma areas might not be an option.
This simple killer is refer lownmemorykiller. It is useful for us, since we still not enable CMA.
Is there any particular reason to use send_sig() and set_tsk_thread_flag() instead of force_sig(). Just curious to know.
This staff is referring lowmemorykiller.c set_tsk_thread_flag is required no matter use send_sig() or force_sig(). However, during stress testing, force_sig() may cause system NULL pointer error. While send_sig() is much stable.
Besides, the stress test on non android system, is manually ion_shrink(heap, -1). While usually ion_shrink(heap, 0), without kill adj <= 0.
- Nishanth Peethambaran
On Wed, Aug 29, 2012 at 10:50 AM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao zhangfei.gao@marvell.com wrote:
ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL
How to test: mount -t debugfs none /mnt cat /mnt/ion/carveout_heap echo adj > /mnt/ion/carveout_heap send_sig SIGKILL to the task with higher adj and using ion buffer Also kill all others tasks refered the buffers, in case using empty buffer
Example:
cat /mnt/ion/carveout_heap client pid size oom_score_adj ion_test 191 4096 0 ion_test 192 4096 0 ion_test 193 4096 0
echo -1 > /mnt/ion/carveout_heap [ 1333.689318] SIGKILL pid: 191 [ 1333.692192] SIGKILL pid: 192 [ 1333.695436] SIGKILL pid: 193 [ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0 [1]+ Killed ./ion_test 2
cat /mnt/ion/carveout_heap client pid size oom_score_adj
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com
Although it's good for keeping ION can always allocating memory, I doubt this killer won't be merged in android kernel mainline.
Nobody care OOM killer on ION since CMA heap may be used in the future. Although fragment also exists in CMA heap, the solution is used to declare a larger CMA area.
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Fri, Aug 31, 2012 at 2:27 PM, zhangfei gao zhangfei.gao@gmail.com wrote:
On Thu, Aug 30, 2012 at 5:34 PM, Nishanth Peethambaran
Thanks Nishanth for careful review.
nishanth.peethu@gmail.com wrote:
ion_buffer_create is called with dev->lock acquired and so dev->lock would have to be freed and schedule out before retry so that the process getting killed can free up all the memory. Else, it will get blocked while freeing the first buffer in ion_buffer_destroy() when it tries to acquire the dev->lock.
Your concern is correct. My mistake for considering no problem if debugfs test no problem. Will resend the updated the patch,
- move ion_shrink out of mutex.
- check error flag of ERR_PTR(-ENOMEM)
- add msleep to allow schedule out.
debugfs of carveout_heap might show the client not using the heap though memory is not freed. If the buffer was mapped to userspace, ion_share_dma_buf() would have taken one reference of ion_buffer() So, all the ion_buffer_put() will succeed if dma_buf fd is not released first and thus all the ion_handle_put() will succeed and will remove the handles from the client before ion_client_destroy() will try to acquire dev->lock()
Sorry, not fully understand. The ion_shrink will send_sig according to used size, is this solve your concern.
It was an explanation of how the first issue can get hidden by debugfs. Client would be removed though the buffers may not get actually freed and so won't be shown in debugfs as a client of the heap.
A lowmem killer with multiple threshold, omm_adj values similar to android lowmem killer would be also an option but it is a matter of policy whether each carveout/cma heap needs to do a lowmem killer or not. For some cases, fallback to other heaps would be ok and for some cases fallback to other heaps like kmalloc and vmalloc or even other carveout/cma areas might not be an option.
This simple killer is refer lownmemorykiller. It is useful for us, since we still not enable CMA.
The lowmemkiller should kick in once the frees space goes below a threshold or heap gets fragmented, but in the attached patch, it kicks it only when allocation fails. Refer to the comparison of free space to lowmem_minfree[n] done in lowmem_shrink(). Even for CMA, lowmemkiller would be needed unless a huge area is reserved for CMA which will restrict the area available for non-movable allocations.
Is there any particular reason to use send_sig() and set_tsk_thread_flag() instead of force_sig(). Just curious to know.
This staff is referring lowmemorykiller.c set_tsk_thread_flag is required no matter use send_sig() or force_sig(). However, during stress testing, force_sig() may cause system NULL pointer error. While send_sig() is much stable.
Besides, the stress test on non android system, is manually ion_shrink(heap, -1). While usually ion_shrink(heap, 0), without kill adj <= 0.
Ok. I am in the process of migration to 3.4 and so was referring to 3.0 lowmemkiller which uses force_sig().
- Nishanth Peethambaran
On Wed, Aug 29, 2012 at 10:50 AM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Wed, Aug 29, 2012 at 11:52 AM, Zhangfei Gao zhangfei.gao@marvell.com wrote:
ion_shrink is called when ION_CARVEOUT_ALLOCATE_FAIL
How to test: mount -t debugfs none /mnt cat /mnt/ion/carveout_heap echo adj > /mnt/ion/carveout_heap send_sig SIGKILL to the task with higher adj and using ion buffer Also kill all others tasks refered the buffers, in case using empty buffer
Example:
cat /mnt/ion/carveout_heap client pid size oom_score_adj ion_test 191 4096 0 ion_test 192 4096 0 ion_test 193 4096 0
echo -1 > /mnt/ion/carveout_heap [ 1333.689318] SIGKILL pid: 191 [ 1333.692192] SIGKILL pid: 192 [ 1333.695436] SIGKILL pid: 193 [ 1333.698312] SIGKILL pid: 193 size: 4096 adj: 0 [1]+ Killed ./ion_test 2
cat /mnt/ion/carveout_heap client pid size oom_score_adj
Signed-off-by: Zhangfei Gao zhangfei.gao@marvell.com
Although it's good for keeping ION can always allocating memory, I doubt this killer won't be merged in android kernel mainline.
Nobody care OOM killer on ION since CMA heap may be used in the future. Although fragment also exists in CMA heap, the solution is used to declare a larger CMA area.
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Fri, Aug 31, 2012 at 8:43 PM, Nishanth Peethambaran nishanth.peethu@gmail.com wrote:
On Fri, Aug 31, 2012 at 2:27 PM, zhangfei gao zhangfei.gao@gmail.com wrote:
On Thu, Aug 30, 2012 at 5:34 PM, Nishanth Peethambaran
Thanks Nishanth for careful review.
nishanth.peethu@gmail.com wrote:
ion_buffer_create is called with dev->lock acquired and so dev->lock would have to be freed and schedule out before retry so that the process getting killed can free up all the memory. Else, it will get blocked while freeing the first buffer in ion_buffer_destroy() when it tries to acquire the dev->lock.
Your concern is correct. My mistake for considering no problem if debugfs test no problem. Will resend the updated the patch,
- move ion_shrink out of mutex.
- check error flag of ERR_PTR(-ENOMEM)
- add msleep to allow schedule out.
debugfs of carveout_heap might show the client not using the heap though memory is not freed. If the buffer was mapped to userspace, ion_share_dma_buf() would have taken one reference of ion_buffer() So, all the ion_buffer_put() will succeed if dma_buf fd is not released first and thus all the ion_handle_put() will succeed and will remove the handles from the client before ion_client_destroy() will try to acquire dev->lock()
Sorry, not fully understand. The ion_shrink will send_sig according to used size, is this solve your concern.
It was an explanation of how the first issue can get hidden by debugfs. Client would be removed though the buffers may not get actually freed and so won't be shown in debugfs as a client of the heap.
A lowmem killer with multiple threshold, omm_adj values similar to android lowmem killer would be also an option but it is a matter of policy whether each carveout/cma heap needs to do a lowmem killer or not. For some cases, fallback to other heaps would be ok and for some cases fallback to other heaps like kmalloc and vmalloc or even other carveout/cma areas might not be an option.
This simple killer is refer lownmemorykiller. It is useful for us, since we still not enable CMA.
The lowmemkiller should kick in once the frees space goes below a threshold or heap gets fragmented, but in the attached patch, it kicks it only when allocation fails. Refer to the comparison of free space to lowmem_minfree[n] done in lowmem_shrink(). Even for CMA, lowmemkiller would be needed unless a huge area is reserved for CMA which will restrict the area available for non-movable allocations.
Could you help correct my understand.
Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink registered as shrink function and only be called when allocation fail.
3.4: mm/page_alloc.c __alloc_pages_nodemask -> if (unlikely(!page)) __alloc_pages_slowpath -> wake_all_kswapd -> vmscan.c kswapd -> do_shrinker_shrink -> lowmemorykiller.c lowmem_shrink
Simple experiment in my environment. by default static int lowmem_minfree[6] = { 3 * 512, /* 6MB */ 2 * 1024, /* 8MB */ 4 * 1024, /* 16MB */ 16 * 1024, /* 64MB */ }; # cd /tmp # dd if=/dev/zero of=del bs=1M count=350 #free total used free shared buffers Mem: 384576 377588 6988 0 0 Swap: 0 0 0 Total: 384576 377588 6988
lowmem_shrink is not triggered even free=6M and only happen when free=3M, suppose allocation fail.
Conclusion is lowmem_shrink only be triggered when allocation fail. While lowmem_minfree and lowmem_adj are useful when allocation fail. And there is no dameon monitoring system memory touch to the threshold.
I am not sure whether the understand is correct, however in our ion oom killer we also triggered the shrink function when allocation fail, which is simple. Or we have to start a kernel task keep monitoring ion size? looks complicated.
Is there any particular reason to use send_sig() and set_tsk_thread_flag() instead of force_sig(). Just curious to know.
This staff is referring lowmemorykiller.c set_tsk_thread_flag is required no matter use send_sig() or force_sig(). However, during stress testing, force_sig() may cause system NULL pointer error. While send_sig() is much stable.
Besides, the stress test on non android system, is manually ion_shrink(heap, -1). While usually ion_shrink(heap, 0), without kill adj <= 0.
Ok. I am in the process of migration to 3.4 and so was referring to 3.0 lowmemkiller which uses force_sig().
drivers/staging/android/lowmemorykiller.c 3.0 use force_sig, 3.4: send_sig(SIGKILL, selected, 0); set_tsk_thread_flag(selected, TIF_MEMDIE);
- Nishanth Peethambaran
On Mon, Sep 3, 2012 at 11:02 AM, zhangfei gao zhangfei.gao@gmail.com wrote:
On Fri, Aug 31, 2012 at 8:43 PM, Nishanth Peethambaran nishanth.peethu@gmail.com wrote:
On Fri, Aug 31, 2012 at 2:27 PM, zhangfei gao zhangfei.gao@gmail.com wrote:
On Thu, Aug 30, 2012 at 5:34 PM, Nishanth Peethambaran
Thanks Nishanth for careful review.
nishanth.peethu@gmail.com wrote:
ion_buffer_create is called with dev->lock acquired and so dev->lock would have to be freed and schedule out before retry so that the process getting killed can free up all the memory. Else, it will get blocked while freeing the first buffer in ion_buffer_destroy() when it tries to acquire the dev->lock.
Your concern is correct. My mistake for considering no problem if debugfs test no problem. Will resend the updated the patch,
- move ion_shrink out of mutex.
- check error flag of ERR_PTR(-ENOMEM)
- add msleep to allow schedule out.
debugfs of carveout_heap might show the client not using the heap though memory is not freed. If the buffer was mapped to userspace, ion_share_dma_buf() would have taken one reference of ion_buffer() So, all the ion_buffer_put() will succeed if dma_buf fd is not released first and thus all the ion_handle_put() will succeed and will remove the handles from the client before ion_client_destroy() will try to acquire dev->lock()
Sorry, not fully understand. The ion_shrink will send_sig according to used size, is this solve your concern.
It was an explanation of how the first issue can get hidden by debugfs. Client would be removed though the buffers may not get actually freed and so won't be shown in debugfs as a client of the heap.
Rebecca's new update in debugfs prints the orphaned buffers and the client which allocated/imported it last.
A lowmem killer with multiple threshold, omm_adj values similar to android lowmem killer would be also an option but it is a matter of policy whether each carveout/cma heap needs to do a lowmem killer or not. For some cases, fallback to other heaps would be ok and for some cases fallback to other heaps like kmalloc and vmalloc or even other carveout/cma areas might not be an option.
This simple killer is refer lownmemorykiller. It is useful for us, since we still not enable CMA.
The lowmemkiller should kick in once the frees space goes below a threshold or heap gets fragmented, but in the attached patch, it kicks it only when allocation fails. Refer to the comparison of free space to lowmem_minfree[n] done in lowmem_shrink(). Even for CMA, lowmemkiller would be needed unless a huge area is reserved for CMA which will restrict the area available for non-movable allocations.
Could you help correct my understand.
Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink registered as shrink function and only be called when allocation fail.
3.4: mm/page_alloc.c __alloc_pages_nodemask -> if (unlikely(!page)) __alloc_pages_slowpath -> wake_all_kswapd -> vmscan.c kswapd -> do_shrinker_shrink -> lowmemorykiller.c lowmem_shrink
Simple experiment in my environment. by default static int lowmem_minfree[6] = { 3 * 512, /* 6MB */ 2 * 1024, /* 8MB */ 4 * 1024, /* 16MB */ 16 * 1024, /* 64MB */ }; # cd /tmp # dd if=/dev/zero of=del bs=1M count=350 #free total used free shared buffers Mem: 384576 377588 6988 0 0 Swap: 0 0 0 Total: 384576 377588 6988
lowmem_shrink is not triggered even free=6M and only happen when free=3M, suppose allocation fail.
Conclusion is lowmem_shrink only be triggered when allocation fail. While lowmem_minfree and lowmem_adj are useful when allocation fail. And there is no dameon monitoring system memory touch to the threshold.
I am not sure whether the understand is correct, however in our ion oom killer we also triggered the shrink function when allocation fail, which is simple. Or we have to start a kernel task keep monitoring ion size? looks complicated.
I may be wrong here. I have also started looking into linux and mm recently. My understanding is kswapd is a daemon and will be kicked in periodically as well, not only when alloc_page fails. So, to be more pro-active the min_free_threshold can be kept higher which will ensure background apps get killed when free memory falls below the threshold and so the allocation won't fail. What I experimented was to implement the oom killer for individual heap. Memory in use is tracked by the alloc/free functions of heap inside a heap variable. With every allocation, this is compared to threshold to see whether background apps should be killed. This works fine and avoids allocations failures and so the timeout and retry logic will be hit only in rare cases. The issue here is that I haven;t found a cleaner way to define the policy in a generic fashion. For example - if multiple heaps are used, lowmemkiller per heap should be kicked in when you are not relying on fallback heaps.
Is there any particular reason to use send_sig() and set_tsk_thread_flag() instead of force_sig(). Just curious to know.
This staff is referring lowmemorykiller.c set_tsk_thread_flag is required no matter use send_sig() or force_sig(). However, during stress testing, force_sig() may cause system NULL pointer error. While send_sig() is much stable.
Besides, the stress test on non android system, is manually ion_shrink(heap, -1). While usually ion_shrink(heap, 0), without kill adj <= 0.
Ok. I am in the process of migration to 3.4 and so was referring to 3.0 lowmemkiller which uses force_sig().
drivers/staging/android/lowmemorykiller.c 3.0 use force_sig, 3.4: send_sig(SIGKILL, selected, 0); set_tsk_thread_flag(selected, TIF_MEMDIE);
- Nishanth Peethambaran
On Mon, Sep 3, 2012 at 2:38 PM, Nishanth Peethambaran nishanth.peethu@gmail.com wrote:
The lowmemkiller should kick in once the frees space goes below a threshold or heap gets fragmented, but in the attached patch, it kicks it only when allocation fails. Refer to the comparison of free space to lowmem_minfree[n] done in lowmem_shrink(). Even for CMA, lowmemkiller would be needed unless a huge area is reserved for CMA which will restrict the area available for non-movable allocations.
Could you help correct my understand.
Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink registered as shrink function and only be called when allocation fail.
3.4: mm/page_alloc.c __alloc_pages_nodemask -> if (unlikely(!page)) __alloc_pages_slowpath -> wake_all_kswapd -> vmscan.c kswapd -> do_shrinker_shrink -> lowmemorykiller.c lowmem_shrink
Simple experiment in my environment. by default static int lowmem_minfree[6] = { 3 * 512, /* 6MB */ 2 * 1024, /* 8MB */ 4 * 1024, /* 16MB */ 16 * 1024, /* 64MB */ }; # cd /tmp # dd if=/dev/zero of=del bs=1M count=350 #free total used free shared buffers Mem: 384576 377588 6988 0 0 Swap: 0 0 0 Total: 384576 377588 6988
lowmem_shrink is not triggered even free=6M and only happen when free=3M, suppose allocation fail.
Conclusion is lowmem_shrink only be triggered when allocation fail. While lowmem_minfree and lowmem_adj are useful when allocation fail. And there is no dameon monitoring system memory touch to the threshold.
I am not sure whether the understand is correct, however in our ion oom killer we also triggered the shrink function when allocation fail, which is simple. Or we have to start a kernel task keep monitoring ion size? looks complicated.
I may be wrong here. I have also started looking into linux and mm recently. My understanding is kswapd is a daemon and will be kicked in periodically as well, not only when alloc_page fails. So, to be more pro-active the min_free_threshold can be kept higher which will ensure background apps get killed when free memory falls below the threshold and so the allocation won't fail.
I have different opinion on it. If we set threshold, it means that heap size is monitored. For example, we create 100MB heap. If available heap size is less than 16MB, OOM killer would be triggered. But most failure cases are allocating larger memory size.
If we allocate 20MB heap and available heap size is 50MB. Although we have enough memory size, we can't get enough memory size because of memory fragment.
And how to tell it's memory fragment? If there are a lot of 2MB blocks, we allocate 512KB. There's no memory fragment. If there are a lot of 2MB blocks, we would fail to allocate 8MB because of memory fragment. We can't use a simple formula to judge which scenario is memory fragment or not. And killer would kill background tasks. We should avoid to trigger killer frequently.
On Mon, Sep 3, 2012 at 12:46 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Mon, Sep 3, 2012 at 2:38 PM, Nishanth Peethambaran nishanth.peethu@gmail.com wrote:
The lowmemkiller should kick in once the frees space goes below a threshold or heap gets fragmented, but in the attached patch, it kicks it only when allocation fails. Refer to the comparison of free space to lowmem_minfree[n] done in lowmem_shrink(). Even for CMA, lowmemkiller would be needed unless a huge area is reserved for CMA which will restrict the area available for non-movable allocations.
Could you help correct my understand.
Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink registered as shrink function and only be called when allocation fail.
3.4: mm/page_alloc.c __alloc_pages_nodemask -> if (unlikely(!page)) __alloc_pages_slowpath -> wake_all_kswapd -> vmscan.c kswapd -> do_shrinker_shrink -> lowmemorykiller.c lowmem_shrink
Simple experiment in my environment. by default static int lowmem_minfree[6] = { 3 * 512, /* 6MB */ 2 * 1024, /* 8MB */ 4 * 1024, /* 16MB */ 16 * 1024, /* 64MB */ }; # cd /tmp # dd if=/dev/zero of=del bs=1M count=350 #free total used free shared buffers Mem: 384576 377588 6988 0 0 Swap: 0 0 0 Total: 384576 377588 6988
lowmem_shrink is not triggered even free=6M and only happen when free=3M, suppose allocation fail.
Conclusion is lowmem_shrink only be triggered when allocation fail. While lowmem_minfree and lowmem_adj are useful when allocation fail. And there is no dameon monitoring system memory touch to the threshold.
I am not sure whether the understand is correct, however in our ion oom killer we also triggered the shrink function when allocation fail, which is simple. Or we have to start a kernel task keep monitoring ion size? looks complicated.
I may be wrong here. I have also started looking into linux and mm recently. My understanding is kswapd is a daemon and will be kicked in periodically as well, not only when alloc_page fails. So, to be more pro-active the min_free_threshold can be kept higher which will ensure background apps get killed when free memory falls below the threshold and so the allocation won't fail.
I have different opinion on it. If we set threshold, it means that heap size is monitored. For example, we create 100MB heap. If available heap size is less than 16MB, OOM killer would be triggered. But most failure cases are allocating larger memory size.
If we allocate 20MB heap and available heap size is 50MB. Although we have enough memory size, we can't get enough memory size because of memory fragment.
And how to tell it's memory fragment? If there are a lot of 2MB blocks, we allocate 512KB. There's no memory fragment. If there are a lot of 2MB blocks, we would fail to allocate 8MB because of memory fragment. We can't use a simple formula to judge which scenario is memory fragment or not. And killer would kill background tasks. We should avoid to trigger killer frequently.
You are right. I did not mean we will avoid having a oom killer which is more of reactive and more aggressive - could kill tasks upto oom_adj of 1/0 depending on whether we want to force-kill foreground app or want the user-space app to take action on allocation faiilure. I am just exploring an option of having a lowmem killer which is more of pro-active type, need not be aggressive (maybe kill apps of oom_adj
6). But, I see few issues with that.
1. Some of heaps do not need explicit killer (kmalloc/vmalloc) types could work with default android lowmem killer. - Having lowmemkiller/shrink specific to heap will solve it. 2. Keeping track of heap usage and total size available. - Add variables to store reserved size and to track current usage of heap and updating it on alloc/free will solve it. 3. Fragmentation as you pointed out. - Solution1 is to not let the heap to fragment. We use a user-space allocator which avoid number of system calls and always request ION to give same sized slabs. - Solution2 is to check the biggest available contiguous chunk available and have a separate threshold for this. (android lowmem killer also checks multiple items) but the major issue with this is that entire buffer-list of the heap will have to be run through each time which is not good. - Solution3 is the lowmem killer not to take care of fragmentation check and let oom killer do it. 4. Defining a general policy is where I am struggling. - One possibility is to treat ion heaps similar to linux mm zones(dma,normal,highmem,cma...), each zone has a threshold and once each zone goes below the threshold, shrinking starts. eg: vmalloc is similar to highmem and has id=0, CMA/carveout is similar to normal and has id=1, specific CMA/varveout is similar to dma and has id=2.
- Nishanth Peethambaran
On Tue, Sep 4, 2012 at 1:01 PM, Nishanth Peethambaran nishanth.peethu@gmail.com wrote:
On Mon, Sep 3, 2012 at 12:46 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Mon, Sep 3, 2012 at 2:38 PM, Nishanth Peethambaran nishanth.peethu@gmail.com wrote:
The lowmemkiller should kick in once the frees space goes below a threshold or heap gets fragmented, but in the attached patch, it kicks it only when allocation fails. Refer to the comparison of free space to lowmem_minfree[n] done in lowmem_shrink(). Even for CMA, lowmemkiller would be needed unless a huge area is reserved for CMA which will restrict the area available for non-movable allocations.
Could you help correct my understand.
Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink registered as shrink function and only be called when allocation fail.
3.4: mm/page_alloc.c __alloc_pages_nodemask -> if (unlikely(!page)) __alloc_pages_slowpath -> wake_all_kswapd -> vmscan.c kswapd -> do_shrinker_shrink -> lowmemorykiller.c lowmem_shrink
Simple experiment in my environment. by default static int lowmem_minfree[6] = { 3 * 512, /* 6MB */ 2 * 1024, /* 8MB */ 4 * 1024, /* 16MB */ 16 * 1024, /* 64MB */ }; # cd /tmp # dd if=/dev/zero of=del bs=1M count=350 #free total used free shared buffers Mem: 384576 377588 6988 0 0 Swap: 0 0 0 Total: 384576 377588 6988
lowmem_shrink is not triggered even free=6M and only happen when free=3M, suppose allocation fail.
Conclusion is lowmem_shrink only be triggered when allocation fail. While lowmem_minfree and lowmem_adj are useful when allocation fail. And there is no dameon monitoring system memory touch to the threshold.
I am not sure whether the understand is correct, however in our ion oom killer we also triggered the shrink function when allocation fail, which is simple. Or we have to start a kernel task keep monitoring ion size? looks complicated.
I may be wrong here. I have also started looking into linux and mm recently. My understanding is kswapd is a daemon and will be kicked in periodically as well, not only when alloc_page fails. So, to be more pro-active the min_free_threshold can be kept higher which will ensure background apps get killed when free memory falls below the threshold and so the allocation won't fail.
I have different opinion on it. If we set threshold, it means that heap size is monitored. For example, we create 100MB heap. If available heap size is less than 16MB, OOM killer would be triggered. But most failure cases are allocating larger memory size.
If we allocate 20MB heap and available heap size is 50MB. Although we have enough memory size, we can't get enough memory size because of memory fragment.
And how to tell it's memory fragment? If there are a lot of 2MB blocks, we allocate 512KB. There's no memory fragment. If there are a lot of 2MB blocks, we would fail to allocate 8MB because of memory fragment. We can't use a simple formula to judge which scenario is memory fragment or not. And killer would kill background tasks. We should avoid to trigger killer frequently.
You are right. I did not mean we will avoid having a oom killer which is more of reactive and more aggressive - could kill tasks upto oom_adj of 1/0 depending on whether we want to force-kill foreground app or want the user-space app to take action on allocation faiilure. I am just exploring an option of having a lowmem killer which is more of pro-active type, need not be aggressive (maybe kill apps of oom_adj
6). But, I see few issues with that.
- Some of heaps do not need explicit killer (kmalloc/vmalloc) types
could work with default android lowmem killer. - Having lowmemkiller/shrink specific to heap will solve it.
I think that we can add flags field into struct ion_platform_data. We can also define ION_HEAP_OOM_KILLER for flags. If user defines ION_HEAP_OOM_KILLER in platform data, the OOM killer would be enabled for this heap. Otherwise, OOM killer would be skipped.
- Keeping track of heap usage and total size available.
- Add variables to store reserved size and to track current usage
of heap and updating it on alloc/free will solve it.
It's OK to show this kind of message in debugfs. It could help user to know current memory usage. We have similar patch that could be submitted later.
- Fragmentation as you pointed out.
- Solution1 is to not let the heap to fragment. We use a
user-space allocator which avoid number of system calls and always request ION to give same sized slabs.
It's hard to say which size is standard size. Maybe camera driver needs 10MB block. Maybe LCD driver needs 16MB block. If we define the size too large, it wastes memory. If we define the size too small, it can't benefit us.
- Solution2 is to check the biggest available contiguous chunk
available and have a separate threshold for this. (android lowmem killer also checks multiple items) but the major issue with this is that entire buffer-list of the heap will have to be run through each time which is not good.
It could be implemented, but it's a little complex. In this case, killer would be triggered after malloc. In order to implement it, we need to define threshold to monitor chunk usage. If we only use #3, it could be much simpler.
- Solution3 is the lowmem killer not to take care of fragmentation
check and let oom killer do it.
It's simple and effect. We like this one. What's your opinion?
- Defining a general policy is where I am struggling.
- One possibility is to treat ion heaps similar to linux mm
zones(dma,normal,highmem,cma...), each zone has a threshold and once each zone goes below the threshold, shrinking starts. eg: vmalloc is similar to highmem and has id=0, CMA/carveout is similar to normal and has id=1, specific CMA/varveout is similar to dma and has id=2.
What's the definition of id? In ion_platform_data, the comment says "unique identifier for heap. When allocating (lower numbers will be allocated from first)".
How about defining two CMA or carveout heaps at the same time?
On Tue, Sep 4, 2012 at 12:03 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Tue, Sep 4, 2012 at 1:01 PM, Nishanth Peethambaran nishanth.peethu@gmail.com wrote:
On Mon, Sep 3, 2012 at 12:46 PM, Haojian Zhuang haojian.zhuang@gmail.com wrote:
On Mon, Sep 3, 2012 at 2:38 PM, Nishanth Peethambaran nishanth.peethu@gmail.com wrote:
The lowmemkiller should kick in once the frees space goes below a threshold or heap gets fragmented, but in the attached patch, it kicks it only when allocation fails. Refer to the comparison of free space to lowmem_minfree[n] done in lowmem_shrink(). Even for CMA, lowmemkiller would be needed unless a huge area is reserved for CMA which will restrict the area available for non-movable allocations.
Could you help correct my understand.
Though lowmem provide lowmem_minfree and lowmem_adj, lowmem_shrink registered as shrink function and only be called when allocation fail.
3.4: mm/page_alloc.c __alloc_pages_nodemask -> if (unlikely(!page)) __alloc_pages_slowpath -> wake_all_kswapd -> vmscan.c kswapd -> do_shrinker_shrink -> lowmemorykiller.c lowmem_shrink
Simple experiment in my environment. by default static int lowmem_minfree[6] = { 3 * 512, /* 6MB */ 2 * 1024, /* 8MB */ 4 * 1024, /* 16MB */ 16 * 1024, /* 64MB */ }; # cd /tmp # dd if=/dev/zero of=del bs=1M count=350 #free total used free shared buffers Mem: 384576 377588 6988 0 0 Swap: 0 0 0 Total: 384576 377588 6988
lowmem_shrink is not triggered even free=6M and only happen when free=3M, suppose allocation fail.
Conclusion is lowmem_shrink only be triggered when allocation fail. While lowmem_minfree and lowmem_adj are useful when allocation fail. And there is no dameon monitoring system memory touch to the threshold.
I am not sure whether the understand is correct, however in our ion oom killer we also triggered the shrink function when allocation fail, which is simple. Or we have to start a kernel task keep monitoring ion size? looks complicated.
I may be wrong here. I have also started looking into linux and mm recently. My understanding is kswapd is a daemon and will be kicked in periodically as well, not only when alloc_page fails. So, to be more pro-active the min_free_threshold can be kept higher which will ensure background apps get killed when free memory falls below the threshold and so the allocation won't fail.
I have different opinion on it. If we set threshold, it means that heap size is monitored. For example, we create 100MB heap. If available heap size is less than 16MB, OOM killer would be triggered. But most failure cases are allocating larger memory size.
If we allocate 20MB heap and available heap size is 50MB. Although we have enough memory size, we can't get enough memory size because of memory fragment.
And how to tell it's memory fragment? If there are a lot of 2MB blocks, we allocate 512KB. There's no memory fragment. If there are a lot of 2MB blocks, we would fail to allocate 8MB because of memory fragment. We can't use a simple formula to judge which scenario is memory fragment or not. And killer would kill background tasks. We should avoid to trigger killer frequently.
You are right. I did not mean we will avoid having a oom killer which is more of reactive and more aggressive - could kill tasks upto oom_adj of 1/0 depending on whether we want to force-kill foreground app or want the user-space app to take action on allocation faiilure. I am just exploring an option of having a lowmem killer which is more of pro-active type, need not be aggressive (maybe kill apps of oom_adj
6). But, I see few issues with that.
- Some of heaps do not need explicit killer (kmalloc/vmalloc) types
could work with default android lowmem killer. - Having lowmemkiller/shrink specific to heap will solve it.
I think that we can add flags field into struct ion_platform_data. We can also define ION_HEAP_OOM_KILLER for flags. If user defines ION_HEAP_OOM_KILLER in platform data, the OOM killer would be enabled for this heap. Otherwise, OOM killer would be skipped.
- Keeping track of heap usage and total size available.
- Add variables to store reserved size and to track current usage
of heap and updating it on alloc/free will solve it.
It's OK to show this kind of message in debugfs. It could help user to know current memory usage. We have similar patch that could be submitted later.
I meant it not for debugfs but for lowmemkiller. What I felt was for the heap to track the total size and used sized in internal heap struct (eg: struct ion_carveout_heap).
- Fragmentation as you pointed out.
- Solution1 is to not let the heap to fragment. We use a
user-space allocator which avoid number of system calls and always request ION to give same sized slabs.
It's hard to say which size is standard size. Maybe camera driver needs 10MB block. Maybe LCD driver needs 16MB block. If we define the size too large, it wastes memory. If we define the size too small, it can't benefit us.
- Solution2 is to check the biggest available contiguous chunk
available and have a separate threshold for this. (android lowmem killer also checks multiple items) but the major issue with this is that entire buffer-list of the heap will have to be run through each time which is not good.
It could be implemented, but it's a little complex. In this case, killer would be triggered after malloc. In order to implement it, we need to define threshold to monitor chunk usage. If we only use #3, it could be much simpler.
- Solution3 is the lowmem killer not to take care of fragmentation
check and let oom killer do it.
It's simple and effect. We like this one. What's your opinion?
Combination of 1 & 3 is what I am trying out.
- Defining a general policy is where I am struggling.
- One possibility is to treat ion heaps similar to linux mm
zones(dma,normal,highmem,cma...), each zone has a threshold and once each zone goes below the threshold, shrinking starts. eg: vmalloc is similar to highmem and has id=0, CMA/carveout is similar to normal and has id=1, specific CMA/varveout is similar to dma and has id=2.
What's the definition of id? In ion_platform_data, the comment says "unique identifier for heap. When allocating (lower numbers will be allocated from first)".
How about defining two CMA or carveout heaps at the same time?
If multiple CMA/carveout heaps are defined, they need to have different ids. That is where I am struggling to define a policy because usage of the heaps is dependent on the user-code - whether they fallback or not and so cannot define a general policy. I was just thinking loud to see whether there are more ideas or a clean way of doing the lowmemkiller.
linaro-mm-sig@lists.linaro.org