[PATCH] staging: android/lowmemorykiller: Don't grab tasklist_lock

List overview All Threads
Download

newer

older

[ACTIVITY] (John Stultz) Feb 6-10

[ACTIVITY] (Anton Vorontsov)...

Anton Vorontsov

1 Feb 2012 1 Feb '12

4:33 a.m.

Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.

For LMK, it is sufficient to surround tasks list traverse loop with rcu_read_{,un}lock() pair.

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---

Also thanks to Eric and Paul for the ideas.

drivers/staging/android/lowmemorykiller.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 2d8d2b7..b3ade2f 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -34,6 +34,7 @@ #include <linux/mm.h> #include <linux/oom.h> #include <linux/sched.h> +#include <linux/rcupdate.h> #include <linux/profile.h> #include <linux/notifier.h>

@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;

- read_lock(&tasklist_lock); + rcu_read_lock(); for_each_process(p) { struct mm_struct *mm; struct signal_struct *sig; @@ -185,7 +186,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n", sc->nr_to_scan, sc->gfp_mask, rem); - read_unlock(&tasklist_lock); + rcu_read_unlock(); return rem; }

-- 1.7.7.6

Show replies by date

Oleg Nesterov

2 Feb 2 Feb

12:54 p.m.

On 02/01, Anton Vorontsov wrote:

...

@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;

read_lock(&tasklist_lock);

rcu_read_lock();

This has the same problem, force_sig() becomes unsafe.

Why do you need force_? Do you really want to kill /sbin/init (or sub-namespace init) ?

We could change force_sig_info() to use lock_task_sighand(), but I'd like to avoid this. Imho, this interface should be cleanuped, and it should be used for synchronous signals only.

With or without this patch, sig == NULL is not possible but !mm is not right, there could be other other threads with mm != NULL.

Oleg.

Anton Vorontsov

5:16 p.m.

On Thu, Feb 02, 2012 at 01:54:41PM +0100, Oleg Nesterov wrote:

...

On 02/01, Anton Vorontsov wrote:

...
@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;

read_lock(&tasklist_lock);

rcu_read_lock();

This has the same problem, force_sig() becomes unsafe.

Ouch, I think I finally got it. So, lock_task_sighand() is trying to gracefully grab the lock, checking if the sighand is not NULL (which means, per __exit_signal(), that the task is halfway into the grave).

Well, it seems that such a behaviour of force_sig() is not quite obvious, and there are other offenders out there. E.g. in sysrq code I don't see anything that prevent the same race.

static void send_sig_all(int sig) { struct task_struct *p;

for_each_process(p) { if (p->mm && !is_global_init(p)) /* Not swapper, init nor kernel thread */ force_sig(sig, p); } }

Would the following fix work for the sysrq?

- - - - From: Anton Vorontsov anton.vorontsov@linaro.org Subject: [PATCH] sysrq: Fix unsafe operations on tasks

sysrq should grab the tasklist lock, otherwise calling force_sig() is not safe, as it might race with exiting task, which ->sighand might be set to NULL already.

Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/tty/sysrq.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 7867b7c..a1bcad7 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -322,11 +322,13 @@ static void send_sig_all(int sig) { struct task_struct *p;

+ read_lock(&tasklist_lock); for_each_process(p) { if (p->mm && !is_global_init(p)) /* Not swapper, init nor kernel thread */ force_sig(sig, p); } + read_unlock(&tasklist_lock); }

static void sysrq_handle_term(int key) - - - -

But for LMK I will use send_signal(), as LMK is special.

Plus, while I'm at it, might want to review a bit closer other offenders, and fix them as well.

...

Why do you need force_? Do you really want to kill /sbin/init (or sub-namespace init) ?

Nope.

...

We could change force_sig_info() to use lock_task_sighand(), but I'd like to avoid this. Imho, this interface should be cleanuped, and it should be used for synchronous signals only.

OK. Then we should fix the users?

...

With or without this patch, sig == NULL is not possible but !mm is not right, there could be other other threads with mm != NULL.

I'm not sure I completely follow. In the current LMK code, we check for !mm because otherwise the potential victim is useless for us (i.e. killing it will not free much memory anyway).

Thanks!

-- Anton Vorontsov Email: cbouatmailru@gmail.com

Anton Vorontsov

3 Feb 3 Feb

12:03 a.m.

New subject: [PATCH v3] staging: android/lowmemorykiller: Don't grab tasklist_lock

Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.

For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().

...

From now on using force_sig() is not safe, as it can race with an

already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/staging/android/lowmemorykiller.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 2d8d2b7..63da844 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -34,6 +34,7 @@ #include <linux/mm.h> #include <linux/oom.h> #include <linux/sched.h> +#include <linux/rcupdate.h> #include <linux/profile.h> #include <linux/notifier.h>

@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;

- read_lock(&tasklist_lock); + rcu_read_lock(); for_each_process(p) { struct mm_struct *mm; struct signal_struct *sig; @@ -180,12 +181,12 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) lowmem_deathpending = selected; task_handoff_register(&task_nb); #endif - force_sig(SIGKILL, selected); + send_sig(SIGKILL, selected, 0); rem -= selected_tasksize; } lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n", sc->nr_to_scan, sc->gfp_mask, rem); - read_unlock(&tasklist_lock); + rcu_read_unlock(); return rem; }

-- 1.7.7.6

Oleg Nesterov

4:36 p.m.

New subject: [PATCH v3] staging: android/lowmemorykiller: Don't grab tasklist_lock

On 02/03, Anton Vorontsov wrote:

...

@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;

read_lock(&tasklist_lock);

rcu_read_lock(); for_each_process(p) { struct mm_struct *mm; struct signal_struct *sig;

@@ -180,12 +181,12 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) lowmem_deathpending = selected; task_handoff_register(&task_nb); #endif
force_sig(SIGKILL, selected);
send_sig(SIGKILL, selected, 0);
rem -= selected_tasksize; } lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n", sc->nr_to_scan, sc->gfp_mask, rem);
read_unlock(&tasklist_lock);

rcu_read_unlock();

I think this is correct. As for ->mm check please look at find_lock_task_mm().

You can also remove the !sig check.

And, forgot to mention. There is another reason why mm != NULL check is wrong (send_sig_all too). A kernel thread can do use_mm(). You should also check PF_KTHREAD.

Oleg.

Oleg Nesterov

4:30 p.m.

On 02/02, Anton Vorontsov wrote:

...

On Thu, Feb 02, 2012 at 01:54:41PM +0100, Oleg Nesterov wrote:

...
This has the same problem, force_sig() becomes unsafe.

Ouch, I think I finally got it. So, lock_task_sighand() is trying to gracefully grab the lock, checking if the sighand is not NULL (which means, per __exit_signal(), that the task is halfway into the grave).

Yes.

...

Would the following fix work for the sysrq?

From: Anton Vorontsov anton.vorontsov@linaro.org Subject: [PATCH] sysrq: Fix unsafe operations on tasks

sysrq should grab the tasklist lock, otherwise calling force_sig() is not safe, as it might race with exiting task, which ->sighand might be set to NULL already.

And there are other reasons for tasklist. It is not clear why for_each_process() itself is safe. OK, currently (afaics) the caller disables irqs, but in theory this doesn't imply rcu_lock.

And it can race with fork() and miss the task, although mostly in theory.

...

--- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -322,11 +322,13 @@ static void send_sig_all(int sig) { struct task_struct *p;

read_lock(&tasklist_lock); for_each_process(p) { if (p->mm && !is_global_init(p)) /* Not swapper, init nor kernel thread */ force_sig(sig, p); }

read_unlock(&tasklist_lock);

Agreed, but force_sig() should not be used anyway. It sends the signal to the thread, we need to kill the process. The main thread can exit.

...

...
With or without this patch, sig == NULL is not possible but !mm is not right, there could be other other threads with mm != NULL.

I'm not sure I completely follow. In the current LMK code, we check for !mm because otherwise the potential victim is useless for us (i.e. killing it will not free much memory anyway).

This is the common mistake. Consider this program:

void *thread_func(...) { use_a_lot_of_memory(); }

int main(void) { pthread_create(..., thread_func, ...); pthread_exit(); }

lowmem_shrink() will skip this task because p->mm == NULL.

Oleg.

Anton Vorontsov

6 Feb 6 Feb

4:29 p.m.

New subject: [PATCH 1/6] oom: Make find_lock_task_mm() sparse-aware

This is needed so that callers would not get 'context imbalance' warnings from the sparse tool.

As a side effect, this patch fixes the following sparse warnings:

CHECK mm/oom_kill.c mm/oom_kill.c:201:28: warning: context imbalance in 'oom_badness' - unexpected unlock include/linux/rcupdate.h:249:30: warning: context imbalance in 'dump_tasks' - unexpected unlock mm/oom_kill.c:453:9: warning: context imbalance in 'oom_kill_task' - unexpected unlock CHECK mm/memcontrol.c ... mm/memcontrol.c:1130:17: warning: context imbalance in 'task_in_mem_cgroup' - unexpected unlock

Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- include/linux/oom.h | 11 ++++++++++- mm/oom_kill.c | 2 +- 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h index 552fba9..e64bfd2 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -21,6 +21,7 @@

#ifdef __KERNEL__

+#include <linux/compiler.h> #include <linux/sched.h> #include <linux/types.h> #include <linux/nodemask.h> @@ -65,7 +66,15 @@ static inline void oom_killer_enable(void) oom_killer_disabled = false; }

-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p); + +#define find_lock_task_mm(p) \ +({ \ + struct task_struct *__ret; \ + \ + __cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \ + __ret; \ +})

/* sysctls */ extern int sysctl_oom_dump_tasks; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 2958fd8..0ebb383 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -136,7 +136,7 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk, * pointer. Return p, or any of its subthreads with a valid ->mm, with * task_lock() held. */ -struct task_struct *find_lock_task_mm(struct task_struct *p) +struct task_struct *__find_lock_task_mm(struct task_struct *p) { struct task_struct *t = p;

-- 1.7.7.6

Greg KH

4:35 p.m.

New subject: [PATCH 1/6] oom: Make find_lock_task_mm() sparse-aware

On Mon, Feb 06, 2012 at 08:29:30PM +0400, Anton Vorontsov wrote:

...

This is needed so that callers would not get 'context imbalance' warnings from the sparse tool.

As a side effect, this patch fixes the following sparse warnings:

CHECK mm/oom_kill.c mm/oom_kill.c:201:28: warning: context imbalance in 'oom_badness' - unexpected unlock include/linux/rcupdate.h:249:30: warning: context imbalance in 'dump_tasks' - unexpected unlock mm/oom_kill.c:453:9: warning: context imbalance in 'oom_kill_task' - unexpected unlock CHECK mm/memcontrol.c ... mm/memcontrol.c:1130:17: warning: context imbalance in 'task_in_mem_cgroup' - unexpected unlock

Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org

include/linux/oom.h | 11 ++++++++++- mm/oom_kill.c | 2 +- 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h index 552fba9..e64bfd2 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -21,6 +21,7 @@ #ifdef __KERNEL__ +#include <linux/compiler.h> #include <linux/sched.h> #include <linux/types.h> #include <linux/nodemask.h> @@ -65,7 +66,15 @@ static inline void oom_killer_enable(void) oom_killer_disabled = false; } -extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p);

+#define find_lock_task_mm(p) \ +({ \
struct task_struct *__ret; \
							\
__cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \

__ret; \
+})

Please use the proper "do...while" style thing here for multi-line, complex #defines like the rest of the kernel does so that you don't end up debugging horrible problems later.

And I would need acks from the -mm developers before I could apply any of this.

thanks,

greg k-h

Anton Vorontsov

6:59 p.m.

New subject: [PATCH 1/6] oom: Make find_lock_task_mm() sparse-aware

On Mon, Feb 06, 2012 at 08:35:42AM -0800, Greg KH wrote: [...]

...

...
-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p);

+#define find_lock_task_mm(p) \ +({ \
struct task_struct *__ret; \
							\
__cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \

__ret; \
+})
Please use the proper "do...while" style thing here for multi-line, complex #defines like the rest of the kernel does so that you don't end up debugging horrible problems later.

Unfortunately this isn't possible in this case. Unlike '({})' GCC extension, do-while statement does not evaluate to a value, i.e. 'x = do { 123; } while (0);' is illegal.

Thanks,

-- Anton Vorontsov Email: cbouatmailru@gmail.com

Greg KH

7:18 p.m.

New subject: [PATCH 1/6] oom: Make find_lock_task_mm() sparse-aware

On Mon, Feb 06, 2012 at 10:59:09PM +0400, Anton Vorontsov wrote:

...

On Mon, Feb 06, 2012 at 08:35:42AM -0800, Greg KH wrote: [...]

...
...
-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p);

+#define find_lock_task_mm(p) \ +({ \
struct task_struct *__ret; \
							\
__cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \

__ret; \
+})
Please use the proper "do...while" style thing here for multi-line, complex #defines like the rest of the kernel does so that you don't end up debugging horrible problems later.
Unfortunately this isn't possible in this case. Unlike '({})' GCC extension, do-while statement does not evaluate to a value, i.e. 'x = do { 123; } while (0);' is illegal.

Ah, you are right, my bad, sorry about that.

greg k-h

KOSAKI Motohiro

9:27 p.m.

New subject: [PATCH 1/6] oom: Make find_lock_task_mm() sparse-aware

2012/2/6 Greg KH gregkh@linuxfoundation.org:

...

On Mon, Feb 06, 2012 at 10:59:09PM +0400, Anton Vorontsov wrote:

...
On Mon, Feb 06, 2012 at 08:35:42AM -0800, Greg KH wrote: [...]

...
...
-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p);

+#define find_lock_task_mm(p) \ +({ \

struct task_struct *__ret; \

\

__cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \

__ret; \

+})

Please use the proper "do...while" style thing here for multi-line, complex #defines like the rest of the kernel does so that you don't end up debugging horrible problems later.

Unfortunately this isn't possible in this case. Unlike '({})' GCC extension, do-while statement does not evaluate to a value, i.e. 'x = do { 123; } while (0);' is illegal.

Ah, you are right, my bad, sorry about that.

greg k-h

Some __cond_lock() caller are inline functions. Is this bad? inline function is always recommended than macros.

Anton Vorontsov

7 Feb 7 Feb

4:48 a.m.

New subject: [PATCH v2 1/6] oom: Make find_lock_task_mm() sparse-aware

This is needed so that callers would not get 'context imbalance' warnings from the sparse tool.

As a side effect, this patch fixes the following sparse warnings:

Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---

On Mon, Feb 06, 2012 at 04:27:32PM -0500, KOSAKI Motohiro wrote: [...]

...

...
...
Unfortunately this isn't possible in this case. Unlike '({})' GCC extension, do-while statement does not evaluate to a value, i.e. 'x = do { 123; } while (0);' is illegal.

Ah, you are right, my bad, sorry about that.

greg k-h

Some __cond_lock() caller are inline functions. Is this bad?

No, that's great, actually. :-) Not obvious, but seems like sparse understands __cond_lock in inline functions, so I'd better use it.

Thanks,

include/linux/oom.h | 12 +++++++++++- mm/oom_kill.c | 2 +- 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h index 552fba9..7c8946a 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -21,6 +21,7 @@

#ifdef __KERNEL__

+#include <linux/compiler.h> #include <linux/sched.h> #include <linux/types.h> #include <linux/nodemask.h> @@ -65,7 +66,16 @@ static inline void oom_killer_enable(void) oom_killer_disabled = false; }

-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p); + +static inline struct task_struct *find_lock_task_mm(struct task_struct *p) +{ + struct task_struct *ret; + + ret = __find_lock_task_mm(p); + (void)__cond_lock(&p->alloc_lock, ret); + return ret; +}

-- 1.7.7.6

KOSAKI Motohiro

5:17 a.m.

New subject: [PATCH v2 1/6] oom: Make find_lock_task_mm() sparse-aware

...

...
Some __cond_lock() caller are inline functions. Is this bad?

No, that's great, actually. :-) Not obvious, but seems like sparse understands __cond_lock in inline functions, so I'd better use it.

Thanks,

Great. So, to this version

Acked-by: KOSAKI Motohiro kosaki.motohiro@jp.fujitsu.com

Oleg Nesterov

8 Feb 8 Feb

3:27 p.m.

New subject: [PATCH v2 1/6] oom: Make find_lock_task_mm() sparse-aware

On 02/07, Anton Vorontsov wrote:

...

On Mon, Feb 06, 2012 at 04:27:32PM -0500, KOSAKI Motohiro wrote:

...
Some __cond_lock() caller are inline functions. Is this bad?

No, that's great, actually. :-) Not obvious, but seems like sparse understands __cond_lock in inline functions, so I'd better use it.

Hmm, great...

may be you can update lock_task_sighand() too ? (in a separate patch of course).

Oleg.

Anton Vorontsov

9 Feb 9 Feb

4:45 p.m.

New subject: [PATCH] sched: Turn lock_task_sighand() into a static inline

It appears that sparse tool understands static inline functions for context balance checking, so let's turn the macros into an inline func.

This makes the code a little bit more robust.

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---

On Wed, Feb 08, 2012 at 04:27:50PM +0100, Oleg Nesterov wrote:

...

On 02/07, Anton Vorontsov wrote:

...
On Mon, Feb 06, 2012 at 04:27:32PM -0500, KOSAKI Motohiro wrote:

...
Some __cond_lock() caller are inline functions. Is this bad?

No, that's great, actually. :-) Not obvious, but seems like sparse understands __cond_lock in inline functions, so I'd better use it.

Hmm, great...

may be you can update lock_task_sighand() too ? (in a separate patch of course).

Sure thing. Here it goes...

include/linux/sched.h | 15 +++++++++------ 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h index e82f721..22ae10e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2394,12 +2394,15 @@ static inline void task_unlock(struct task_struct *p) extern struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, unsigned long *flags);

-#define lock_task_sighand(tsk, flags) \ -({ struct sighand_struct *__ss; \ - __cond_lock(&(tsk)->sighand->siglock, \ - (__ss = __lock_task_sighand(tsk, flags))); \ - __ss; \ -}) \ +static inline struct sighand_struct *lock_task_sighand(struct task_struct *tsk, + unsigned long *flags) +{ + struct sighand_struct *ret; + + ret = __lock_task_sighand(tsk, flags); + (void)__cond_lock(&tsk->sighand->siglock, ret); + return ret; +}

static inline void unlock_task_sighand(struct task_struct *tsk, unsigned long *flags)

-- 1.7.8.3

Anton Vorontsov

6 Feb 6 Feb

4:29 p.m.

New subject: [PATCH 2/6] oom: Get rid of sparse warnings

Sparse flood makes it hard to catch newly-introduced warnings. So let's fix the the sparse warnings in the oom killer:

CHECK mm/oom_kill.c mm/oom_kill.c:139:20: warning: context imbalance in '__find_lock_task_mm' - wrong count at exit mm/oom_kill.c:771:9: warning: context imbalance in 'out_of_memory' - different lock contexts for basic block

The first one is fixed by assuring sparse that we know that we exit with the lock held.

The second one is caused by the fact that sparse isn't smart enough to handle noreturn attribute.

Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- mm/oom_kill.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0ebb383..49569b6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -142,8 +142,14 @@ struct task_struct *__find_lock_task_mm(struct task_struct *p)

do { task_lock(t); - if (likely(t->mm)) + if (likely(t->mm)) { + /* + * Shut up sparse: we do know that we exit w/ the + * task locked. + */ + __release(&t->alloc_loc); return t; + } task_unlock(t); } while_each_thread(p, t);

@@ -766,6 +772,7 @@ retry: dump_header(NULL, gfp_mask, order, NULL, mpol_mask); read_unlock(&tasklist_lock); panic("Out of memory and no killable processes...\n"); + return; }

if (oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,

-- 1.7.7.6

KOSAKI Motohiro

9:33 p.m.

New subject: [PATCH 2/6] oom: Get rid of sparse warnings

2012/2/6 Anton Vorontsov anton.vorontsov@linaro.org:

...

Sparse flood makes it hard to catch newly-introduced warnings. So let's fix the the sparse warnings in the oom killer:

CHECK mm/oom_kill.c mm/oom_kill.c:139:20: warning: context imbalance in '__find_lock_task_mm' - wrong count at exit mm/oom_kill.c:771:9: warning: context imbalance in 'out_of_memory' - different lock contexts for basic block

The first one is fixed by assuring sparse that we know that we exit with the lock held.

The second one is caused by the fact that sparse isn't smart enough to handle noreturn attribute.

Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org

mm/oom_kill.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0ebb383..49569b6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -142,8 +142,14 @@ struct task_struct *__find_lock_task_mm(struct task_struct *p)

do { task_lock(t);

if (likely(t->mm))

if (likely(t->mm)) {

/*

* Shut up sparse: we do know that we exit w/ the

* task locked.

*/

__release(&t->alloc_loc);

task struct only have allock_lock, not alloc_loc. Moreover we don't release the lock in this code path. Seems odd.

...

return t;

}

task_unlock(t); } while_each_thread(p, t);

@@ -766,6 +772,7 @@ retry: dump_header(NULL, gfp_mask, order, NULL, mpol_mask); read_unlock(&tasklist_lock); panic("Out of memory and no killable processes...\n");

return;

}

if (oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,

1.7.7.6

Anton Vorontsov

7 Feb 7 Feb

4:20 a.m.

New subject: [PATCH v2 2/6] oom: Get rid of sparse warnings

Sparse flood makes it hard to catch newly-introduced warnings. So let's fix the the sparse warnings in the oom killer:

The first one is fixed by assuring sparse that we know that we exit with the lock held.

The second one is caused by the fact that sparse isn't smart enough to handle noreturn attribute.

Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---

On Mon, Feb 06, 2012 at 04:33:26PM -0500, KOSAKI Motohiro wrote:

...

2012/2/6 Anton Vorontsov anton.vorontsov@linaro.org:

...
Sparse flood makes it hard to catch newly-introduced warnings. So let's fix the the sparse warnings in the oom killer:

CHECK mm/oom_kill.c mm/oom_kill.c:139:20: warning: context imbalance in '__find_lock_task_mm' - wrong count at exit mm/oom_kill.c:771:9: warning: context imbalance in 'out_of_memory' - different lock contexts for basic block

The first one is fixed by assuring sparse that we know that we exit with the lock held.

The second one is caused by the fact that sparse isn't smart enough to handle noreturn attribute.

Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org

mm/oom_kill.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0ebb383..49569b6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -142,8 +142,14 @@ struct task_struct *__find_lock_task_mm(struct task_struct *p)

do { task_lock(t);

if (likely(t->mm))

if (likely(t->mm)) {

/*

* Shut up sparse: we do know that we exit w/ the

* task locked.

*/

__release(&t->alloc_loc);

task struct only have allock_lock, not alloc_loc.

Funnily, but sparse does not care. :-) __release(foo) will work as well. Seems like sparse counts locking balance globally.

This is now fixed in the patch down below, thanks for catching.

...

Moreover we don't release the lock in this code path. Seems odd.

Indeed. That's exactly what sparse seeing is as well. We exit without releasing the lock, which is bad (in sparse' eyes). So we lie to sparse, telling it that we do release, so it shut ups.

Usually, we annotate functions with __releases() and __acquires() when when functions releases/acquires one of its arguments, but in this case the function returns a locked value, and it seems that there's no way to properly annotate this case.

mm/oom_kill.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0ebb383..1914367 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -142,8 +142,14 @@ struct task_struct *__find_lock_task_mm(struct task_struct *p)

do { task_lock(t); - if (likely(t->mm)) + if (likely(t->mm)) { + /* + * Shut up sparse: we do know that we exit w/ the + * task locked. + */ + __release(&t->alloc_lock); return t; + } task_unlock(t); } while_each_thread(p, t);

@@ -766,6 +772,7 @@ retry: dump_header(NULL, gfp_mask, order, NULL, mpol_mask); read_unlock(&tasklist_lock); panic("Out of memory and no killable processes...\n"); + return; }

if (oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,

-- 1.7.7.6

KOSAKI Motohiro

5:38 a.m.

New subject: [PATCH v2 2/6] oom: Get rid of sparse warnings

...

...
task struct only have allock_lock, not alloc_loc.

Funnily, but sparse does not care. :-) __release(foo) will work as well. Seems like sparse counts locking balance globally.

This is now fixed in the patch down below, thanks for catching.

...
Moreover we don't release the lock in this code path. Seems odd.

Indeed. That's exactly what sparse seeing is as well. We exit without releasing the lock, which is bad (in sparse' eyes). So we lie to sparse, telling it that we do release, so it shut ups.

Hmmm....

To be honest, I really dislike any lie annotation. Why? It is very fragile and easily become broken from unrelated but near line changes. Please consider to enhance sparse at first.

Anton Vorontsov

6:23 a.m.

New subject: [PATCH v2 2/6] oom: Get rid of sparse warnings

On Tue, Feb 07, 2012 at 12:38:58AM -0500, KOSAKI Motohiro wrote:

...

...
...
task struct only have allock_lock, not alloc_loc.

Funnily, but sparse does not care. :-) __release(foo) will work as well. Seems like sparse counts locking balance globally.

This is now fixed in the patch down below, thanks for catching.

...
Moreover we don't release the lock in this code path. Seems odd.

Indeed. That's exactly what sparse seeing is as well. We exit without releasing the lock, which is bad (in sparse' eyes). So we lie to sparse, telling it that we do release, so it shut ups.

Hmmm....

To be honest, I really dislike any lie annotation. Why? It is very fragile and easily become broken from unrelated but near line changes. Please consider to enhance sparse at first.

I somewhat doubt that it is possible to "enhance it". We keep the lock held conditionaly, so we need too place the hint in the code itself. I believe the best we can do would be something like this:

diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 4a24354..61d91f2 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -14,6 +14,7 @@ # define __releases(x) __attribute__((context(x,1,0))) # define __acquire(x) __context__(x,1) # define __release(x) __context__(x,-1) +# define __ret_with_lock(x) __context__(x,-1) # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) # define __percpu __attribute__((noderef, address_space(3))) #ifdef CONFIG_SPARSE_RCU_POINTER @@ -37,6 +38,7 @@ extern void __chk_io_ptr(const volatile void __iomem *); # define __releases(x) # define __acquire(x) (void)0 # define __release(x) (void)0 +# define __ret_with_lock(x) # define __cond_lock(x,c) (c) # define __percpu # define __rcu

And then use it instead of __release().

-- Anton Vorontsov Email: cbouatmailru@gmail.com

KOSAKI Motohiro

8 Feb 8 Feb

6:13 p.m.

New subject: [PATCH v2 2/6] oom: Get rid of sparse warnings

(2/7/12 1:23 AM), Anton Vorontsov wrote:

...

On Tue, Feb 07, 2012 at 12:38:58AM -0500, KOSAKI Motohiro wrote:

...
...
...
task struct only have allock_lock, not alloc_loc.

Funnily, but sparse does not care. :-) __release(foo) will work as well. Seems like sparse counts locking balance globally.

This is now fixed in the patch down below, thanks for catching.

...
Moreover we don't release the lock in this code path. Seems odd.

Indeed. That's exactly what sparse seeing is as well. We exit without releasing the lock, which is bad (in sparse' eyes). So we lie to sparse, telling it that we do release, so it shut ups.

Hmmm....

To be honest, I really dislike any lie annotation. Why? It is very fragile and easily become broken from unrelated but near line changes. Please consider to enhance sparse at first.

I somewhat doubt that it is possible to "enhance it". We keep the lock held conditionaly, so we need too place the hint in the code itself. I believe the best we can do would be something like this:

diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 4a24354..61d91f2 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -14,6 +14,7 @@ # define __releases(x) __attribute__((context(x,1,0))) # define __acquire(x) __context__(x,1) # define __release(x) __context__(x,-1) +# define __ret_with_lock(x) __context__(x,-1) # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) # define __percpu __attribute__((noderef, address_space(3))) #ifdef CONFIG_SPARSE_RCU_POINTER @@ -37,6 +38,7 @@ extern void __chk_io_ptr(const volatile void __iomem *); # define __releases(x) # define __acquire(x) (void)0 # define __release(x) (void)0 +# define __ret_with_lock(x) # define __cond_lock(x,c) (c) # define __percpu # define __rcu

And then use it instead of __release().

Hmmm...

I still dislike this lie annotation. But maybe it's better than nothing (dunno, just guess). Go ahead.

Anton Vorontsov

6 Feb 6 Feb

4:29 p.m.

New subject: [PATCH 3/6] staging: android/lowmemorykiller: Don't grab tasklist_lock

Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.

For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().

...

From now on using force_sig() is not safe, as it can race with an

already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.

@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;

-- 1.7.7.6

Greg KH

4:36 p.m.

New subject: [PATCH 3/6] staging: android/lowmemorykiller: Don't grab tasklist_lock

On Mon, Feb 06, 2012 at 08:29:41PM +0400, Anton Vorontsov wrote:

...

Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.

For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().

...
From now on using force_sig() is not safe, as it can race with an

already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org

Are these last 4 patches independant of the first 2 and can be taken through the staging tree now?

thanks,

greg k-h

Anton Vorontsov

4:42 p.m.

New subject: [PATCH 3/6] staging: android/lowmemorykiller: Don't grab tasklist_lock

On Mon, Feb 06, 2012 at 08:36:49AM -0800, Greg KH wrote:

...

On Mon, Feb 06, 2012 at 08:29:41PM +0400, Anton Vorontsov wrote:

...
Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.

For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().

...
From now on using force_sig() is not safe, as it can race with an

already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org

Are these last 4 patches independant of the first 2 and can be taken through the staging tree now?

Yep, without the first two there is just a bit of sparse warnings. Not a big deal.

Thanks,

-- Anton Vorontsov Email: cbouatmailru@gmail.com

Greg KH

9 Feb 9 Feb

12:56 a.m.

New subject: [PATCH 3/6] staging: android/lowmemorykiller: Don't grab tasklist_lock

On Mon, Feb 06, 2012 at 08:42:15PM +0400, Anton Vorontsov wrote:

...

On Mon, Feb 06, 2012 at 08:36:49AM -0800, Greg KH wrote:

...
On Mon, Feb 06, 2012 at 08:29:41PM +0400, Anton Vorontsov wrote:

...
Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.

For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().

...
From now on using force_sig() is not safe, as it can race with an

already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org

Are these last 4 patches independant of the first 2 and can be taken through the staging tree now?

Yep, without the first two there is just a bit of sparse warnings. Not a big deal.

Ok, I'll take the last 4, the first 2 needs to go through some other tree (-mm?)

thanks,

greg k-h

Oleg Nesterov

8 Feb 8 Feb

3:32 p.m.

New subject: [PATCH 3/6] staging: android/lowmemorykiller: Don't grab tasklist_lock

On 02/06, Anton Vorontsov wrote:

...

Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.

For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().

From now on using force_sig() is not safe, as it can race with an already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.

Reviewed-by: Oleg Nesterov oleg@redhat.com

Anton Vorontsov

6 Feb 6 Feb

4:29 p.m.

New subject: [PATCH 4/6] staging: android/lowmemorykiller: Better mm handling

LMK should not directly check for task->mm. The reason is that the process' threads may exit or detach its mm via use_mm(), but other threads may still have a valid mm. To catch this we use find_lock_task_mm(), which walks up all threads and returns an appropriate task (with lock held).

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/staging/android/lowmemorykiller.c | 16 +++++++++------- 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 63da844..0755e2f 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -82,7 +82,7 @@ task_notify_func(struct notifier_block *self, unsigned long val, void *data)

static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) { - struct task_struct *p; + struct task_struct *tsk; struct task_struct *selected = NULL; int rem = 0; int tasksize; @@ -134,15 +134,17 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) selected_oom_adj = min_adj;

rcu_read_lock(); - for_each_process(p) { - struct mm_struct *mm; + for_each_process(tsk) { + struct task_struct *p; struct signal_struct *sig; int oom_adj;

- task_lock(p); - mm = p->mm; + p = find_lock_task_mm(tsk); + if (!p) + continue; + sig = p->signal; - if (!mm || !sig) { + if (!sig) { task_unlock(p); continue; } @@ -151,7 +153,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) task_unlock(p); continue; } - tasksize = get_mm_rss(mm); + tasksize = get_mm_rss(p->mm); task_unlock(p); if (tasksize <= 0) continue;

-- 1.7.7.6

KOSAKI Motohiro

9:36 p.m.

New subject: [PATCH 4/6] staging: android/lowmemorykiller: Better mm handling

...

static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) {

struct task_struct *p;

struct task_struct *tsk;

struct task_struct *selected = NULL; int rem = 0; int tasksize; @@ -134,15 +134,17 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) selected_oom_adj = min_adj;

rcu_read_lock();

for_each_process(p) {

struct mm_struct *mm;

for_each_process(tsk) {

struct task_struct *p;

struct signal_struct *sig; int oom_adj;

task_lock(p);

mm = p->mm;

p = find_lock_task_mm(tsk);

if (!p)

continue;

sig = p->signal;

if (!mm || !sig) {

if (!sig) {

task_unlock(p); continue; } @@ -151,7 +153,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) task_unlock(p); continue; }

tasksize = get_mm_rss(mm);

tasksize = get_mm_rss(p->mm);

task_unlock(p); if (tasksize <= 0) continue;

ack.

Oleg Nesterov

8 Feb 8 Feb

3:34 p.m.

New subject: [PATCH 4/6] staging: android/lowmemorykiller: Better mm handling

On 02/06, Anton Vorontsov wrote:

...

LMK should not directly check for task->mm. The reason is that the process' threads may exit or detach its mm via use_mm(), but other threads may still have a valid mm. To catch this we use find_lock_task_mm(), which walks up all threads and returns an appropriate task (with lock held).

Reviewed-by: Oleg Nesterov oleg@redhat.com

Anton Vorontsov

6 Feb 6 Feb

4:29 p.m.

New subject: [PATCH 5/6] staging: android/lowmemorykiller: No need for task->signal check

task->signal == NULL is not possible, so no need for these checks.

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/staging/android/lowmemorykiller.c | 8 +------- 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 0755e2f..6e800d3 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -136,19 +136,13 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) rcu_read_lock(); for_each_process(tsk) { struct task_struct *p; - struct signal_struct *sig; int oom_adj;

p = find_lock_task_mm(tsk); if (!p) continue;

- sig = p->signal; - if (!sig) { - task_unlock(p); - continue; - } - oom_adj = sig->oom_adj; + oom_adj = p->signal->oom_adj; if (oom_adj < min_adj) { task_unlock(p); continue;

-- 1.7.7.6

KOSAKI Motohiro

9:38 p.m.

New subject: [PATCH 5/6] staging: android/lowmemorykiller: No need for task->signal check

...

drivers/staging/android/lowmemorykiller.c | 8 +------- 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 0755e2f..6e800d3 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -136,19 +136,13 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) rcu_read_lock(); for_each_process(tsk) { struct task_struct *p;

struct signal_struct *sig;

int oom_adj;

p = find_lock_task_mm(tsk); if (!p) continue;

sig = p->signal;

if (!sig) {

task_unlock(p);

continue;

}

oom_adj = sig->oom_adj;

oom_adj = p->signal->oom_adj;

if (oom_adj < min_adj) { task_unlock(p); continue;

ack.

Oleg Nesterov

8 Feb 8 Feb

3:35 p.m.

New subject: [PATCH 5/6] staging: android/lowmemorykiller: No need for task->signal check

On 02/06, Anton Vorontsov wrote:

...

task->signal == NULL is not possible, so no need for these checks.

Reviewed-by: Oleg Nesterov oleg@redhat.com

Anton Vorontsov

6 Feb 6 Feb

4:30 p.m.

New subject: [PATCH 6/6] staging: android/lowmemorykiller: Do not kill kernel threads

LMK should not try killing kernel threads.

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/staging/android/lowmemorykiller.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 6e800d3..ae38c39 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -138,6 +138,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) struct task_struct *p; int oom_adj;

+ if (tsk->flags & PF_KTHREAD) + continue; + p = find_lock_task_mm(tsk); if (!p) continue;

-- 1.7.7.6

KOSAKI Motohiro

9:38 p.m.

New subject: [PATCH 6/6] staging: android/lowmemorykiller: Do not kill kernel threads

2012/2/6 Anton Vorontsov anton.vorontsov@linaro.org:

...

LMK should not try killing kernel threads.

Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org

drivers/staging/android/lowmemorykiller.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 6e800d3..ae38c39 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -138,6 +138,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) struct task_struct *p; int oom_adj;

if (tsk->flags & PF_KTHREAD)

continue;

ack.

Oleg Nesterov

8 Feb 8 Feb

3:36 p.m.

New subject: [PATCH 6/6] staging: android/lowmemorykiller: Do not kill kernel threads

On 02/06, Anton Vorontsov wrote:

...

LMK should not try killing kernel threads.

Reviewed-by: Oleg Nesterov oleg@redhat.com

5230

days inactive

5238

days old

linaro-kernel@lists.linaro.org

34 comments

participants

tags (0)

participants (4)

Anton Vorontsov
Greg KH
KOSAKI Motohiro
Oleg Nesterov