Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.
For LMK, it is sufficient to surround tasks list traverse loop with rcu_read_{,un}lock() pair.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---
Also thanks to Eric and Paul for the ideas.
drivers/staging/android/lowmemorykiller.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 2d8d2b7..b3ade2f 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -34,6 +34,7 @@ #include <linux/mm.h> #include <linux/oom.h> #include <linux/sched.h> +#include <linux/rcupdate.h> #include <linux/profile.h> #include <linux/notifier.h>
@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;
- read_lock(&tasklist_lock); + rcu_read_lock(); for_each_process(p) { struct mm_struct *mm; struct signal_struct *sig; @@ -185,7 +186,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n", sc->nr_to_scan, sc->gfp_mask, rem); - read_unlock(&tasklist_lock); + rcu_read_unlock(); return rem; }
On 02/01, Anton Vorontsov wrote:
@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;
- read_lock(&tasklist_lock);
- rcu_read_lock();
This has the same problem, force_sig() becomes unsafe.
Why do you need force_? Do you really want to kill /sbin/init (or sub-namespace init) ?
We could change force_sig_info() to use lock_task_sighand(), but I'd like to avoid this. Imho, this interface should be cleanuped, and it should be used for synchronous signals only.
With or without this patch, sig == NULL is not possible but !mm is not right, there could be other other threads with mm != NULL.
Oleg.
On Thu, Feb 02, 2012 at 01:54:41PM +0100, Oleg Nesterov wrote:
On 02/01, Anton Vorontsov wrote:
@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;
- read_lock(&tasklist_lock);
- rcu_read_lock();
This has the same problem, force_sig() becomes unsafe.
Ouch, I think I finally got it. So, lock_task_sighand() is trying to gracefully grab the lock, checking if the sighand is not NULL (which means, per __exit_signal(), that the task is halfway into the grave).
Well, it seems that such a behaviour of force_sig() is not quite obvious, and there are other offenders out there. E.g. in sysrq code I don't see anything that prevent the same race.
static void send_sig_all(int sig) { struct task_struct *p;
for_each_process(p) { if (p->mm && !is_global_init(p)) /* Not swapper, init nor kernel thread */ force_sig(sig, p); } }
Would the following fix work for the sysrq?
- - - - From: Anton Vorontsov anton.vorontsov@linaro.org Subject: [PATCH] sysrq: Fix unsafe operations on tasks
sysrq should grab the tasklist lock, otherwise calling force_sig() is not safe, as it might race with exiting task, which ->sighand might be set to NULL already.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/tty/sysrq.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 7867b7c..a1bcad7 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -322,11 +322,13 @@ static void send_sig_all(int sig) { struct task_struct *p;
+ read_lock(&tasklist_lock); for_each_process(p) { if (p->mm && !is_global_init(p)) /* Not swapper, init nor kernel thread */ force_sig(sig, p); } + read_unlock(&tasklist_lock); }
static void sysrq_handle_term(int key) - - - -
But for LMK I will use send_signal(), as LMK is special.
Plus, while I'm at it, might want to review a bit closer other offenders, and fix them as well.
Why do you need force_? Do you really want to kill /sbin/init (or sub-namespace init) ?
Nope.
We could change force_sig_info() to use lock_task_sighand(), but I'd like to avoid this. Imho, this interface should be cleanuped, and it should be used for synchronous signals only.
OK. Then we should fix the users?
With or without this patch, sig == NULL is not possible but !mm is not right, there could be other other threads with mm != NULL.
I'm not sure I completely follow. In the current LMK code, we check for !mm because otherwise the potential victim is useless for us (i.e. killing it will not free much memory anyway).
Thanks!
Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.
For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().
From now on using force_sig() is not safe, as it can race with an
already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/staging/android/lowmemorykiller.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 2d8d2b7..63da844 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -34,6 +34,7 @@ #include <linux/mm.h> #include <linux/oom.h> #include <linux/sched.h> +#include <linux/rcupdate.h> #include <linux/profile.h> #include <linux/notifier.h>
@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;
- read_lock(&tasklist_lock); + rcu_read_lock(); for_each_process(p) { struct mm_struct *mm; struct signal_struct *sig; @@ -180,12 +181,12 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) lowmem_deathpending = selected; task_handoff_register(&task_nb); #endif - force_sig(SIGKILL, selected); + send_sig(SIGKILL, selected, 0); rem -= selected_tasksize; } lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n", sc->nr_to_scan, sc->gfp_mask, rem); - read_unlock(&tasklist_lock); + rcu_read_unlock(); return rem; }
On 02/03, Anton Vorontsov wrote:
@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;
- read_lock(&tasklist_lock);
- rcu_read_lock(); for_each_process(p) { struct mm_struct *mm; struct signal_struct *sig;
@@ -180,12 +181,12 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) lowmem_deathpending = selected; task_handoff_register(&task_nb); #endif
force_sig(SIGKILL, selected);
rem -= selected_tasksize; } lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n", sc->nr_to_scan, sc->gfp_mask, rem);send_sig(SIGKILL, selected, 0);
- read_unlock(&tasklist_lock);
- rcu_read_unlock();
I think this is correct. As for ->mm check please look at find_lock_task_mm().
You can also remove the !sig check.
And, forgot to mention. There is another reason why mm != NULL check is wrong (send_sig_all too). A kernel thread can do use_mm(). You should also check PF_KTHREAD.
Oleg.
On 02/02, Anton Vorontsov wrote:
On Thu, Feb 02, 2012 at 01:54:41PM +0100, Oleg Nesterov wrote:
This has the same problem, force_sig() becomes unsafe.
Ouch, I think I finally got it. So, lock_task_sighand() is trying to gracefully grab the lock, checking if the sighand is not NULL (which means, per __exit_signal(), that the task is halfway into the grave).
Yes.
Would the following fix work for the sysrq?
From: Anton Vorontsov anton.vorontsov@linaro.org Subject: [PATCH] sysrq: Fix unsafe operations on tasks
sysrq should grab the tasklist lock, otherwise calling force_sig() is not safe, as it might race with exiting task, which ->sighand might be set to NULL already.
And there are other reasons for tasklist. It is not clear why for_each_process() itself is safe. OK, currently (afaics) the caller disables irqs, but in theory this doesn't imply rcu_lock.
And it can race with fork() and miss the task, although mostly in theory.
--- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -322,11 +322,13 @@ static void send_sig_all(int sig) { struct task_struct *p;
- read_lock(&tasklist_lock); for_each_process(p) { if (p->mm && !is_global_init(p)) /* Not swapper, init nor kernel thread */ force_sig(sig, p); }
- read_unlock(&tasklist_lock);
Agreed, but force_sig() should not be used anyway. It sends the signal to the thread, we need to kill the process. The main thread can exit.
With or without this patch, sig == NULL is not possible but !mm is not right, there could be other other threads with mm != NULL.
I'm not sure I completely follow. In the current LMK code, we check for !mm because otherwise the potential victim is useless for us (i.e. killing it will not free much memory anyway).
This is the common mistake. Consider this program:
void *thread_func(...) { use_a_lot_of_memory(); }
int main(void) { pthread_create(..., thread_func, ...); pthread_exit(); }
lowmem_shrink() will skip this task because p->mm == NULL.
Oleg.
This is needed so that callers would not get 'context imbalance' warnings from the sparse tool.
As a side effect, this patch fixes the following sparse warnings:
CHECK mm/oom_kill.c mm/oom_kill.c:201:28: warning: context imbalance in 'oom_badness' - unexpected unlock include/linux/rcupdate.h:249:30: warning: context imbalance in 'dump_tasks' - unexpected unlock mm/oom_kill.c:453:9: warning: context imbalance in 'oom_kill_task' - unexpected unlock CHECK mm/memcontrol.c ... mm/memcontrol.c:1130:17: warning: context imbalance in 'task_in_mem_cgroup' - unexpected unlock
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- include/linux/oom.h | 11 ++++++++++- mm/oom_kill.c | 2 +- 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h index 552fba9..e64bfd2 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -21,6 +21,7 @@
#ifdef __KERNEL__
+#include <linux/compiler.h> #include <linux/sched.h> #include <linux/types.h> #include <linux/nodemask.h> @@ -65,7 +66,15 @@ static inline void oom_killer_enable(void) oom_killer_disabled = false; }
-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p); + +#define find_lock_task_mm(p) \ +({ \ + struct task_struct *__ret; \ + \ + __cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \ + __ret; \ +})
/* sysctls */ extern int sysctl_oom_dump_tasks; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 2958fd8..0ebb383 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -136,7 +136,7 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk, * pointer. Return p, or any of its subthreads with a valid ->mm, with * task_lock() held. */ -struct task_struct *find_lock_task_mm(struct task_struct *p) +struct task_struct *__find_lock_task_mm(struct task_struct *p) { struct task_struct *t = p;
On Mon, Feb 06, 2012 at 08:29:30PM +0400, Anton Vorontsov wrote:
This is needed so that callers would not get 'context imbalance' warnings from the sparse tool.
As a side effect, this patch fixes the following sparse warnings:
CHECK mm/oom_kill.c mm/oom_kill.c:201:28: warning: context imbalance in 'oom_badness' - unexpected unlock include/linux/rcupdate.h:249:30: warning: context imbalance in 'dump_tasks' - unexpected unlock mm/oom_kill.c:453:9: warning: context imbalance in 'oom_kill_task' - unexpected unlock CHECK mm/memcontrol.c ... mm/memcontrol.c:1130:17: warning: context imbalance in 'task_in_mem_cgroup' - unexpected unlock
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
include/linux/oom.h | 11 ++++++++++- mm/oom_kill.c | 2 +- 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h index 552fba9..e64bfd2 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -21,6 +21,7 @@ #ifdef __KERNEL__ +#include <linux/compiler.h> #include <linux/sched.h> #include <linux/types.h> #include <linux/nodemask.h> @@ -65,7 +66,15 @@ static inline void oom_killer_enable(void) oom_killer_disabled = false; } -extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p);
+#define find_lock_task_mm(p) \ +({ \
- struct task_struct *__ret; \
\
- __cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \
- __ret; \
+})
Please use the proper "do...while" style thing here for multi-line, complex #defines like the rest of the kernel does so that you don't end up debugging horrible problems later.
And I would need acks from the -mm developers before I could apply any of this.
thanks,
greg k-h
On Mon, Feb 06, 2012 at 08:35:42AM -0800, Greg KH wrote: [...]
-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p);
+#define find_lock_task_mm(p) \ +({ \
- struct task_struct *__ret; \
\
- __cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \
- __ret; \
+})
Please use the proper "do...while" style thing here for multi-line, complex #defines like the rest of the kernel does so that you don't end up debugging horrible problems later.
Unfortunately this isn't possible in this case. Unlike '({})' GCC extension, do-while statement does not evaluate to a value, i.e. 'x = do { 123; } while (0);' is illegal.
Thanks,
On Mon, Feb 06, 2012 at 10:59:09PM +0400, Anton Vorontsov wrote:
On Mon, Feb 06, 2012 at 08:35:42AM -0800, Greg KH wrote: [...]
-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p);
+#define find_lock_task_mm(p) \ +({ \
- struct task_struct *__ret; \
\
- __cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \
- __ret; \
+})
Please use the proper "do...while" style thing here for multi-line, complex #defines like the rest of the kernel does so that you don't end up debugging horrible problems later.
Unfortunately this isn't possible in this case. Unlike '({})' GCC extension, do-while statement does not evaluate to a value, i.e. 'x = do { 123; } while (0);' is illegal.
Ah, you are right, my bad, sorry about that.
greg k-h
2012/2/6 Greg KH gregkh@linuxfoundation.org:
On Mon, Feb 06, 2012 at 10:59:09PM +0400, Anton Vorontsov wrote:
On Mon, Feb 06, 2012 at 08:35:42AM -0800, Greg KH wrote: [...]
-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p);
+#define find_lock_task_mm(p) \ +({ \
- struct task_struct *__ret; \
- \
- __cond_lock(&(p)->alloc_lock, __ret = __find_lock_task_mm(p)); \
- __ret; \
+})
Please use the proper "do...while" style thing here for multi-line, complex #defines like the rest of the kernel does so that you don't end up debugging horrible problems later.
Unfortunately this isn't possible in this case. Unlike '({})' GCC extension, do-while statement does not evaluate to a value, i.e. 'x = do { 123; } while (0);' is illegal.
Ah, you are right, my bad, sorry about that.
greg k-h
Some __cond_lock() caller are inline functions. Is this bad? inline function is always recommended than macros.
This is needed so that callers would not get 'context imbalance' warnings from the sparse tool.
As a side effect, this patch fixes the following sparse warnings:
CHECK mm/oom_kill.c mm/oom_kill.c:201:28: warning: context imbalance in 'oom_badness' - unexpected unlock include/linux/rcupdate.h:249:30: warning: context imbalance in 'dump_tasks' - unexpected unlock mm/oom_kill.c:453:9: warning: context imbalance in 'oom_kill_task' - unexpected unlock CHECK mm/memcontrol.c ... mm/memcontrol.c:1130:17: warning: context imbalance in 'task_in_mem_cgroup' - unexpected unlock
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---
On Mon, Feb 06, 2012 at 04:27:32PM -0500, KOSAKI Motohiro wrote: [...]
Unfortunately this isn't possible in this case. Unlike '({})' GCC extension, do-while statement does not evaluate to a value, i.e. 'x = do { 123; } while (0);' is illegal.
Ah, you are right, my bad, sorry about that.
greg k-h
Some __cond_lock() caller are inline functions. Is this bad?
No, that's great, actually. :-) Not obvious, but seems like sparse understands __cond_lock in inline functions, so I'd better use it.
Thanks,
include/linux/oom.h | 12 +++++++++++- mm/oom_kill.c | 2 +- 2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/linux/oom.h b/include/linux/oom.h index 552fba9..7c8946a 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -21,6 +21,7 @@
#ifdef __KERNEL__
+#include <linux/compiler.h> #include <linux/sched.h> #include <linux/types.h> #include <linux/nodemask.h> @@ -65,7 +66,16 @@ static inline void oom_killer_enable(void) oom_killer_disabled = false; }
-extern struct task_struct *find_lock_task_mm(struct task_struct *p); +extern struct task_struct *__find_lock_task_mm(struct task_struct *p); + +static inline struct task_struct *find_lock_task_mm(struct task_struct *p) +{ + struct task_struct *ret; + + ret = __find_lock_task_mm(p); + (void)__cond_lock(&p->alloc_lock, ret); + return ret; +}
/* sysctls */ extern int sysctl_oom_dump_tasks; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 2958fd8..0ebb383 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -136,7 +136,7 @@ static bool has_intersects_mems_allowed(struct task_struct *tsk, * pointer. Return p, or any of its subthreads with a valid ->mm, with * task_lock() held. */ -struct task_struct *find_lock_task_mm(struct task_struct *p) +struct task_struct *__find_lock_task_mm(struct task_struct *p) { struct task_struct *t = p;
Some __cond_lock() caller are inline functions. Is this bad?
No, that's great, actually. :-) Not obvious, but seems like sparse understands __cond_lock in inline functions, so I'd better use it.
Thanks,
Great. So, to this version
Acked-by: KOSAKI Motohiro kosaki.motohiro@jp.fujitsu.com
On 02/07, Anton Vorontsov wrote:
On Mon, Feb 06, 2012 at 04:27:32PM -0500, KOSAKI Motohiro wrote:
Some __cond_lock() caller are inline functions. Is this bad?
No, that's great, actually. :-) Not obvious, but seems like sparse understands __cond_lock in inline functions, so I'd better use it.
Hmm, great...
may be you can update lock_task_sighand() too ? (in a separate patch of course).
Oleg.
It appears that sparse tool understands static inline functions for context balance checking, so let's turn the macros into an inline func.
This makes the code a little bit more robust.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---
On Wed, Feb 08, 2012 at 04:27:50PM +0100, Oleg Nesterov wrote:
On 02/07, Anton Vorontsov wrote:
On Mon, Feb 06, 2012 at 04:27:32PM -0500, KOSAKI Motohiro wrote:
Some __cond_lock() caller are inline functions. Is this bad?
No, that's great, actually. :-) Not obvious, but seems like sparse understands __cond_lock in inline functions, so I'd better use it.
Hmm, great...
may be you can update lock_task_sighand() too ? (in a separate patch of course).
Sure thing. Here it goes...
include/linux/sched.h | 15 +++++++++------ 1 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h index e82f721..22ae10e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2394,12 +2394,15 @@ static inline void task_unlock(struct task_struct *p) extern struct sighand_struct *__lock_task_sighand(struct task_struct *tsk, unsigned long *flags);
-#define lock_task_sighand(tsk, flags) \ -({ struct sighand_struct *__ss; \ - __cond_lock(&(tsk)->sighand->siglock, \ - (__ss = __lock_task_sighand(tsk, flags))); \ - __ss; \ -}) \ +static inline struct sighand_struct *lock_task_sighand(struct task_struct *tsk, + unsigned long *flags) +{ + struct sighand_struct *ret; + + ret = __lock_task_sighand(tsk, flags); + (void)__cond_lock(&tsk->sighand->siglock, ret); + return ret; +}
static inline void unlock_task_sighand(struct task_struct *tsk, unsigned long *flags)
Sparse flood makes it hard to catch newly-introduced warnings. So let's fix the the sparse warnings in the oom killer:
CHECK mm/oom_kill.c mm/oom_kill.c:139:20: warning: context imbalance in '__find_lock_task_mm' - wrong count at exit mm/oom_kill.c:771:9: warning: context imbalance in 'out_of_memory' - different lock contexts for basic block
The first one is fixed by assuring sparse that we know that we exit with the lock held.
The second one is caused by the fact that sparse isn't smart enough to handle noreturn attribute.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- mm/oom_kill.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0ebb383..49569b6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -142,8 +142,14 @@ struct task_struct *__find_lock_task_mm(struct task_struct *p)
do { task_lock(t); - if (likely(t->mm)) + if (likely(t->mm)) { + /* + * Shut up sparse: we do know that we exit w/ the + * task locked. + */ + __release(&t->alloc_loc); return t; + } task_unlock(t); } while_each_thread(p, t);
@@ -766,6 +772,7 @@ retry: dump_header(NULL, gfp_mask, order, NULL, mpol_mask); read_unlock(&tasklist_lock); panic("Out of memory and no killable processes...\n"); + return; }
if (oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
2012/2/6 Anton Vorontsov anton.vorontsov@linaro.org:
Sparse flood makes it hard to catch newly-introduced warnings. So let's fix the the sparse warnings in the oom killer:
CHECK mm/oom_kill.c mm/oom_kill.c:139:20: warning: context imbalance in '__find_lock_task_mm' - wrong count at exit mm/oom_kill.c:771:9: warning: context imbalance in 'out_of_memory' - different lock contexts for basic block
The first one is fixed by assuring sparse that we know that we exit with the lock held.
The second one is caused by the fact that sparse isn't smart enough to handle noreturn attribute.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
mm/oom_kill.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0ebb383..49569b6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -142,8 +142,14 @@ struct task_struct *__find_lock_task_mm(struct task_struct *p)
do { task_lock(t);
- if (likely(t->mm))
- if (likely(t->mm)) {
- /*
- * Shut up sparse: we do know that we exit w/ the
- * task locked.
- */
- __release(&t->alloc_loc);
task struct only have allock_lock, not alloc_loc. Moreover we don't release the lock in this code path. Seems odd.
return t;
- }
task_unlock(t); } while_each_thread(p, t);
@@ -766,6 +772,7 @@ retry: dump_header(NULL, gfp_mask, order, NULL, mpol_mask); read_unlock(&tasklist_lock); panic("Out of memory and no killable processes...\n");
- return;
}
if (oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
1.7.7.6
Sparse flood makes it hard to catch newly-introduced warnings. So let's fix the the sparse warnings in the oom killer:
CHECK mm/oom_kill.c mm/oom_kill.c:139:20: warning: context imbalance in '__find_lock_task_mm' - wrong count at exit mm/oom_kill.c:771:9: warning: context imbalance in 'out_of_memory' - different lock contexts for basic block
The first one is fixed by assuring sparse that we know that we exit with the lock held.
The second one is caused by the fact that sparse isn't smart enough to handle noreturn attribute.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org ---
On Mon, Feb 06, 2012 at 04:33:26PM -0500, KOSAKI Motohiro wrote:
2012/2/6 Anton Vorontsov anton.vorontsov@linaro.org:
Sparse flood makes it hard to catch newly-introduced warnings. So let's fix the the sparse warnings in the oom killer:
CHECK mm/oom_kill.c mm/oom_kill.c:139:20: warning: context imbalance in '__find_lock_task_mm' - wrong count at exit mm/oom_kill.c:771:9: warning: context imbalance in 'out_of_memory' - different lock contexts for basic block
The first one is fixed by assuring sparse that we know that we exit with the lock held.
The second one is caused by the fact that sparse isn't smart enough to handle noreturn attribute.
Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
mm/oom_kill.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0ebb383..49569b6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -142,8 +142,14 @@ struct task_struct *__find_lock_task_mm(struct task_struct *p)
do { task_lock(t);
- if (likely(t->mm))
- if (likely(t->mm)) {
- /*
- * Shut up sparse: we do know that we exit w/ the
- * task locked.
- */
- __release(&t->alloc_loc);
task struct only have allock_lock, not alloc_loc.
Funnily, but sparse does not care. :-) __release(foo) will work as well. Seems like sparse counts locking balance globally.
This is now fixed in the patch down below, thanks for catching.
Moreover we don't release the lock in this code path. Seems odd.
Indeed. That's exactly what sparse seeing is as well. We exit without releasing the lock, which is bad (in sparse' eyes). So we lie to sparse, telling it that we do release, so it shut ups.
Usually, we annotate functions with __releases() and __acquires() when when functions releases/acquires one of its arguments, but in this case the function returns a locked value, and it seems that there's no way to properly annotate this case.
mm/oom_kill.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0ebb383..1914367 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -142,8 +142,14 @@ struct task_struct *__find_lock_task_mm(struct task_struct *p)
do { task_lock(t); - if (likely(t->mm)) + if (likely(t->mm)) { + /* + * Shut up sparse: we do know that we exit w/ the + * task locked. + */ + __release(&t->alloc_lock); return t; + } task_unlock(t); } while_each_thread(p, t);
@@ -766,6 +772,7 @@ retry: dump_header(NULL, gfp_mask, order, NULL, mpol_mask); read_unlock(&tasklist_lock); panic("Out of memory and no killable processes...\n"); + return; }
if (oom_kill_process(p, gfp_mask, order, points, totalpages, NULL,
task struct only have allock_lock, not alloc_loc.
Funnily, but sparse does not care. :-) __release(foo) will work as well. Seems like sparse counts locking balance globally.
This is now fixed in the patch down below, thanks for catching.
Moreover we don't release the lock in this code path. Seems odd.
Indeed. That's exactly what sparse seeing is as well. We exit without releasing the lock, which is bad (in sparse' eyes). So we lie to sparse, telling it that we do release, so it shut ups.
Hmmm....
To be honest, I really dislike any lie annotation. Why? It is very fragile and easily become broken from unrelated but near line changes. Please consider to enhance sparse at first.
On Tue, Feb 07, 2012 at 12:38:58AM -0500, KOSAKI Motohiro wrote:
task struct only have allock_lock, not alloc_loc.
Funnily, but sparse does not care. :-) __release(foo) will work as well. Seems like sparse counts locking balance globally.
This is now fixed in the patch down below, thanks for catching.
Moreover we don't release the lock in this code path. Seems odd.
Indeed. That's exactly what sparse seeing is as well. We exit without releasing the lock, which is bad (in sparse' eyes). So we lie to sparse, telling it that we do release, so it shut ups.
Hmmm....
To be honest, I really dislike any lie annotation. Why? It is very fragile and easily become broken from unrelated but near line changes. Please consider to enhance sparse at first.
I somewhat doubt that it is possible to "enhance it". We keep the lock held conditionaly, so we need too place the hint in the code itself. I believe the best we can do would be something like this:
diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 4a24354..61d91f2 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -14,6 +14,7 @@ # define __releases(x) __attribute__((context(x,1,0))) # define __acquire(x) __context__(x,1) # define __release(x) __context__(x,-1) +# define __ret_with_lock(x) __context__(x,-1) # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) # define __percpu __attribute__((noderef, address_space(3))) #ifdef CONFIG_SPARSE_RCU_POINTER @@ -37,6 +38,7 @@ extern void __chk_io_ptr(const volatile void __iomem *); # define __releases(x) # define __acquire(x) (void)0 # define __release(x) (void)0 +# define __ret_with_lock(x) # define __cond_lock(x,c) (c) # define __percpu # define __rcu
And then use it instead of __release().
(2/7/12 1:23 AM), Anton Vorontsov wrote:
On Tue, Feb 07, 2012 at 12:38:58AM -0500, KOSAKI Motohiro wrote:
task struct only have allock_lock, not alloc_loc.
Funnily, but sparse does not care. :-) __release(foo) will work as well. Seems like sparse counts locking balance globally.
This is now fixed in the patch down below, thanks for catching.
Moreover we don't release the lock in this code path. Seems odd.
Indeed. That's exactly what sparse seeing is as well. We exit without releasing the lock, which is bad (in sparse' eyes). So we lie to sparse, telling it that we do release, so it shut ups.
Hmmm....
To be honest, I really dislike any lie annotation. Why? It is very fragile and easily become broken from unrelated but near line changes. Please consider to enhance sparse at first.
I somewhat doubt that it is possible to "enhance it". We keep the lock held conditionaly, so we need too place the hint in the code itself. I believe the best we can do would be something like this:
diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 4a24354..61d91f2 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -14,6 +14,7 @@ # define __releases(x) __attribute__((context(x,1,0))) # define __acquire(x) __context__(x,1) # define __release(x) __context__(x,-1) +# define __ret_with_lock(x) __context__(x,-1) # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) # define __percpu __attribute__((noderef, address_space(3))) #ifdef CONFIG_SPARSE_RCU_POINTER @@ -37,6 +38,7 @@ extern void __chk_io_ptr(const volatile void __iomem *); # define __releases(x) # define __acquire(x) (void)0 # define __release(x) (void)0 +# define __ret_with_lock(x) # define __cond_lock(x,c) (c) # define __percpu # define __rcu
And then use it instead of __release().
Hmmm...
I still dislike this lie annotation. But maybe it's better than nothing (dunno, just guess). Go ahead.
Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.
For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().
From now on using force_sig() is not safe, as it can race with an
already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/staging/android/lowmemorykiller.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 2d8d2b7..63da844 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -34,6 +34,7 @@ #include <linux/mm.h> #include <linux/oom.h> #include <linux/sched.h> +#include <linux/rcupdate.h> #include <linux/profile.h> #include <linux/notifier.h>
@@ -132,7 +133,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) } selected_oom_adj = min_adj;
- read_lock(&tasklist_lock); + rcu_read_lock(); for_each_process(p) { struct mm_struct *mm; struct signal_struct *sig; @@ -180,12 +181,12 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) lowmem_deathpending = selected; task_handoff_register(&task_nb); #endif - force_sig(SIGKILL, selected); + send_sig(SIGKILL, selected, 0); rem -= selected_tasksize; } lowmem_print(4, "lowmem_shrink %lu, %x, return %d\n", sc->nr_to_scan, sc->gfp_mask, rem); - read_unlock(&tasklist_lock); + rcu_read_unlock(); return rem; }
On Mon, Feb 06, 2012 at 08:29:41PM +0400, Anton Vorontsov wrote:
Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.
For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().
From now on using force_sig() is not safe, as it can race with an
already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
Are these last 4 patches independant of the first 2 and can be taken through the staging tree now?
thanks,
greg k-h
On Mon, Feb 06, 2012 at 08:36:49AM -0800, Greg KH wrote:
On Mon, Feb 06, 2012 at 08:29:41PM +0400, Anton Vorontsov wrote:
Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.
For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().
From now on using force_sig() is not safe, as it can race with an
already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
Are these last 4 patches independant of the first 2 and can be taken through the staging tree now?
Yep, without the first two there is just a bit of sparse warnings. Not a big deal.
Thanks,
On Mon, Feb 06, 2012 at 08:42:15PM +0400, Anton Vorontsov wrote:
On Mon, Feb 06, 2012 at 08:36:49AM -0800, Greg KH wrote:
On Mon, Feb 06, 2012 at 08:29:41PM +0400, Anton Vorontsov wrote:
Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.
For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().
From now on using force_sig() is not safe, as it can race with an
already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
Are these last 4 patches independant of the first 2 and can be taken through the staging tree now?
Yep, without the first two there is just a bit of sparse warnings. Not a big deal.
Ok, I'll take the last 4, the first 2 needs to go through some other tree (-mm?)
thanks,
greg k-h
On 02/06, Anton Vorontsov wrote:
Grabbing tasklist_lock has its disadvantages, i.e. it blocks process creation and destruction. If there are lots of processes, blocking doesn't sound as a great idea.
For LMK, it is sufficient to surround tasks list traverse with rcu_read_{,un}lock().
From now on using force_sig() is not safe, as it can race with an already exiting task, so we use send_sig() now. As a downside, it won't kill PID namespace init processes, but that's not what we want anyway.
Reviewed-by: Oleg Nesterov oleg@redhat.com
LMK should not directly check for task->mm. The reason is that the process' threads may exit or detach its mm via use_mm(), but other threads may still have a valid mm. To catch this we use find_lock_task_mm(), which walks up all threads and returns an appropriate task (with lock held).
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/staging/android/lowmemorykiller.c | 16 +++++++++------- 1 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 63da844..0755e2f 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -82,7 +82,7 @@ task_notify_func(struct notifier_block *self, unsigned long val, void *data)
static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) { - struct task_struct *p; + struct task_struct *tsk; struct task_struct *selected = NULL; int rem = 0; int tasksize; @@ -134,15 +134,17 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) selected_oom_adj = min_adj;
rcu_read_lock(); - for_each_process(p) { - struct mm_struct *mm; + for_each_process(tsk) { + struct task_struct *p; struct signal_struct *sig; int oom_adj;
- task_lock(p); - mm = p->mm; + p = find_lock_task_mm(tsk); + if (!p) + continue; + sig = p->signal; - if (!mm || !sig) { + if (!sig) { task_unlock(p); continue; } @@ -151,7 +153,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) task_unlock(p); continue; } - tasksize = get_mm_rss(mm); + tasksize = get_mm_rss(p->mm); task_unlock(p); if (tasksize <= 0) continue;
static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) {
- struct task_struct *p;
- struct task_struct *tsk;
struct task_struct *selected = NULL; int rem = 0; int tasksize; @@ -134,15 +134,17 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) selected_oom_adj = min_adj;
rcu_read_lock();
- for_each_process(p) {
- struct mm_struct *mm;
- for_each_process(tsk) {
- struct task_struct *p;
struct signal_struct *sig; int oom_adj;
- task_lock(p);
- mm = p->mm;
- p = find_lock_task_mm(tsk);
- if (!p)
- continue;
sig = p->signal;
- if (!mm || !sig) {
- if (!sig) {
task_unlock(p); continue; } @@ -151,7 +153,7 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) task_unlock(p); continue; }
- tasksize = get_mm_rss(mm);
- tasksize = get_mm_rss(p->mm);
task_unlock(p); if (tasksize <= 0) continue;
ack.
On 02/06, Anton Vorontsov wrote:
LMK should not directly check for task->mm. The reason is that the process' threads may exit or detach its mm via use_mm(), but other threads may still have a valid mm. To catch this we use find_lock_task_mm(), which walks up all threads and returns an appropriate task (with lock held).
Reviewed-by: Oleg Nesterov oleg@redhat.com
task->signal == NULL is not possible, so no need for these checks.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/staging/android/lowmemorykiller.c | 8 +------- 1 files changed, 1 insertions(+), 7 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 0755e2f..6e800d3 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -136,19 +136,13 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) rcu_read_lock(); for_each_process(tsk) { struct task_struct *p; - struct signal_struct *sig; int oom_adj;
p = find_lock_task_mm(tsk); if (!p) continue;
- sig = p->signal; - if (!sig) { - task_unlock(p); - continue; - } - oom_adj = sig->oom_adj; + oom_adj = p->signal->oom_adj; if (oom_adj < min_adj) { task_unlock(p); continue;
drivers/staging/android/lowmemorykiller.c | 8 +------- 1 files changed, 1 insertions(+), 7 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 0755e2f..6e800d3 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -136,19 +136,13 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) rcu_read_lock(); for_each_process(tsk) { struct task_struct *p;
- struct signal_struct *sig;
int oom_adj;
p = find_lock_task_mm(tsk); if (!p) continue;
- sig = p->signal;
- if (!sig) {
- task_unlock(p);
- continue;
- }
- oom_adj = sig->oom_adj;
- oom_adj = p->signal->oom_adj;
if (oom_adj < min_adj) { task_unlock(p); continue;
ack.
On 02/06, Anton Vorontsov wrote:
task->signal == NULL is not possible, so no need for these checks.
Reviewed-by: Oleg Nesterov oleg@redhat.com
LMK should not try killing kernel threads.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org --- drivers/staging/android/lowmemorykiller.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 6e800d3..ae38c39 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -138,6 +138,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) struct task_struct *p; int oom_adj;
+ if (tsk->flags & PF_KTHREAD) + continue; + p = find_lock_task_mm(tsk); if (!p) continue;
2012/2/6 Anton Vorontsov anton.vorontsov@linaro.org:
LMK should not try killing kernel threads.
Suggested-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Anton Vorontsov anton.vorontsov@linaro.org
drivers/staging/android/lowmemorykiller.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c index 6e800d3..ae38c39 100644 --- a/drivers/staging/android/lowmemorykiller.c +++ b/drivers/staging/android/lowmemorykiller.c @@ -138,6 +138,9 @@ static int lowmem_shrink(struct shrinker *s, struct shrink_control *sc) struct task_struct *p; int oom_adj;
- if (tsk->flags & PF_KTHREAD)
- continue;
ack.
On 02/06, Anton Vorontsov wrote:
LMK should not try killing kernel threads.
Reviewed-by: Oleg Nesterov oleg@redhat.com
linaro-kernel@lists.linaro.org