This series is to fix UAF when running kfence test case test_gfpzero, which is time costly. This UAF bug can be easily triggered by setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Furthermore, some optimization for kunit tests has been done.
v1->v2: Change log is updated.
Peng Liu (3): kunit: fix UAF when run kfence test case test_gfpzero kunit: make kunit_test_timeout compatible with comment kfence: test: try to avoid test_gfpzero trigger rcu_stall
lib/kunit/try-catch.c | 3 ++- mm/kfence/kfence_test.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
Kunit will create a new thread to run an actual test case, and the main process will wait for the completion of the actual test thread until overtime. The variable "struct kunit test" has local property in function kunit_try_catch_run, and will be used in the test case thread. Task kunit_try_catch_run will free "struct kunit test" when kunit runs overtime, but the actual test case is still run and an UAF bug will be triggered.
The above problem has been both observed in a physical machine and qemu platform when running kfence kunit tests. The problem can be triggered when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the test case test_gfpzero will cost hours and kunit will run to overtime. The follows show the panic log.
BUG: unable to handle page fault for address: ffffffff82d882e9
Call Trace: kunit_log_append+0x58/0xd0 ... test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test] test_gfpzero.cold+0x61/0x8ab [kfence_test] kunit_try_run_case+0x4c/0x70 kunit_generic_run_threadfn_adapter+0x11/0x20 kthread+0x166/0x190 ret_from_fork+0x22/0x30 Kernel panic - not syncing: Fatal exception Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
To solve this problem, the test case thread should be stopped when the kunit frame runs overtime. The stop signal will send in function kunit_try_catch_run, and test_gfpzero will handle it.
Signed-off-by: Peng Liu liupeng256@huawei.com --- lib/kunit/try-catch.c | 1 + mm/kfence/kfence_test.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c index be38a2c5ecc2..6b3d4db94077 100644 --- a/lib/kunit/try-catch.c +++ b/lib/kunit/try-catch.c @@ -78,6 +78,7 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context) if (time_remaining == 0) { kunit_err(test, "try timed out\n"); try_catch->try_result = -ETIMEDOUT; + kthread_stop(task_struct); }
exit_code = try_catch->try_result; diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 50dbb815a2a8..caed6b4eba94 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -623,7 +623,7 @@ static void test_gfpzero(struct kunit *test) break; test_free(buf2);
- if (i == CONFIG_KFENCE_NUM_OBJECTS) { + if (kthread_should_stop() || (i == CONFIG_KFENCE_NUM_OBJECTS)) { kunit_warn(test, "giving up ... cannot get same object back\n"); return; }
On Wed, 9 Mar 2022 at 09:19, 'Peng Liu' via kasan-dev kasan-dev@googlegroups.com wrote:
Kunit will create a new thread to run an actual test case, and the main process will wait for the completion of the actual test thread until overtime. The variable "struct kunit test" has local property in function kunit_try_catch_run, and will be used in the test case thread. Task kunit_try_catch_run will free "struct kunit test" when kunit runs overtime, but the actual test case is still run and an UAF bug will be triggered.
The above problem has been both observed in a physical machine and qemu platform when running kfence kunit tests. The problem can be triggered when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the test case test_gfpzero will cost hours and kunit will run to overtime. The follows show the panic log.
BUG: unable to handle page fault for address: ffffffff82d882e9
Call Trace: kunit_log_append+0x58/0xd0 ... test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test] test_gfpzero.cold+0x61/0x8ab [kfence_test] kunit_try_run_case+0x4c/0x70 kunit_generic_run_threadfn_adapter+0x11/0x20 kthread+0x166/0x190 ret_from_fork+0x22/0x30 Kernel panic - not syncing: Fatal exception Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
To solve this problem, the test case thread should be stopped when the kunit frame runs overtime. The stop signal will send in function kunit_try_catch_run, and test_gfpzero will handle it.
Signed-off-by: Peng Liu liupeng256@huawei.com
Reviewed-by: Marco Elver elver@google.com
Also Cc'ing more KUnit folks to double-check this is the right solution.
lib/kunit/try-catch.c | 1 + mm/kfence/kfence_test.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c index be38a2c5ecc2..6b3d4db94077 100644 --- a/lib/kunit/try-catch.c +++ b/lib/kunit/try-catch.c @@ -78,6 +78,7 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context) if (time_remaining == 0) { kunit_err(test, "try timed out\n"); try_catch->try_result = -ETIMEDOUT;
kthread_stop(task_struct); } exit_code = try_catch->try_result;
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 50dbb815a2a8..caed6b4eba94 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -623,7 +623,7 @@ static void test_gfpzero(struct kunit *test) break; test_free(buf2);
if (i == CONFIG_KFENCE_NUM_OBJECTS) {
if (kthread_should_stop() || (i == CONFIG_KFENCE_NUM_OBJECTS)) { kunit_warn(test, "giving up ... cannot get same object back\n"); return; }
-- 2.18.0.huawei.25
-- You received this message because you are subscribed to the Google Groups "kasan-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20220309083753.1561921-2-liupeng....
On Wed, Mar 9, 2022 at 3:19 AM 'Peng Liu' via KUnit Development kunit-dev@googlegroups.com wrote:
Kunit will create a new thread to run an actual test case, and the main process will wait for the completion of the actual test thread until overtime. The variable "struct kunit test" has local property in function kunit_try_catch_run, and will be used in the test case thread. Task kunit_try_catch_run will free "struct kunit test" when kunit runs overtime, but the actual test case is still run and an UAF bug will be triggered.
The above problem has been both observed in a physical machine and qemu platform when running kfence kunit tests. The problem can be triggered when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the test case test_gfpzero will cost hours and kunit will run to overtime. The follows show the panic log.
BUG: unable to handle page fault for address: ffffffff82d882e9
Call Trace: kunit_log_append+0x58/0xd0 ... test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test] test_gfpzero.cold+0x61/0x8ab [kfence_test] kunit_try_run_case+0x4c/0x70 kunit_generic_run_threadfn_adapter+0x11/0x20 kthread+0x166/0x190 ret_from_fork+0x22/0x30 Kernel panic - not syncing: Fatal exception Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
To solve this problem, the test case thread should be stopped when the kunit frame runs overtime. The stop signal will send in function kunit_try_catch_run, and test_gfpzero will handle it.
Signed-off-by: Peng Liu liupeng256@huawei.com
Thanks for taking care of this.
Reviewed-by: Brendan Higgins brendanhiggins@google.com
In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC" represent 5min. However, it is wrong when dealing with arm64 whose default HZ = 250, or some other situations. Use msecs_to_jiffies to fix this, and kunit_test_timeout will work as desired.
Fixes: 5f3e06208920 ("kunit: test: add support for test abort") Signed-off-by: Peng Liu liupeng256@huawei.com --- lib/kunit/try-catch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c index 6b3d4db94077..f7825991d576 100644 --- a/lib/kunit/try-catch.c +++ b/lib/kunit/try-catch.c @@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout(void) * If tests timeout due to exceeding sysctl_hung_task_timeout_secs, * the task will be killed and an oops generated. */ - return 300 * MSEC_PER_SEC; /* 5 min */ + return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */ }
void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
On Wed, 9 Mar 2022 at 09:19, 'Peng Liu' via kasan-dev kasan-dev@googlegroups.com wrote:
In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC" represent 5min. However, it is wrong when dealing with arm64 whose default HZ = 250, or some other situations. Use msecs_to_jiffies to fix this, and kunit_test_timeout will work as desired.
Fixes: 5f3e06208920 ("kunit: test: add support for test abort") Signed-off-by: Peng Liu liupeng256@huawei.com
Reviewed-by: Marco Elver elver@google.com
+Cc more KUnit folks.
lib/kunit/try-catch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c index 6b3d4db94077..f7825991d576 100644 --- a/lib/kunit/try-catch.c +++ b/lib/kunit/try-catch.c @@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout(void) * If tests timeout due to exceeding sysctl_hung_task_timeout_secs, * the task will be killed and an oops generated. */
return 300 * MSEC_PER_SEC; /* 5 min */
return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */
}
void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
2.18.0.huawei.25
-- You received this message because you are subscribed to the Google Groups "kasan-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20220309083753.1561921-3-liupeng....
On Wed, Mar 9, 2022 at 2:19 AM 'Peng Liu' via KUnit Development kunit-dev@googlegroups.com wrote:
In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC" represent 5min. However, it is wrong when dealing with arm64 whose default HZ = 250, or some other situations. Use msecs_to_jiffies to fix this, and kunit_test_timeout will work as desired.
Fixes: 5f3e06208920 ("kunit: test: add support for test abort") Signed-off-by: Peng Liu liupeng256@huawei.com
Reviewed-by: Daniel Latypov dlatypov@google.com
Thanks for catching this!
lib/kunit/try-catch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c index 6b3d4db94077..f7825991d576 100644 --- a/lib/kunit/try-catch.c +++ b/lib/kunit/try-catch.c @@ -52,7 +52,7 @@ static unsigned long kunit_test_timeout(void) * If tests timeout due to exceeding sysctl_hung_task_timeout_secs, * the task will be killed and an oops generated. */
return 300 * MSEC_PER_SEC; /* 5 min */
return 300 * msecs_to_jiffies(MSEC_PER_SEC); /* 5 min */
}
void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
2.18.0.huawei.25
-- You received this message because you are subscribed to the Google Groups "KUnit Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to kunit-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kunit-dev/20220309083753.1561921-3-liupeng....
On Wed, Mar 9, 2022 at 3:19 AM 'Peng Liu' via KUnit Development kunit-dev@googlegroups.com wrote:
In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC" represent 5min. However, it is wrong when dealing with arm64 whose default HZ = 250, or some other situations. Use msecs_to_jiffies to fix this, and kunit_test_timeout will work as desired.
Fixes: 5f3e06208920 ("kunit: test: add support for test abort") Signed-off-by: Peng Liu liupeng256@huawei.com
Reviewed-by: Brendan Higgins brendanhiggins@google.com
When CONFIG_KFENCE_NUM_OBJECTS is set to a big number, kfence kunit-test-case test_gfpzero will eat up nearly all the CPU's resources and rcu_stall is reported as the following log which is cut from a physical server.
rcu: INFO: rcu_sched self-detected stall on CPU rcu: 68-....: (14422 ticks this GP) idle=6ce/1/0x4000000000000002 softirq=592/592 fqs=7500 (t=15004 jiffies g=10677 q=20019) Task dump for CPU 68: task:kunit_try_catch state:R running task stack: 0 pid: 9728 ppid: 2 flags:0x0000020a Call trace: dump_backtrace+0x0/0x1e4 show_stack+0x20/0x2c sched_show_task+0x148/0x170 ... rcu_sched_clock_irq+0x70/0x180 update_process_times+0x68/0xb0 tick_sched_handle+0x38/0x74 ... gic_handle_irq+0x78/0x2c0 el1_irq+0xb8/0x140 kfree+0xd8/0x53c test_alloc+0x264/0x310 [kfence_test] test_gfpzero+0xf4/0x840 [kfence_test] kunit_try_run_case+0x48/0x20c kunit_generic_run_threadfn_adapter+0x28/0x34 kthread+0x108/0x13c ret_from_fork+0x10/0x18
To avoid rcu_stall and unacceptable latency, a schedule point is added to test_gfpzero.
Signed-off-by: Peng Liu liupeng256@huawei.com --- mm/kfence/kfence_test.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index caed6b4eba94..1b50f70a4c0f 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -627,6 +627,7 @@ static void test_gfpzero(struct kunit *test) kunit_warn(test, "giving up ... cannot get same object back\n"); return; } + cond_resched(); }
for (i = 0; i < size; i++)
On Wed, 9 Mar 2022 at 09:19, 'Peng Liu' via kasan-dev kasan-dev@googlegroups.com wrote:
When CONFIG_KFENCE_NUM_OBJECTS is set to a big number, kfence kunit-test-case test_gfpzero will eat up nearly all the CPU's resources and rcu_stall is reported as the following log which is cut from a physical server.
rcu: INFO: rcu_sched self-detected stall on CPU rcu: 68-....: (14422 ticks this GP) idle=6ce/1/0x4000000000000002 softirq=592/592 fqs=7500 (t=15004 jiffies g=10677 q=20019) Task dump for CPU 68: task:kunit_try_catch state:R running task stack: 0 pid: 9728 ppid: 2 flags:0x0000020a Call trace: dump_backtrace+0x0/0x1e4 show_stack+0x20/0x2c sched_show_task+0x148/0x170 ... rcu_sched_clock_irq+0x70/0x180 update_process_times+0x68/0xb0 tick_sched_handle+0x38/0x74 ... gic_handle_irq+0x78/0x2c0 el1_irq+0xb8/0x140 kfree+0xd8/0x53c test_alloc+0x264/0x310 [kfence_test] test_gfpzero+0xf4/0x840 [kfence_test] kunit_try_run_case+0x48/0x20c kunit_generic_run_threadfn_adapter+0x28/0x34 kthread+0x108/0x13c ret_from_fork+0x10/0x18
To avoid rcu_stall and unacceptable latency, a schedule point is added to test_gfpzero.
Signed-off-by: Peng Liu liupeng256@huawei.com
Reviewed-by: Marco Elver elver@google.com
mm/kfence/kfence_test.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index caed6b4eba94..1b50f70a4c0f 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -627,6 +627,7 @@ static void test_gfpzero(struct kunit *test) kunit_warn(test, "giving up ... cannot get same object back\n"); return; }
cond_resched(); } for (i = 0; i < size; i++)
-- 2.18.0.huawei.25
-- You received this message because you are subscribed to the Google Groups "kasan-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20220309083753.1561921-4-liupeng....
On Wed, Mar 9, 2022 at 3:19 AM 'Peng Liu' via KUnit Development kunit-dev@googlegroups.com wrote:
This series is to fix UAF when running kfence test case test_gfpzero, which is time costly. This UAF bug can be easily triggered by setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Furthermore, some optimization for kunit tests has been done.
I was able to reproduce the error you described and can confirm that I didn't see the UAF after applying your patches.
Tested-by: Brendan Higgins brendanhiggins@google.com
linux-kselftest-mirror@lists.linaro.org