Currently, watchdog-test keep running until it gets a SIGINT. However, when watchdog-test is executed from the kselftests framework, where it launches test via timeout which will send SIGTERM in time up. This could lead to 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS in silent. 2. kselftests gets an timeout exit code, and judge watchdog-test as 'not ok'
This patch is prepare to fix above 2 issues
Signed-off-by: Li Zhijian lizhijian@fujitsu.com --- Hey, Cover letter is here.
It's notice that a OS reboot was triggerred after ran the watchdog-test in kselftests framwork 'make run_tests', that's because watchdog-test didn't stop feeding the watchdog after enable it.
In addition, current watchdog-test didn't adapt to the kselftests framework which launchs the test with /usr/bin/timeout and no timeout is expected. --- tools/testing/selftests/watchdog/watchdog-test.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c index bc71cbca0dde..2f8fd2670897 100644 --- a/tools/testing/selftests/watchdog/watchdog-test.c +++ b/tools/testing/selftests/watchdog/watchdog-test.c @@ -27,7 +27,7 @@
int fd; const char v = 'V'; -static const char sopts[] = "bdehp:st:Tn:NLf:i"; +static const char sopts[] = "bdehp:st:Tn:NLf:c:i"; static const struct option lopts[] = { {"bootstatus", no_argument, NULL, 'b'}, {"disable", no_argument, NULL, 'd'}, @@ -42,6 +42,7 @@ static const struct option lopts[] = { {"gettimeleft", no_argument, NULL, 'L'}, {"file", required_argument, NULL, 'f'}, {"info", no_argument, NULL, 'i'}, + {"count", required_argument, NULL, 'c'}, {NULL, no_argument, NULL, 0x0} };
@@ -95,6 +96,7 @@ static void usage(char *progname) printf(" -n, --pretimeout=T\tSet the pretimeout to T seconds\n"); printf(" -N, --getpretimeout\tGet the pretimeout\n"); printf(" -L, --gettimeleft\tGet the time left until timer expires\n"); + printf(" -c, --count\tStop after feeding the watchdog count times\n"); printf("\n"); printf("Parameters are parsed left-to-right in real-time.\n"); printf("Example: %s -d -t 10 -p 5 -e\n", progname); @@ -174,7 +176,7 @@ int main(int argc, char *argv[]) unsigned int ping_rate = DEFAULT_PING_RATE; int ret; int c; - int oneshot = 0; + int oneshot = 0, stop = 1, count = 0; char *file = "/dev/watchdog"; struct watchdog_info info; int temperature; @@ -307,6 +309,9 @@ int main(int argc, char *argv[]) else printf("WDIOC_GETTIMELEFT error '%s'\n", strerror(errno)); break; + case 'c': + stop = 0; + count = strtoul(optarg, NULL, 0); case 'f': /* Handled above */ break; @@ -336,8 +341,8 @@ int main(int argc, char *argv[])
signal(SIGINT, term);
- while (1) { - keep_alive(); + while (stop || count--) { + exit_code = keep_alive(); sleep(ping_rate); } end:
So that we can check its exit code to judge whether it succeeds or not.
Signed-off-by: Li Zhijian lizhijian@fujitsu.com --- .../testing/selftests/watchdog/watchdog-test.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c index 2f8fd2670897..d42fc2fd790d 100644 --- a/tools/testing/selftests/watchdog/watchdog-test.c +++ b/tools/testing/selftests/watchdog/watchdog-test.c @@ -51,7 +51,7 @@ static const struct option lopts[] = { * the PC Watchdog card to reset its internal timer so it doesn't trigger * a computer reset. */ -static void keep_alive(void) +static int keep_alive(void) { int dummy; int ret; @@ -59,6 +59,8 @@ static void keep_alive(void) ret = ioctl(fd, WDIOC_KEEPALIVE, &dummy); if (!ret) printf("."); + + return ret; }
/* @@ -175,7 +177,7 @@ int main(int argc, char *argv[]) int flags; unsigned int ping_rate = DEFAULT_PING_RATE; int ret; - int c; + int c, n; int oneshot = 0, stop = 1, count = 0; char *file = "/dev/watchdog"; struct watchdog_info info; @@ -342,7 +344,9 @@ int main(int argc, char *argv[]) signal(SIGINT, term);
while (stop || count--) { - exit_code = keep_alive(); + ret = keep_alive(); + if (ret) + goto end; sleep(ping_rate); } end: @@ -350,9 +354,10 @@ int main(int argc, char *argv[]) * Send specific magic character 'V' just in case Magic Close is * enabled to ensure watchdog gets disabled on close. */ - ret = write(fd, &v, 1); - if (ret < 0) + n = write(fd, &v, 1); + if (n < 0) printf("Stopping watchdog ticks failed (%d)...\n", errno); close(fd); - return 0; + + return ret || n < 0; }
We are ready to run the watchdog tests and check its exit code. An example output is like below: linux/tools/testing/selftests/watchdog# make run_tests TAP version 13 1..1 # timeout set to 45 # selftests: watchdog: run_watchdog-test.sh # ./run_watchdog-test.sh: Feed dog # Watchdog Ticking Away! # .. # ./run_watchdog-test.sh: Show watchdog_info # watchdog_info: # identity: iTCO_wdt # firmware_version: 2 # Support/Status: Set timeout (in seconds) # Support/Status: Supports magic close char # Support/Status: Keep alive ping reply # # ./run_watchdog-test.sh Turn off the watchdog timer # Watchdog card disabled. # Watchdog Ticking Away! # .. # ./run_watchdog-test.sh: Turn off the watchdog timer # Watchdog card enabled. # Watchdog Ticking Away! # .. # ./run_watchdog-test.sh: Set timeout to T seconds # Watchdog card disabled. # Watchdog timeout set to 10 seconds. # Watchdog card enabled. # Watchdog Ticking Away! # .. # ./run_watchdog-test.sh: Get the timeout # WDIOC_GETTIMEOUT returns 10 seconds. # # ./run_watchdog-test.sh: Get the pretimeout to T seconds # Watchdog card disabled. # WDIOC_GETPRETIMEOUT returns 0 seconds. # Watchdog card enabled. # # ./run_watchdog-test.sh Get the time left until timer expires # WDIOC_GETTIMELEFT returns 9 seconds. # # Get status & supported features # Support/Status: Keep alive ping reply # WDIOC_GETTEMP: 'Inappropriate ioctl for device' not ok 1 selftests: watchdog: run_watchdog-test.sh # exit=1
Signed-off-by: Li Zhijian lizhijian@fujitsu.com --- tools/testing/selftests/watchdog/Makefile | 3 +- .../selftests/watchdog/run_watchdog-test.sh | 41 +++++++++++++++++++ 2 files changed, 43 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/watchdog/run_watchdog-test.sh
diff --git a/tools/testing/selftests/watchdog/Makefile b/tools/testing/selftests/watchdog/Makefile index 6b5598b55252..4d1cb1d292a1 100644 --- a/tools/testing/selftests/watchdog/Makefile +++ b/tools/testing/selftests/watchdog/Makefile @@ -1,4 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -TEST_GEN_PROGS := watchdog-test +TEST_GEN_FILES := watchdog-test +TEST_PROGS := run_watchdog-test.sh
include ../lib.mk diff --git a/tools/testing/selftests/watchdog/run_watchdog-test.sh b/tools/testing/selftests/watchdog/run_watchdog-test.sh new file mode 100755 index 000000000000..0ea4b7770f1d --- /dev/null +++ b/tools/testing/selftests/watchdog/run_watchdog-test.sh @@ -0,0 +1,41 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +echo "$0: Feed dog" +./watchdog-test -c 2 || exit + +echo +echo "$0: Show watchdog_info" +./watchdog-test -i -c 2 || exit + +echo +echo "$0 Turn off the watchdog timer" +./watchdog-test -d -c 2 || exit + +echo +echo "$0: Turn on the watchdog timer" +./watchdog-test -e -c 2 || exit + +echo +echo "$0: Set timeout to T seconds" +./watchdog-test -d -t 10 -c 2 -e || exit + +echo +echo "$0: Get the timeout" +./watchdog-test -T -c 2 || exit + +echo +echo "$0: Get the pretimeout to T seconds" +./watchdog-test -d -N -c 2 -e || exit + +echo +echo "$0 Get the time left until timer expires" +./watchdog-test -L -c 2 || exit + +echo +echo "Get status & supported features" +./watchdog-test -s || exit + +echo +echo "$0: Set the pretimeout to T seconds" +./watchdog-test -n 3 -c 2 || exit
On 10/24/24 19:39, Li Zhijian wrote:
Currently, watchdog-test keep running until it gets a SIGINT. However, when watchdog-test is executed from the kselftests framework, where it launches test via timeout which will send SIGTERM in time up. This could lead to
- watchdog haven't stop, a watchdog reset is triggered to reboot the OS in silent.
- kselftests gets an timeout exit code, and judge watchdog-test as 'not ok'
This test isn't really supposed to be run from kselftest framework. This is the reason why it isn't included in the default run.
This patch is prepare to fix above 2 issues
This series needs a separate cover letter explaining how this problem is being fixed.
Signed-off-by: Li Zhijian lizhijian@fujitsu.com
Hey, Cover letter is here.
It's notice that a OS reboot was triggerred after ran the watchdog-test in kselftests framwork 'make run_tests', that's because watchdog-test didn't stop feeding the watchdog after enable it.
In addition, current watchdog-test didn't adapt to the kselftests framework which launchs the test with /usr/bin/timeout and no timeout is expected.
tools/testing/selftests/watchdog/watchdog-test.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c index bc71cbca0dde..2f8fd2670897 100644 --- a/tools/testing/selftests/watchdog/watchdog-test.c +++ b/tools/testing/selftests/watchdog/watchdog-test.c @@ -27,7 +27,7 @@ int fd; const char v = 'V'; -static const char sopts[] = "bdehp:st:Tn:NLf:i"; +static const char sopts[] = "bdehp:st:Tn:NLf:c:i"; static const struct option lopts[] = { {"bootstatus", no_argument, NULL, 'b'}, {"disable", no_argument, NULL, 'd'}, @@ -42,6 +42,7 @@ static const struct option lopts[] = { {"gettimeleft", no_argument, NULL, 'L'}, {"file", required_argument, NULL, 'f'}, {"info", no_argument, NULL, 'i'},
- {"count", required_argument, NULL, 'c'}, {NULL, no_argument, NULL, 0x0} };
@@ -95,6 +96,7 @@ static void usage(char *progname) printf(" -n, --pretimeout=T\tSet the pretimeout to T seconds\n"); printf(" -N, --getpretimeout\tGet the pretimeout\n"); printf(" -L, --gettimeleft\tGet the time left until timer expires\n");
- printf(" -c, --count\tStop after feeding the watchdog count times\n"); printf("\n"); printf("Parameters are parsed left-to-right in real-time.\n"); printf("Example: %s -d -t 10 -p 5 -e\n", progname);
@@ -174,7 +176,7 @@ int main(int argc, char *argv[]) unsigned int ping_rate = DEFAULT_PING_RATE; int ret; int c;
- int oneshot = 0;
- int oneshot = 0, stop = 1, count = 0; char *file = "/dev/watchdog"; struct watchdog_info info; int temperature;
@@ -307,6 +309,9 @@ int main(int argc, char *argv[]) else printf("WDIOC_GETTIMELEFT error '%s'\n", strerror(errno)); break;
case 'c':
stop = 0;
case 'f': /* Handled above */ break;count = strtoul(optarg, NULL, 0);
@@ -336,8 +341,8 @@ int main(int argc, char *argv[]) signal(SIGINT, term);
- while (1) {
keep_alive();
- while (stop || count--) {
sleep(ping_rate); } end:exit_code = keep_alive();
On 27/10/2024 08:28, Shuah Khan wrote:
On 10/24/24 19:39, Li Zhijian wrote:
Currently, watchdog-test keep running until it gets a SIGINT. However, when watchdog-test is executed from the kselftests framework, where it launches test via timeout which will send SIGTERM in time up. This could lead to 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS in silent. 2. kselftests gets an timeout exit code, and judge watchdog-test as 'not ok'
This test isn't really supposed to be run from kselftest framework. This is the reason why it isn't included in the default run.
May I know what's the default run, is it different from `make run_tests` ?
This patch is prepare to fix above 2 issues
This series needs a separate cover letter explaining how this problem is being fixed.
Cover letter is in this patch, see below: In addition, we can get the 'How' by reading the simple change in each change.
Signed-off-by: Li Zhijian lizhijian@fujitsu.com
Hey, Cover letter is here.
It's notice that a OS reboot was triggerred after ran the watchdog-test in kselftests framwork 'make run_tests', that's because watchdog-test didn't stop feeding the watchdog after enable it.
In addition, current watchdog-test didn't adapt to the kselftests framework which launchs the test with /usr/bin/timeout and no timeout is expected.
On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
On 27/10/2024 08:28, Shuah Khan wrote:
On 10/24/24 19:39, Li Zhijian wrote:
Currently, watchdog-test keep running until it gets a SIGINT. However, when watchdog-test is executed from the kselftests framework, where it launches test via timeout which will send SIGTERM in time up. This could lead to 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS in silent. 2. kselftests gets an timeout exit code, and judge watchdog-test as 'not ok'
This test isn't really supposed to be run from kselftest framework. This is the reason why it isn't included in the default run.
May I know what's the default run, is it different from `make run_tests` ?
No it isn't. "make kselftest" runs only the targets mentioned in the selftests Makefile. That is considered the kselftest default run.
There is a reason why watchdog isn't included in the default run. It isn't intended to be run by users by default as this is test is just for testing watchdog api
This patch is prepare to fix above 2 issues
This series needs a separate cover letter explaining how this problem is being fixed.
Cover letter is in this patch, see below: In addition, we can get the 'How' by reading the simple change in each change.
That isn't enough to understand why this change is needed. Send patch series with a cover letter explaining what you are doing.
Signed-off-by: Li Zhijian lizhijian@fujitsu.com
Hey, Cover letter is here.
It's notice that a OS reboot was triggerred after ran the watchdog-test in kselftests framwork 'make run_tests', that's because watchdog-test didn't stop feeding the watchdog after enable it.
In addition, current watchdog-test didn't adapt to the kselftests framework which launchs the test with /usr/bin/timeout and no timeout is expected.
thanks, -- Shuah
On 28/10/2024 11:29, Shuah Khan wrote:
On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
On 27/10/2024 08:28, Shuah Khan wrote:
On 10/24/24 19:39, Li Zhijian wrote:
Currently, watchdog-test keep running until it gets a SIGINT. However, when watchdog-test is executed from the kselftests framework, where it launches test via timeout which will send SIGTERM in time up. This could lead to 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS in silent. 2. kselftests gets an timeout exit code, and judge watchdog-test as 'not ok'
This test isn't really supposed to be run from kselftest framework. This is the reason why it isn't included in the default run.
May I know what's the default run, is it different from `make run_tests` ?
No it isn't. "make kselftest" runs only the targets mentioned in the selftests Makefile. That is considered the kselftest default run.
Hey, Shuah,
Thanks for your explanation. If that is the case, I do not have an urgent need for the current patch, expect I'd like to avoid the reboot issue after an accidentally `make run_tests`
Some changes are make as below, please take a look. I will send it out we reach a consensus.
commit 2296f9d88fde4921758a45bf160a7f1b9d4678a0 (HEAD) Author: Li Zhijian lizhijian@fujitsu.com Date: Mon Oct 28 11:54:03 2024 +0800
selftests/watchdog-test: Fix system accidentally reset after watchdog-test
After `make run_tests` to run watchdog-test, a system reboot would happen due to watchdog not stop. ``` [ 1367.185172] watchdog: watchdog0: watchdog did not stop! ```
Fix it by registering a timeout signal in watchdog-test, where its signal handler will stop the watchdog.
After that # timeout 1 ./watchdog-test Watchdog Ticking Away! . Stopping watchdog ticks...
Signed-off-by: Li Zhijian lizhijian@fujitsu.com
diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c index bc71cbca0dde..97acb90f8b30 100644 --- a/tools/testing/selftests/watchdog/watchdog-test.c +++ b/tools/testing/selftests/watchdog/watchdog-test.c @@ -335,6 +335,10 @@ int main(int argc, char *argv[]) printf("Watchdog Ticking Away!\n");
signal(SIGINT, term); + /* + * Register the timeout signal + */ + signal(SIGTERM, term);
while (1) { keep_alive();
There is a reason why watchdog isn't included in the default run. It isn't intended to be run by users by default as this is test is just for testing watchdog api
This patch is prepare to fix above 2 issues
This series needs a separate cover letter explaining how this problem is being fixed.
Cover letter is in this patch, see below: In addition, we can get the 'How' by reading the simple change in each change.
That isn't enough to understand why this change is needed. Send patch series with a cover letter explaining what you are doing.
Signed-off-by: Li Zhijian lizhijian@fujitsu.com
Hey, Cover letter is here.
It's notice that a OS reboot was triggerred after ran the watchdog-test in kselftests framwork 'make run_tests', that's because watchdog-test didn't stop feeding the watchdog after enable it.
In addition, current watchdog-test didn't adapt to the kselftests framework which launchs the test with /usr/bin/timeout and no timeout is expected.
thanks, -- Shuah
On 10/27/24 22:02, Zhijian Li (Fujitsu) wrote:
On 28/10/2024 11:29, Shuah Khan wrote:
On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
On 27/10/2024 08:28, Shuah Khan wrote:
On 10/24/24 19:39, Li Zhijian wrote:
Currently, watchdog-test keep running until it gets a SIGINT. However, when watchdog-test is executed from the kselftests framework, where it launches test via timeout which will send SIGTERM in time up. This could lead to 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS in silent. 2. kselftests gets an timeout exit code, and judge watchdog-test as 'not ok'
This test isn't really supposed to be run from kselftest framework. This is the reason why it isn't included in the default run.
May I know what's the default run, is it different from `make run_tests` ?
No it isn't. "make kselftest" runs only the targets mentioned in the selftests Makefile. That is considered the kselftest default run.
Hey, Shuah,
Thanks for your explanation. If that is the case, I do not have an urgent need for the current patch, expect I'd like to avoid the reboot issue after an accidentally `make run_tests`
Some changes are make as below, please take a look. I will send it out we reach a consensus.
commit 2296f9d88fde4921758a45bf160a7f1b9d4678a0 (HEAD) Author: Li Zhijian lizhijian@fujitsu.com Date: Mon Oct 28 11:54:03 2024 +0800
selftests/watchdog-test: Fix system accidentally reset after watchdog-test After `make run_tests` to run watchdog-test, a system reboot would happen due to watchdog not stop. ```
The system shouldn't reboot just because watchdog test is left running. watchdog test keeps calling ioctl() with WDIOC_KEEPALIVE to make sure the watchdog card timer is reset.
If you are seeing reboots, that means watchdog test couldn't reset the timer. This usually mean system is unresponsive or something is wrong with the watchdog card on your system.
This is the behavior you would expect from a watchdog timer. Does your system have a watchdog card ot or you enabling softdog module?
Either way there is some other reason for the system reboot.
thanks, -- Shuah
On 28/10/2024 13:31, Shuah Khan wrote:
On 10/27/24 22:02, Zhijian Li (Fujitsu) wrote:
On 28/10/2024 11:29, Shuah Khan wrote:
On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
On 27/10/2024 08:28, Shuah Khan wrote:
On 10/24/24 19:39, Li Zhijian wrote:
Currently, watchdog-test keep running until it gets a SIGINT. However, when watchdog-test is executed from the kselftests framework, where it launches test via timeout which will send SIGTERM in time up. This could lead to 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS in silent. 2. kselftests gets an timeout exit code, and judge watchdog-test as 'not ok'
This test isn't really supposed to be run from kselftest framework. This is the reason why it isn't included in the default run.
May I know what's the default run, is it different from `make run_tests` ?
No it isn't. "make kselftest" runs only the targets mentioned in the selftests Makefile. That is considered the kselftest default run.
Hey, Shuah,
Thanks for your explanation. If that is the case, I do not have an urgent need for the current patch, expect I'd like to avoid the reboot issue after an accidentally `make run_tests`
Some changes are make as below, please take a look. I will send it out we reach a consensus.
commit 2296f9d88fde4921758a45bf160a7f1b9d4678a0 (HEAD) Author: Li Zhijian lizhijian@fujitsu.com Date: Mon Oct 28 11:54:03 2024 +0800
selftests/watchdog-test: Fix system accidentally reset after watchdog-test After `make run_tests` to run watchdog-test, a system reboot would happen due to watchdog not stop. ```
The system shouldn't reboot just because watchdog test is left running. watchdog test keeps calling ioctl() with WDIOC_KEEPALIVE to make sure the watchdog card timer is reset.
Err..
How watchdog test keep calling ioctl() with WDIOC_KEEPALIVE after ./watchdog_test has finished?
In my understanding, the cause is that, ./watchdog_test didn't goto neither A) 347 end: 348 /* 349 * Send specific magic character 'V' just in case Magic Close is 350 * enabled to ensure watchdog gets disabled on close. 351 */ 352 ret = write(fd, &v, 1); 353 if (ret < 0) 354 printf("Stopping watchdog ticks failed (%d)...\n", errno);
nor B)
68 static void term(int sig) 69 { 70 int ret = write(fd, &v, 1); 71 72 close(fd); 73 if (ret < 0) 74 printf("\nStopping watchdog ticks failed (%d)...\n", errno); 75 else 76 printf("\nStopping watchdog ticks...\n"); 77 exit(0); 78 }
to "ensure watchdog gets disabled on close"
The timeout default signal is SIGTERM, watchdog_test only registered SIGINT handler.
Thanks Zhijian
If you are seeing reboots, that means watchdog test couldn't reset the timer. This usually mean system is unresponsive or something is wrong with the watchdog card on your system.
This is the behavior you would expect from a watchdog timer. Does your system have a watchdog card ot or you enabling softdog module?
Either way there is some other reason for the system reboot.
thanks, -- Shuah
On 10/27/24 23:45, Zhijian Li (Fujitsu) wrote:
On 28/10/2024 13:31, Shuah Khan wrote:
On 10/27/24 22:02, Zhijian Li (Fujitsu) wrote:
On 28/10/2024 11:29, Shuah Khan wrote:
On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
On 27/10/2024 08:28, Shuah Khan wrote:
On 10/24/24 19:39, Li Zhijian wrote: > Currently, watchdog-test keep running until it gets a SIGINT. However, > when watchdog-test is executed from the kselftests framework, where it > launches test via timeout which will send SIGTERM in time up. This could > lead to > 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS > in silent. > 2. kselftests gets an timeout exit code, and judge watchdog-test as > 'not ok' > This test isn't really supposed to be run from kselftest framework. This is the reason why it isn't included in the default run.
May I know what's the default run, is it different from `make run_tests` ?
No it isn't. "make kselftest" runs only the targets mentioned in the selftests Makefile. That is considered the kselftest default run.
Hey, Shuah,
Thanks for your explanation. If that is the case, I do not have an urgent need for the current patch, expect I'd like to avoid the reboot issue after an accidentally `make run_tests`
Some changes are make as below, please take a look. I will send it out we reach a consensus.
commit 2296f9d88fde4921758a45bf160a7f1b9d4678a0 (HEAD) Author: Li Zhijian lizhijian@fujitsu.com Date: Mon Oct 28 11:54:03 2024 +0800
selftests/watchdog-test: Fix system accidentally reset after watchdog-test After `make run_tests` to run watchdog-test, a system reboot would happen due to watchdog not stop. ```
The system shouldn't reboot just because watchdog test is left running. watchdog test keeps calling ioctl() with WDIOC_KEEPALIVE to make sure the watchdog card timer is reset.
Err..
How watchdog test keep calling ioctl() with WDIOC_KEEPALIVE after ./watchdog_test has finished?
In my understanding, the cause is that, ./watchdog_test didn't goto neither A) 347 end: 348 /* 349 * Send specific magic character 'V' just in case Magic Close is 350 * enabled to ensure watchdog gets disabled on close. 351 */ 352 ret = write(fd, &v, 1); 353 if (ret < 0) 354 printf("Stopping watchdog ticks failed (%d)...\n", errno);
nor B)
Can you send strace output from "make run_tests" from your system?
thanks, -- Shuah
linux/tools/testing/selftests/watchdog# make run_tests TAP version 13 1..1 # timeout set to 45 # selftests: watchdog: watchdog-test # Watchdog Ticking Away! # .............................................# not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
And i got warning in dmesg
[ 1953.229511] watchdog: watchdog0: watchdog did not stop!
On 28/10/2024 13:55, Shuah Khan wrote:
The system shouldn't reboot just because watchdog test is left running. watchdog test keeps calling ioctl() with WDIOC_KEEPALIVE to make sure the watchdog card timer is reset.
Err..
How watchdog test keep calling ioctl() with WDIOC_KEEPALIVE after ./watchdog_test has finished?
In my understanding, the cause is that, ./watchdog_test didn't goto neither A) 347 end: 348 /* 349 * Send specific magic character 'V' just in case Magic Close is 350 * enabled to ensure watchdog gets disabled on close. 351 */ 352 ret = write(fd, &v, 1); 353 if (ret < 0) 354 printf("Stopping watchdog ticks failed (%d)...\n", errno);
nor B)
Can you send strace output from "make run_tests" from your system?
thanks, -- Shuah
On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
linux/tools/testing/selftests/watchdog# make run_tests TAP version 13 1..1 # timeout set to 45 # selftests: watchdog: watchdog-test # Watchdog Ticking Away! # .............................................# not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
And i got warning in dmesg
[ 1953.229511] watchdog: watchdog0: watchdog did not stop!
Run "make run_tests" under strace and send me the output.
thanks, -- Shuah
On 28/10/2024 14:25, Shuah Khan wrote:
On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
linux/tools/testing/selftests/watchdog# make run_tests TAP version 13 1..1 # timeout set to 45 # selftests: watchdog: watchdog-test # Watchdog Ticking Away! # .............................................# not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
And i got warning in dmesg
[ 1953.229511] watchdog: watchdog0: watchdog did not stop!
Run "make run_tests" under strace and send me the output.
Could you share the exact command, how to 'Run "make run_tests" under strace'
thanks, -- Shuah
On 10/28/24 00:32, Zhijian Li (Fujitsu) wrote:
On 28/10/2024 14:25, Shuah Khan wrote:
On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
linux/tools/testing/selftests/watchdog# make run_tests TAP version 13 1..1 # timeout set to 45 # selftests: watchdog: watchdog-test # Watchdog Ticking Away! # .............................................# not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
And i got warning in dmesg
[ 1953.229511] watchdog: watchdog0: watchdog did not stop!
Run "make run_tests" under strace and send me the output.
Could you share the exact command, how to 'Run "make run_tests" under strace'
strace make run_tests > strace.out 2>&1
Send me strace.out
thanks. -- Shuah
On 10/28/24 00:44, Shuah Khan wrote:
On 10/28/24 00:32, Zhijian Li (Fujitsu) wrote:
On 28/10/2024 14:25, Shuah Khan wrote:
On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
linux/tools/testing/selftests/watchdog# make run_tests TAP version 13 1..1 # timeout set to 45 # selftests: watchdog: watchdog-test # Watchdog Ticking Away! # .............................................# not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
And i got warning in dmesg
[ 1953.229511] watchdog: watchdog0: watchdog did not stop!
Run "make run_tests" under strace and send me the output.
Could you share the exact command, how to 'Run "make run_tests" under strace'
strace make run_tests > strace.out 2>&1
Send me strace.out
Thank you for the strace output. kselftest uses a timeout to terminate hung tests - that timeout is 45 seconds. When you run "make run_tests" under watchdog directory, you are running into this.
Yes your fix to add SIGTERM handling makes sense. Please also handle other signals - SIGKILL, SIGQUIT.
Thanks for the find.
thanks, -- Shuah
On 29/10/2024 10:24, Shuah Khan wrote:
On 10/28/24 00:44, Shuah Khan wrote:
On 10/28/24 00:32, Zhijian Li (Fujitsu) wrote:
On 28/10/2024 14:25, Shuah Khan wrote:
On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
linux/tools/testing/selftests/watchdog# make run_tests TAP version 13 1..1 # timeout set to 45 # selftests: watchdog: watchdog-test # Watchdog Ticking Away! # .............................................# not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
And i got warning in dmesg
[ 1953.229511] watchdog: watchdog0: watchdog did not stop!
Run "make run_tests" under strace and send me the output.
Could you share the exact command, how to 'Run "make run_tests" under strace'
strace make run_tests > strace.out 2>&1
Send me strace.out
Thank you for the strace output. kselftest uses a timeout to terminate hung tests - that timeout is 45 seconds. When you run "make run_tests" under watchdog directory, you are running into this.
Yes your fix to add SIGTERM handling makes sense. Please also handle other signals - SIGKILL, SIGQUIT.
Understood, I will update the patch soon.
Thanks Zhijian
Thanks for the find.
thanks, -- Shuah
linux-kselftest-mirror@lists.linaro.org