LTP crypto regressions noticed on linux next 20200621.
The common case for all tests is timeout after 15 minutes which means tests got hung and LTP timers killed those test runs after timeout. The root cause of the failure is under investigation.
ltp-crypto-tests: * af_alg02 - failed * af_alg05 - failed ltp-syscalls-tests: * keyctl07 - failed * request_key03 - failed
Output log: -------------- af_alg02: af_alg02.c:52: BROK: Timed out while reading from request socket.
Test code at line number 52 is
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL); SAFE_PTHREAD_CREATE(&thr, NULL, verify_encrypt, NULL);
TST_CHECKPOINT_WAIT(0);
while (pthread_kill(thr, 0) != ESRCH) { if (tst_timeout_remaining() <= 10) { pthread_cancel(thr); tst_brk(TBROK, "Timed out while reading from request socket.");
af_alg05: tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s [ 362.599868] kworker/dying (137) used greatest stack depth: 11600 bytes left Test timeouted, sending SIGKILL! tst_test.c:1286: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:1287: BROK: Test killed! (timeout?)
request_key03: tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s request_key03.c:65: CONF: kernel doesn't support key type 'encrypted' request_key03.c:65: CONF: kernel doesn't support key type 'trusted' Test timeouted, sending SIGKILL! tst_test.c:1286: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:1287: BROK: Test killed! (timeout?)
keyctl07 tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s Test timeouted, sending SIGKILL! tst_test.c:1286: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:1287: BROK: Test killed! (timeout?)
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git commit: 5a94f5bc041ea9e4d17c93b11ea6f6a2e5ad361b git describe: next-20200621 kernel-config: https://builds.tuxbuild.com/PB-45Luvlx0yYJ8MZgpijA/kernel.config
ref: https://lkft.validation.linaro.org/scheduler/job/1511938#L2211 https://lkft.validation.linaro.org/scheduler/job/1511935#L9225
- Naresh
On Tue, Jun 23, 2020 at 12:04:06AM +0530, Naresh Kamboju wrote:
LTP crypto regressions noticed on linux next 20200621.
The common case for all tests is timeout after 15 minutes which means tests got hung and LTP timers killed those test runs after timeout. The root cause of the failure is under investigation.
ltp-crypto-tests: * af_alg02 - failed * af_alg05 - failed ltp-syscalls-tests: * keyctl07 - failed * request_key03 - failed
Output log:
af_alg02: af_alg02.c:52: BROK: Timed out while reading from request socket.
Test code at line number 52 is
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL); SAFE_PTHREAD_CREATE(&thr, NULL, verify_encrypt, NULL);
TST_CHECKPOINT_WAIT(0);
while (pthread_kill(thr, 0) != ESRCH) { if (tst_timeout_remaining() <= 10) { pthread_cancel(thr); tst_brk(TBROK, "Timed out while reading from request socket.");
af_alg05: tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s [ 362.599868] kworker/dying (137) used greatest stack depth: 11600 bytes left Test timeouted, sending SIGKILL! tst_test.c:1286: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:1287: BROK: Test killed! (timeout?)
request_key03: tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s request_key03.c:65: CONF: kernel doesn't support key type 'encrypted' request_key03.c:65: CONF: kernel doesn't support key type 'trusted' Test timeouted, sending SIGKILL! tst_test.c:1286: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:1287: BROK: Test killed! (timeout?)
keyctl07 tst_test.c:1246: INFO: Timeout per run is 0h 15m 00s Test timeouted, sending SIGKILL! tst_test.c:1286: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:1287: BROK: Test killed! (timeout?)
metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git commit: 5a94f5bc041ea9e4d17c93b11ea6f6a2e5ad361b git describe: next-20200621 kernel-config: https://builds.tuxbuild.com/PB-45Luvlx0yYJ8MZgpijA/kernel.config
ref: https://lkft.validation.linaro.org/scheduler/job/1511938#L2211 https://lkft.validation.linaro.org/scheduler/job/1511935#L9225
Can you try reverting:
d13ef8e10756873b0a8b7cc8f230a2d1026710ea
The patch is titled "umh: fix processed error when UMH_WAIT_PROC is used"
If this fixes the issue we have to ask ourselves then why, given that anything other than 0 is a return code and prior to this patch we were not accepting negative return codes, but we were never getting them as the invalid return code was being returned when UMH_WAIT_PROC was used and an error code was returned. So now, *any* non-zero value is a return code. The onlything I can think of is that we may want to special-case the -ENOMEM error code, or maybe some other one to not skip it as in the below patch.
If reverting the commit does not fix the issue, this is not the droid we are looking for.
diff --git a/security/keys/request_key.c b/security/keys/request_key.c index ff462f3d46ca..94ed23a8991f 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -193,7 +193,7 @@ static int call_sbin_request_key(struct key *authkey, void *aux) ret = call_usermodehelper_keys(request_key, argv, envp, keyring, UMH_WAIT_PROC); kdebug("usermode -> 0x%x", ret); - if (ret != 0) { + if (ret != -ENOMEM && ret != 0) { /* ret is the exit/wait code */ if (test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) || key_validate(key) < 0)
On Tue, 23 Jun 2020 at 04:19, Luis Chamberlain mcgrof@kernel.org wrote:
On Tue, Jun 23, 2020 at 12:04:06AM +0530, Naresh Kamboju wrote:
LTP crypto regressions noticed on linux next 20200621.
The common case for all tests is timeout after 15 minutes which means tests got hung and LTP timers killed those test runs after timeout. The root cause of the failure is under investigation.
ltp-crypto-tests: * af_alg02 - failed * af_alg05 - failed ltp-syscalls-tests: * keyctl07 - failed * request_key03 - failed
<trim>
Can you try reverting:
d13ef8e10756873b0a8b7cc8f230a2d1026710ea
The patch is titled "umh: fix processed error when UMH_WAIT_PROC is used"
Thanks for the investigation. After reverting, two test cases got PASS out of four reported failure cases. ltp-crypto-tests: * af_alg02 - still failing - Hung and time out * af_alg05 - still failing - Hung and time out ltp-syscalls-tests: * keyctl07 - PASS * request_key03 - PASS
Please suggest the way to debug / fix the af_alg02 and af_alg05 failures.
- Naresh
On Tue, Jun 23, 2020 at 11:53:43AM +0530, Naresh Kamboju wrote:
Thanks for the investigation. After reverting, two test cases got PASS out of four reported failure cases. ltp-crypto-tests: * af_alg02 - still failing - Hung and time out * af_alg05 - still failing - Hung and time out ltp-syscalls-tests: * keyctl07 - PASS * request_key03 - PASS
Please suggest the way to debug / fix the af_alg02 and af_alg05 failures.
Did you clear the MSG_MORE flag in the final send(2) call before you call recv(2)?
Cheers,
On Tue, Jun 23, 2020 at 04:40:56PM +1000, Herbert Xu wrote:
On Tue, Jun 23, 2020 at 11:53:43AM +0530, Naresh Kamboju wrote:
Thanks for the investigation. After reverting, two test cases got PASS out of four reported failure cases. ltp-crypto-tests: * af_alg02 - still failing - Hung and time out * af_alg05 - still failing - Hung and time out ltp-syscalls-tests: * keyctl07 - PASS * request_key03 - PASS
Please suggest the way to debug / fix the af_alg02 and af_alg05 failures.
Did you clear the MSG_MORE flag in the final send(2) call before you call recv(2)?
The source code for the two failing AF_ALG tests is here:
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypt... https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypt...
They use read() and write(), not send() and recv().
af_alg02 uses read() to read from a "salsa20" request socket without writing anything to it. It is expected that this returns 0, i.e. that behaves like encrypting an empty message.
af_alg05 uses write() to write 15 bytes to a "cbc(aes-generic)" request socket, then read() to read 15 bytes. It is expected that this fails with EINVAL, since the length is not aligned to the AES block size (16 bytes).
- Eric
On Tue, Jun 23, 2020 at 10:02:17AM -0700, Eric Biggers wrote:
The source code for the two failing AF_ALG tests is here:
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypt... https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypt...
They use read() and write(), not send() and recv().
af_alg02 uses read() to read from a "salsa20" request socket without writing anything to it. It is expected that this returns 0, i.e. that behaves like encrypting an empty message.
af_alg05 uses write() to write 15 bytes to a "cbc(aes-generic)" request socket, then read() to read 15 bytes. It is expected that this fails with EINVAL, since the length is not aligned to the AES block size (16 bytes).
Thanks. Sounds like it's my introduction of the init variable that broke this. Let me investigate.
On Tue, Jun 23, 2020 at 10:02:17AM -0700, Eric Biggers wrote:
The source code for the two failing AF_ALG tests is here:
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypt... https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypt...
They use read() and write(), not send() and recv().
af_alg02 uses read() to read from a "salsa20" request socket without writing anything to it. It is expected that this returns 0, i.e. that behaves like encrypting an empty message.
af_alg05 uses write() to write 15 bytes to a "cbc(aes-generic)" request socket, then read() to read 15 bytes. It is expected that this fails with EINVAL, since the length is not aligned to the AES block size (16 bytes).
This patch should fix the regression:
---8<--- Some user-space programs rely on crypto requests that have no control metadata. This broke when a check was added to require the presence of control metadata with the ctx->init flag.
This patch fixes the regression by removing the ctx->init flag.
This means that we do not distinguish the case of no metadata as opposed to an empty request. IOW it is always assumed that if you call recv(2) before sending metadata that you are working with an empty request.
Reported-by: Sachin Sant sachinp@linux.vnet.ibm.com Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Fixes: f3c802a1f300 ("crypto: algif_aead - Only wake up when...") Signed-off-by: Herbert Xu herbert@gondor.apana.org.au
diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 9fcb91ea10c4..2d391117c020 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -635,7 +635,6 @@ void af_alg_pull_tsgl(struct sock *sk, size_t used, struct scatterlist *dst,
if (!ctx->used) ctx->merge = 0; - ctx->init = ctx->more; } EXPORT_SYMBOL_GPL(af_alg_pull_tsgl);
@@ -757,8 +756,7 @@ int af_alg_wait_for_data(struct sock *sk, unsigned flags, unsigned min) break; timeout = MAX_SCHEDULE_TIMEOUT; if (sk_wait_event(sk, &timeout, - ctx->init && (!ctx->more || - (min && ctx->used >= min)), + !ctx->more || (min && ctx->used >= min), &wait)) { err = 0; break; @@ -847,7 +845,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, }
lock_sock(sk); - if (ctx->init && (init || !ctx->more)) { + if (!ctx->more && ctx->used) { err = -EINVAL; goto unlock; } @@ -858,7 +856,6 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, memcpy(ctx->iv, con.iv->iv, ivsize);
ctx->aead_assoclen = con.aead_assoclen; - ctx->init = true; }
while (size) { diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index d48d2156e621..749fe42315be 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -106,7 +106,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, size_t usedpages = 0; /* [in] RX bufs to be used from user */ size_t processed = 0; /* [in] TX bufs to be consumed */
- if (!ctx->init || ctx->more) { + if (ctx->more) { err = af_alg_wait_for_data(sk, flags, 0); if (err) return err; diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index a51ba22fef58..5b6fa5e8c00d 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -61,7 +61,7 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg, int err = 0; size_t len = 0;
- if (!ctx->init || (ctx->more && ctx->used < bs)) { + if (ctx->more && ctx->used < bs) { err = af_alg_wait_for_data(sk, flags, bs); if (err) return err; diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index ee6412314f8f..08c087cc89d6 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -135,7 +135,6 @@ struct af_alg_async_req { * SG? * @enc: Cryptographic operation to be performed when * recvmsg is invoked. - * @init: True if metadata has been sent. * @len: Length of memory allocated for this data structure. */ struct af_alg_ctx { @@ -152,7 +151,6 @@ struct af_alg_ctx { bool more; bool merge; bool enc; - bool init;
unsigned int len; };
On Fri, 26 Jun 2020 at 12:00, Herbert Xu herbert@gondor.apana.org.au wrote:
On Tue, Jun 23, 2020 at 10:02:17AM -0700, Eric Biggers wrote:
The source code for the two failing AF_ALG tests is here:
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypt... https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypt...
They use read() and write(), not send() and recv().
af_alg02 uses read() to read from a "salsa20" request socket without writing anything to it. It is expected that this returns 0, i.e. that behaves like encrypting an empty message.
Since we are on this subject, LTP af_alg02 test case fails on stable 4.9 and stable 4.4 This is not a regression because the test case has been failing from the beginning.
Is this test case expected to fail on stable 4.9 and 4.4 ? or any chance to fix this on these older branches ?
Test output: af_alg02.c:52: BROK: Timed out while reading from request socket.
ref: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191... https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191...
- Naresh
On Tue, Jun 30, 2020 at 02:18:11PM +0530, Naresh Kamboju wrote:
Since we are on this subject, LTP af_alg02 test case fails on stable 4.9 and stable 4.4 This is not a regression because the test case has been failing from the beginning.
Is this test case expected to fail on stable 4.9 and 4.4 ? or any chance to fix this on these older branches ?
Test output: af_alg02.c:52: BROK: Timed out while reading from request socket.
ref: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191... https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191...
Actually this test really is broken. Even though empty requests are legal, they should never be done with no write(2) at all. Because this fundamentally breaks the use of a blocking read(2) to wait for more data.
Granted this has been broken since 2017 but I'm not going to reintroduce this just because of a broken test case.
So please either remove af_alg02 or fix it by adding a control message through sendmsg(2).
Thanks,
---8<--- Some user-space programs rely on crypto requests that have no control metadata. This broke when a check was added to require the presence of control metadata with the ctx->init flag.
This patch fixes the regression by setting ctx->init as long as one sendmsg(2) has been made, with or without a control message.
Reported-by: Sachin Sant sachinp@linux.vnet.ibm.com Reported-by: Naresh Kamboju naresh.kamboju@linaro.org Fixes: f3c802a1f300 ("crypto: algif_aead - Only wake up when...") Signed-off-by: Herbert Xu herbert@gondor.apana.org.au
diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 9fcb91ea10c41..5882ed46f1adb 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -851,6 +851,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, err = -EINVAL; goto unlock; } + ctx->init = true;
if (init) { ctx->enc = enc; @@ -858,7 +859,6 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, memcpy(ctx->iv, con.iv->iv, ivsize);
ctx->aead_assoclen = con.aead_assoclen; - ctx->init = true; }
while (size) {
On Thu, Jul 02, 2020 at 01:32:21PM +1000, Herbert Xu wrote:
On Tue, Jun 30, 2020 at 02:18:11PM +0530, Naresh Kamboju wrote:
Since we are on this subject, LTP af_alg02 test case fails on stable 4.9 and stable 4.4 This is not a regression because the test case has been failing from the beginning.
Is this test case expected to fail on stable 4.9 and 4.4 ? or any chance to fix this on these older branches ?
Test output: af_alg02.c:52: BROK: Timed out while reading from request socket.
ref: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191... https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.228-191...
Actually this test really is broken.
FWIW the patch "umh: fix processed error when UMH_WAIT_PROC is used" was dropped from linux-next for now as it was missing checking for signals. I'll be open coding iall checks for each UMH_WAIT_PROC callers next. Its not clear if this was the issue with this test case, but figured I'd let you know.
Luis