chrome-platform/for-kernelci baseline: 98 runs, 5 regressions (v6.1-rc1-5-g27b86a65cd16)
Regressions Summary -------------------
platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+---------------+----------+----------------------------+------------ mt8183-kukui-...uniper-sku16 | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 1 rk3399-gru-kevin | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 4
Details: https://kernelci.org/test/job/chrome-platform/branch/for-kernelci/kernel/v6....
Test: baseline Tree: chrome-platform Branch: for-kernelci Describe: v6.1-rc1-5-g27b86a65cd16 URL: https://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux.git SHA: 27b86a65cd16b0e94ef69196b008d701a53feddb
Test Regressions ----------------
platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+---------------+----------+----------------------------+------------ mt8183-kukui-...uniper-sku16 | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 1
Details: https://kernelci.org/test/plan/id/635f5937dc56bfd2b5e7db6b
Results: 0 PASS, 1 FAIL, 0 SKIP Full config: defconfig+arm64-chromebook Compiler: gcc-10 (aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110) Plain log: https://storage.kernelci.org//chrome-platform/for-kernelci/v6.1-rc1-5-g27b86... HTML log: https://storage.kernelci.org//chrome-platform/for-kernelci/v6.1-rc1-5-g27b86... Rootfs: http://storage.kernelci.org/images/rootfs/buildroot/buildroot-baseline/20221...
* baseline.login: https://kernelci.org/test/case/id/635f5937dc56bfd2b5e7db6c failing since 59 days (last pass: tag-chrome-platform-for-v5.20-2-gddffaa3d76750, first fail: v6.0-rc1-18-gf36a064d1483)
platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+---------------+----------+----------------------------+------------ rk3399-gru-kevin | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 4
Details: https://kernelci.org/test/plan/id/635f585b4cac8cb70ce7db4e
Results: 85 PASS, 7 FAIL, 0 SKIP Full config: defconfig+arm64-chromebook Compiler: gcc-10 (aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110) Plain log: https://storage.kernelci.org//chrome-platform/for-kernelci/v6.1-rc1-5-g27b86... HTML log: https://storage.kernelci.org//chrome-platform/for-kernelci/v6.1-rc1-5-g27b86... Rootfs: http://storage.kernelci.org/images/rootfs/buildroot/buildroot-baseline/20221...
* baseline.bootrr.rockchip-i2s1-probed: https://kernelci.org/test/case/id/635f585b4cac8cb70ce7db74 failing since 208 days (last pass: v5.17-rc1-10-g0ef49b25b7cc, first fail: v5.18-rc1-1-gbf84b10d9901d)
2022-10-31T05:08:28.264507 /lava-7786804/1/../bin/lava-test-case 2022-10-31T05:08:28.281732 <8>[ 49.835676] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=rockchip-i2s1-probed RESULT=fail>
* baseline.bootrr.cros-ec-sensors-gyro0-probed: https://kernelci.org/test/case/id/635f585b4cac8cb70ce7db97 failing since 147 days (last pass: v5.18-rc1-24-gc2d7e384924d, first fail: v5.19-rc1-5-g4a0e708bb23fc)
2022-10-31T05:08:24.155654 <8>[ 45.707785] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=cros-ec-sensors-accel1-probed RESULT=fail> 2022-10-31T05:08:25.203819 /lava-7786804/1/../bin/lava-test-case
* baseline.bootrr.cros-ec-sensors-accel1-probed: https://kernelci.org/test/case/id/635f585b4cac8cb70ce7db98 failing since 147 days (last pass: v5.18-rc1-24-gc2d7e384924d, first fail: v5.19-rc1-5-g4a0e708bb23fc)
2022-10-31T05:08:23.083107 <8>[ 44.634358] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=cros-ec-sensors-accel0-probed RESULT=fail> 2022-10-31T05:08:24.139057 /lava-7786804/1/../bin/lava-test-case
* baseline.bootrr.cros-ec-sensors-accel0-probed: https://kernelci.org/test/case/id/635f585b4cac8cb70ce7db99 failing since 147 days (last pass: v5.18-rc1-24-gc2d7e384924d, first fail: v5.19-rc1-5-g4a0e708bb23fc)
2022-10-31T05:08:21.966121 <8>[ 43.517597] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=cros-ec-sensors-driver-present RESULT=pass> 2022-10-31T05:08:23.060571 /lava-7786804/1/../bin/lava-test-case
On Sun, Oct 30, 2022 at 10:51 PM kernelci.org bot bot@kernelci.org wrote:
chrome-platform/for-kernelci baseline: 98 runs, 5 regressions (v6.1-rc1-5-g27b86a65cd16)
Does anybody look at these? It's a bit weird to see "5 regressions", and then to look back on the last few weeks (probably months) and see the same errors...
Regressions Summary
platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+---------------+----------+----------------------------+------------ mt8183-kukui-...uniper-sku16 | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 1 rk3399-gru-kevin | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 4
Details: https://kernelci.org/test/job/chrome-platform/branch/for-kernelci/kernel/v6....
Test: baseline Tree: chrome-platform Branch: for-kernelci Describe: v6.1-rc1-5-g27b86a65cd16 URL: https://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux.git SHA: 27b86a65cd16b0e94ef69196b008d701a53feddb
Test Regressions
...
platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+---------------+----------+----------------------------+------------ rk3399-gru-kevin | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 4
Details: https://kernelci.org/test/plan/id/635f585b4cac8cb70ce7db4e
Results: 85 PASS, 7 FAIL, 0 SKIP Full config: defconfig+arm64-chromebook Compiler: gcc-10 (aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110) Plain log: https://storage.kernelci.org//chrome-platform/for-kernelci/v6.1-rc1-5-g27b86... HTML log: https://storage.kernelci.org//chrome-platform/for-kernelci/v6.1-rc1-5-g27b86... Rootfs: http://storage.kernelci.org/images/rootfs/buildroot/buildroot-baseline/20221...
Where can I find the test cases? (i.e., what's determining a "failure" for one of these?)
As a small guess, I see that we're missing some common configs, like:
CONFIG_PHY_ROCKCHIP_DP
This sometimes means that the display subsystem (and therefore audio subsystem that relies on DP for one of its components) is not going to fully set itself up properly. Usually things can still sort of work, but I don't really know what the tests are looking for.
Brian
+Tzung-Bi Shih
On Mon, Oct 31, 2022 at 10:40 AM Brian Norris briannorris@chromium.org wrote:
On Sun, Oct 30, 2022 at 10:51 PM kernelci.org bot bot@kernelci.org wrote:
chrome-platform/for-kernelci baseline: 98 runs, 5 regressions (v6.1-rc1-5-g27b86a65cd16)
Does anybody look at these? It's a bit weird to see "5 regressions", and then to look back on the last few weeks (probably months) and see the same errors...
Regressions Summary
platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+---------------+----------+----------------------------+------------ mt8183-kukui-...uniper-sku16 | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 1 rk3399-gru-kevin | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 4
Details: https://kernelci.org/test/job/chrome-platform/branch/for-kernelci/kernel/v6....
Test: baseline Tree: chrome-platform Branch: for-kernelci Describe: v6.1-rc1-5-g27b86a65cd16 URL: https://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux.git SHA: 27b86a65cd16b0e94ef69196b008d701a53feddb
Test Regressions
...
platform | arch | lab | compiler | defconfig | regressions -----------------------------+-------+---------------+----------+----------------------------+------------ rk3399-gru-kevin | arm64 | lab-collabora | gcc-10 | defconfig+arm64-chromebook | 4
Details: https://kernelci.org/test/plan/id/635f585b4cac8cb70ce7db4e
Results: 85 PASS, 7 FAIL, 0 SKIP Full config: defconfig+arm64-chromebook Compiler: gcc-10 (aarch64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110) Plain log: https://storage.kernelci.org//chrome-platform/for-kernelci/v6.1-rc1-5-g27b86... HTML log: https://storage.kernelci.org//chrome-platform/for-kernelci/v6.1-rc1-5-g27b86... Rootfs: http://storage.kernelci.org/images/rootfs/buildroot/buildroot-baseline/20221...
Where can I find the test cases? (i.e., what's determining a "failure" for one of these?)
As a small guess, I see that we're missing some common configs, like:
CONFIG_PHY_ROCKCHIP_DP
This sometimes means that the display subsystem (and therefore audio subsystem that relies on DP for one of its components) is not going to fully set itself up properly. Usually things can still sort of work, but I don't really know what the tests are looking for.
Brian
On Mon, Oct 31, 2022 at 10:40:25AM -0700, Brian Norris wrote:
On Sun, Oct 30, 2022 at 10:51 PM kernelci.org bot bot@kernelci.org wrote:
chrome-platform/for-kernelci baseline: 98 runs, 5 regressions (v6.1-rc1-5-g27b86a65cd16)
Does anybody look at these? It's a bit weird to see "5 regressions", and then to look back on the last few weeks (probably months) and see the same errors...
Nobody seems to care about a lot of the Chromebook stuff AFAICT.
Details: https://kernelci.org/test/job/chrome-platform/branch/for-kernelci/kernel/v6....
Where can I find the test cases? (i.e., what's determining a "failure" for one of these?)
The first place to go is the details link above, though it's a bit less clear for the non-baseline tests you should at least be able to see what's going on on the console which will often help with finding the testsuite, or the error messages printed are descriptive as to what they were looking for specifically. The kernelci-core repo contains all the stuff to map from a test name inot running something (often with external assistance) but there's lots of templating in the way.
As a small guess, I see that we're missing some common configs, like:
CONFIG_PHY_ROCKCHIP_DP
This sometimes means that the display subsystem (and therefore audio subsystem that relies on DP for one of its components) is not going to fully set itself up properly. Usually things can still sort of work, but I don't really know what the tests are looking for.
The baseline tests are just making sure that the system comes up to a shell. The driver loaded tests are checking that particular devices have a driver bound to them.
On Mon, Oct 31, 2022 at 09:37:28PM +0000, Mark Brown wrote:
On Mon, Oct 31, 2022 at 10:40:25AM -0700, Brian Norris wrote:
On Sun, Oct 30, 2022 at 10:51 PM kernelci.org bot bot@kernelci.org wrote:
chrome-platform/for-kernelci baseline: 98 runs, 5 regressions (v6.1-rc1-5-g27b86a65cd16)
Does anybody look at these? It's a bit weird to see "5 regressions", and then to look back on the last few weeks (probably months) and see the same errors...
Nobody seems to care about a lot of the Chromebook stuff AFAICT.
:(
Well, if I/we are ever going to change that, it'd be nice to get a little further on the below questions still:
Details: https://kernelci.org/test/job/chrome-platform/branch/for-kernelci/kernel/v6....
Where can I find the test cases? (i.e., what's determining a "failure" for one of these?)
The first place to go is the details link above, though it's a bit less clear for the non-baseline tests you should at least be able to see what's going on on the console which will often help with finding the testsuite, or the error messages printed are descriptive as to what they were looking for specifically. The kernelci-core repo contains all the stuff to map from a test name inot running something (often with external assistance) but there's lots of templating in the way.
Thanks. I already looked at most/all of those links. And I was first going for the "baseline" tests, since those seem pretty basic, and if we can't pass those, we're probably totally lost.
As a small guess, I see that we're missing some common configs, like:
CONFIG_PHY_ROCKCHIP_DP
This sometimes means that the display subsystem (and therefore audio subsystem that relies on DP for one of its components) is not going to fully set itself up properly. Usually things can still sort of work, but I don't really know what the tests are looking for.
The baseline tests are just making sure that the system comes up to a shell. The driver loaded tests are checking that particular devices have a driver bound to them.
For one specific example: I'm looking at rockchip-i2s1-probed, which fails here:
https://storage.kernelci.org/chrome-platform/for-kernelci/v6.1-rc1-5-g27b86a...
I can't find a single mention of "i2s1" or "probed" in the kernelci repo, so I must be missing something. Is there some external config file in another repo? Or else the test configs are autogenerating cases on the fly based on parsing...the device tree?
Anyway, I don't know how or why that ever passed, because AFAICT, RK3399 Chromebooks should only have a single I2S block enabled, and they're passing the 'rockchip-i2s0-probed' case. So it feels like I need to be disabling some test case.
Somewhat similar story for cros-ec-sensors-accel{0,1}-probed, although I believe the sensor driver is still working for me; I also see no cros-ec-sensors errors in the KernelCI logs. So I wonder what exactly the test is looking for (e.g., maybe the device name changed?).
Brian
On Mon, Oct 31, 2022 at 03:21:55PM -0700, Brian Norris wrote:
On Mon, Oct 31, 2022 at 09:37:28PM +0000, Mark Brown wrote:
On Mon, Oct 31, 2022 at 10:40:25AM -0700, Brian Norris wrote:
The baseline tests are just making sure that the system comes up to a shell. The driver loaded tests are checking that particular devices have a driver bound to them.
For one specific example: I'm looking at rockchip-i2s1-probed, which fails here:
https://storage.kernelci.org/chrome-platform/for-kernelci/v6.1-rc1-5-g27b86a...
Not the exact same job but a LAVA defintion for that one can be seen at
https://lava.collabora.dev/scheduler/job/7797321/definition
I can't find a single mention of "i2s1" or "probed" in the kernelci repo, so I must be missing something. Is there some external config file in another repo? Or else the test configs are autogenerating cases on the fly based on parsing...the device tree?
The KernelCI repo just says what testsuites to invoke and how, it's not got the actual testsuites. Those X didn't probe failures come from bootrr:
https://github.com/andersson/bootrr
forked to:
https://github.com/kernelci/bootrr
(which could use some upstreaming...) with the specific errors for gru-kevin coming from:
https://github.com/kernelci/bootrr/blob/main/boards/google%2Ckevin
which ends up in our rootfss.
Those failures in particular come from some reorganisation of the DT for the Rockchip devices a while back which regularly gets bisected by our bisect bot, I did report it or something very similar as looking like a false positive but nobody followed up. I see there's some version dependent checks for the acclerators which may not be working properly any more I guess but nothing for the I2S.
Anyway, I don't know how or why that ever passed, because AFAICT, RK3399 Chromebooks should only have a single I2S block enabled, and they're passing the 'rockchip-i2s0-probed' case. So it feels like I need to be disabling some test case.
Yes, that was what I'd determined too - the reorganisation of the DT looked legit, I can't remember what it was exactly. I suspect it may have boiled down to adding some missing default disables, or removing an erroious enable for the board.
Somewhat similar story for cros-ec-sensors-accel{0,1}-probed, although I believe the sensor driver is still working for me; I also see no cros-ec-sensors errors in the KernelCI logs. So I wonder what exactly the test is looking for (e.g., maybe the device name changed?).
IIRC there were some of these that were a device name change.
On Mon, Oct 31, 2022 at 10:52:52PM +0000, Mark Brown wrote:
On Mon, Oct 31, 2022 at 03:21:55PM -0700, Brian Norris wrote:
I can't find a single mention of "i2s1" or "probed" in the kernelci repo, so I must be missing something. Is there some external config file in another repo? Or else the test configs are autogenerating cases on the fly based on parsing...the device tree?
The KernelCI repo just says what testsuites to invoke and how, it's not got the actual testsuites. Those X didn't probe failures come from bootrr:
https://github.com/andersson/bootrr
forked to:
https://github.com/kernelci/bootrr
(which could use some upstreaming...) with the specific errors for
Neither of those looks particularly active. If I patch stuff, is it better to send PRs to the 'andersson' one or the 'kernelci' one?
gru-kevin coming from:
https://github.com/kernelci/bootrr/blob/main/boards/google%2Ckevin
which ends up in our rootfss.
Ah, thanks. That helps. Although it hurts in other ways, see below.
Those failures in particular come from some reorganisation of the DT for the Rockchip devices a while back which regularly gets bisected by our bisect bot, I did report it or something very similar as looking like a false positive but nobody followed up. I see there's some version dependent checks for the acclerators which may not be working properly any more I guess but nothing for the I2S.
Anyway, I don't know how or why that ever passed, because AFAICT, RK3399 Chromebooks should only have a single I2S block enabled, and they're passing the 'rockchip-i2s0-probed' case. So it feels like I need to be disabling some test case.
Yes, that was what I'd determined too - the reorganisation of the DT looked legit, I can't remember what it was exactly. I suspect it may have boiled down to adding some missing default disables, or removing an erroious enable for the board.
Ah, based off your pointers, I see the test was looking for what used to be the i2s2 alias. But then I recall we stopped using that i2s instance:
https://git.kernel.org/linus/b5fbaf7d779f5f02b7f75b080e7707222573be2a arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
I forgot that folks did that downstream long ago but never bothered finishing upstreaming that until I got to it this year... ...but still, it's kinda sad that we've bothered to set up all this "CI" and then nobody paid any attention :( I only noticed because I recently subscribed to chrome-platform@lists.linux.dev.
Anyway, I guess I gotta go patch the test expectations.
Somewhat similar story for cros-ec-sensors-accel{0,1}-probed, although I believe the sensor driver is still working for me; I also see no cros-ec-sensors errors in the KernelCI logs. So I wonder what exactly the test is looking for (e.g., maybe the device name changed?).
IIRC there were some of these that were a device name change.
Oh, this one makes me gag.
"assert_device_present cros-ec-sensors-accel0-probed cros-ec-sensors cros-ec-accel.11.*"
? Really, ".11"? That sounds like we're trying to test kernel implementation details, asynchronous probe race conditions, Makefile / linker ordering, and similar -- not anything that we actually expect to remain stable across kernel versions :(
I'm not sure there's a great stable way to refer to such devices, so maybe it'd be better to write this as "count the number of devices" instead? Or I think this particular driver supports an "id" sysfs attribute, which refers to a stable underlying firmware ID. But that'd involve even more device-specific logic.
I don't think I even care *why* the ID changed; that ID is far from a stable thing, if I'm reading it correctly. At least most of the others refer to hardware addresses, which are a little more reasonable to rely on (even if the device naming still isn't a stable guarantee).
Brian
On Mon, Oct 31, 2022 at 04:36:32PM -0700, Brian Norris wrote: [...]
...but still, it's kinda sad that we've bothered to set up all this "CI" and then nobody paid any attention :( I only noticed because I recently subscribed to chrome-platform@lists.linux.dev.
I do notice that the kernelci reported test failures/regressions for chrome-platform repo. I only raised my hand in some private sessions instead of public mailing lists.
For now, I only see build results (e.g. [1]) as a signal to move patches from "for-kernelci" to "for-next" branches; and ignore the regression ones temporarily.
The LAVA lab "lab-collabora" was setup by Collabora; and it has been un-maintained for a long while. In order to troubleshooting, I thought we need to access the LAVA dispatcher and boards (at least from SSH). But it seems only Collabora folks have the permission to access them.
In either cases, we should figure out a way to make the "CI" back to work. I will seek for Collabora folks' help.
[1]: https://lore.kernel.org/chrome-platform/63609caf.620a0220.bd35e.9c8e@mx.goog...
On Mon, Oct 31, 2022 at 04:36:32PM -0700, Brian Norris wrote:
On Mon, Oct 31, 2022 at 10:52:52PM +0000, Mark Brown wrote:
On Mon, Oct 31, 2022 at 03:21:55PM -0700, Brian Norris wrote:
The KernelCI repo just says what testsuites to invoke and how, it's not got the actual testsuites. Those X didn't probe failures come from bootrr:
forked to:
(which could use some upstreaming...) with the specific errors for
Neither of those looks particularly active. If I patch stuff, is it better to send PRs to the 'andersson' one or the 'kernelci' one?
The kernelci one is the one that gets deployed in kernelci so probably there in the short term - all the Chromebooks are there, I'm not sure how many got upstreamed at all.
Yes, that was what I'd determined too - the reorganisation of the DT looked legit, I can't remember what it was exactly. I suspect it may have boiled down to adding some missing default disables, or removing an erroious enable for the board.
Ah, based off your pointers, I see the test was looking for what used to be the i2s2 alias. But then I recall we stopped using that i2s instance:
https://git.kernel.org/linus/b5fbaf7d779f5f02b7f75b080e7707222573be2a arm64: dts: rockchip: Switch RK3399-Gru DP to SPDIF output
I forgot that folks did that downstream long ago but never bothered finishing upstreaming that until I got to it this year... ...but still, it's kinda sad that we've bothered to set up all this "CI" and then nobody paid any attention :( I only noticed because I recently subscribed to chrome-platform@lists.linux.dev.
Ah, that's the one. I think I'd flagged it as the test looking wrong but nobody picked it up.
"assert_device_present cros-ec-sensors-accel0-probed cros-ec-sensors cros-ec-accel.11.*"
? Really, ".11"? That sounds like we're trying to test kernel implementation details, asynchronous probe race conditions, Makefile / linker ordering, and similar -- not anything that we actually expect to remain stable across kernel versions :(
I'm not sure there's a great stable way to refer to such devices, so maybe it'd be better to write this as "count the number of devices" instead? Or I think this particular driver supports an "id" sysfs attribute, which refers to a stable underlying firmware ID. But that'd involve even more device-specific logic.
It looks to me like the intent of the test is to find the device with the highest number and get a count that way but ICBW.
I don't think I even care *why* the ID changed; that ID is far from a stable thing, if I'm reading it correctly. At least most of the others refer to hardware addresses, which are a little more reasonable to rely on (even if the device naming still isn't a stable guarantee).
Yeah. My understanding is that the intent with bootrr is to be a smoke test which flags up if drivers aren't getting instantiated, taking a basic login test further forwards so it'll notice more devices. Like you say it's a bit of a fragile mechanism though.
kernel-build-reports@lists.linaro.org