Getting to a clean baseline with mainline

List overview All Threads
Download

newer

older

4.9.57/5d7a76ac: no regressions...

Adding coverage

Tom Gall

16 Oct 2017 16 Oct '17

10:41 p.m.

Hi All,

I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions.

Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4.

I haven’t captured everything since this requires manual c/n/p: (Being able to do command line queries would be really awesome)

With the triage meeting tomorrow I’d like to focus down in these areas.

x86_64 ltp-syscalls-tests - How much of this caused by NFS? Why are we still using NFS? linkat01 open12 openat02 renameat201 renameat202 sendfile09 sendfile09_64 utime01 utime02 utime06 utimes01

ltp_containers - looks like arm64 is in better shape? Did something not get replicated on x86? netns_breakns_ip_ipv4_ioctl netns_breakns_ip_ipv4_netlink netns_breakns_ip_ipv6_ioctl netns_breakns_ip_ipv6_netlink netns_breakns_ns_exec_ipv4_ioc netns_breakns_ns_exec_ipv4_net netns_breakns_ns_exec_ipv6_ioc netns_breakns_ns_exec_ipv6_net netns_comm_ip_ipv4_ioctl netns_comm_ip_ipv4_netlink netns_comm_ip_ipv6_ioctl netns_comm_ip_ipv6_netlink netns_comm_ns_exec_ipv4_ioctl netns_comm_ns_exec_ipv4_netlin netns_comm_ns_exec_ipv6_ioctl netns_comm_ns_exec_ipv6_netlin netns_sysfs

kselftest ftracetest - fixed in kselftest-next pstore_tests - extra command line options needed run_fuse_test.sh - fixed for ARM run.sh run_vmtests test_align test_kmod.sh test_progs test_verifier

HiKey quotactl01 - still missing quota quota_remount_test01

ltp-containers is better but…. : netns_comm_ip_ipv6_ioctl netns_comm_ip_ipv6_netlink netns_comm_ns_exec_ipv6_ioctl netns_comm_ns_exec_ipv6_netlin netns_sysfs

kselftest breakpoint_test_arm64 ftracetest pstore_tests run_fuse_test.sh run.sh run_vmtests seccomp_bpf test_align test_kmod.sh test_maps test_progs test_verifier

Juno ltp-syscall - looks like all NFS related? when can we dump NFS? open12 openat02 quotactl01 renameat201 renameat202 sendfile09 sendfile09_64 utime01 utime02 utime06 utimensat01 utimes01 quota_remount_test01 - missing quota

ltp-containers - same problems as on Hikey netns_comm_ip_ipv6_ioctl netns_comm_ip_ipv6_netlink netns_comm_ns_exec_ipv6_ioctl netns_comm_ns_exec_ipv6_netlin netns_sysfs

x15 perf_event_open02 quotactl01 leapsec_timer 22 failures in ltp-hugetlbfs - I thought we agreed we were going to put it on our skip list for now ltp-containers-test 18 failures — looks to be same as intel

It’s a long list but feels like with a fix or two in the test environment we could have a number of failures switch to green.

Show replies by date

Milosz Wasilewski

17 Oct 17 Oct

11:12 a.m.

On 16 October 2017 at 23:41, Tom Gall tom.gall@linaro.org wrote:

...

Hi All,

I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions.

Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4.

I haven’t captured everything since this requires manual c/n/p: (Being able to do command line queries would be really awesome)

With the triage meeting tomorrow I’d like to focus down in these areas.

x86_64 ltp-syscalls-tests - How much of this caused by NFS? Why are we still using NFS? linkat01 open12 openat02 renameat201 renameat202 sendfile09 sendfile09_64 utime01 utime02 utime06 utimes01

I think all are due to NFS. We're waiting for quotes for new HW. With old HW there is no other option than NFS.

...

ltp_containers - looks like arm64 is in better shape? Did something not get replicated on x86? netns_breakns_ip_ipv4_ioctl netns_breakns_ip_ipv4_netlink netns_breakns_ip_ipv6_ioctl netns_breakns_ip_ipv6_netlink netns_breakns_ns_exec_ipv4_ioc netns_breakns_ns_exec_ipv4_net netns_breakns_ns_exec_ipv6_ioc netns_breakns_ns_exec_ipv6_net netns_comm_ip_ipv4_ioctl netns_comm_ip_ipv4_netlink netns_comm_ip_ipv6_ioctl netns_comm_ip_ipv6_netlink netns_comm_ns_exec_ipv4_ioctl netns_comm_ns_exec_ipv4_netlin netns_comm_ns_exec_ipv6_ioctl netns_comm_ns_exec_ipv6_netlin netns_sysfs

kselftest ftracetest - fixed in kselftest-next pstore_tests - extra command line options needed run_fuse_test.sh - fixed for ARM run.sh run_vmtests test_align test_kmod.sh test_progs test_verifier

HiKey quotactl01 - still missing quota quota_remount_test01

ltp-containers is better but…. : netns_comm_ip_ipv6_ioctl netns_comm_ip_ipv6_netlink netns_comm_ns_exec_ipv6_ioctl netns_comm_ns_exec_ipv6_netlin netns_sysfs

kselftest breakpoint_test_arm64 ftracetest pstore_tests run_fuse_test.sh run.sh run_vmtests seccomp_bpf test_align test_kmod.sh test_maps test_progs test_verifier

bpf is going to be blacklisted today unless someone objects: https://review.linaro.org/#/c/21838/

...

Juno ltp-syscall - looks like all NFS related? when can we dump NFS? open12 openat02 quotactl01 renameat201 renameat202 sendfile09 sendfile09_64 utime01 utime02 utime06 utimensat01 utimes01

I think all above are due to NFS. I can't give a timeline for that. Currently there is no queue, so there is a room to do some experiments on juno.

...

quota_remount_test01 - missing quota

ltp-containers - same problems as on Hikey netns_comm_ip_ipv6_ioctl netns_comm_ip_ipv6_netlink netns_comm_ns_exec_ipv6_ioctl netns_comm_ns_exec_ipv6_netlin netns_sysfs

x15 perf_event_open02 quotactl01 leapsec_timer 22 failures in ltp-hugetlbfs - I thought we agreed we were going to put it on our skip list for now ltp-containers-test 18 failures — looks to be same as intel

It’s a long list but feels like with a fix or two in the test environment we could have a number of failures switch to green.

+1. I'll try to address NFS on juno.

milosz

...

Lts-dev mailing list Lts-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lts-dev

Greg KH

11:15 a.m.

On Tue, Oct 17, 2017 at 12:12:10PM +0100, Milosz Wasilewski wrote:

...

On 16 October 2017 at 23:41, Tom Gall tom.gall@linaro.org wrote:

...
Hi All,

I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions.

Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4.

I haven’t captured everything since this requires manual c/n/p: (Being able to do command line queries would be really awesome)

With the triage meeting tomorrow I’d like to focus down in these areas.

x86_64 ltp-syscalls-tests - How much of this caused by NFS? Why are we still using NFS? linkat01 open12 openat02 renameat201 renameat202 sendfile09 sendfile09_64 utime01 utime02 utime06 utimes01

I think all are due to NFS. We're waiting for quotes for new HW. With old HW there is no other option than NFS.

Really? You can't run off of a sdcard? some other networked filessytem that actually works? cifs? nfsv4? lustre? :)

...

...
kselftest breakpoint_test_arm64 ftracetest pstore_tests run_fuse_test.sh run.sh run_vmtests seccomp_bpf test_align test_kmod.sh test_maps test_progs test_verifier

bpf is going to be blacklisted today unless someone objects: https://review.linaro.org/#/c/21838/

Why would it be failing on 4.14-rc?

And don't blacklist the whole thing on older kernels please, if at all possible just don't run the ones that we "know" will fail as the feature is not present.

thanks,

greg k-h

Milosz Wasilewski

11:22 a.m.

On 17 October 2017 at 12:15, Greg KH gregkh@google.com wrote:

...

On Tue, Oct 17, 2017 at 12:12:10PM +0100, Milosz Wasilewski wrote:

...
On 16 October 2017 at 23:41, Tom Gall tom.gall@linaro.org wrote:

...
Hi All,

I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions.

Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4.

I haven’t captured everything since this requires manual c/n/p: (Being able to do command line queries would be really awesome)

With the triage meeting tomorrow I’d like to focus down in these areas.

x86_64 ltp-syscalls-tests - How much of this caused by NFS? Why are we still using NFS? linkat01 open12 openat02 renameat201 renameat202 sendfile09 sendfile09_64 utime01 utime02 utime06 utimes01

I think all are due to NFS. We're waiting for quotes for new HW. With old HW there is no other option than NFS.

Really? You can't run off of a sdcard? some other networked filessytem that actually works? cifs? nfsv4? lustre? :)

this is old crappy server that already boots from USB stick to load the kernel and mount nfs. There are hard drives but had to be disconnected to allow for loading kernels. And we just have one of it. So I'd rather get the new HW in place than patch this already broken setup.

...

...
...
kselftest breakpoint_test_arm64 ftracetest pstore_tests run_fuse_test.sh run.sh run_vmtests seccomp_bpf test_align test_kmod.sh test_maps test_progs test_verifier

bpf is going to be blacklisted today unless someone objects: https://review.linaro.org/#/c/21838/

Why would it be failing on 4.14-rc?

And don't blacklist the whole thing on older kernels please, if at all possible just don't run the ones that we "know" will fail as the feature is not present.

it's just BPF from kselftests: test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs test_align test_kmod.sh

They fail on 4.4 and 4.9 kernels. Other failures don't fall in this category. Remaining kselftests will be still executed. Is that what you ask for?

milosz

Greg KH

11:32 a.m.

On Tue, Oct 17, 2017 at 12:22:40PM +0100, Milosz Wasilewski wrote:

...

On 17 October 2017 at 12:15, Greg KH gregkh@google.com wrote:

...
On Tue, Oct 17, 2017 at 12:12:10PM +0100, Milosz Wasilewski wrote:

...
On 16 October 2017 at 23:41, Tom Gall tom.gall@linaro.org wrote:

...
Hi All,

I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions.

Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4.

I haven’t captured everything since this requires manual c/n/p: (Being able to do command line queries would be really awesome)

With the triage meeting tomorrow I’d like to focus down in these areas.

x86_64 ltp-syscalls-tests - How much of this caused by NFS? Why are we still using NFS? linkat01 open12 openat02 renameat201 renameat202 sendfile09 sendfile09_64 utime01 utime02 utime06 utimes01

I think all are due to NFS. We're waiting for quotes for new HW. With old HW there is no other option than NFS.

Really? You can't run off of a sdcard? some other networked filessytem that actually works? cifs? nfsv4? lustre? :)

this is old crappy server that already boots from USB stick to load the kernel and mount nfs. There are hard drives but had to be disconnected to allow for loading kernels. And we just have one of it. So I'd rather get the new HW in place than patch this already broken setup.

Oh wait, this is x86? Come on, you can't just get a new Dell server send to you within days to resolve this issue? NFS should not be an issue here :(

What's the status on the new hardware?

...

...
...
...
kselftest breakpoint_test_arm64 ftracetest pstore_tests run_fuse_test.sh run.sh run_vmtests seccomp_bpf test_align test_kmod.sh test_maps test_progs test_verifier

bpf is going to be blacklisted today unless someone objects: https://review.linaro.org/#/c/21838/

Why would it be failing on 4.14-rc?

And don't blacklist the whole thing on older kernels please, if at all possible just don't run the ones that we "know" will fail as the feature is not present.

it's just BPF from kselftests: test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs test_align test_kmod.sh

They fail on 4.4 and 4.9 kernels. Other failures don't fall in this category. Remaining kselftests will be still executed. Is that what you ask for?

This email thread was about getting everything "green" on mainline (i.e. 4.14-rc), and was not about 4.4 or 4.9, so I'm a bit confused why you would be saying it is going to be disabled on 4.14-rc...

thanks,

greg k-h

Milosz Wasilewski

12:35 p.m.

On 17 October 2017 at 12:32, Greg KH gregkh@google.com wrote:

...

On Tue, Oct 17, 2017 at 12:22:40PM +0100, Milosz Wasilewski wrote:

...
On 17 October 2017 at 12:15, Greg KH gregkh@google.com wrote:

...
On Tue, Oct 17, 2017 at 12:12:10PM +0100, Milosz Wasilewski wrote:

...
On 16 October 2017 at 23:41, Tom Gall tom.gall@linaro.org wrote:

...
Hi All,

I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions.

Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4.

I haven’t captured everything since this requires manual c/n/p: (Being able to do command line queries would be really awesome)

With the triage meeting tomorrow I’d like to focus down in these areas.

x86_64 ltp-syscalls-tests - How much of this caused by NFS? Why are we still using NFS? linkat01 open12 openat02 renameat201 renameat202 sendfile09 sendfile09_64 utime01 utime02 utime06 utimes01

I think all are due to NFS. We're waiting for quotes for new HW. With old HW there is no other option than NFS.

Really? You can't run off of a sdcard? some other networked filessytem that actually works? cifs? nfsv4? lustre? :)

this is old crappy server that already boots from USB stick to load the kernel and mount nfs. There are hard drives but had to be disconnected to allow for loading kernels. And we just have one of it. So I'd rather get the new HW in place than patch this already broken setup.

Oh wait, this is x86? Come on, you can't just get a new Dell server send to you within days to resolve this issue? NFS should not be an issue here :(

What's the status on the new hardware?

Dave?

...

...
...
...
...
kselftest breakpoint_test_arm64 ftracetest pstore_tests run_fuse_test.sh run.sh run_vmtests seccomp_bpf test_align test_kmod.sh test_maps test_progs test_verifier

bpf is going to be blacklisted today unless someone objects: https://review.linaro.org/#/c/21838/

Why would it be failing on 4.14-rc?

And don't blacklist the whole thing on older kernels please, if at all possible just don't run the ones that we "know" will fail as the feature is not present.

it's just BPF from kselftests: test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs test_align test_kmod.sh

They fail on 4.4 and 4.9 kernels. Other failures don't fall in this category. Remaining kselftests will be still executed. Is that what you ask for?

This email thread was about getting everything "green" on mainline (i.e. 4.14-rc), and was not about 4.4 or 4.9, so I'm a bit confused why you would be saying it is going to be disabled on 4.14-rc...

I take it back. It's only going to be disabled on 4.4 and 4.9 for now. I checked again on my local board as I suspected the old version of test causes the failures on more recent kernel. Unfortunately kselftests from the same commit fail as well as tests from -next.

Results with 4.15-rc5 kselftests:

Results with linux-next kselftests (I'm not 100% sure, but I think it's version next-20171013)

Running tests in bpf ======================================== selftests: test_verifier [FAIL] - file missing selftests: test_tag [PASS] selftests: test_maps [FAIL] selftests: test_lru_map [PASS] [ 304.806974] audit: type=1701 audit(1507803081.379:4): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2899 comm="test_lpm_map" exe="/opt/kselftests/next/bpf/test_lpm_map" sig=6 res=1 ./run_kselftest.sh: line 11: 2899 Aborted (core dumped) ./test_lpm_map > /tmp/test_lpm_map 2>&1 selftests: test_lpm_map [FAIL] [ 304.838407] audit: type=1701 audit(1507803081.411:5): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2902 comm="test_progs" exe="/opt/kselftests/next/bpf/test_progs" sig=6 res=1 ./run_kselftest.sh: line 12: 2902 Aborted (core dumped) ./test_progs > /tmp/test_progs 2>&1 selftests: test_progs [FAIL] - missing object files that the test expects selftests: test_align [FAIL] - file missing selftests: test_verifier_log [PASS] selftests: test_kmod.sh [FAIL] selftests: test_xdp_redirect.sh [PASS] selftests: test_xdp_meta.sh [PASS]

So it looks like sth is wrong with building the tests.

milosz

Greg KH

12:48 p.m.

On Tue, Oct 17, 2017 at 01:35:22PM +0100, Milosz Wasilewski wrote:

...

...
This email thread was about getting everything "green" on mainline (i.e. 4.14-rc), and was not about 4.4 or 4.9, so I'm a bit confused why you would be saying it is going to be disabled on 4.14-rc...

I take it back. It's only going to be disabled on 4.4 and 4.9 for now. I checked again on my local board as I suspected the old version of test causes the failures on more recent kernel. Unfortunately kselftests from the same commit fail as well as tests from -next.

Results with 4.15-rc5 kselftests:

Running tests in bpf

selftests: test_verifier [FAIL] - file missing

This passes for me here on Linus's tree.

...

selftests: test_tag [PASS] selftests: test_maps [FAIL]

This fails for me, odds are I don't have the needed config option enabled.

...

selftests: test_lru_map [PASS] selftests: test_lpm_map [PASS] [ 741.106598] audit: type=1701 audit(1507803522.407:6): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=3493 comm="test_progs" exe="/opt/kselftests/default-in-kernel/bpf/test_progs" sig=6 res=1 ./run_kselftest.sh: line 12: 3493 Aborted (core dumped) ./test_progs > /tmp/test_progs 2>&1

Ugh, turn off audit... This one works for me as well.

...

selftests: test_progs [FAIL] - missing object files that the test expects

Also works for me.

...

selftests: test_align [FAIL] - file missing

also succeds for me.

...

selftests: test_kmod.sh [FAIL]

Also works.

This is all from a "clean" 4.14-rc3 tree on my laptop. I can bootk 4.14-rc5 to see if something broke since then, but I doubt it...

Are you sure this is set up properly on your end, this is really odd.

Try running this all on your desktop with Linus's kernel running, and see what that shows.

thanks,

greg k-h

Greg KH

1:02 p.m.

On Tue, Oct 17, 2017 at 02:48:13PM +0200, Greg KH wrote:

...

On Tue, Oct 17, 2017 at 01:35:22PM +0100, Milosz Wasilewski wrote:

...
...
This email thread was about getting everything "green" on mainline (i.e. 4.14-rc), and was not about 4.4 or 4.9, so I'm a bit confused why you would be saying it is going to be disabled on 4.14-rc...

I take it back. It's only going to be disabled on 4.4 and 4.9 for now. I checked again on my local board as I suspected the old version of test causes the failures on more recent kernel. Unfortunately kselftests from the same commit fail as well as tests from -next.

Results with 4.15-rc5 kselftests:

Running tests in bpf

selftests: test_verifier [FAIL] - file missing

This passes for me here on Linus's tree.

...
selftests: test_tag [PASS] selftests: test_maps [FAIL]

This fails for me, odds are I don't have the needed config option enabled.

...
selftests: test_lru_map [PASS] selftests: test_lpm_map [PASS] [ 741.106598] audit: type=1701 audit(1507803522.407:6): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=3493 comm="test_progs" exe="/opt/kselftests/default-in-kernel/bpf/test_progs" sig=6 res=1 ./run_kselftest.sh: line 12: 3493 Aborted (core dumped) ./test_progs > /tmp/test_progs 2>&1

Ugh, turn off audit... This one works for me as well.

...
selftests: test_progs [FAIL] - missing object files that the test expects

Also works for me.

...
selftests: test_align [FAIL] - file missing

also succeds for me.

...
selftests: test_kmod.sh [FAIL]

Also works.

This is all from a "clean" 4.14-rc3 tree on my laptop. I can bootk 4.14-rc5 to see if something broke since then, but I doubt it...

I just tried 4.14-rc5 and got the same results as above, running the tests from 4.14-rc5.

Are you sure you are running these with the correct permissions?

I'll go try to figure out what is up with 'test_maps' now...

thanks,

greg k-h

Greg KH

1:05 p.m.

On Tue, Oct 17, 2017 at 03:02:50PM +0200, Greg KH wrote:

...

On Tue, Oct 17, 2017 at 02:48:13PM +0200, Greg KH wrote:

...
On Tue, Oct 17, 2017 at 01:35:22PM +0100, Milosz Wasilewski wrote:

...
...
This email thread was about getting everything "green" on mainline (i.e. 4.14-rc), and was not about 4.4 or 4.9, so I'm a bit confused why you would be saying it is going to be disabled on 4.14-rc...

I take it back. It's only going to be disabled on 4.4 and 4.9 for now. I checked again on my local board as I suspected the old version of test causes the failures on more recent kernel. Unfortunately kselftests from the same commit fail as well as tests from -next.

Results with 4.15-rc5 kselftests:

Running tests in bpf

selftests: test_verifier [FAIL] - file missing

This passes for me here on Linus's tree.

...
selftests: test_tag [PASS] selftests: test_maps [FAIL]

This fails for me, odds are I don't have the needed config option enabled.

...
selftests: test_lru_map [PASS] selftests: test_lpm_map [PASS] [ 741.106598] audit: type=1701 audit(1507803522.407:6): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=3493 comm="test_progs" exe="/opt/kselftests/default-in-kernel/bpf/test_progs" sig=6 res=1 ./run_kselftest.sh: line 12: 3493 Aborted (core dumped) ./test_progs > /tmp/test_progs 2>&1

Ugh, turn off audit... This one works for me as well.

...
selftests: test_progs [FAIL] - missing object files that the test expects

Also works for me.

...
selftests: test_align [FAIL] - file missing

also succeds for me.

...
selftests: test_kmod.sh [FAIL]

Also works.

This is all from a "clean" 4.14-rc3 tree on my laptop. I can bootk 4.14-rc5 to see if something broke since then, but I doubt it...

I just tried 4.14-rc5 and got the same results as above, running the tests from 4.14-rc5.

Are you sure you are running these with the correct permissions?

I'll go try to figure out what is up with 'test_maps' now...

Yes, it looks like I didn't have CONFIG_BPF_SYSCALL enabled, which is required by this test. Is that the issue for you as well?

thanks,

greg k-h

Greg KH

1:15 p.m.

On Tue, Oct 17, 2017 at 03:05:58PM +0200, Greg KH wrote:

...

On Tue, Oct 17, 2017 at 03:02:50PM +0200, Greg KH wrote:

...
On Tue, Oct 17, 2017 at 02:48:13PM +0200, Greg KH wrote:

...
On Tue, Oct 17, 2017 at 01:35:22PM +0100, Milosz Wasilewski wrote:

...
...
This email thread was about getting everything "green" on mainline (i.e. 4.14-rc), and was not about 4.4 or 4.9, so I'm a bit confused why you would be saying it is going to be disabled on 4.14-rc...

I take it back. It's only going to be disabled on 4.4 and 4.9 for now. I checked again on my local board as I suspected the old version of test causes the failures on more recent kernel. Unfortunately kselftests from the same commit fail as well as tests from -next.

Results with 4.15-rc5 kselftests:

Running tests in bpf

selftests: test_verifier [FAIL] - file missing

This passes for me here on Linus's tree.

...
selftests: test_tag [PASS] selftests: test_maps [FAIL]

This fails for me, odds are I don't have the needed config option enabled.

...
selftests: test_lru_map [PASS] selftests: test_lpm_map [PASS] [ 741.106598] audit: type=1701 audit(1507803522.407:6): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=3493 comm="test_progs" exe="/opt/kselftests/default-in-kernel/bpf/test_progs" sig=6 res=1 ./run_kselftest.sh: line 12: 3493 Aborted (core dumped) ./test_progs > /tmp/test_progs 2>&1

Ugh, turn off audit... This one works for me as well.

...
selftests: test_progs [FAIL] - missing object files that the test expects

Also works for me.

...
selftests: test_align [FAIL] - file missing

also succeds for me.

...
selftests: test_kmod.sh [FAIL]

Also works.

This is all from a "clean" 4.14-rc3 tree on my laptop. I can bootk 4.14-rc5 to see if something broke since then, but I doubt it...

I just tried 4.14-rc5 and got the same results as above, running the tests from 4.14-rc5.

Are you sure you are running these with the correct permissions?

I'll go try to figure out what is up with 'test_maps' now...

Yes, it looks like I didn't have CONFIG_BPF_SYSCALL enabled, which is required by this test. Is that the issue for you as well?

Sorry, I meant CONFIG_STREAM_PARSER.

Milosz Wasilewski

1:30 p.m.

On 17 October 2017 at 14:15, Greg KH gregkh@google.com wrote:

...

...
...
I'll go try to figure out what is up with 'test_maps' now...

Yes, it looks like I didn't have CONFIG_BPF_SYSCALL enabled, which is required by this test. Is that the issue for you as well?

Sorry, I meant CONFIG_STREAM_PARSER.

zcat /proc/config.gz | grep CONFIG_STREAM_PARSER # CONFIG_STREAM_PARSER is not set

and zcat /proc/config.gz | grep CONFIG_BPF CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_BPF_JIT=y # CONFIG_BPF_STREAM_PARSER is not set CONFIG_BPF_EVENTS=y

OTOH the config fragment for bpf test doesn't require it and that's why it's not enabled:

~/linux/tools/testing/selftests/bpf# cat config CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_NET_CLS_BPF=m CONFIG_BPF_EVENTS=y CONFIG_TEST_BPF=m

milosz

Greg KH

1:54 p.m.

On Tue, Oct 17, 2017 at 02:30:49PM +0100, Milosz Wasilewski wrote:

...

On 17 October 2017 at 14:15, Greg KH gregkh@google.com wrote:

...
...
...
I'll go try to figure out what is up with 'test_maps' now...

Yes, it looks like I didn't have CONFIG_BPF_SYSCALL enabled, which is required by this test. Is that the issue for you as well?

Sorry, I meant CONFIG_STREAM_PARSER.

zcat /proc/config.gz | grep CONFIG_STREAM_PARSER # CONFIG_STREAM_PARSER is not set

and zcat /proc/config.gz | grep CONFIG_BPF CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_BPF_JIT=y # CONFIG_BPF_STREAM_PARSER is not set CONFIG_BPF_EVENTS=y

OTOH the config fragment for bpf test doesn't require it and that's why it's not enabled:

~/linux/tools/testing/selftests/bpf# cat config CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_NET_CLS_BPF=m CONFIG_BPF_EVENTS=y CONFIG_TEST_BPF=m

Ok, well that's an easy bug to fix, patches are always welcome upstream :)

But even when I do enable it, it still fails for me, but with a different error: Failed empty parser prog detach

What is the error you are getting for this failure?

thanks,

greg k-h

Milosz Wasilewski

5:20 p.m.

On 17 October 2017 at 14:54, Greg KH gregkh@google.com wrote:

...

On Tue, Oct 17, 2017 at 02:30:49PM +0100, Milosz Wasilewski wrote:

...
On 17 October 2017 at 14:15, Greg KH gregkh@google.com wrote:

...
...
...
I'll go try to figure out what is up with 'test_maps' now...

Yes, it looks like I didn't have CONFIG_BPF_SYSCALL enabled, which is required by this test. Is that the issue for you as well?

Sorry, I meant CONFIG_STREAM_PARSER.

zcat /proc/config.gz | grep CONFIG_STREAM_PARSER # CONFIG_STREAM_PARSER is not set

and zcat /proc/config.gz | grep CONFIG_BPF CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_BPF_JIT=y # CONFIG_BPF_STREAM_PARSER is not set CONFIG_BPF_EVENTS=y

OTOH the config fragment for bpf test doesn't require it and that's why it's not enabled:

~/linux/tools/testing/selftests/bpf# cat config CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_NET_CLS_BPF=m CONFIG_BPF_EVENTS=y CONFIG_TEST_BPF=m

Ok, well that's an easy bug to fix, patches are always welcome upstream :)

But even when I do enable it, it still fails for me, but with a different error: Failed empty parser prog detach

What is the error you are getting for this failure?

root@hikey:/opt/kselftests/default-in-kernel/bpf# ./test_maps Failed to create sockmap -1

This is the pre-built version that comes from jenkins. I tried rebuilding natively and in this case bpf doesn't even build.

milosz

Greg KH

18 Oct 18 Oct

9:42 a.m.

On Tue, Oct 17, 2017 at 06:20:08PM +0100, Milosz Wasilewski wrote:

...

On 17 October 2017 at 14:54, Greg KH gregkh@google.com wrote:

...
On Tue, Oct 17, 2017 at 02:30:49PM +0100, Milosz Wasilewski wrote:

...
On 17 October 2017 at 14:15, Greg KH gregkh@google.com wrote:

...
...
...
I'll go try to figure out what is up with 'test_maps' now...

Yes, it looks like I didn't have CONFIG_BPF_SYSCALL enabled, which is required by this test. Is that the issue for you as well?

Sorry, I meant CONFIG_STREAM_PARSER.

zcat /proc/config.gz | grep CONFIG_STREAM_PARSER # CONFIG_STREAM_PARSER is not set

and zcat /proc/config.gz | grep CONFIG_BPF CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_BPF_JIT=y # CONFIG_BPF_STREAM_PARSER is not set CONFIG_BPF_EVENTS=y

OTOH the config fragment for bpf test doesn't require it and that's why it's not enabled:

~/linux/tools/testing/selftests/bpf# cat config CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_NET_CLS_BPF=m CONFIG_BPF_EVENTS=y CONFIG_TEST_BPF=m

Ok, well that's an easy bug to fix, patches are always welcome upstream :)

But even when I do enable it, it still fails for me, but with a different error: Failed empty parser prog detach

What is the error you are getting for this failure?

root@hikey:/opt/kselftests/default-in-kernel/bpf# ./test_maps Failed to create sockmap -1

Ok, that should be due to the config option. Try changing that and see what happens.

...

This is the pre-built version that comes from jenkins. I tried rebuilding natively and in this case bpf doesn't even build.

What is the error that you get?

Can't anyone just run this locally to test it out to verify if this is a test system issue vs. a "real" kernel issue like I did farther up this thread?

thanks,

greg k-h

Milosz Wasilewski

10:04 a.m.

On 18 October 2017 at 10:42, Greg KH gregkh@google.com wrote:

...

On Tue, Oct 17, 2017 at 06:20:08PM +0100, Milosz Wasilewski wrote:

...
On 17 October 2017 at 14:54, Greg KH gregkh@google.com wrote:

...
On Tue, Oct 17, 2017 at 02:30:49PM +0100, Milosz Wasilewski wrote:

...
On 17 October 2017 at 14:15, Greg KH gregkh@google.com wrote:

...
...
> I'll go try to figure out what is up with 'test_maps' now...

Yes, it looks like I didn't have CONFIG_BPF_SYSCALL enabled, which is required by this test. Is that the issue for you as well?

Sorry, I meant CONFIG_STREAM_PARSER.

zcat /proc/config.gz | grep CONFIG_STREAM_PARSER # CONFIG_STREAM_PARSER is not set

and zcat /proc/config.gz | grep CONFIG_BPF CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_BPF_JIT=y # CONFIG_BPF_STREAM_PARSER is not set CONFIG_BPF_EVENTS=y

OTOH the config fragment for bpf test doesn't require it and that's why it's not enabled:

~/linux/tools/testing/selftests/bpf# cat config CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_NET_CLS_BPF=m CONFIG_BPF_EVENTS=y CONFIG_TEST_BPF=m

Ok, well that's an easy bug to fix, patches are always welcome upstream :)

But even when I do enable it, it still fails for me, but with a different error: Failed empty parser prog detach

What is the error you are getting for this failure?

root@hikey:/opt/kselftests/default-in-kernel/bpf# ./test_maps Failed to create sockmap -1

Ok, that should be due to the config option. Try changing that and see what happens.

...
This is the pre-built version that comes from jenkins. I tried rebuilding natively and in this case bpf doesn't even build.

What is the error that you get?

make[2]: Entering directory '/home/root/linux/tools/testing/selftests/bpf' make -C ../../../lib/bpf OUTPUT=/home/root/linux/tools/testing/selftests/bpf/ make[3]: Entering directory '/home/root/linux/tools/lib/bpf' Makefile:143: tools/build/Makefile.include: No such file or directory make[3]: *** No rule to make target 'tools/build/Makefile.include'. Stop. make[3]: Leaving directory '/home/root/linux/tools/lib/bpf' make[2]: *** [Makefile:34: /home/root/linux/tools/testing/selftests/bpf/libbpf.a] Error 2 make[2]: Leaving directory '/home/root/linux/tools/testing/selftests/bpf'

I didn't debug any further.

...

Can't anyone just run this locally to test it out to verify if this is a test system issue vs. a "real" kernel issue like I did farther up this thread?

Sure, I'll have someone looking into that.

milosz

Dave Pigott

8:14 a.m.

...

On 17 Oct 2017, at 13:35, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

On 17 October 2017 at 12:32, Greg KH <gregkh@google.com mailto:gregkh@google.com> wrote:

...
On Tue, Oct 17, 2017 at 12:22:40PM +0100, Milosz Wasilewski wrote:

...
On 17 October 2017 at 12:15, Greg KH gregkh@google.com wrote:

...
On Tue, Oct 17, 2017 at 12:12:10PM +0100, Milosz Wasilewski wrote:

...
On 16 October 2017 at 23:41, Tom Gall tom.gall@linaro.org wrote:

...
Hi All,

I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions.

Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4.

I haven’t captured everything since this requires manual c/n/p: (Being able to do command line queries would be really awesome)

With the triage meeting tomorrow I’d like to focus down in these areas.

x86_64 ltp-syscalls-tests - How much of this caused by NFS? Why are we still using NFS? linkat01 open12 openat02 renameat201 renameat202 sendfile09 sendfile09_64 utime01 utime02 utime06 utimes01

I think all are due to NFS. We're waiting for quotes for new HW. With old HW there is no other option than NFS.

Really? You can't run off of a sdcard? some other networked filessytem that actually works? cifs? nfsv4? lustre? :)

this is old crappy server that already boots from USB stick to load the kernel and mount nfs. There are hard drives but had to be disconnected to allow for loading kernels. And we just have one of it. So I'd rather get the new HW in place than patch this already broken setup.

Oh wait, this is x86? Come on, you can't just get a new Dell server send to you within days to resolve this issue? NFS should not be an issue here :(

What's the status on the new hardware?

Dave?

Quote in. Authorisation should come today, on a two week lead time once ordered.

Dave

...

...
...
...
...
...
kselftest breakpoint_test_arm64 ftracetest pstore_tests run_fuse_test.sh run.sh run_vmtests seccomp_bpf test_align test_kmod.sh test_maps test_progs test_verifier

bpf is going to be blacklisted today unless someone objects: https://review.linaro.org/#/c/21838/

Why would it be failing on 4.14-rc?

And don't blacklist the whole thing on older kernels please, if at all possible just don't run the ones that we "know" will fail as the feature is not present.

it's just BPF from kselftests: test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs test_align test_kmod.sh

They fail on 4.4 and 4.9 kernels. Other failures don't fall in this category. Remaining kselftests will be still executed. Is that what you ask for?

This email thread was about getting everything "green" on mainline (i.e. 4.14-rc), and was not about 4.4 or 4.9, so I'm a bit confused why you would be saying it is going to be disabled on 4.14-rc...

I take it back. It's only going to be disabled on 4.4 and 4.9 for now. I checked again on my local board as I suspected the old version of test causes the failures on more recent kernel. Unfortunately kselftests from the same commit fail as well as tests from -next.

Results with 4.15-rc5 kselftests:

Running tests in bpf

selftests: test_verifier [FAIL] - file missing selftests: test_tag [PASS] selftests: test_maps [FAIL] selftests: test_lru_map [PASS] selftests: test_lpm_map [PASS] [ 741.106598] audit: type=1701 audit(1507803522.407:6): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=3493 comm="test_progs" exe="/opt/kselftests/default-in-kernel/bpf/test_progs" sig=6 res=1 ./run_kselftest.sh: line 12: 3493 Aborted (core dumped) ./test_progs > /tmp/test_progs 2>&1 selftests: test_progs [FAIL] - missing object files that the test expects selftests: test_align [FAIL] - file missing selftests: test_kmod.sh [FAIL] selftests: test_xdp_redirect.sh [PASS]

Results with linux-next kselftests (I'm not 100% sure, but I think it's version next-20171013)

Running tests in bpf

selftests: test_verifier [FAIL] - file missing selftests: test_tag [PASS] selftests: test_maps [FAIL] selftests: test_lru_map [PASS] [ 304.806974] audit: type=1701 audit(1507803081.379:4): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2899 comm="test_lpm_map" exe="/opt/kselftests/next/bpf/test_lpm_map" sig=6 res=1 ./run_kselftest.sh: line 11: 2899 Aborted (core dumped) ./test_lpm_map > /tmp/test_lpm_map 2>&1 selftests: test_lpm_map [FAIL] [ 304.838407] audit: type=1701 audit(1507803081.411:5): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=2902 comm="test_progs" exe="/opt/kselftests/next/bpf/test_progs" sig=6 res=1 ./run_kselftest.sh: line 12: 2902 Aborted (core dumped) ./test_progs > /tmp/test_progs 2>&1 selftests: test_progs [FAIL] - missing object files that the test expects selftests: test_align [FAIL] - file missing selftests: test_verifier_log [PASS] selftests: test_kmod.sh [FAIL] selftests: test_xdp_redirect.sh [PASS] selftests: test_xdp_meta.sh [PASS]

So it looks like sth is wrong with building the tests.

milosz

Greg KH

9:40 a.m.

On Wed, Oct 18, 2017 at 09:14:27AM +0100, Dave Pigott wrote:

...

...
On 17 Oct 2017, at 13:35, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

On 17 October 2017 at 12:32, Greg KH <gregkh@google.com mailto:gregkh@google.com> wrote:

...
On Tue, Oct 17, 2017 at 12:22:40PM +0100, Milosz Wasilewski wrote:

...
On 17 October 2017 at 12:15, Greg KH gregkh@google.com wrote:

...
On Tue, Oct 17, 2017 at 12:12:10PM +0100, Milosz Wasilewski wrote:

...
On 16 October 2017 at 23:41, Tom Gall tom.gall@linaro.org wrote: > Hi All, > > I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions. > > Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4. > > > I haven’t captured everything since this requires manual c/n/p: (Being able to do command line queries would be really awesome) > > With the triage meeting tomorrow I’d like to focus down in these areas. > > > x86_64 > ltp-syscalls-tests - How much of this caused by NFS? Why are we still using NFS? > linkat01 > open12 > openat02 > renameat201 > renameat202 > sendfile09 > sendfile09_64 > utime01 > utime02 > utime06 > utimes01

I think all are due to NFS. We're waiting for quotes for new HW. With old HW there is no other option than NFS.

Really? You can't run off of a sdcard? some other networked filessytem that actually works? cifs? nfsv4? lustre? :)

this is old crappy server that already boots from USB stick to load the kernel and mount nfs. There are hard drives but had to be disconnected to allow for loading kernels. And we just have one of it. So I'd rather get the new HW in place than patch this already broken setup.

Oh wait, this is x86? Come on, you can't just get a new Dell server send to you within days to resolve this issue? NFS should not be an issue here :(

What's the status on the new hardware?

Dave?

Quote in. Authorisation should come today, on a two week lead time once ordered.

Great, so maybe a month before this can be in the system?

In the meantime, don't you have a spare laptop around somewhere that you can use to run this on? :)

thanks,

greg k-h

Dave Pigott

9:54 a.m.

...

On 18 Oct 2017, at 10:40, Greg KH gregkh@google.com wrote:

On Wed, Oct 18, 2017 at 09:14:27AM +0100, Dave Pigott wrote:

...
...
On 17 Oct 2017, at 13:35, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

On 17 October 2017 at 12:32, Greg KH <gregkh@google.com mailto:gregkh@google.com> wrote:

...
On Tue, Oct 17, 2017 at 12:22:40PM +0100, Milosz Wasilewski wrote:

...
On 17 October 2017 at 12:15, Greg KH gregkh@google.com wrote:

...
On Tue, Oct 17, 2017 at 12:12:10PM +0100, Milosz Wasilewski wrote: > On 16 October 2017 at 23:41, Tom Gall tom.gall@linaro.org wrote: >> Hi All, >> >> I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions. >> >> Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4. >> >> >> I haven’t captured everything since this requires manual c/n/p: (Being able to do command line queries would be really awesome) >> >> With the triage meeting tomorrow I’d like to focus down in these areas. >> >> >> x86_64 >> ltp-syscalls-tests - How much of this caused by NFS? Why are we still using NFS? >> linkat01 >> open12 >> openat02 >> renameat201 >> renameat202 >> sendfile09 >> sendfile09_64 >> utime01 >> utime02 >> utime06 >> utimes01 > > I think all are due to NFS. We're waiting for quotes for new HW. With > old HW there is no other option than NFS.

Really? You can't run off of a sdcard? some other networked filessytem that actually works? cifs? nfsv4? lustre? :)

this is old crappy server that already boots from USB stick to load the kernel and mount nfs. There are hard drives but had to be disconnected to allow for loading kernels. And we just have one of it. So I'd rather get the new HW in place than patch this already broken setup.

Oh wait, this is x86? Come on, you can't just get a new Dell server send to you within days to resolve this issue? NFS should not be an issue here :(

What's the status on the new hardware?

Dave?

Quote in. Authorisation should come today, on a two week lead time once ordered.

Great, so maybe a month before this can be in the system?

In the meantime, don't you have a spare laptop around somewhere that you can use to run this on? :)

I’ve asked the supplier if it’s possible to expedite. The main problem we have is that LAVA needs a serial connection for automation, and not many laptops supply those. :)

In the meantime I’ll see if I can scrounge something from somewhere else in the lab.

Dave

Greg KH

10:26 a.m.

On Wed, Oct 18, 2017 at 10:54:20AM +0100, Dave Pigott wrote:

...

...
In the meantime, don't you have a spare laptop around somewhere that you can use to run this on? :)

I’ve asked the supplier if it’s possible to expedite. The main problem we have is that LAVA needs a serial connection for automation, and not many laptops supply those. :)

Do usb-serial converters work with LAVA?

Neil Williams

10:33 a.m.

On 18 October 2017 at 11:26, Greg KH gregkh@google.com wrote:

...

On Wed, Oct 18, 2017 at 10:54:20AM +0100, Dave Pigott wrote:

...
...
In the meantime, don't you have a spare laptop around somewhere that

you

...
...
can use to run this on? :)

I’ve asked the supplier if it’s possible to expedite. The main problem we have is that LAVA needs a serial connection for automation, and not many laptops supply those. :)

Do usb-serial converters work with LAVA?

Yes. We've used usb-serial converters in the past with devices which have genuine ports but that's the opposite direction.

The main problem with using usb-serial for a test device is whether the test device can be persuaded to output firmware, bootloader and kernel messages on /dev/ttyUSB? - it's not just the kernel command line, LAVA will need to interact with the bootloader to specify the location of the kernel to be downloaded.

usb-serial would be supported as a second serial connection (code currently in review) but as the primary connection, I'm not sure it would work.

A PCI serial port would be much better.

As long as the device can be reconfigured to only boot PXE then LAVA could interact with grub just as with current x86. It's all a question of whether the PXE and Grub messages can be put onto /dev/ttyUSB? - there may be limitations in the BIOS/UEFI.

-- Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/

Dan Rue

17 Oct 17 Oct

4:36 p.m.

On Mon, Oct 16, 2017 at 10:41:43PM +0000, Tom Gall wrote:

...

Hi All,

I’m looking at the 4.14-rc5 results. I think it’s important that we establish clear green baselines so we can detect regressions.

Universally I think we want to get stuff on skip lists and then get after those skip lists working on fixes. As that happens, we basically get those fixes back ported to 4.9 and 4.4.

+1.

I support a rather strict and aggressive policy of skipping flaky or failing tests, followed by the work to make root cause fixes and get them re-introduced, and new tests introduced, but only once they are provably stable. It is not wise to try to fix tests alongside trying to finding actual regressions - the test environment should be considered production/stable and dev work should happen outside it. I don't suggest that this policy be permanent, but I think it would be effective in the short and medium term to gain trust and eyeballs. This may mask legitimate problems for the time being, but it is more important to establish a stable 'baseline'.

Dan

3069

days inactive

3071

days old

lts-dev@lists.linaro.org

20 comments

participants

tags (0)

participants (6)

Dan Rue
Dave Pigott
Greg KH
Milosz Wasilewski
Neil Williams
Tom Gall