CC to linaro-kernel.
Good job, Chase!
A few comments on the testing!
1, It is performance testing. data is only useful when do kernel comparison. but in different testing we may use different benchmark version as follow log show. that make result variational and useless.
===== additional disk space will be used. Get:1 http://ports.ubuntu.com/ubuntu-ports/ vivid/universe dbench arm64 4.0-2 [1921 kB] Fetched 1921 kB in 0s (19.2 MB/s) =====
2, performance testing often has variational results, that normally requests repeat testing and collect the average, standard deviations etc index of results. We need to collect repeatly running data and to decide how many times re-run needed.
3, on this test 'dbench' each of operations are kind of performance result. we need to store their all, includs Count, AvgLat, MaxLat, not only the Throughput. And the 8 clients/8 procs are testing parameters, that is meaningless to store as test case. For each of benchmarks, we need to tune them one by one on our testing machine to find out typical/meaningful parameters, For most benchmark, it is good to re-test them with different parameters according to our testing boards.
$dbench 8 -c /usr/share/dbench/client.txt .... Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 2011265 0.848 1478.406 Close 1477355 0.350 1165.450 Rename 85163 1.263 62.960 Unlink 406180 0.517 1287.522 Deltree 48 57.127 186.366 Mkdir 24 0.009 0.027 Qpathinfo 1823148 0.567 1445.759 Qfileinfo 319390 0.272 486.622 Qfsinfo 334240 0.421 1161.980 Sfileinfo 163808 0.558 993.785 Find 704767 0.874 1164.246 WriteX 1002240 0.032 9.801 ReadX 3152551 0.032 662.566 LockX 6550 0.011 0.727 UnlockX 6550 0.005 0.535 Flush 140954 0.613 53.237 Throughput 105.205 MB/sec 8 clients 8 procs max_latency=1478.412 ms wait for background monitors: perf-profile
4, perf tool output is very important for kernel developer to know why we got this performance data, where need to improve, that is same important as test results. So we'd better to figure out how many perf data we can get from testing, and collect them.
On 06/10/2015 02:01 PM, Riku Voipio wrote:
On 9 June 2015 at 22:33, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:
On 9 June 2015 at 18:50, Mark Brown broonie@linaro.org wrote:
On 8 June 2015 at 14:39, Riku Voipio riku.voipio@linaro.org wrote:
I've pushed my version of LKP test definition: https://review.linaro.org/#/c/6382/ So I don't expect to work on that side anymore. I'll still fix the few benhmarks that don't build on Aarch64.
Mark or Kevin, can you give a spin on the tests at the current state?
I'm not sure how to get LAVA to run a test definition from a gerritt review. In so far as I'm able to review by looking at the code it looks good.
It probably would be possible, but I also have no idea how to do that :)
Chase presented his attempt today. Here are the results: https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/a... LKP produces a JSON file with the results. Chase took the file and translated it to LAVA results. If I understood correctly, the JSON schema is unified for all benchmarks so we will be able to run it in LAVA with just the list of benchmarks to use (similar to LTP).
Looks great, thanks Chase!.
I'm not sure this is wise unless we actually have a realistic intention of actually running these tests, we'd need to be very clear about that.
My plan is to include them in LSK testing. If everything goes fine, it will happen in ~2 weeks from now.
Ok, I think we can wait for that - I'm escaping to VAC in the end of this month, but I think you can manage fine without me :)
Riku
On 10 June 2015 at 08:04, Alex Shi alex.shi@linaro.org wrote:
2, performance testing often has variational results, that normally requests repeat testing and collect the average, standard deviations etc index of results. We need to collect repeatly running data and to decide how many times re-run needed.
I wonder if this should be done at a level up from the test definition for a given test itself, or perhaps a library that the test definitions can use - like you say this seems like something that's going to be needed for many tests (though for some the test will already have its own implementation).
4, perf tool output is very important for kernel developer to know why we got this performance data, where need to improve, that is same important as test results. So we'd better to figure out how many perf data we can get from testing, and collect them.
This might be something else that could be shared.
On 06/10/2015 06:53 PM, Mark Brown wrote:
On 10 June 2015 at 08:04, Alex Shi <alex.shi@linaro.org mailto:alex.shi@linaro.org> wrote:
2, performance testing often has variational results, that normally requests repeat testing and collect the average, standard deviations etc index of results. We need to collect repeatly running data and to decide how many times re-run needed.
I wonder if this should be done at a level up from the test definition for a given test itself, or perhaps a library that the test definitions can use - like you say this seems like something that's going to be needed for many tests (though for some the test will already have its own implementation).
If my memory correct, the LKP should have this function in test script. We just need to tune it for each of benchmark and each different boards.
4, perf tool output is very important for kernel developer to know why we got this performance data, where need to improve, that is same important as test results. So we'd better to figure out how many perf data we can get from testing, and collect them.
This might be something else that could be shared.
Yes, like other kind of profile tools etc.
The parsing of test output is done by LKP, LKP save metrics to json files, our test definition decode the json file and send them to LAVA. If we want to have all the sub-metrics, I guess patch the LKP test suite and send it to upstream is the right way to go. IHMO, it can be done, but not at this stage.
The data from 'time' and 'perf' also will be saved by LKP. I think the 'avg.json' is the right file I should parse, it included metrics of the benchmark and time and perf. I added a 'LOOPS' parameter in test definition to support repeatedly run. If we run the test more then once, the data in avg.json will be the average of the runs. Here is a lava job example https://validation.linaro.org/scheduler/job/382401
Hopefully, it is what we need. Would you please check and let me know your opinion?
Thanks, Chase
On 10 June 2015 at 19:41, Alex Shi alex.shi@linaro.org wrote:
On 06/10/2015 06:53 PM, Mark Brown wrote:
On 10 June 2015 at 08:04, Alex Shi <alex.shi@linaro.org mailto:alex.shi@linaro.org> wrote:
2, performance testing often has variational results, that normally requests repeat testing and collect the average, standard deviations etc index of results. We need to collect repeatly running data and to decide how many times re-run needed.
I wonder if this should be done at a level up from the test definition for a given test itself, or perhaps a library that the test definitions can use - like you say this seems like something that's going to be needed for many tests (though for some the test will already have its own implementation).
If my memory correct, the LKP should have this function in test script. We just need to tune it for each of benchmark and each different boards.
4, perf tool output is very important for kernel developer to know why we got this performance data, where need to improve, that is same important as test results. So we'd better to figure out how many perf data we can get from testing, and collect them.
This might be something else that could be shared.
Yes, like other kind of profile tools etc.
On 06/11/2015 11:55 AM, Chase Qi wrote:
The parsing of test output is done by LKP, LKP save metrics to json files, our test definition decode the json file and send them to LAVA. If we want to have all the sub-metrics, I guess patch the LKP test suite and send it to upstream is the right way to go. IHMO, it can be done, but not at this stage.
Maybe the upstream LKP don't want our specific parse for LAVA. We probably need to handle them by ourself. and if the test output can not be show out clear/appropriately, it willn't so helpful for us.
The data from 'time' and 'perf' also will be saved by LKP. I think the 'avg.json' is the right file I should parse, it included metrics of the benchmark and time and perf. I added a 'LOOPS' parameter in test definition to support repeatedly run. If we run the test more then once, the data in avg.json will be the average of the runs. Here is a lava job example https://validation.linaro.org/scheduler/job/382401
It is hard to figure out something useful from this links. https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/8...
seems it doesn't work now. Could you like to resend the report when everything right.
Hopefully, it is what we need. Would you please check and let me know your opinion?
On 11 June 2015 at 08:17, Alex Shi alex.shi@linaro.org wrote:
On 06/11/2015 11:55 AM, Chase Qi wrote:
The parsing of test output is done by LKP, LKP save metrics to json files, our test definition decode the json file and send them to LAVA. If we want to have all the sub-metrics, I guess patch the LKP test suite and send it to upstream is the right way to go. IHMO, it can be done, but not at this stage.
Maybe the upstream LKP don't want our specific parse for LAVA. We probably need to handle them by ourself. and if the test output can not be show out clear/appropriately, it willn't so helpful for us.
There is nothing LAVA specific there. Chase is using LKP output only and LKP doesn't save the table you presented in any way. So if we want to have the data, LKP needs to be patched.
The data from 'time' and 'perf' also will be saved by LKP. I think the 'avg.json' is the right file I should parse, it included metrics of the benchmark and time and perf. I added a 'LOOPS' parameter in test definition to support repeatedly run. If we run the test more then once, the data in avg.json will be the average of the runs. Here is a lava job example https://validation.linaro.org/scheduler/job/382401
It is hard to figure out something useful from this links. https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/8...
seems it doesn't work now. Could you like to resend the report when everything right.
It does work, here are detailed results: https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/8... Alex, we're not kernel hackers and we don't know what's important and what is not. Chase is asking for help identifying the important bits. Complaining that what we present is not what you want without details doesn't help :(
milosz
Hopefully, it is what we need. Would you please check and let me know your opinion?
On 06/11/2015 03:26 PM, Milosz Wasilewski wrote:
On 11 June 2015 at 08:17, Alex Shi alex.shi@linaro.org wrote:
On 06/11/2015 11:55 AM, Chase Qi wrote:
The parsing of test output is done by LKP, LKP save metrics to json files, our test definition decode the json file and send them to LAVA. If we want to have all the sub-metrics, I guess patch the LKP test suite and send it to upstream is the right way to go. IHMO, it can be done, but not at this stage.
Maybe the upstream LKP don't want our specific parse for LAVA. We probably need to handle them by ourself. and if the test output can not be show out clear/appropriately, it willn't so helpful for us.
There is nothing LAVA specific there. Chase is using LKP output only and LKP doesn't save the table you presented in any way. So if we want to have the data, LKP needs to be patched.
Seems there are some misunderstanding here. I didn't mean we don't need a patch for parse. That needed. I just don't know if LKP upstream like to pick up this 'json file decode' script.
The data from 'time' and 'perf' also will be saved by LKP. I think the 'avg.json' is the right file I should parse, it included metrics of the benchmark and time and perf. I added a 'LOOPS' parameter in test definition to support repeatedly run. If we run the test more then once, the data in avg.json will be the average of the runs. Here is a lava job example https://validation.linaro.org/scheduler/job/382401
It is hard to figure out something useful from this links. https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/8...
seems it doesn't work now. Could you like to resend the report when everything right.
It does work, here are detailed results: https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/8...
Sorry for miss this.
For this results show, it is still organized as functional testing results. and mixed the profile data with benchmark data and even with 'split job', 'setup local dbench' etc setup step as benchmarks.
We'd better to split out our targets that just benchmark data. Also we care about the measurement value instead of 'pass' or 'fail' of benchmarks.
The following format would be better than current. | kernel 1| kernel 2| |benchmark 1| value x | value x2| |benchmark 2| value y | value y2|
And further steps, we'd better set up an auto compare function to tracking if some measurement has regression on new kernel version. At that time, it worth to look into for details.
Alex, we're not kernel hackers and we don't know what's important and what is not.
I knew this, that is why I explain what's important or useful for kernel engineers.
Chase is asking for help identifying the important bits. Complaining that what we present is not what you want without details doesn't help :(
I am sorry, if the feature request looks just complains. I do appreciate what Riku and Chase did on this job!
Guess we should have the same goal, that is making the performance testing useful and reliable for kernel engineers in linaro. Not sth we made it in a hurry, but no one like using it, since it hard to get details and missed useful info.
milosz
Hopefully, it is what we need. Would you please check and let me know your opinion?
On 11 June 2015 at 16:31, Alex Shi alex.shi@linaro.org wrote:
On 06/11/2015 03:26 PM, Milosz Wasilewski wrote:
On 11 June 2015 at 08:17, Alex Shi alex.shi@linaro.org wrote:
On 06/11/2015 11:55 AM, Chase Qi wrote:
The parsing of test output is done by LKP, LKP save metrics to json files, our test definition decode the json file and send them to LAVA. If we want to have all the sub-metrics, I guess patch the LKP test suite and send it to upstream is the right way to go. IHMO, it can be done, but not at this stage.
Maybe the upstream LKP don't want our specific parse for LAVA. We probably need to handle them by ourself. and if the test output can not be show out clear/appropriately, it willn't so helpful for us.
There is nothing LAVA specific there. Chase is using LKP output only and LKP doesn't save the table you presented in any way. So if we want to have the data, LKP needs to be patched.
Seems there are some misunderstanding here. I didn't mean we don't need a patch for parse. That needed. I just don't know if LKP upstream like to pick up this 'json file decode' script.
The data from 'time' and 'perf' also will be saved by LKP. I think the 'avg.json' is the right file I should parse, it included metrics of the benchmark and time and perf. I added a 'LOOPS' parameter in test definition to support repeatedly run. If we run the test more then once, the data in avg.json will be the average of the runs. Here is a lava job example https://validation.linaro.org/scheduler/job/382401
It is hard to figure out something useful from this links. https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/8...
seems it doesn't work now. Could you like to resend the report when everything right.
It does work, here are detailed results: https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/8...
Sorry for miss this.
For this results show, it is still organized as functional testing results. and mixed the profile data with benchmark data and even with 'split job', 'setup local dbench' etc setup step as benchmarks.
We'd better to split out our targets that just benchmark data. Also we care about the measurement value instead of 'pass' or 'fail' of benchmarks.
Alex,
In LAVA, if test failed, or result is missing, we want to know what was wrong, so we design the case to check test result of each step. Even I remove this check points, "lava-test-shell-install" and "lava-test-shell-run" are produced by LAVA for the same purpose, they exist for all test. IMO, it is a feature, not a trouble.
If we run test in the same lava-test-shell, then all test results will be saved to the same test run. I don't think we can categorize them at the moment.
The following format would be better than current. | kernel 1| kernel 2| |benchmark 1| value x | value x2| |benchmark 2| value y | value y2|
And further steps, we'd better set up an auto compare function to tracking if some measurement has regression on new kernel version. At that time, it worth to look into for details.
Alex, we're not kernel hackers and we don't know what's important and what is not.
I knew this, that is why I explain what's important or useful for kernel engineers.
I run the dbench 3 times and attached the test log from LKP, please check the json files inside and let me know what are important for you. If the numbers you are looking for not included in the json files, then we need patch LKP to pick up the numbers. If we care the numbers, I guess LKP should care too, otherwise, why we care?
Chase is asking for help identifying the important bits. Complaining that what we present is not what you want without details doesn't help :(
I am sorry, if the feature request looks just complains. I do appreciate what Riku and Chase did on this job!
Guess we should have the same goal, that is making the performance testing useful and reliable for kernel engineers in linaro. Not sth we made it in a hurry, but no one like using it, since it hard to get details and missed useful info.
I think we can get it run first, then improve it step by step. “You can't get fat from eating a single mouthful” goes a saying in China.
Thanks, Chase
milosz
Hopefully, it is what we need. Would you please check and let me know your opinion?
On 11 June 2015 at 09:31, Alex Shi alex.shi@linaro.org wrote:
On 06/11/2015 03:26 PM, Milosz Wasilewski wrote:
On 11 June 2015 at 08:17, Alex Shi alex.shi@linaro.org wrote:
On 06/11/2015 11:55 AM, Chase Qi wrote:
The parsing of test output is done by LKP, LKP save metrics to json files, our test definition decode the json file and send them to LAVA. If we want to have all the sub-metrics, I guess patch the LKP test suite and send it to upstream is the right way to go. IHMO, it can be done, but not at this stage.
Maybe the upstream LKP don't want our specific parse for LAVA. We probably need to handle them by ourself. and if the test output can not be show out clear/appropriately, it willn't so helpful for us.
There is nothing LAVA specific there. Chase is using LKP output only and LKP doesn't save the table you presented in any way. So if we want to have the data, LKP needs to be patched.
Seems there are some misunderstanding here. I didn't mean we don't need a patch for parse. That needed. I just don't know if LKP upstream like to pick up this 'json file decode' script.
This script is a part of our LAVA integration and doesn't need to go to upstream LKP. What's missing (if I understand correctly) from LKP are these values:
Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 2011265 0.848 1478.406 Close 1477355 0.350 1165.450 Rename 85163 1.263 62.960 Unlink 406180 0.517 1287.522 Deltree 48 57.127 186.366 Mkdir 24 0.009 0.027 Qpathinfo 1823148 0.567 1445.759 Qfileinfo 319390 0.272 486.622 Qfsinfo 334240 0.421 1161.980 Sfileinfo 163808 0.558 993.785 Find 704767 0.874 1164.246 WriteX 1002240 0.032 9.801 ReadX 3152551 0.032 662.566 LockX 6550 0.011 0.727 UnlockX 6550 0.005 0.535 Flush 140954 0.613 53.237
If they are important for you, LKP needs to be patched to include them in the LKP results, not LAVA results. Our LAVA integration takes only what LKP produces.
The data from 'time' and 'perf' also will be saved by LKP. I think the 'avg.json' is the right file I should parse, it included metrics of the benchmark and time and perf. I added a 'LOOPS' parameter in test definition to support repeatedly run. If we run the test more then once, the data in avg.json will be the average of the runs. Here is a lava job example https://validation.linaro.org/scheduler/job/382401
It is hard to figure out something useful from this links. https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/8...
seems it doesn't work now. Could you like to resend the report when everything right.
It does work, here are detailed results: https://validation.linaro.org/dashboard/streams/anonymous/chase-qi/bundles/8...
Sorry for miss this.
For this results show, it is still organized as functional testing results. and mixed the profile data with benchmark data and even with 'split job', 'setup local dbench' etc setup step as benchmarks.
We'd better to split out our targets that just benchmark data. Also we care about the measurement value instead of 'pass' or 'fail' of benchmarks.
that isn't possible at this stage. LAVA structures the results this way.
The following format would be better than current. | kernel 1| kernel 2| |benchmark 1| value x | value x2| |benchmark 2| value y | value y2|
Again, this isn't possible for single run as we're only running on a single kernel. Such feature requires raw data postprocessing ans most likely will not be a part of LAVA. LAVA will be used to run tests and store raw data. The comparison will happen somewhere else. I just had a meeting with ARM ART (Android RunTime) team and they requested similar comparison features for their benchmarks. They are willing to share code they're using. This includes DB for storing the benchmark results and some scripts that do comparison. So eventually we will get the build-to-build or branch-to-branch comparison. For the moment let's focus on collecting the benchmark data and making sure we store everything you need.
And further steps, we'd better set up an auto compare function to tracking if some measurement has regression on new kernel version. At that time, it worth to look into for details.
Alex, we're not kernel hackers and we don't know what's important and what is not.
I knew this, that is why I explain what's important or useful for kernel engineers.
Chase is asking for help identifying the important bits. Complaining that what we present is not what you want without details doesn't help :(
I am sorry, if the feature request looks just complains. I do appreciate what Riku and Chase did on this job!
Guess we should have the same goal, that is making the performance testing useful and reliable for kernel engineers in linaro. Not sth we made it in a hurry, but no one like using it, since it hard to get details and missed useful info.
+1 on that. Let's try to identify the important data, store it and have ready for postprocessing. If LKP is missing something, we need to fix it upstream.
milosz
On 11 June 2015 at 15:18, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:
On 11 June 2015 at 09:31, Alex Shi alex.shi@linaro.org wrote:
The following format would be better than current. | kernel 1| kernel 2| |benchmark 1| value x | value x2| |benchmark 2| value y | value y2|
Again, this isn't possible for single run as we're only running on a single kernel. Such feature requires raw data postprocessing ans most likely will not be a part of LAVA. LAVA will be used to run tests and store raw data. The comparison will happen somewhere else. I just had a meeting with ARM ART (Android RunTime) team and they requested similar comparison features for their benchmarks. They are willing to share code they're using. This includes DB for storing the benchmark results and some scripts that do comparison. So eventually we will get the build-to-build or branch-to-branch comparison. For the moment let's focus on collecting the benchmark data and making sure we store everything you need.
It seems LKP itself has result comparing tools. We can upload the raw data as a test run attachment to LAVA, and extract them for post processing with LKP scripts. It is worth at least testing these before re-inventing them.
Riku
On 11 June 2015 at 13:52, Riku Voipio riku.voipio@linaro.org wrote:
On 11 June 2015 at 15:18, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:
On 11 June 2015 at 09:31, Alex Shi alex.shi@linaro.org wrote:
The following format would be better than current. | kernel 1| kernel 2| |benchmark 1| value x | value x2| |benchmark 2| value y | value y2|
Again, this isn't possible for single run as we're only running on a single kernel. Such feature requires raw data postprocessing ans most likely will not be a part of LAVA. LAVA will be used to run tests and store raw data. The comparison will happen somewhere else. I just had a meeting with ARM ART (Android RunTime) team and they requested similar comparison features for their benchmarks. They are willing to share code they're using. This includes DB for storing the benchmark results and some scripts that do comparison. So eventually we will get the build-to-build or branch-to-branch comparison. For the moment let's focus on collecting the benchmark data and making sure we store everything you need.
It seems LKP itself has result comparing tools. We can upload the raw data as a test run attachment to LAVA, and extract them for post processing with LKP scripts. It is worth at least testing these before re-inventing them.
I will try that. Thanks for pointing that out.
milosz
Riku Voipio riku.voipio@linaro.org writes:
On 11 June 2015 at 15:18, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:
On 11 June 2015 at 09:31, Alex Shi alex.shi@linaro.org wrote:
The following format would be better than current. | kernel 1| kernel 2| |benchmark 1| value x | value x2| |benchmark 2| value y | value y2|
Again, this isn't possible for single run as we're only running on a single kernel. Such feature requires raw data postprocessing ans most likely will not be a part of LAVA. LAVA will be used to run tests and store raw data. The comparison will happen somewhere else. I just had a meeting with ARM ART (Android RunTime) team and they requested similar comparison features for their benchmarks. They are willing to share code they're using. This includes DB for storing the benchmark results and some scripts that do comparison. So eventually we will get the build-to-build or branch-to-branch comparison. For the moment let's focus on collecting the benchmark data and making sure we store everything you need.
It seems LKP itself has result comparing tools. We can upload the raw data as a test run attachment to LAVA, and extract them for post processing with LKP scripts. It is worth at least testing these before re-inventing them.
From a kernel developer PoV, I just want to re-iterate that what's most
important for these performance/benchmark "tests" is not the specific values/results, but the comparison of results to other kernels and the trends over time. The trending could be done with LAVA image reports, but the comparison to other kernels will likely need to be some external post-processing tool.
Of course, the first phase is the simple pass/fail so we know the benchmarks actualy run (or why they didn't) but the more important phase is the ability to compare performance/benchmarks.
Kevin
On 06/16/2015 03:19 AM, Kevin Hilman wrote:
Riku Voipio riku.voipio@linaro.org writes:
On 11 June 2015 at 15:18, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:
On 11 June 2015 at 09:31, Alex Shi alex.shi@linaro.org wrote:
The following format would be better than current. | kernel 1| kernel 2| |benchmark 1| value x | value x2| |benchmark 2| value y | value y2|
Again, this isn't possible for single run as we're only running on a single kernel. Such feature requires raw data postprocessing ans most likely will not be a part of LAVA. LAVA will be used to run tests and store raw data. The comparison will happen somewhere else. I just had a meeting with ARM ART (Android RunTime) team and they requested similar comparison features for their benchmarks. They are willing to share code they're using. This includes DB for storing the benchmark results and some scripts that do comparison. So eventually we will get the build-to-build or branch-to-branch comparison. For the moment let's focus on collecting the benchmark data and making sure we store everything you need.
It seems LKP itself has result comparing tools. We can upload the raw data as a test run attachment to LAVA, and extract them for post processing with LKP scripts. It is worth at least testing these before re-inventing them.
From a kernel developer PoV, I just want to re-iterate that what's most important for these performance/benchmark "tests" is not the specific values/results, but the comparison of results to other kernels and the trends over time. The trending could be done with LAVA image reports, but the comparison to other kernels will likely need to be some external post-processing tool.
Of course, the first phase is the simple pass/fail so we know the benchmarks actualy run (or why they didn't) but the more important phase is the ability to compare performance/benchmarks.
Yes, definitely!
When the different performance result show on different kernel on a same specific board, we know the kernel has an performance issue. Then performance monitor tools, like vmstat/iostat, or the kernel profiling data will give some clues for problem location. Further more, if the testing infrastructure can locate the buggy commit via bisection, it would be perfect!
This script is a part of our LAVA integration and doesn't need to go to upstream LKP. What's missing (if I understand correctly) from LKP are these values:
Operation Count AvgLat MaxLat
NTCreateX 2011265 0.848 1478.406 Close 1477355 0.350 1165.450 Rename 85163 1.263 62.960 Unlink 406180 0.517 1287.522 Deltree 48 57.127 186.366 Mkdir 24 0.009 0.027 Qpathinfo 1823148 0.567 1445.759 Qfileinfo 319390 0.272 486.622 Qfsinfo 334240 0.421 1161.980 Sfileinfo 163808 0.558 993.785 Find 704767 0.874 1164.246 WriteX 1002240 0.032 9.801 ReadX 3152551 0.032 662.566 LockX 6550 0.011 0.727 UnlockX 6550 0.005 0.535 Flush 140954 0.613 53.237
If they are important for you, LKP needs to be patched to include them in the LKP results, not LAVA results. Our LAVA integration takes only what LKP produces.
Yes, they are sub-cases of dbench benchmark. They are meaningful as each of file-system operations. And they are the only meaningful benchmark results among mixed profiles. I mean that profile data are important, but normally no one care them unless the benchmark result changes.
[cut]
that isn't possible at this stage. LAVA structures the results this way.
The following format would be better than current. | kernel 1| kernel 2| |benchmark 1| value x | value x2| |benchmark 2| value y | value y2|
Again, this isn't possible for single run as we're only running on a single kernel. Such feature requires raw data postprocessing ans most likely will not be a part of LAVA. LAVA will be used to run tests and store raw data. The comparison will happen somewhere else. I just had a meeting with ARM ART (Android RunTime) team and they requested similar comparison features for their benchmarks. They are willing to share code they're using. This includes DB for storing the benchmark results and some scripts that do comparison. So eventually we will get the build-to-build or branch-to-branch comparison. For the moment let's focus on collecting the benchmark data and making sure we store everything you need.
It is possible to get this format from dashboard?
+1 on that. Let's try to identify the important data, store it and have ready for postprocessing. If LKP is missing something, we need to fix it upstream.
Thanks a lot! It would be very useful for performance tracking, especially for upstream kernel monitor on ARM boards!
On 11 June 2015 at 13:56, Alex Shi alex.shi@linaro.org wrote:
This script is a part of our LAVA integration and doesn't need to go to upstream LKP. What's missing (if I understand correctly) from LKP are these values:
Operation Count AvgLat MaxLat
NTCreateX 2011265 0.848 1478.406 Close 1477355 0.350 1165.450 Rename 85163 1.263 62.960 Unlink 406180 0.517 1287.522 Deltree 48 57.127 186.366 Mkdir 24 0.009 0.027 Qpathinfo 1823148 0.567 1445.759 Qfileinfo 319390 0.272 486.622 Qfsinfo 334240 0.421 1161.980 Sfileinfo 163808 0.558 993.785 Find 704767 0.874 1164.246 WriteX 1002240 0.032 9.801 ReadX 3152551 0.032 662.566 LockX 6550 0.011 0.727 UnlockX 6550 0.005 0.535 Flush 140954 0.613 53.237
If they are important for you, LKP needs to be patched to include them in the LKP results, not LAVA results. Our LAVA integration takes only what LKP produces.
Yes, they are sub-cases of dbench benchmark. They are meaningful as each of file-system operations. And they are the only meaningful benchmark results among mixed profiles. I mean that profile data are important, but normally no one care them unless the benchmark result changes.
that is clear now. I'll add a task to patch it in LKP.
[cut]
that isn't possible at this stage. LAVA structures the results this way.
The following format would be better than current. | kernel 1| kernel 2| |benchmark 1| value x | value x2| |benchmark 2| value y | value y2|
Again, this isn't possible for single run as we're only running on a single kernel. Such feature requires raw data postprocessing ans most likely will not be a part of LAVA. LAVA will be used to run tests and store raw data. The comparison will happen somewhere else. I just had a meeting with ARM ART (Android RunTime) team and they requested similar comparison features for their benchmarks. They are willing to share code they're using. This includes DB for storing the benchmark results and some scripts that do comparison. So eventually we will get the build-to-build or branch-to-branch comparison. For the moment let's focus on collecting the benchmark data and making sure we store everything you need.
It is possible to get this format from dashboard?
Not that I'm aware of, sorry.
milosz
+1 on that. Let's try to identify the important data, store it and have ready for postprocessing. If LKP is missing something, we need to fix it upstream.
Thanks a lot! It would be very useful for performance tracking, especially for upstream kernel monitor on ARM boards!
On 11 June 2015 at 06:55, Chase Qi chase.qi@linaro.org wrote:
The data from 'time' and 'perf' also will be saved by LKP. I think the 'avg.json' is the right file I should parse, it included metrics of the benchmark and time and perf. I added a 'LOOPS' parameter in test definition to support repeatedly run. If we run the test more then once, the data in avg.json will be the average of the runs. Here is a lava job example https://validation.linaro.org/scheduler/job/382401
Does the LOOPS parameter map to the iterations variable lkp uses? At least the ebizzy seems to use iterations setting to run itself 100x by default.
Riku
On 11 June 2015 at 15:45, Riku Voipio riku.voipio@linaro.org wrote:
On 11 June 2015 at 06:55, Chase Qi chase.qi@linaro.org wrote:
The data from 'time' and 'perf' also will be saved by LKP. I think the 'avg.json' is the right file I should parse, it included metrics of the benchmark and time and perf. I added a 'LOOPS' parameter in test definition to support repeatedly run. If we run the test more then once, the data in avg.json will be the average of the runs. Here is a lava job example https://validation.linaro.org/scheduler/job/382401
Does the LOOPS parameter map to the iterations variable lkp uses? At least the ebizzy seems to use iterations setting to run itself 100x by default.
Hi Riku,
No. It just run test multiple times. For example, if the 'LOOPS' variable set to 3, the current test-definition will run this command 'run-local *.yaml' 3 times, lkp will calculate the average of the metrics and saved them to 'avg.josn' automatically. As you mentioned, lkp might already considered this part, and some benchmarks also was designed to run itself multiple times by default. So in the beginning, I guess we should set the LOOPS to 1 (the default setting). If we found that test scores vary obviously for each run, we can update test plan and increase the LOOPS to see if it helps.
Here is the current test code https://git.linaro.org/people/chase.qi/test-definitions.git/blob/HEAD:/commo...
Thanks, Chase
Riku
linaro-kernel@lists.linaro.org