LAVA and Android Toolchain Benchmarks

List overview All Threads
Download

newer

older

[ACTIVITY] Android Platform Team,...

Lets start sending stuff to Gerrit

Andy Doan

18 Jan 2012 18 Jan '12

4:59 a.m.

Sorry for the wide distribution, but I wasn't sure who all would be interested.

I spent time over the last month updating the Android monthly toolchain benchmark process[1] to pull its benchmark data from LAVA tests that are stored in validation.linaro.org. Here's an example test run[2].

This month's results will be published to the wiki as I normally do. However, I spent some time last weekend looking at how to handle this on the validation server as well. I first toyed with trying to do a simple report plugin. However, it really didn't quite have everything I thought was needed.

I wound up using the "LAVA kernel CI views" project as a skeleton to create something for Android. I've got a local prototype that's starting to do just about everything I want (I'm fighting some issues the the javascript FLOT library for my charts). I'm attaching a screenshot so you can get a rough idea.

Before I really invest time, I wanted to get people's thoughts. Some big questions for me:

1) Is anyone against doing this? 2) The project is currently called "Android Benchmarks". However, I'm wondering if we should make more of a generic view for "Android". "Toolchain Benchmarks" could then be one part of this, but we'd have a spot to add other things if needed/wanted. Due to how projects are shown across the top of validation.l.o, I think we need pretty concise entries. 3) If the general thought is "this looks okay", then are you guys okay with targeting it for next month?

[1] https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-12 [2] <http://validation.linaro.org/lava-server/dashboard/streams/anonymous/doanac/...

...

Attachments:

lava-android.png (image/png — 88.2 KB)

Show replies by date

Zygmunt Krynicki

18 Jan 18 Jan

11:16 a.m.

Hi, looks nice :)

On Wed, Jan 18, 2012 at 5:59 AM, Andy Doan andy.doan@linaro.org wrote:

...

Sorry for the wide distribution, but I wasn't sure who all would be interested.

I spent time over the last month updating the Android monthly toolchain benchmark process[1] to pull its benchmark data from LAVA tests that are stored in validation.linaro.org. Here's an example test run[2].

This month's results will be published to the wiki as I normally do. However, I spent some time last weekend looking at how to handle this on the validation server as well. I first toyed with trying to do a simple report plugin. However, it really didn't quite have everything I thought was needed.

I wound up using the "LAVA kernel CI views" project as a skeleton to create something for Android. I've got a local prototype that's starting to do just about everything I want (I'm fighting some issues the the javascript FLOT library for my charts). I'm attaching a screenshot so you can get a rough idea.

Before I really invest time, I wanted to get people's thoughts. Some big questions for me:

Is anyone against doing this?

That's a question to TSC (regarding benchmark data)

...

The project is currently called "Android Benchmarks". However, I'm

wondering if we should make more of a generic view for "Android". "Toolchain Benchmarks" could then be one part of this, but we'd have a spot to add other things if needed/wanted. Due to how projects are shown across the top of validation.l.o, I think we need pretty concise entries.

Would you mind sharing the source code with the validation team? I want to see what models/views you have. Last weekend I created lp:~zkrynicki/+junk/lava-android. Perhaps it would make sense to meld both extensions and have one "android insight page"

As for the top-level menu it's going to change entirely (one day, when we have the time to rework that part) into a "Projects, Infrastructure, Me" menu. Most of the low-level pieces will be in the infrastructure menu. Projects will grow an ability to pull in pieces of infrastructure that are related to that project (say daily builds/tests/benchmarks) while the me-menu will have stuff the user is participating in/associated with/etc. So don't worry about the menu, we'll manage that part soon enough.

...

If the general thought is "this looks okay", then are you guys okay

with targeting it for next month?

[1] https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-12 [2] <http://validation.linaro.org/lava-server/dashboard/streams/anonymous/doanac/...

...

Alexander Sack

11:25 a.m.

On Wed, Jan 18, 2012 at 12:16 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:

...

Hi, looks nice :)

On Wed, Jan 18, 2012 at 5:59 AM, Andy Doan andy.doan@linaro.org wrote:

...
Sorry for the wide distribution, but I wasn't sure who all would be interested.

I spent time over the last month updating the Android monthly toolchain benchmark process[1] to pull its benchmark data from LAVA tests that are stored in validation.linaro.org. Here's an example test run[2].

This month's results will be published to the wiki as I normally do. However, I spent some time last weekend looking at how to handle this on the validation server as well. I first toyed with trying to do a simple report plugin. However, it really didn't quite have everything I thought was needed.

I wound up using the "LAVA kernel CI views" project as a skeleton to create something for Android. I've got a local prototype that's starting to do just about everything I want (I'm fighting some issues the the javascript FLOT library for my charts). I'm attaching a screenshot so you can get a rough idea.

Before I really invest time, I wanted to get people's thoughts. Some big questions for me:

Is anyone against doing this?

That's a question to TSC (regarding benchmark data)

The data policy doesn't block doing it. Worst case, it might block publishing such view unmodified to the public. But let's look at what we want to do first and then discuss the implications of the data policy.

-- Alexander Sack Technical Director, Linaro Platform Teams http://www.linaro.org | Open source software for ARM SoCs http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog

Andy Doan

4:07 p.m.

On 01/18/2012 05:25 AM, Alexander Sack wrote:

...

On Wed, Jan 18, 2012 at 12:16 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:

...
Hi, looks nice :)

On Wed, Jan 18, 2012 at 5:59 AM, Andy Doan andy.doan@linaro.org wrote:

...
Sorry for the wide distribution, but I wasn't sure who all would be interested.

I spent time over the last month updating the Android monthly toolchain benchmark process[1] to pull its benchmark data from LAVA tests that are stored in validation.linaro.org. Here's an example test run[2].

This month's results will be published to the wiki as I normally do. However, I spent some time last weekend looking at how to handle this on the validation server as well. I first toyed with trying to do a simple report plugin. However, it really didn't quite have everything I thought was needed.

I wound up using the "LAVA kernel CI views" project as a skeleton to create something for Android. I've got a local prototype that's starting to do just about everything I want (I'm fighting some issues the the javascript FLOT library for my charts). I'm attaching a screenshot so you can get a rough idea.

Before I really invest time, I wanted to get people's thoughts. Some big questions for me:

Is anyone against doing this?

That's a question to TSC (regarding benchmark data)

The data policy doesn't block doing it. Worst case, it might block publishing such view unmodified to the public. But let's look at what we want to do first and then discuss the implications of the data policy.

There might be another way to look at the code I'm doing. While it has a very specific title right now "Android Toolchain Benchmark Report". Its actually quite generic and might be useful for other things. In essence, it does two things:

1) Collate measurements from multiple tests. Reduce these down to one set of data with average measurements. I also include standard deviation in there, so you can get an idea of the quality of the data.

2) Take multiple "combined results" and compare them with each other.

Just something to keep in mind if there's ever a desire to do something like this elsewhere.

-andy

Zygmunt Krynicki

4:21 p.m.

On Wed, Jan 18, 2012 at 5:07 PM, Andy Doan andy.doan@linaro.org wrote:

...

On 01/18/2012 05:25 AM, Alexander Sack wrote:

...
On Wed, Jan 18, 2012 at 12:16 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:

...
Hi, looks nice :)

On Wed, Jan 18, 2012 at 5:59 AM, Andy Doan andy.doan@linaro.org wrote:

...
Sorry for the wide distribution, but I wasn't sure who all would be interested.

I spent time over the last month updating the Android monthly toolchain benchmark process[1] to pull its benchmark data from LAVA tests that are stored in validation.linaro.org. Here's an example test run[2].

This month's results will be published to the wiki as I normally do. However, I spent some time last weekend looking at how to handle this on the validation server as well. I first toyed with trying to do a simple report plugin. However, it really didn't quite have everything I thought was needed.

I wound up using the "LAVA kernel CI views" project as a skeleton to create something for Android. I've got a local prototype that's starting to do just about everything I want (I'm fighting some issues the the javascript FLOT library for my charts). I'm attaching a screenshot so you can get a rough idea.

Before I really invest time, I wanted to get people's thoughts. Some big questions for me:

Is anyone against doing this?

That's a question to TSC (regarding benchmark data)

The data policy doesn't block doing it. Worst case, it might block publishing such view unmodified to the public. But let's look at what we want to do first and then discuss the implications of the data policy.

There might be another way to look at the code I'm doing. While it has a very specific title right now "Android Toolchain Benchmark Report". Its actually quite generic and might be useful for other things. In essence, it does two things:

1) Collate measurements from multiple tests. Reduce these down to one set of data with average measurements. I also include standard deviation in there, so you can get an idea of the quality of the data.

2) Take multiple "combined results" and compare them with each other.

Excellent. I think we should keep evolving this (and keep it separate from lava-android I did). Again, could you share the code?

Andy Doan

4:44 p.m.

On 01/18/2012 10:21 AM, Zygmunt Krynicki wrote:

...

Again, could you share the code?

Okay, here's a drop:

https://code.launchpad.net/~doanac/+junk/lava_android_benchmark_views

The commit message details some of the problems the code currently has.

Zygmunt Krynicki

5:24 p.m.

On Wed, Jan 18, 2012 at 5:44 PM, Andy Doan andy.doan@linaro.org wrote:

...

On 01/18/2012 10:21 AM, Zygmunt Krynicki wrote:

...
Again, could you share the code?

Okay, here's a drop:

https://code.launchpad.net/~doanac/+junk/lava_android_benchmark_views

The commit message details some of the problems the code currently has.

Looking at it now

Some quick comments:

1) We should strive to use the database to compute averages/stdev 2) We should materialize such changes over time to keep performance sensible (scanning everything is going to be too slow) 3) Associating BenchmarkRun with Bundle is odd, why not with TestRun? 4) In benchmark_run.get_summary(jso) you iterate over deserialized bundle. You may want to iterate over the database models instead, computing sanitized_bundle() is pricy and it was just a hack for the dashboard (I should have made it private)

I'm still unsure how the computation model work ATM, I'll post again.

Thanks ZK

Andy Doan

11:39 p.m.

On 01/18/2012 11:24 AM, Zygmunt Krynicki wrote:

...

On Wed, Jan 18, 2012 at 5:44 PM, Andy Doan andy.doan@linaro.org wrote:

...
On 01/18/2012 10:21 AM, Zygmunt Krynicki wrote:

...
Again, could you share the code?

Okay, here's a drop:

https://code.launchpad.net/~doanac/+junk/lava_android_benchmark_views

The commit message details some of the problems the code currently has.

Looking at it now

Some quick comments:

We should strive to use the database to compute averages/stdev

We should materialize such changes over time to keep performance

sensible (scanning everything is going to be too slow)

This was just a proof-of-concept, so things are sloppy. Still, I think you are right. It will probably make building up representations like you noted in comment 4 easier.

I haven't fully read into the dashboard_app code. But it seems to have a concept of storing the bundle and then "analyzing it" which I think moves it to a database representation. That type of concept might work well for this.

I think there might even be a chance to simply make the data a "super bundle" ie combine the bundles into one. With the "measurement" fields just being the averages. The only other thing I'd want to add to each measurement is the standard deviation - not sure if that can fit in the existing data model or not (or if I'm over-thinking this)

...

Associating BenchmarkRun with Bundle is odd, why not with TestRun?

A BenchmarkRun contains multiple test runs. For example it may have 4 skia, 4 v8, and 4 0xbench Test Runs. We can then take the averages of each of type to get better average results. So a BenchmarkRun can be viewed as one job that was submitted to the validation to run against a given android build. Then we can compare say a linaro-gcc-4.6 run with an Android-gcc-4.4 run.

...

In benchmark_run.get_summary(jso) you iterate over deserialized

bundle. You may want to iterate over the database models instead, computing sanitized_bundle() is pricy and it was just a hack for the dashboard (I should have made it private)

Sure - the original code I had written before this django app was based on pulling streams from the validation server so it was json based.

...

I'm still unsure how the computation model work ATM, I'll post again.

Sure - the code needs some cleaning up and optimizations. Thanks for your input.llllll

Paul Larson

19 Jan 19 Jan

12:01 a.m.

This is great Andy! When do you think it will be ready for release and deployment on v.l.o.?

-Paul Larson

Andy Doan

3:43 p.m.

On 01/18/2012 06:01 PM, Paul Larson wrote:

...

This is great Andy! When do you think it will be ready for release and deployment on v.l.o.?

-Paul Larson

I'm hoping to have the code ready at Connect.

Andy Doan

18 Jan 18 Jan

3:56 p.m.

On 01/18/2012 05:16 AM, Zygmunt Krynicki wrote:

...

...

The project is currently called "Android Benchmarks". However, I'm

wondering if we should make more of a generic view for "Android". "Toolchain Benchmarks" could then be one part of this, but we'd have a spot to add other things if needed/wanted. Due to how projects are shown across the top of validation.l.o, I think we need pretty concise entries.

Would you mind sharing the source code with the validation team? I want to see what models/views you have. Last weekend I created lp:~zkrynicki/+junk/lava-android. Perhaps it would make sense to meld both extensions and have one "android insight page"

Sure. The code is a bit sloppy still, but I'll clean it a little more and either share it on its own or in a format that can be merged with your project.

...

As for the top-level menu it's going to change entirely (one day, when we have the time to rework that part) into a "Projects, Infrastructure, Me" menu. Most of the low-level pieces will be in the infrastructure menu. Projects will grow an ability to pull in pieces of infrastructure that are related to that project (say daily builds/tests/benchmarks) while the me-menu will have stuff the user is participating in/associated with/etc. So don't worry about the menu, we'll manage that part soon enough.

cool!

Alexander Sack

11:19 a.m.

On Wed, Jan 18, 2012 at 5:59 AM, Andy Doan andy.doan@linaro.org wrote:

...

Sorry for the wide distribution, but I wasn't sure who all would be interested.

I spent time over the last month updating the Android monthly toolchain benchmark process[1] to pull its benchmark data from LAVA tests that are stored in validation.linaro.org. Here's an example test run[2].

Looks like an awesome pitch. So far you are using lava to store your local results and then render them in the dashboard, right?

...

I wound up using the "LAVA kernel CI views" project as a skeleton to create something for Android. I've got a local prototype that's starting to do just about everything I want (I'm fighting some issues the the javascript FLOT library for my charts). I'm attaching a screenshot so you can get a rough idea.

Before I really invest time, I wanted to get people's thoughts. Some big questions for me:

Is anyone against doing this?

No, that's great and having a great report for such things is exactly what we are looking for. Of course, later with automated builds for toolchain tip branches and lava runs to validate and get benchmark results.

...

The project is currently called "Android Benchmarks". However, I'm

wondering if we should make more of a generic view for "Android".

IIRC, organizing the dashboard around teams and boards and/or products/baselines was discussed at connect and makes sense imo.

My take though is that a team dashboard is basically putting multiple views on a single page, so starting with one detailed view (toolchain benchmarking) sounds fine.

...

"Toolchain Benchmarks" could then be one part of this, but we'd have a spot to add other things if needed/wanted. Due to how projects are shown across the top of validation.l.o, I think we need pretty concise entries.

I agree. For navigation hierarchy it makes sense to think about what areas would have such view. I would think this view should be accessible from an Android, Toolchain WG and Benchmarking area. (or only one of those to start).

...

If the general thought is "this looks okay", then are you guys okay

with targeting it for next month?

I have no direct say on that, but I am certainly happy if blueprints for such cool views get scheduled. So +1 from me.

-- Alexander Sack Technical Director, Linaro Platform Teams http://www.linaro.org | Open source software for ARM SoCs http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog

Andy Doan

3:59 p.m.

On 01/18/2012 05:19 AM, Alexander Sack wrote:

...

On Wed, Jan 18, 2012 at 5:59 AM, Andy Doan andy.doan@linaro.org wrote:

...
Sorry for the wide distribution, but I wasn't sure who all would be interested.

I spent time over the last month updating the Android monthly toolchain benchmark process[1] to pull its benchmark data from LAVA tests that are stored in validation.linaro.org. Here's an example test run[2].

Looks like an awesome pitch. So far you are using lava to store your local results and then render them in the dashboard, right?

Yes. And once we get the lava-android adb sleep issue fixed in the validation lab, you can theoretically run it all there.

...

I have no direct say on that, but I am certainly happy if blueprints for such cool views get scheduled. So +1 from me.

There was already a blueprint in place I should have mentioned. So I think we are in good shape.

Thanks guys!

Zach Pfeffer

11:49 p.m.

Dude this totally rocks. Also adding linaro-dev.

On 17 January 2012 22:59, Andy Doan andy.doan@linaro.org wrote:

...

Sorry for the wide distribution, but I wasn't sure who all would be interested.

I spent time over the last month updating the Android monthly toolchain benchmark process[1] to pull its benchmark data from LAVA tests that are stored in validation.linaro.org. Here's an example test run[2].

This month's results will be published to the wiki as I normally do. However, I spent some time last weekend looking at how to handle this on the validation server as well. I first toyed with trying to do a simple report plugin. However, it really didn't quite have everything I thought was needed.

I wound up using the "LAVA kernel CI views" project as a skeleton to create something for Android. I've got a local prototype that's starting to do just about everything I want (I'm fighting some issues the the javascript FLOT library for my charts). I'm attaching a screenshot so you can get a rough idea.

Before I really invest time, I wanted to get people's thoughts. Some big questions for me:

Is anyone against doing this?

The project is currently called "Android Benchmarks". However, I'm

wondering if we should make more of a generic view for "Android". "Toolchain Benchmarks" could then be one part of this, but we'd have a spot to add other things if needed/wanted. Due to how projects are shown across the top of validation.l.o, I think we need pretty concise entries. 3) If the general thought is "this looks okay", then are you guys okay with targeting it for next month?

[1] https://wiki.linaro.org/Platform/Android/AndroidToolchainBenchmarking/2011-12 [2] <http://validation.linaro.org/lava-server/dashboard/streams/anonymous/doanac/...

...

linaro-android mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android

-- Zach Pfeffer Android Platform Team Lead, Linaro Platform Teams Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog

Michael Hudson-Doyle

19 Jan 19 Jan

2:56 a.m.

On Tue, 17 Jan 2012 22:59:05 -0600, Andy Doan andy.doan@linaro.org wrote:

...

Sorry for the wide distribution, but I wasn't sure who all would be interested.

Wow, that looks pretty awesome. I don't really have anything to add over what the others said, but I did want to ask if you've seen http://speed.pypy.org/ ? It has some really good ways of displaying and tracking performance changes.

Cheers, mwh

4965

days inactive

4966

days old

linaro-android@lists.linaro.org

14 comments

participants

tags (0)

participants (6)

Alexander Sack
Andy Doan
Michael Hudson-Doyle
Paul Larson
Zach Pfeffer
Zygmunt Krynicki