SQUAD data retention policy

List overview All Threads
Download

newer

older

caching /api/builds/<id>/email for...

roll out new release on Monday?

Milosz Wasilewski

11 Oct 2018 11 Oct '18

9:34 a.m.

Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

The PR in question is here: https://github.com/Linaro/squad/pull/370

milosz

Show replies by date

Ryan Harkin

11 Oct 11 Oct

10:24 a.m.

On Thu, 11 Oct 2018 at 10:35, Milosz Wasilewski < milosz.wasilewski@linaro.org> wrote:

...

Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

I don't need to keep my results forever. But what would be nice is to be able to keep 180 days OR n sets of data.

This way, release jobs, that only gets run once per month, for example, could keep a history of the project over a longer period of time.

...

The PR in question is here: https://github.com/Linaro/squad/pull/370

milosz

Martin Stadler

1:12 p.m.

On Thu, 11 Oct 2018 at 11:24, Ryan Harkin ryan.harkin@linaro.org wrote:

...

On Thu, 11 Oct 2018 at 10:35, Milosz Wasilewski < milosz.wasilewski@linaro.org> wrote:

...
Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

I don't need to keep my results forever. But what would be nice is to be able to keep 180 days OR n sets of data.

Similar item here, what I do need is to keep iterave data sets from each releae of the ERP. This doesn't mean that we need all test runs, just the final tests, for each server platfrom, for each release of the ERP.

I understand that this will make things more complex, but that said, this is a complex issue and just having X days worth of data is not going to be useful, as I am sure you can understand.

...

This way, release jobs, that only gets run once per month, for example, could keep a history of the project over a longer period of time.

Also, what is the driver here, is this just a storage issue, or are you collecting so much data that you are impacting perf on squad it self?

Martin

...

...
The PR in question is here: https://github.com/Linaro/squad/pull/370

milosz

-- *Martin Stadler *| Sr. Director Datacenter and Cloud martin@linaro.org +44-07492180779 book a meeting linaro.co/martin-stadler-30min

Milosz Wasilewski

1:19 p.m.

On Thu, 11 Oct 2018 at 14:13, Martin Stadler martin.stadler@linaro.org wrote:

...

On Thu, 11 Oct 2018 at 11:24, Ryan Harkin ryan.harkin@linaro.org wrote:

...
On Thu, 11 Oct 2018 at 10:35, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...
Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

I don't need to keep my results forever. But what would be nice is to be able to keep 180 days OR n sets of data.

Similar item here, what I do need is to keep iterave data sets from each releae of the ERP. This doesn't mean that we need all test runs, just the final tests, for each server platfrom, for each release of the ERP.

I understand that this will make things more complex, but that said, this is a complex issue and just having X days worth of data is not going to be useful, as I am sure you can understand.

...
This way, release jobs, that only gets run once per month, for example, could keep a history of the project over a longer period of time.

+1

Also, what is the driver here, is this just a storage issue, or are you collecting so much data that you are impacting perf on squad it self?

It's only the storage at the moment. Amount of data might cause performance hits, but so far most of the performance problems are caused by our code not by the amount of data. Doing things 'smarter' solves such performance issues.

milosz

...

Martin

...
...
The PR in question is here: https://github.com/Linaro/squad/pull/370

milosz

--

Martin Stadler | Sr. Director Datacenter and Cloud martin@linaro.org +44-07492180779 book a meeting linaro.co/martin-stadler-30min

Vincent Guittot

1:39 p.m.

On Thu, 11 Oct 2018 at 15:13, Martin Stadler martin.stadler@linaro.org wrote:

...

On Thu, 11 Oct 2018 at 11:24, Ryan Harkin ryan.harkin@linaro.org wrote:

...
On Thu, 11 Oct 2018 at 10:35, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...
Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

I don't need to keep my results forever. But what would be nice is to be able to keep 180 days OR n sets of data.

Similar item here, what I do need is to keep iterave data sets from each releae of the ERP. This doesn't mean that we need all test runs, just the final tests, for each server platfrom, for each release of the ERP.

I understand that this will make things more complex, but that said, this is a complex issue and just having X days worth of data is not going to be useful, as I am sure you can understand.

+1 for the being able to set a size.

Even if we do't want to save results forever, 180 days seems to be far too short. In our PMWG case, It seems reasonable to keep data as long as the tested kernel is maintained so we can make comparison across the whole life. We are more in the range of 2 years and even 6 years as announced for v4.4 at last connect But we probably don't want to keep all results but only a subset of them

...

...
This way, release jobs, that only gets run once per month, for example, could keep a history of the project over a longer period of time.

+1

Also, what is the driver here, is this just a storage issue, or are you collecting so much data that you are impacting perf on squad it self?

Martin

...
...
The PR in question is here: https://github.com/Linaro/squad/pull/370

milosz

--

Martin Stadler | Sr. Director Datacenter and Cloud martin@linaro.org +44-07492180779 book a meeting linaro.co/martin-stadler-30min

Antonio Terceiro

2:46 p.m.

Hello,

On Thu, Oct 11, 2018 at 03:39:53PM +0200, Vincent Guittot wrote:

...

On Thu, 11 Oct 2018 at 15:13, Martin Stadler martin.stadler@linaro.org wrote:

...
On Thu, 11 Oct 2018 at 11:24, Ryan Harkin ryan.harkin@linaro.org wrote:

...
On Thu, 11 Oct 2018 at 10:35, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...
Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

I don't need to keep my results forever. But what would be nice is to be able to keep 180 days OR n sets of data.

Similar item here, what I do need is to keep iterave data sets from each releae of the ERP. This doesn't mean that we need all test runs, just the final tests, for each server platfrom, for each release of the ERP.

I understand that this will make things more complex, but that said, this is a complex issue and just having X days worth of data is not going to be useful, as I am sure you can understand.

+1 for the being able to set a size.

Even if we do't want to save results forever, 180 days seems to be far too short. In our PMWG case, It seems reasonable to keep data as long as the tested kernel is maintained so we can make comparison across the whole life. We are more in the range of 2 years and even 6 years as announced for v4.4 at last connect But we probably don't want to keep all results but only a subset of them

Thanks for everyone's feedback. So my plan to address your concerns is the following:

- allow the cleanup to be disabled completely by setting the number of days to -1. This can be used as a simple solution for projects that have new builds at a low frequency.

- add a `keep forever` flag to build objects, that can be set using the REST API. This way each project can mark exactly which builds they want to keep, without needing to hardcode any specific logic in squad.

Thoughts?

Milosz Wasilewski

2:53 p.m.

On Thu, 11 Oct 2018 at 15:46, Antonio Terceiro antonio.terceiro@linaro.org wrote:

...

Hello,

On Thu, Oct 11, 2018 at 03:39:53PM +0200, Vincent Guittot wrote:

...
On Thu, 11 Oct 2018 at 15:13, Martin Stadler martin.stadler@linaro.org wrote:

...
On Thu, 11 Oct 2018 at 11:24, Ryan Harkin ryan.harkin@linaro.org wrote:

...
On Thu, 11 Oct 2018 at 10:35, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...
Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

I don't need to keep my results forever. But what would be nice is to be able to keep 180 days OR n sets of data.

Similar item here, what I do need is to keep iterave data sets from each releae of the ERP. This doesn't mean that we need all test runs, just the final tests, for each server platfrom, for each release of the ERP.

I understand that this will make things more complex, but that said, this is a complex issue and just having X days worth of data is not going to be useful, as I am sure you can understand.

+1 for the being able to set a size.

Even if we do't want to save results forever, 180 days seems to be far too short. In our PMWG case, It seems reasonable to keep data as long as the tested kernel is maintained so we can make comparison across the whole life. We are more in the range of 2 years and even 6 years as announced for v4.4 at last connect But we probably don't want to keep all results but only a subset of them

Thanks for everyone's feedback. So my plan to address your concerns is the following:

allow the cleanup to be disabled completely by setting the number of days to -1. This can be used as a simple solution for projects that have new builds at a low frequency.

add a `keep forever` flag to build objects, that can be set using the REST API. This way each project can mark exactly which builds they want to keep, without needing to hardcode any specific logic in squad.

sounds good to me.

milosz

...

Thoughts? _______________________________________________ Squad-dev mailing list Squad-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/squad-dev

Dan Rue

4:09 p.m.

On Thu, Oct 11, 2018 at 11:46:14AM -0300, Antonio Terceiro wrote:

...

Hello,

On Thu, Oct 11, 2018 at 03:39:53PM +0200, Vincent Guittot wrote:

...
On Thu, 11 Oct 2018 at 15:13, Martin Stadler martin.stadler@linaro.org wrote:

...
On Thu, 11 Oct 2018 at 11:24, Ryan Harkin ryan.harkin@linaro.org wrote:

...
On Thu, 11 Oct 2018 at 10:35, Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...
Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

I don't need to keep my results forever. But what would be nice is to be able to keep 180 days OR n sets of data.

Similar item here, what I do need is to keep iterave data sets from each releae of the ERP. This doesn't mean that we need all test runs, just the final tests, for each server platfrom, for each release of the ERP.

I understand that this will make things more complex, but that said, this is a complex issue and just having X days worth of data is not going to be useful, as I am sure you can understand.

+1 for the being able to set a size.

Even if we do't want to save results forever, 180 days seems to be far too short. In our PMWG case, It seems reasonable to keep data as long as the tested kernel is maintained so we can make comparison across the whole life. We are more in the range of 2 years and even 6 years as announced for v4.4 at last connect But we probably don't want to keep all results but only a subset of them

Thanks for everyone's feedback. So my plan to address your concerns is the following:

allow the cleanup to be disabled completely by setting the number of days to -1. This can be used as a simple solution for projects that have new builds at a low frequency.

add a `keep forever` flag to build objects, that can be set using the REST API. This way each project can mark exactly which builds they want to keep, without needing to hardcode any specific logic in squad.

For LKFT we're still discussing what our policy should be for data retention. We link into qa-reports and lava in bugs and public mailing lists posts, and so our task is to assess the impact of such links breaking over time.

Logistically, will there be time to set this value after this feature is released, but before the first purge occurs, so that we will not lose data as a part of the deployment? We will surely want more than 180 days, but we're trying to decide if 365 is sufficient.

Dan

Antonio Terceiro

15 Oct 15 Oct

9:04 p.m.

On Thu, Oct 11, 2018 at 11:09:01AM -0500, Dan Rue wrote:

...

On Thu, Oct 11, 2018 at 11:46:14AM -0300, Antonio Terceiro wrote:

[...]

...

...
Thanks for everyone's feedback. So my plan to address your concerns is the following:

allow the cleanup to be disabled completely by setting the number of days to -1. This can be used as a simple solution for projects that have new builds at a low frequency.

add a `keep forever` flag to build objects, that can be set using the REST API. This way each project can mark exactly which builds they want to keep, without needing to hardcode any specific logic in squad.

For LKFT we're still discussing what our policy should be for data retention. We link into qa-reports and lava in bugs and public mailing lists posts, and so our task is to assess the impact of such links breaking over time.

Logistically, will there be time to set this value after this feature is released, but before the first purge occurs, so that we will not lose data as a part of the deployment? We will surely want more than 180 days, but we're trying to decide if 365 is sufficient.

Good point. I could set all existing projects to -1 to give some time for a decision, or we could wait until we have a decision from most projects before deploying this in production.

Milosz Wasilewski

13 Nov 13 Nov

11:42 a.m.

On Mon, 15 Oct 2018 at 22:04, Antonio Terceiro antonio.terceiro@linaro.org wrote:

...

On Thu, Oct 11, 2018 at 11:09:01AM -0500, Dan Rue wrote:

...
On Thu, Oct 11, 2018 at 11:46:14AM -0300, Antonio Terceiro wrote:

[...]

...
...
Thanks for everyone's feedback. So my plan to address your concerns is the following:

allow the cleanup to be disabled completely by setting the number of days to -1. This can be used as a simple solution for projects that have new builds at a low frequency.

add a `keep forever` flag to build objects, that can be set using the REST API. This way each project can mark exactly which builds they want to keep, without needing to hardcode any specific logic in squad.

For LKFT we're still discussing what our policy should be for data retention. We link into qa-reports and lava in bugs and public mailing lists posts, and so our task is to assess the impact of such links breaking over time.

Logistically, will there be time to set this value after this feature is released, but before the first purge occurs, so that we will not lose data as a part of the deployment? We will surely want more than 180 days, but we're trying to decide if 365 is sufficient.

Good point. I could set all existing projects to -1 to give some time for a decision, or we could wait until we have a decision from most projects before deploying this in production.

Patch was merged and is now deployed in staging. I setup a testing project: https://staging-qa-reports.linaro.org/people/data-retention-test/ The data should be deleted after 1 day. This brings an interesting challenge - what should happen to calculated regressions and fixes? Should we re-calculate them when removing baselines? Keep them as they are? Maybe the baseline 'version' should be stored so we know how the calculation was made?

milosz

...

Squad-dev mailing list Squad-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/squad-dev

Dan Rue

7:34 p.m.

On Tue, Nov 13, 2018 at 11:42:47AM +0000, Milosz Wasilewski wrote:

...

On Mon, 15 Oct 2018 at 22:04, Antonio Terceiro antonio.terceiro@linaro.org wrote:

...
On Thu, Oct 11, 2018 at 11:09:01AM -0500, Dan Rue wrote:

...
On Thu, Oct 11, 2018 at 11:46:14AM -0300, Antonio Terceiro wrote:

[...]

...
...
Thanks for everyone's feedback. So my plan to address your concerns is the following:

allow the cleanup to be disabled completely by setting the number of days to -1. This can be used as a simple solution for projects that have new builds at a low frequency.

add a `keep forever` flag to build objects, that can be set using the REST API. This way each project can mark exactly which builds they want to keep, without needing to hardcode any specific logic in squad.

For LKFT we're still discussing what our policy should be for data retention. We link into qa-reports and lava in bugs and public mailing lists posts, and so our task is to assess the impact of such links breaking over time.

Logistically, will there be time to set this value after this feature is released, but before the first purge occurs, so that we will not lose data as a part of the deployment? We will surely want more than 180 days, but we're trying to decide if 365 is sufficient.

Good point. I could set all existing projects to -1 to give some time for a decision, or we could wait until we have a decision from most projects before deploying this in production.

Patch was merged and is now deployed in staging. I setup a testing project: https://staging-qa-reports.linaro.org/people/data-retention-test/ The data should be deleted after 1 day. This brings an interesting challenge - what should happen to calculated regressions and fixes? Should we re-calculate them when removing baselines? Keep them as they are? Maybe the baseline 'version' should be stored so we know how the calculation was made?

How does it behave today? I prefer the calculate and cache approach, but, I don't know how to deal with references going away. I've been trying to figure out how to delete known issues, and I'm hitting the same problem. For example, we have some tests that have changed names; I added a new known issue with the new name, but if I delete the old one then all of the old results will start showing failures where previously it showed xfail.

Instead, in general, it'd be nice if everything were calculated and stored. It might solve some of the performance issues, too.

But, as I write this I talk myself out of it. Having everything real-time is simpler, uses more cpu but less storage. If we do go with calculate and cache, then we'll invariably need to implement 're-calculate' so that calculations can be updated when dependent references are changed.

Dan

Milosz Wasilewski

7:46 p.m.

On Tue, 13 Nov 2018 at 19:35, Dan Rue dan.rue@linaro.org wrote:

...

On Tue, Nov 13, 2018 at 11:42:47AM +0000, Milosz Wasilewski wrote:

...
On Mon, 15 Oct 2018 at 22:04, Antonio Terceiro antonio.terceiro@linaro.org wrote:

...
On Thu, Oct 11, 2018 at 11:09:01AM -0500, Dan Rue wrote:

...
On Thu, Oct 11, 2018 at 11:46:14AM -0300, Antonio Terceiro wrote:

[...]

...
...
Thanks for everyone's feedback. So my plan to address your concerns is the following:

allow the cleanup to be disabled completely by setting the number of days to -1. This can be used as a simple solution for projects that have new builds at a low frequency.

add a `keep forever` flag to build objects, that can be set using the REST API. This way each project can mark exactly which builds they want to keep, without needing to hardcode any specific logic in squad.

For LKFT we're still discussing what our policy should be for data retention. We link into qa-reports and lava in bugs and public mailing lists posts, and so our task is to assess the impact of such links breaking over time.

Logistically, will there be time to set this value after this feature is released, but before the first purge occurs, so that we will not lose data as a part of the deployment? We will surely want more than 180 days, but we're trying to decide if 365 is sufficient.

Good point. I could set all existing projects to -1 to give some time for a decision, or we could wait until we have a decision from most projects before deploying this in production.

Patch was merged and is now deployed in staging. I setup a testing project: https://staging-qa-reports.linaro.org/people/data-retention-test/ The data should be deleted after 1 day. This brings an interesting challenge - what should happen to calculated regressions and fixes? Should we re-calculate them when removing baselines? Keep them as they are? Maybe the baseline 'version' should be stored so we know how the calculation was made?

How does it behave today? I prefer the calculate and cache approach, but, I don't know how to deal with references going away. I've been trying to figure out how to delete known issues, and I'm hitting the same problem. For example, we have some tests that have changed names; I added a new known issue with the new name, but if I delete the old one then all of the old results will start showing failures where previously it showed xfail.

I think in this case just mark it inactive. The object will still be there so xfails stay, but it will not show in the UI.

...

Instead, in general, it'd be nice if everything were calculated and stored. It might solve some of the performance issues, too.

But, as I write this I talk myself out of it. Having everything real-time is simpler, uses more cpu but less storage. If we do go with calculate and cache, then we'll invariably need to implement 're-calculate' so that calculations can be updated when dependent references are changed.

That's the problem I have. The regressions and fixes are cached right now. However when the builds start disappearing caches will become invalid. In best case scenario this will only affect last undeleted build. However if there are builds marked to 'keep forever' we'll have very inconsistent data.

milosz

...

Dan

Tom Gall

11 Oct 11 Oct

1:33 p.m.

On Thu, Oct 11, 2018 at 4:35 AM Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...

Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

Out of curiosity, any ideas what kernelci might do in the future?

...

The PR in question is here: https://github.com/Linaro/squad/pull/370

milosz

-- Regards, Tom Director, Linaro Consumer Group, Multimedia Working Group Linaro.org │ Open source software for ARM SoCs irc: tgall_foo | skype : tom_gall "Where's the kaboom!? There was supposed to be an earth-shattering kaboom!" Marvin Martian

Milosz Wasilewski

1:45 p.m.

On Thu, 11 Oct 2018 at 14:33, Tom Gall tom.gall@linaro.org wrote:

...

On Thu, Oct 11, 2018 at 4:35 AM Milosz Wasilewski milosz.wasilewski@linaro.org wrote:

...
Hi,

Antonio proposed a patch that sets default data retention policy to 180 days. It can be changed for each project in qa-reports separately (extended or shortened). We don't have an option to keep the data forever. IMHO this is a good idea, but I wanted to ask whether you have a need to keep your results forever.

Out of curiosity, any ideas what kernelci might do in the future?

No idea. IIRC they only keep a couple weeks in storage.kerleci.org, I'm not sure about MongoDB.

milosz

...

...
The PR in question is here: https://github.com/Linaro/squad/pull/370

milosz

-- Regards, Tom

Director, Linaro Consumer Group, Multimedia Working Group Linaro.org │ Open source software for ARM SoCs irc: tgall_foo | skype : tom_gall

"Where's the kaboom!? There was supposed to be an earth-shattering kaboom!" Marvin Martian

2436

days inactive

2469

days old

squad-dev@lists.linaro.org

13 comments

participants

tags (0)

participants (7)

Antonio Terceiro
Dan Rue
Martin Stadler
Milosz Wasilewski
Ryan Harkin
Tom Gall
Vincent Guittot