Re: [Squad-dev] SQUAD data retention policy

13 Nov 2018

      On Tue, 13 Nov 2018 at 19:35, Dan Rue dan.rue@linaro.org wrote:
...
On Tue, Nov 13, 2018 at 11:42:47AM +0000, Milosz Wasilewski wrote:
...
On Mon, 15 Oct 2018 at 22:04, Antonio Terceiro
antonio.terceiro@linaro.org wrote:
...
On Thu, Oct 11, 2018 at 11:09:01AM -0500, Dan Rue wrote:
...
On Thu, Oct 11, 2018 at 11:46:14AM -0300, Antonio Terceiro wrote:
[...]
...
...
Thanks for everyone's feedback. So my plan to address your concerns is
the following:

allow the cleanup to be disabled completely by setting the number of
days to -1. This can be used as a simple solution for projects that
have new builds at a low frequency.

add a `keep forever` flag to build objects, that can be set using the
REST API. This way each project can mark exactly which builds they
want to keep, without needing to hardcode any specific logic in squad.

For LKFT we're still discussing what our policy should be for data
retention. We link into qa-reports and lava in bugs and public mailing
lists posts, and so our task is to assess the impact of such links
breaking over time.
Logistically, will there be time to set this value after this feature is
released, but before the first purge occurs, so that we will not lose
data as a part of the deployment? We will surely want more than 180
days, but we're trying to decide if 365 is sufficient.
Good point. I could set all existing projects to -1 to give some time
for a decision, or we could wait until we have a decision from most
projects before deploying this in production.
Patch was merged and is now deployed in staging. I setup a testing project:
https://staging-qa-reports.linaro.org/people/data-retention-test/
The data should be deleted after 1 day. This brings an interesting
challenge - what should happen to calculated regressions and fixes?
Should we re-calculate them when removing baselines? Keep them as they
are? Maybe the baseline 'version' should be stored so we know how the
calculation was made?
How does it behave today? I prefer the calculate and cache approach,
but, I don't know how to deal with references going away. I've been
trying to figure out how to delete known issues, and I'm hitting the
same problem. For example, we have some tests that have changed names; I
added a new known issue with the new name, but if I delete the old one
then all of the old results will start showing failures where previously
it showed xfail.
I think in this case just mark it inactive. The object will still be
there so xfails stay, but it will not show in the UI.
...
Instead, in general, it'd be nice if everything were calculated and
stored. It might solve some of the performance issues, too.
But, as I write this I talk myself out of it. Having everything
real-time is simpler, uses more cpu but less storage. If we do go with
calculate and cache, then we'll invariably need to implement
're-calculate' so that calculations can be updated when dependent
references are changed.
That's the problem I have. The regressions and fixes are cached right
now. However when the builds start disappearing caches will become
invalid. In best case scenario this will only affect last undeleted
build. However if there are builds marked to 'keep forever' we'll have
very inconsistent data.
milosz
...
Dan

2025

2024

2023

2022

2021

2020

2019

2018

Re: [Squad-dev] SQUAD data retention policy