On Tue, 13 Nov 2018 at 19:35, Dan Rue dan.rue@linaro.org wrote:
On Tue, Nov 13, 2018 at 11:42:47AM +0000, Milosz Wasilewski wrote:
On Mon, 15 Oct 2018 at 22:04, Antonio Terceiro antonio.terceiro@linaro.org wrote:
On Thu, Oct 11, 2018 at 11:09:01AM -0500, Dan Rue wrote:
On Thu, Oct 11, 2018 at 11:46:14AM -0300, Antonio Terceiro wrote:
[...]
Thanks for everyone's feedback. So my plan to address your concerns is the following:
allow the cleanup to be disabled completely by setting the number of days to -1. This can be used as a simple solution for projects that have new builds at a low frequency.
add a `keep forever` flag to build objects, that can be set using the REST API. This way each project can mark exactly which builds they want to keep, without needing to hardcode any specific logic in squad.
For LKFT we're still discussing what our policy should be for data retention. We link into qa-reports and lava in bugs and public mailing lists posts, and so our task is to assess the impact of such links breaking over time.
Logistically, will there be time to set this value after this feature is released, but before the first purge occurs, so that we will not lose data as a part of the deployment? We will surely want more than 180 days, but we're trying to decide if 365 is sufficient.
Good point. I could set all existing projects to -1 to give some time for a decision, or we could wait until we have a decision from most projects before deploying this in production.
Patch was merged and is now deployed in staging. I setup a testing project: https://staging-qa-reports.linaro.org/people/data-retention-test/ The data should be deleted after 1 day. This brings an interesting challenge - what should happen to calculated regressions and fixes? Should we re-calculate them when removing baselines? Keep them as they are? Maybe the baseline 'version' should be stored so we know how the calculation was made?
How does it behave today? I prefer the calculate and cache approach, but, I don't know how to deal with references going away. I've been trying to figure out how to delete known issues, and I'm hitting the same problem. For example, we have some tests that have changed names; I added a new known issue with the new name, but if I delete the old one then all of the old results will start showing failures where previously it showed xfail.
I think in this case just mark it inactive. The object will still be there so xfails stay, but it will not show in the UI.
Instead, in general, it'd be nice if everything were calculated and stored. It might solve some of the performance issues, too.
But, as I write this I talk myself out of it. Having everything real-time is simpler, uses more cpu but less storage. If we do go with calculate and cache, then we'll invariably need to implement 're-calculate' so that calculations can be updated when dependent references are changed.
That's the problem I have. The regressions and fixes are cached right now. However when the builds start disappearing caches will become invalid. In best case scenario this will only affect last undeleted build. However if there are builds marked to 'keep forever' we'll have very inconsistent data.
milosz
Dan