On Fri, Sep 09, 2011, Paul Sokolovsky wrote:
Also, a general choice with a generic mirroring service is whether we try sharing the mirrored data effectively. Say we want to mirror Cyanogenmod manifests, or if someone wanted to mirror Android upstream
- Linaro manigests, could we do so in a way which avoids duplicating
the contents of repos. This can quickly jump to github/gitorious-level complexity though, but some of it needs to be considered now (like the fact that Linaro has plenty of manifests and they point at 99% of the same data, so a mirror of Linaro with separate storage per-manifest would be unacceptably costly).
I'm not sure I follow exactly. Are you talking about sharing git pack data across repositories, knowing they're close forks of each other? I never heard about that, nor I think its worth being pursued, because there's enough complexity already. It can be handled on specific project level somehow, for example Linaro codebase is a proper superset of AOSP, with fetching Linaro tree, one would have (easily separatable) access to both AOSP and Linaro code (technically, not politically).
Good example of what I meant: I don't think we want to go as far as sharing git pack data across repositories which are close, that's clearly on the too complex side of things. However we also don't want to keep it too simple, like one mirror data set for each manifest.xml that we mirror. Instead, it's likely going to be something like parsing all manifests that we want to mirror, building a map from them of which git repos and branches and tags and shas we want to mirror, then mirroring that in a flat hierarchy, then finding a way to consume the resulting super-mirror.
Another problem I see with our current usage is that the branch names keep changing, for instance linaro_android_2.3.5, and because this is where we get manifests from, it might be tricky hardcoding this into the mirroring service.
Good mirroring service wouldn't be tight to specific branch, I guess.
Yup; just wanted to point out that this is a bit specific to our choice of workflow, and not generic. If we try to create a *reusable* tool to mirror manifests, we want to be careful not too put too many smarts which are only relevant to Linaro. I didn't really think about that, maybe it's just a matter of giving an easy way to Linaro to provision new branch names to the mirroring service, or maybe it should be done with a manifest_branch_name_regexp="^linaro_android_.*" kind of approach.
Yes. Please note that I personally consider it flawed experiment.
Ok, understood, also thanks to the detailed list of issues you passed; I'll wait and see a bit more on that one.
would have its limited usage, but to claim it's generally-suitable solution, it would need lot of testing. (With all these experiments, I got feeling that magic that repo does is very well thought, and keeps acceptable performance when dealing with truly enormous Android codebase at shaky equilibrium, so changing seemingly small thing can drop checkout speed twice for example).
[...]
No, I'm saying that I have intuitive suspicions that it may affect performance, so we'd rather take necessity of performance and load tests for this work seriously. And if performance degradation against reference design (repo) will be proven, think what to do about it.
Noted
Well, I don't know. We for sure would want to (try to) improve build checkout time, and all ways should be considered. At repo is after just a tool which checks out a list of git repositories at specified revisions, so if we'd find more optimal way to do that (though likely more limited in other aspects), why not?
Yup; replacing repo with a compatible interface for our limited cases is fine, but it seems important to preserve the possibility of using repo in the workflow.
Ah, so you've played with python-git and it sucks?
No, I didn't mean that. I meant that before moving to using more intricate API level which would be allowed by using language binding, it would makes sent to try to implement it in terms of git pull/push, if that's not enough - git fetch, and only if still not enough - to employ even more complex solutions.
Oh just seemed a better idea to me to use python-git instead of wrapping git pull, git clone, git fetch etc.; this wasn't to do specifically advanced things. I was simply hoping this would draw a nice line in the sand in terms of responsibilities and give us a cleaner interface to do git-ish stuff.