repo and other mirroring
paul.sokolovsky at linaro.org
Fri Sep 9 13:01:36 UTC 2011
On Thu, 8 Sep 2011 13:26:57 +0200
Loïc Minier <loic.minier at linaro.org> wrote:
> On Thu, Sep 08, 2011, Paul Sokolovsky wrote:
> > Repo is bad tool for mirroring. We came to that conclusion, as other
> > folks before us. So, android-build repo mirror waits for it rewrite,
> > left in peace for now that it "mostly works". But for upstream
> > mirror for Gerrit, I implemented it via previously queued ideas of
> > "proper" git mirror. It's not deployed in production mode yet -
> > kernel.org downtime stroke right in the middle of it.
> > Gerrit upstream mirror is essentially loop over existing working
> > repository tree in FS, anf git pull/push each with suitable ref
> > params (I tried --mirror first, but that doesn't work well with
> > Gerrit). The devel codxe is here:
> > https://code.launchpad.net/~linaro-infrastructure/linaro-android-gerrit-support/gerrit-support
> It seems there is an important new problem with the use case of
> mirroring for Gerrit: detecting new projects and removed projects as
> to provision/unprovision in Gerrit (gerrit create-project). I guess
> we're not too strict about removing projects/cleaning up git repos
> removed from manifests right now.
Yes. Actually there're 2 problems: 1) detecting new upstream projects
(there's script for that, but not expected to be run automatically so
far); 2) properly migrating our own components to Gerrit, - so far we
just leave old repos in place to keep old releases fetchable, but I
expect this to lead to a mess. Btw, projects in Gerrit are not
deletable in normal way ;-). (One could hack on DB level, yeah.)
> Do I understand correctly that we have a mirror AND the Gerrit copy
> of the repos? (/srv/git.linaro.org/git/android + /mnt/gerrit-mirror
Yes, we have Gerrit-served master repositories, plus we have workcopy
checkout to perform merges with upstream. Well, those merges are not
merges, but fast-forwards of upstream branch subset, it would be cool
if git could do fast-forward from a repository to repository w/o
working copy, but of course that wouldn't work in general case of
merges and conflicts.
> Also, a general choice with a generic mirroring service is whether we
> try sharing the mirrored data effectively. Say we want to mirror
> Cyanogenmod manifests, or if someone wanted to mirror Android
> + Linaro manigests, could we do so in a way which avoids duplicating
> the contents of repos.
> This can quickly jump to
> github/gitorious-level complexity though, but some of it needs to be
> considered now (like the fact that Linaro has plenty of manifests and
> they point at 99% of the same data, so a mirror of Linaro with
> separate storage per-manifest would be unacceptably costly).
I'm not sure I follow exactly. Are you talking about sharing git pack
data across repositories, knowing they're close forks of each other? I
never heard about that, nor I think its worth being pursued, because
there's enough complexity already. It can be handled on specific
project level somehow, for example Linaro codebase is a proper superset
of AOSP, with fetching Linaro tree, one would have (easily separatable)
access to both AOSP and Linaro code (technically, not politically).
> Another problem I see with our current usage is that the branch names
> keep changing, for instance linaro_android_2.3.5, and because this is
> where we get manifests from, it might be tricky hardcoding this into
> the mirroring service.
Good mirroring service wouldn't be tight to specific branch, I guess.
> > > I've solved c) by forking repo for myself, packaging it as
> > > a .deb and nuking the repo updating pieces, but a) and b) remain
> > > issues with the result. I have other concerns with repo:
> > > * quite complex and hairy for what it does in practice
> > Yeah, just faced it that with latest "repo patched to fetch all
> > tags" campaign - seeminingly small changes broke mirroring and
> > suspectedly adversely affected performance.
> I didn't pull your patch yet, but my intent was to pull exactly that
> feature; it's the top three commits at:
Yes. Please note that I personally consider it flawed experiment. It
would have its limited usage, but to claim it's generally-suitable
solution, it would need lot of testing. (With all these experiments, I
got feeling that magic that repo does is very well thought, and keeps
acceptable performance when dealing with truly enormous Android codebase
at shaky equilibrium, so changing seemingly small thing can drop
checkout speed twice for example).
> > Yes, that's for sure a way to go. Except that after the patched repo
> > experiments I have suspicion that running repo against complete git
> > mirror vs repo's partial mirror may increase sync time and
> > I/O complexity (the latter hits in full in case of parallel
> > fetches).
> You're saying we should filter the contents of the git repos exposed
> in the resulting mirrored tree so that developers and buildds
> pointing repo at it get good performance, correct?
No, I'm saying that I have intuitive suspicions that it may affect
performance, so we'd rather take necessity of performance and load
tests for this work seriously. And if performance degradation against
reference design (repo) will be proven, think what to do about it.
> Do we know in
> which ways? avoiding useless branches I guess? If we know about all
> manifests which point at all branches/tags/shas, we can keep just
> that in the published repos and garbage collect the rest (might need
> more than git gc).
> > But generally, I came to following observation: 1) repo to fetch
> > codebase for development; 2) repo for mirroring; 3) repo to fetch
> > for quick build - are all rather different usecases. Usage 1 for
> > sure stays that way forever, but for usages 2 & 3 we may write
> > specialized tools providing better performance.
> Very well put!
> I agree on 1 and 2, I'm less sure about 3; would we really want to
> move to something else than repo for builds? It's good when buildd
> and developer builds look alike when possible.
Well, I don't know. We for sure would want to (try to) improve build
checkout time, and all ways should be considered. At repo is after just
a tool which checks out a list of git repositories at specified
revisions, so if we'd find more optimal way to do that (though likely
more limited in other aspects), why not?
> > > Another advantage is that there is now a python-git which can be
> > > leveraged rather than wrapping the git commands in python as repo
> > > does.
> > That sounds too complicated, like another "quite complex and hairy
> > for what it does in practice" ;-).
> Ah, so you've played with python-git and it sucks?
No, I didn't mean that. I meant that before moving to using more
intricate API level which would be allowed by using language binding,
it would makes sent to try to implement it in terms of git pull/push,
if that's not enough - git fetch, and only if still not enough - to
employ even more complex solutions.
> Fair enough, but
> I did find it's a bit of a pity that repo has so much git wrapping
> logic, not isolated at all, and probably fragile in the face of
> changes to the host's git and gitconfig.
> > I guess you know that Google works (worked) slowly on integrating
> > some kind of submodules support into repo (for exact purpose not
> > known by me, I assume ditching manifests, but maybe just to support
> > components with it). But works goes very slowly or halted.
> The whole repo workflow seems so borken to me that I doubt they have
> a chance fixing it in incremental updates.
> > Well, I can't at once tell if repositories with submodules may need
> > additional attention with mirroring. My guess is yes. After all,
> > submodule is just a symlink to another repo, they are fetched in
> > *working copy*. And git mirror is a bare repo.
> You're right; I hadn't thought enough about this. It will definitely
> need more than a couple of commands.
> First, we'll run in the same issue that required rewriting manifests
> that the sources for the commits are separate repos. So we could
> rewrite ".gitmodules" files, or we could try using "insteadOf" to map
> original URLs to mirrored URLs. Would be nice if this is easy to
> Second, indeed we need to do the actual fetching, but it doesn't
> seem to be *too* hard; perhaps we could even contribute this to git
> if it's generic enough.
Yes, I guess that makes sense.
Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
More information about the linaro-dev