Images, apt-get clean and friends
dave.martin at linaro.org
Fri Aug 6 17:38:51 BST 2010
On Fri, Aug 6, 2010 at 3:15 PM, Michael Vogt <mvo at ubuntu.com> wrote:
> You have the following options to make the on-disk file size smaller:
> * keep them compressed on disk (needs apt in maverick), you need to set
> Acquire::GzipIndexes "true";
> in /etc/apt/apt.conf.d/10keepcompressed
> * Create the apt caches in memory only, set
> Dir::Cache::srcpkgcache "";
> Dir::Cache::pkgcache "";
> in /etc/apt/apt.conf.d
> Given your hardware targets I think its best to test how well both
> solutions perform, enabling the first one should be pretty safe.
Hmmm, interesting, we'll have to play with those :)
>> > Out of interest, does anyone know why dpkg/apt never migrated from the
>> > "massive sequential text file" approach to something more
>> > database-oriented? I've often thought that the current system's
>> > scalability has been under pressure for a long time, and that there is
>> > potential for substantial improvements in footprint and performance -
>> > though the Debian and Ubuntu communities would need to give their
>> > support for such an approach, unless we wanted to switch to a
>> > different packaging system.
>> CCed mvo who can probably give some background here. From what I understand
>> it's a long standing wishlist bug with potential to break the world :-P
> There is currently no work on this inside apt. The text file is the
> "canonical" source, the *pkgcache.bin is a fast mmap representation of
> this information that is re-build if needed.
> There are various non-apt approaches out there (as you probably aware
> of) that use different storage. I think this needs careful evaluation
> and research, the current mmap format is actually not bad and in the
> latest maverick apt we can grow the size dynamically. The upside of
> the text file approach is simplicity and robustness, on failures
> sysadmin can inspect/fix with vi.
I hadn't got the impression that there was anything bad about apt's
package cache, but it might be nice to consolidate it with the package
lists in some way, since the information contained is essentially the
same (I think).
For a while, Debian also distributed package list diffs, which can
save a lot of download bandwidth in principle, though it gives the
receiving host more work to do. I don't know if it still works that
way though; I haven't seen any diffs during list updates for a while.
I guess if someone wanted to work on this, it might make sense to
create a new (initially experimental) database backend for apt while
continuing to support the existing one. The new backend could allow
for an alternative representation of the data and some alternative
mechanisms, but contents of the data could remain unchanged. You're
right that the text file approach is just fine (and convenient) for
desktop PC and server environments in particular, as well as being
robust and well-tested, so we certainly wouldn't want to get rid of it
or risk breaking it.
> P.S. Talking about areas where our packaging system needs
> improvements, I think the biggest problem for a embedded use-case are
> maintainer scripts (postinst, preinst, postrm, prerm). They make
> rollback from failed upgrades impossible (as they can alter the system
> state in ways outside of dpkgs knowledge/control). Having a more
> declarative approach (triggers are a great step forward) would
> certainly be a win.
Part of the problem is that the maintainer scripts can currently run
random helper programs and scripts. This is very flexible, but by
definition, dpkg can't understand what every possible program will do.
The package maintainer could be given the task of describing to dpkg
what will happen, but I'm wondering whether dpkg can realistically
enforce it, or whether there's a risk of the description having to
become as complex as the helper.
So I guess there's a question of whether we can/should reduce that
flexibility, and by how much.
If the flexibility has to be maintained, I wonder whether some sort of
filesystem checkpointing approach is possible, so that the changes
made by the maintainer scripts go into a staging area, to be
"committed" together in a safe way when the packager has finished.
Maybe something similar to fakeroot could be used, or a FUSE-based
versioned filesystem or similar. Not sure how to commit the changes
safely though; you kinda end up having to lock the whole filesystem at
least for that phase of the process.
In principle, we have the problem even now that if dpkg runs while
something else is modifying files in /var, /etc, it's possible for the
configuration to get corrupted. Fortunately, that's a rare
occurrence, and generally requires the admin to do something silly.
More information about the Linaro-dev