On Fri, Aug 6, 2010 at 3:15 PM, Michael Vogt mvo@ubuntu.com wrote:
You have the following options to make the on-disk file size smaller:
* keep them compressed on disk (needs apt in maverick), you need to set """ Acquire::GzipIndexes "true"; """ in /etc/apt/apt.conf.d/10keepcompressed
* Create the apt caches in memory only, set """ Dir::Cache::srcpkgcache ""; Dir::Cache::pkgcache ""; """ in /etc/apt/apt.conf.d
Given your hardware targets I think its best to test how well both solutions perform, enabling the first one should be pretty safe.
Hmmm, interesting, we'll have to play with those :)
Out of interest, does anyone know why dpkg/apt never migrated from the "massive sequential text file" approach to something more database-oriented? I've often thought that the current system's scalability has been under pressure for a long time, and that there is potential for substantial improvements in footprint and performance - though the Debian and Ubuntu communities would need to give their support for such an approach, unless we wanted to switch to a different packaging system.
CCed mvo who can probably give some background here. From what I understand it's a long standing wishlist bug with potential to break the world :-P
There is currently no work on this inside apt. The text file is the "canonical" source, the *pkgcache.bin is a fast mmap representation of this information that is re-build if needed.
There are various non-apt approaches out there (as you probably aware of) that use different storage. I think this needs careful evaluation and research, the current mmap format is actually not bad and in the latest maverick apt we can grow the size dynamically. The upside of the text file approach is simplicity and robustness, on failures sysadmin can inspect/fix with vi.
I hadn't got the impression that there was anything bad about apt's package cache, but it might be nice to consolidate it with the package lists in some way, since the information contained is essentially the same (I think).
For a while, Debian also distributed package list diffs, which can save a lot of download bandwidth in principle, though it gives the receiving host more work to do. I don't know if it still works that way though; I haven't seen any diffs during list updates for a while.
I guess if someone wanted to work on this, it might make sense to create a new (initially experimental) database backend for apt while continuing to support the existing one. The new backend could allow for an alternative representation of the data and some alternative mechanisms, but contents of the data could remain unchanged. You're right that the text file approach is just fine (and convenient) for desktop PC and server environments in particular, as well as being robust and well-tested, so we certainly wouldn't want to get rid of it or risk breaking it.
[...]
P.S. Talking about areas where our packaging system needs improvements, I think the biggest problem for a embedded use-case are maintainer scripts (postinst, preinst, postrm, prerm). They make rollback from failed upgrades impossible (as they can alter the system state in ways outside of dpkgs knowledge/control). Having a more declarative approach (triggers are a great step forward) would certainly be a win.
Part of the problem is that the maintainer scripts can currently run random helper programs and scripts. This is very flexible, but by definition, dpkg can't understand what every possible program will do. The package maintainer could be given the task of describing to dpkg what will happen, but I'm wondering whether dpkg can realistically enforce it, or whether there's a risk of the description having to become as complex as the helper.
So I guess there's a question of whether we can/should reduce that flexibility, and by how much.
If the flexibility has to be maintained, I wonder whether some sort of filesystem checkpointing approach is possible, so that the changes made by the maintainer scripts go into a staging area, to be "committed" together in a safe way when the packager has finished. Maybe something similar to fakeroot could be used, or a FUSE-based versioned filesystem or similar. Not sure how to commit the changes safely though; you kinda end up having to lock the whole filesystem at least for that phase of the process.
In principle, we have the problem even now that if dpkg runs while something else is modifying files in /var, /etc, it's possible for the configuration to get corrupted. Fortunately, that's a rare occurrence, and generally requires the admin to do something silly.
Cheers ---Dave