On Fri, Aug 06, 2010 at 12:05:25PM +0200, Alexander Sack wrote:
On Fri, Aug 6, 2010 at 11:57 AM, Dave Martin dave.martin@linaro.org wrote:
On Fri, Aug 6, 2010 at 9:53 AM, Alexander Sack asac@linaro.org wrote:
On Fri, Aug 6, 2010 at 3:28 AM, Christian Robottom Reis <kiko@linaro.org
Hi there!
I unpacked our minimal release image and ran an xdiskusage on it, mostly to see what we're shipping -- and I was surprised to see that a fourth of the image is actually apt package caches and lists. Can we put into the image generation script something to strip them out before generating the image?
if there are really .deb's shipped in the tarball then this is definitly waste and a bug.
However, if its just the lists and pkg cache then I am not so convinced unless we say we remove apt (and dpkg) from our images (e.g. dont allow easy
install/upgrade
etc.).
Those files would come back when running apt-get update etc., so the only thing we would win is smaller initial download bandwidth, while I think
we
are really after general/lasting disk foodprint savings.
We could remove these files, but I agree it may be a false optimisation: the size of the release filesystem is no longer representative of the steady-state size of the filesystem when it's in use in this case.
You have the following options to make the on-disk file size smaller:
* keep them compressed on disk (needs apt in maverick), you need to set """ Acquire::GzipIndexes "true"; """ in /etc/apt/apt.conf.d/10keepcompressed
* Create the apt caches in memory only, set """ Dir::Cache::srcpkgcache ""; Dir::Cache::pkgcache ""; """ in /etc/apt/apt.conf.d
Given your hardware targets I think its best to test how well both solutions perform, enabling the first one should be pretty safe.
Out of interest, does anyone know why dpkg/apt never migrated from the "massive sequential text file" approach to something more database-oriented? I've often thought that the current system's scalability has been under pressure for a long time, and that there is potential for substantial improvements in footprint and performance - though the Debian and Ubuntu communities would need to give their support for such an approach, unless we wanted to switch to a different packaging system.
CCed mvo who can probably give some background here. From what I understand it's a long standing wishlist bug with potential to break the world :-P
There is currently no work on this inside apt. The text file is the "canonical" source, the *pkgcache.bin is a fast mmap representation of this information that is re-build if needed.
There are various non-apt approaches out there (as you probably aware of) that use different storage. I think this needs careful evaluation and research, the current mmap format is actually not bad and in the latest maverick apt we can grow the size dynamically. The upside of the text file approach is simplicity and robustness, on failures sysadmin can inspect/fix with vi.
Cheers, Michael
P.S. Talking about areas where our packaging system needs improvements, I think the biggest problem for a embedded use-case are maintainer scripts (postinst, preinst, postrm, prerm). They make rollback from failed upgrades impossible (as they can alter the system state in ways outside of dpkgs knowledge/control). Having a more declarative approach (triggers are a great step forward) would certainly be a win.