Images, apt-get clean and friends
dave.martin at linaro.org
Fri Aug 6 10:57:21 BST 2010
On Fri, Aug 6, 2010 at 9:53 AM, Alexander Sack <asac at linaro.org> wrote:
> On Fri, Aug 6, 2010 at 3:28 AM, Christian Robottom Reis <kiko at linaro.org>
>> Hi there!
>> I unpacked our minimal release image and ran an xdiskusage on it,
>> mostly to see what we're shipping -- and I was surprised to see that a
>> fourth of the image is actually apt package caches and lists. Can we
>> put into the image generation script something to strip them out before
>> generating the image?
> if there are really .deb's shipped in the tarball then this is definitly
> waste and a bug.
> However, if its just the lists and pkg cache then I am not so convinced
> unless we say we
> remove apt (and dpkg) from our images (e.g. dont allow easy install/upgrade
> Those files would come back when running apt-get update etc., so the only
> thing we would win is smaller initial download bandwidth, while I think we
> are really after
> general/lasting disk foodprint savings.
We could remove these files, but I agree it may be a false
optimisation: the size of the release filesystem is no longer
representative of the steady-state size of the filesystem when it's in
use in this case.
Out of interest, does anyone know why dpkg/apt never migrated from the
"massive sequential text file" approach to something more
database-oriented? I've often thought that the current system's
scalability has been under pressure for a long time, and that there is
potential for substantial improvements in footprint and performance -
though the Debian and Ubuntu communities would need to give their
support for such an approach, unless we wanted to switch to a
different packaging system.
> One thing we could do is remove universe from our default apt line. this
> probably would
> reduce the size of that directory by > 50% ...
> Long term we could have our own archive with less packages ... this could
> reduce size
> of those indexes etc. even further.
>> The untarring also suggests a number of places where we could further
>> trim the image, some of which are probably pretty hard to do:
>> * stripping /usr/share/doc out (but everybody knew that)
> ack. we plan to do that using pitti's dpkg improvements; last time they
> didn't land
> in the archive yet, but I will check the status soon again.
It's interesting to note that due to the fact that /usr/share/doc
contains mostly nearly-empty directories and tiny files, the
filesystem overhead may be a significant part of the overall
consumption here - I estimate about 20-30% of the overall space,
assuming a typical filesystem with 4KB blocksize.
If we have to keep /usr/share/doc/ (for copyright notices and so on),
maybe it would be feasible to replace each /usr/share/doc/<package>/
with a tarball? This would eliminate most of the overhead as well as
making the actual data smaller. Since /usr/share/doc/ is not accessed
often, and not accessed by many automated tools, this might not cause
>> * stripping out modules for devices that won't ever be on
>> this ARM device
> yeah, this feels to make sense. However, I am not sure how to draw the line.
> Maybe this is something the kernel WG can take a look at and come up with a
> reduced list of modules?
Classifying drivers by bus, and throwing out anything that can't be
physically connected, such as PCI/AGP/ISA might be an approach here.
Also, peripherals which can only be connected to on-SoC buses, but are
not present in a given platform's silicon could be excluded. We would
still have to keep a lot though... anything which can be connected via
USB, for example.
A more ambitious solution might be to allow for dynamic installation
of missing modules, but that's probably a separate project since it
would impact on the way the kernel is packaged.
Currently we have no choice but to install absolutely everything "just
in case" (much like the way /dev used to contains 1000s if device
nodes that were never used).
More information about the Linaro-dev