+++ Christian Robottom Reis [2010-08-05 22:28 -0300]:
Hi there!
I unpacked our minimal release image and ran an xdiskusage on it,
mostly to see what we're shipping -- and I was surprised to see that a fourth of the image is actually apt package caches and lists.
This is typical for a debian-based minimal system.
Emdebian has spend some time developing tools for minimising installed size of Debian-compatible images. So I can make a few relevant comments.
Can we put into the image generation script something to strip them out before generating the image?
We could. The tradeoff is having to download them again on first use of apt on the target system vs a smaller installed system until that is done. In cases where that is 'never' then it's a big win.
Making sure that only repos that are actually needed on the target are listed can help. Does it need src repos? Does it need universe/multiverse? leaving those out makes a huge difference.
I assume there are no .debs in the apt cache? debotstrap-based installers leave all the .debs in because they are needed for second-stage configuration, but I assume we've done the second-staging by some means or other. (multistrap-based image creation does not need the .debs for 'second-stage', so this issue does not arise).
The untarring also suggests a number of places where we could further trim the image, some of which are probably pretty hard to do:
*
- stripping /usr/share/doc out (but everybody knew that)
- dropping charmaps, zones and locale info that will never really be used
- stripping out modules for devices that won't ever be on this ARM device
- stripping out firmware for peripherals that won't ever be on this ARM device
This is pretty close to what emdebian grip does - i.e. the set of easy wins which approximately halves your base image size without making any binary-incompatible changes or rebuilding anything. (although emdebian doesn't do anything about kernels - we've left that as out-of-scope)
We could use the em-grip tool (or a variant) to repackage our debs to make smaller images. However the result is not policy-compliant 'ubuntu', but a new repository of packages containing the exact same binaries, but less bloat, ontop of which you can install any normal ubuntu packages which have not had this treatment. That may or may not be how we want to proceed? It is a sane and effective way to manage this sort of thing (it is currently trivial to crossgrade Debian to emdebian-grip and save a load of space, or to use the installer to install grip instead of normal Debian). We could pull the same trick for Ubuntu with relatively little effort.
Grip does the following things to compatibly save space: * Reduce all Long descriptions to 4 lines in packages files (makes them approc half the size) * strip other fields that aren't actually needed (including 'recommends') * strip all docs, examples, manpages, just leaving copyright files * sets dpkg-vendor so that it can be used to different stuff in maintainer scripts (or on rebuilds). * restricts overall archive size to keep apt metadata size down * remove lintian files, help files * don't require everything 'essential' so a smaller minimal system can be specified. * split translations out into '.tdebs' - one per lang per package, in separate pool i.e not like the ubuntu or proposed Debian schemes
* tzdata is one thing we left alone in grip, although it would be really good to slim it down a bit. In crush it was shrunk to ~1/3rd the size (2.4MB) by removing the 'right' and 'posix' copies.
Of course another way to achieve much the same effect is to use dpkg filtering at install time to do the same sorts of stripping. This means you can leave the package files exactly as they were (and downloads don't get any smaller, only final images). That has been implemented as proof of concept a few years back, but there are some complicated issues about what happens on future upgrades/removals and exactly how dpkg should deal with operations on installed-but-filtered files.
If we want to make smaller images we should certainly look at re-using some of the emdebian technology and/or mechanisms as it already works well.
Wookey