On Sunday 27 March 2011 11:15:38 Michael Monnerie wrote:
On Freitag, 25. März 2011 Arnd Bergmann wrote:
# ./flashbench --findfat --fat-nr=6 /dev/sde --blocksize=512 sched_setscheduler: Operation not permitted 4MiB 7.94M/s 7.97M/s 7.95M/s 7.98M/s 7.85M/s 7.95M/s 2MiB 5.57M/s 5.55M/s 5.56M/s 5.53M/s 5.57M/s 5.55M/s 1MiB 24.3M/s 24M/s 7M/s 6.93M/s 6.96M/s 6.88M/s 512KiB 23.6M/s 23.9M/s 6.09M/s 6.06M/s 6.11M/s 6.07M/s 256KiB 4.41M/s 4.43M/s 4.44M/s 4.42M/s 4.45M/s 4.39M/s 128KiB 4.17M/s 4.15M/s 4.16M/s 4.16M/s 3.09M/s 3.09M/s 64KiB 21.5M/s 4.59M/s 4.52M/s 22M/s 4.57M/s 4.67M/s 32KiB 16.8M/s 16M/s 16.4M/s 16.1M/s 16.2M/s 16.4M/s 16KiB 4.62M/s 4.49M/s 4.45M/s 4.41M/s 4.49M/s 4.43M/s 8KiB 1.93M/s 1.94M/s 1.96M/s 2.01M/s 1.95M/s 2.01M/s 4KiB 970K/s 971K/s 1.01M/s 973K/s 990K/s 1.02M/s 2KiB 467K/s 481K/s 468K/s 467K/s 468K/s 472K/s 1KiB 235K/s 234K/s 236K/s 232K/s 230K/s 225K/s 512B 120K/s 118K/s 118K/s 118K/s 117K/s 120K/s
Ok. The important part here is the effective page size, which is almost certainly 32 KB. Writing any smaller blocks is always much worse here. Writing larger blocks has interesting side-effects, so they may be more to it than I thought at first.
I was wondering why the 1M,512K,64K was faster, and rerun the test twice for the bigger areas:
# ./flashbench --findfat --fat-nr=6 /dev/sde --blocksize=512 sched_setscheduler: Operation not permitted 4MiB 23.1M/s 23M/s 22.8M/s 22.9M/s 7.85M/s 7.91M/s 2MiB 6.67M/s 5.58M/s 5.58M/s 5.56M/s 5.57M/s 5.55M/s 1MiB 23.3M/s 24M/s 6.98M/s 7.02M/s 6.95M/s 6.89M/s 512KiB 23.6M/s 23.5M/s 6.11M/s 6.13M/s 6.07M/s 6.08M/s 256KiB 4.46M/s 4.44M/s 4.41M/s 4.41M/s 4.47M/s 4.42M/s 128KiB 4.15M/s 4.1M/s 4.12M/s 4.14M/s 3.09M/s 3.09M/s 64KiB 21.4M/s 21.9M/s 4.49M/s 4.58M/s 4.57M/s 4.52M/s 32KiB 16.2M/s 17M/s 16.4M/s 17.1M/s 17.1M/s 17M/s ^C # ./flashbench --findfat --fat-nr=6 /dev/sde --blocksize=512 sched_setscheduler: Operation not permitted 4MiB 7.21M/s 22.8M/s 23.4M/s 22.5M/s 7.88M/s 7.88M/s 2MiB 7.24M/s 6.72M/s 5.57M/s 5.58M/s 6.68M/s 6.68M/s 1MiB 24.2M/s 24.7M/s 6.91M/s 6.88M/s 6.95M/s 7M/s 512KiB 23.6M/s 23.8M/s 6.14M/s 6.04M/s 6.9M/s 6.94M/s 256KiB 4.48M/s 4.45M/s 4.45M/s 4.45M/s 6.75M/s 6.61M/s 128KiB 4.15M/s 4.15M/s 3.09M/s 3.08M/s 4.13M/s 4.14M/s 64KiB 22.7M/s 22.3M/s 4.54M/s 4.53M/s 6.72M/s 6.59M/s 32KiB 15.5M/s 15.7M/s 16.1M/s 16.5M/s 5.93M/s 5.71M/s ^C
Seems there's a big variation in the 4M, 1M, 512K and 64K tests. Is that normal?
No, it's not normal, but it can be explained by cache effects:
The data transfer rate on the USB interface into the cache is probably 23 MB/s, but the stick can only write about 8 MB/s continuously using >32 KB blocks, so as soon as the cache is full, any further block gets much slower.
You can verify if this is the case by writing a lot of data to the stick:
# Write 256 MB using 4 MB blocks dd if=/dev/zero of=/dev/sde bs=4M oflag=direct count=64
# Write 256 MB using 32 Kb blocks dd if=/dev/zero of=/dev/sde bs=32K oflag=direct count=8192
My guess is that in the first case, you get to around 8 MB/s, while the second one should be a bit better.
This kind of caching obviously makes it harder to get good data out of the stick, but on the other hand greatly improves the performance in real-world scenarios. Most other sticks I've seen don't have it.
Another possible explanation is that the stick actually uses 12 MB erase blocks, not 4 MB (flashbench -a only checks power-of-two values). You can test this by using a multiple of three for both erase and block size in the tests:
./flashbench --findfat --fat-nr=6 /dev/sde --blocksize=$[3072] --erasesize=$[12 * 1024 * 1024]
If this is the case, the stick will frequently have to do garbage collection, which makes it slower than it could be.
That will add the two missing pieces of information for the survey, the page size and the location of the FAT. I've already entered the other data into the wiki.
And what would I read from those results? Is the 1M,512K and 64K result so much faster because the stick is FAT optimized?
No. However, the 32 KiB effect is definitely for the FAT optimization, because that is the largest block size supported by most FAT32 implementations.
Arnd