On Monday 11 July 2011 18:16:50 Peter Warasin wrote:
hi guys
tests for a kingston 2GB micro SD card
BTW: must these tests been done on a partition which is 4MiB aligned?
Yes, the partition must be aligned to the erase block size, in this case probably 1MB. You can add something like '--offset=$[123 * 512]' to move the start of the test run 123 sectors into the drive to correct this.
If the partition is misaligned, that will improve the measurement, but the numbers here indicate that the alignment is indeed at least 1MB.
./flashbench -a --blocksize=1024 /dev/mmcblk0p3 align 134217728 pre 1.69ms on 1.86ms post 1.29ms diff 375µs align 67108864 pre 1.87ms on 2.04ms post 1.35ms diff 432µs align 33554432 pre 1.91ms on 2.08ms post 1.35ms diff 455µs align 16777216 pre 1.83ms on 2ms post 1.32ms diff 426µs align 8388608 pre 1.83ms on 2ms post 1.32ms diff 426µs align 4194304 pre 1.77ms on 1.97ms post 1.29ms diff 441µs align 2097152 pre 1.8ms on 1.97ms post 1.29ms diff 425µs align 1048576 pre 1.51ms on 1.98ms post 1.29ms diff 579µs align 524288 pre 1.27ms on 1.45ms post 1.29ms diff 169µs align 262144 pre 1.27ms on 1.45ms post 1.29ms diff 169µs align 131072 pre 1.27ms on 1.45ms post 1.29ms diff 169µs align 65536 pre 1.26ms on 1.44ms post 1.29ms diff 164µs align 32768 pre 1.27ms on 1.45ms post 1.29ms diff 169µs align 16384 pre 1.29ms on 1.45ms post 1.27ms diff 168µs align 8192 pre 1.29ms on 1.46ms post 1.29ms diff 177µs align 4096 pre 1.29ms on 1.33ms post 1.29ms diff 40µs align 2048 pre 1.29ms on 1.33ms post 1.29ms diff 39.4µs
i assume i see the erase-block size where pre value is smallest (??) erase-block size 64Kib? that can't possible be, can it?
The erase block size is usually the point where the last column has the largest drop, in this case between 1MB and 512KB, so the erase block is 1MB.
The page size is probably 8KB according to this, i.e. the other drop.
What the number show is:
* It takes around 1.29ms to read 1KB within a page * It takes around 1.46ms to read 1KB across a page boundary * It takes around 1.97ms to read 1KB across an erase block
Quite typical for a 2GB card or smaller.
$ ./flashbench --open-au --open-au-nr=1 /dev/mmcblk0p3 4MiB 5.94M/s 2MiB 4.21M/s 1MiB 4.26M/s 512KiB 3.27M/s 256KiB 4.27M/s 128KiB 5.91M/s 64KiB 10.3M/s 32KiB 8.76M/s 16KiB 7.06M/s
$ ./flashbench --open-au --open-au-nr=2 /dev/mmcblk0p3 4MiB 9.87M/s 2MiB 10.1M/s 1MiB 4.54M/s 512KiB 1.97M/s 256KiB 945K/s 128KiB 464K/s 64KiB 213K/s 32KiB 112K/s 16KiB 55.4K/s
$ ./flashbench --open-au --open-au-nr=3 /dev/mmcblk0p3 4MiB 8.18M/s 2MiB 9.97M/s 1MiB 4.42M/s 512KiB 1.96M/s 256KiB 942K/s 128KiB 463K/s 64KiB 218K/s 32KiB 111K/s 16KiB 55.5K/s
means the card can handle only 1 block at once, does it? because performance is constant only with 1 block and 4MiB erase-block should be fine (?)
Yes, this is the main problem with Kingston cards. You can see the effect all over the table in https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey for every Kingston SD card.
./flashbench --open-au --open-au-nr=1 /dev/mmcblk0p3 --erasesize=$[1 * 1024 * 1024] 1MiB 2.72M/s 512KiB 1.97M/s 256KiB 1.98M/s 128KiB 1.97M/s 64KiB 4.25M/s 32KiB 4.03M/s 16KiB 3.6M/s
$./flashbench --open-au --open-au-nr=1 /dev/mmcblk0p3 --erasesize=$[2 * 1024 * 1024] 2MiB 9.97M/s 1MiB 2.26M/s 512KiB 1.97M/s 256KiB 2.71M/s 128KiB 4.23M/s 64KiB 1.78M/s 32KiB 5.19M/s 16KiB 7.08M/s
./flashbench --open-au --open-au-nr=1 /dev/mmcblk0p3 --erasesize=$[4 * 1024 * 1024] 4MiB 6.03M/s 2MiB 4.27M/s 1MiB 3.26M/s 512KiB 3.32M/s 256KiB 3.33M/s 128KiB 6M/s 64KiB 10.3M/s 32KiB 8.96M/s 16KiB 7.05M/s
so, 2MiB is better, is it?
I think this has more to do with the alignment of the first erase block. If the partition start is not aligned to a full multiple of the erase block size, larger blocks are simply faster because you actually end up writing full erase blocks, while writing 1MB across two partial erase blocks is rather slow.
hmm, does that turbo-boost at 64KiB mean that's the correct page size?
No, you can see this effect for very many media from different vendors. I have two possible explanations for this:
1. The cards optimize for the behaviour of microsoft windows, which is known to normally write exactly 64KB at once. 2. It's an artifact of the way that the Linux block layer works.
Possibly it's also a combination of these two.
tests with --random:
$ ./flashbench --open-au --open-au-nr=1 /dev/mmcblk0p3 --erasesize=$[2
- 1024 * 1024] --random
2MiB 4.27M/s 1MiB 1.76M/s 512KiB 998K/s 256KiB 633K/s 128KiB 419K/s 64KiB 325K/s 32KiB 177K/s 16KiB 86.1K/s
$ ./flashbench --open-au --open-au-nr=1 /dev/mmcblk0p3 --erasesize=$[4
- 1024 * 1024] --random
4MiB 5.99M/s 2MiB 2.01M/s 1MiB 2.92M/s 512KiB 1.2M/s 256KiB 605K/s 128KiB 318K/s 64KiB 241K/s 32KiB 129K/s 16KiB 65.2K/s
./flashbench --open-au --open-au-nr=1 /dev/mmcblk0p3 --erasesize=$[1 * 1024 * 1024] --random 1MiB 2.75M/s 512KiB 804K/s 256KiB 984K/s 128KiB 377K/s 64KiB 265K/s 32KiB 186K/s 16KiB 78.8K/s
This mainly shows how Kingston cards suck with random I/O: You cannot even write a single erase block efficiently doing random writes with an erase block. When you have this behaviour, doing more --random tests does not help you any further.
./flashbench --open-au-nr=1 --random /dev/mmcblk0p3 -O --erasesize=$[1536 * 1024] --blocksize=$[96 * 1024] 1.5MiB 3.57M/s 768KiB 1.02M/s 384KiB 1.53M/s 192KiB 546K/s 96KiB 311K/s
./flashbench --open-au-nr=1 --random /dev/mmcblk0p3 -O --erasesize=$[3072 * 1024] --blocksize=$[96 * 1024] 3MiB 5.34M/s 1.5MiB 2.29M/s 768KiB 2.1M/s 384KiB 888K/s 192KiB 470K/s 96KiB 233K/s
./flashbench --open-au-nr=1 --random /dev/mmcblk0p3 -O --erasesize=$[6144 * 1024] --blocksize=$[96 * 1024] 6MiB 6.97M/s 3MiB 2.9M/s 1.5MiB 2.61M/s 768KiB 1.49M/s 384KiB 881K/s 192KiB 442K/s 96KiB 216K/s
./flashbench --open-au-nr=1 --random /dev/mmcblk0p3 -O --erasesize=$[8192 * 1024] --blocksize=$[64 * 1024] 8MiB 7.63M/s 4MiB 3.33M/s 2MiB 3.71M/s 1MiB 2.58M/s 512KiB 1.19M/s 256KiB 604K/s 128KiB 282K/s 64KiB 242K/s
hope these numbers help somehow.. i'm lost in numbers now :)
The only thing you can tell from these numbers is that 1MB erase block size is likely correct, because the performance numbers are halved with every line below 1MB but are fairly constant above that.
Arnd