On Thursday 02 June 2011, Michael Monnerie wrote:
On Dienstag, 31. Mai 2011 Arnd Bergmann wrote:
Try larger values for --open-au-nr=X. You are on the right track, the erasesize is obviously 8 MB, and the number of open AUs is most likely more than 6.
For both linear and --random, you can pass --open-au-nr=32 just for fun, to see what happens (the performance should be horrible, no need to let it run to the end in that case), then start with smaller values (8, 12, 16, 20) to see where the cutoff is. Find the largest fast one and the smallest slow one for linear and random modes.
With the --open-au tests, just use --blocksize=512 as a start. The number only defines when the test stops.
I thought that already with "--open-au-nr=5 --erasesize=$[8 * 1024 * 1024] /dev/sdd --random" it was slow. What is your definition of slow? For me, the first test that goes below 20MB/s would have been slow, so "--open-au-nr=5 --erasesize=$[8 * 1024 * 1024] /dev/sdd --blocksize=$[256 * 1024]" But you seem to mean something else.
Here some results.
It takes a bit experience to spot the difference, but it's rather clear when you look at the difference between 16 and 20 AUs:
Results with small Blocksize. Run by a script which outputs the start time of the test, so you can see how long it was running. Unforunately, I only saw that I used "8*512*1024" for the erasesize, which is 4MB instead of the wanted 8MB. Does that matter?
Using the wrong erase block size makes the result less clear, but it seems that here you still have the correct output:
@12:03:52 # ./flashbench --open-au --open-au-nr=1 --erasesize=$[8 * 512 * 1024] /dev/sdd --blocksize=512 4MiB 22M/s 2MiB 5.99M/s 1MiB 21.2M/s 512KiB 5.98M/s 256KiB 20.9M/s 128KiB 5.92M/s 64KiB 21.2M/s 32KiB 5.99M/s 16KiB 18.3M/s
In particular, what you can see very clearly here is that every other line takes a long time, because the card has to do garbage-collection. When you use 8 MB, that's probably not the case.
@15:21:05 # ./flashbench --open-au --open-au-nr=16 --erasesize=$[8 * 512 * 1024] /dev/sdd --blocksize=512 4MiB 6.9M/s 2MiB 5.14M/s 1MiB 4.65M/s 512KiB 1.63M/s 256KiB 4.06M/s 128KiB 1.73M/s 64KiB 1.64M/s 32KiB 1.02M/s 16KiB 471K/s 8KiB 238K/s 4KiB 111K/s 2KiB 57.4K/s 1KiB 28.6K/s 512B 14.4K/s @17:58:45 # ./flashbench --open-au --open-au-nr=20 --erasesize=$[8 * 512 * 1024] /dev/sdd --blocksize=512 4MiB 4.54M/s 2MiB 4.98M/s 1MiB 1.21M/s 512KiB 947K/s 256KiB 491K/s 128KiB 285K/s 64KiB 109K/s 32KiB 73.6K/s 16KiB 29.9K/s 8KiB 13.5K/s 4KiB 6.77K/s 2KiB 3.4K/s Terminated (was running from 1800 overnight until 1042, so I killed it)
Right, this is what I mean with slow. You can normally use 'blocksize=32768' here, the only difference that makes is to stop it earlier. To find the exact number, I would suggest trying 17, 18 and 19 erase blocks to see which one is the first one that is slow.
Slow here means that every single line is only half as fast as the previous one, because each write access independent of the size always takes the same time when the card has to garbage-collect one erase block, the effect that I described in https://lwn.net/Articles/428799/.
Now the tests with "--random" and correct 8MB erasesize:
@10:58:54 # ./flashbench --open-au --open-au-nr=1 --erasesize=$[8 * 1024 * 1024] /dev/sdd --blocksize=512 --random 8MiB 7.11M/s 4MiB 21.9M/s 2MiB 13.2M/s 1MiB 8.37M/s 512KiB 7.02M/s 256KiB 6.48M/s 128KiB 6.15M/s 64KiB 5.86M/s 32KiB 5.45M/s 16KiB 4.1M/s 8KiB 1.75M/s 4KiB 1.22M/s 2KiB 512K/s 1KiB 266K/s 512B 136K/s
Ok, this is significantly slower, so the stick does something different for random writes.
@17:33:20 # ./flashbench --open-au --open-au-nr=16 --erasesize=$[8 * 1024 * 1024] /dev/sdd --blocksize=512 --random 8MiB 22.2M/s 4MiB 10.4M/s 2MiB 3.28M/s 1MiB 3.02M/s 512KiB 2.94M/s 256KiB 2.34M/s 128KiB 2.48M/s 64KiB 1.07M/s 32KiB 911K/s 16KiB 397K/s 8KiB 240K/s 4KiB 115K/s 2KiB 58K/s 1KiB 29K/s 512B 14.5K/s @22:46:38 # ./flashbench --open-au --open-au-nr=20 --erasesize=$[8 * 1024 * 1024] /dev/sdd --blocksize=512 --random 8MiB 22.2M/s 4MiB 6.6M/s 2MiB 2.46M/s 1MiB 1.14M/s 512KiB 540K/s 256KiB 362K/s 128KiB 200K/s 64KiB 103K/s 32KiB 52.9K/s 16KiB 27.2K/s 8KiB 13.7K/s
The cutoff with --random seems to be still at the same point between 16 and 20. That is good.
What next?
Just try these:
for i in 16 17 18 19 20 ; do echo == $i == ./flashbench --open-au --open-au-nr=$i --erasesize=$[8 * 1024 * 1024] /dev/sdd --blocksize=32768 done
You can stop it as soon as it obviously goes "slow".
Arnd