On Montag, 28. März 2011 Arnd Bergmann wrote:
On Monday 28 March 2011, Michael Monnerie wrote:
# ./flashbench --open-au --random --open-au-nr=5 --erasesize=$[3 * 1024 * 1024] /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 3MiB 17.2M/s 1.5MiB 6.3M/s 768KiB 2.53M/s 384KiB 1.17M/s 192KiB 568K/s 96KiB 283K/s 48KiB 140K/s 24KiB 69.5K/s
D'oh. I thought your stick can do 5 chunks, but looking at the earlier results, it can only do 4. Thankfully, you've tested that as well.
# nice --20 ./flashbench --open-au --random --open-au-nr=4 --erasesize=$[12 * 1024 * 1024] /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 22.8M/s 6MiB 15.7M/s 3MiB 7.76M/s 1.5MiB 4.45M/s 768KiB 2.46M/s 384KiB 1.13M/s 192KiB 562K/s 96KiB 284K/s 48KiB 142K/s 24KiB 70.6K/s
The goal is to find the minimum value for --erasesize= and the maximum value for --open-au-nr= that can give the full bandwidth of 20+MB/s, which I'm guessing will be 12 MB and 5, but it would be helpful to know for sure.
OK, seems au=4 and erasesize=12M is the best.
No, unfortunately my prediction was wrong. What I meant was that we should look for the case where it does not get slower in the later rows, i.e. the 96 KiB row should be closer to 25 MB/s, where here it is only 284K/s -- clearly not "fast".
I'm pretty sure that you will see the fast case with
# one 12-MB chunk ./flashbench --open-au --open-au-nr=1 --erasesize=$[12 * 1024 * 1024] \ /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024]
and probably also for
# one 12-MB chunk in randomized order ./flashbench --open-au --random --open-au-nr=1 --erasesize=$[12 * 1024 * 1024] \ /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024]
but apparently not for
# two 12-MB chunks ./flashbench --open-au --random --open-au-nr=2 --erasesize=$[12 * 1024 * 1024] \ /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] 12MiB 9.28M/s 6MiB 11.7M/s 3MiB 10.4M/s
Plus, you've shown earlier that the card can do four 4MB-chunks with
# four 4-MB chunks ./flashbench -O --erasesize=$[4 * 1024 * 1024] /dev/sde --open-au-nr=4 4MiB 7.6M/s 2MiB 6.75M/s 1MiB 6.32M/s 512KiB 8.42M/s ...
(this was slower than 10 MB/s, but did not degrade with smaller blocks)
OK, here the latest results: # ./flashbench --open-au --open-au-nr=1 --erasesize=$[12 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 23.9M/s 6MiB 7.71M/s 3MiB 13.1M/s 1.5MiB 13.3M/s 768KiB 13.4M/s 384KiB 10.2M/s 192KiB 20.3M/s 96KiB 24.6M/s 48KiB 19.4M/s 24KiB 7.53M/s # ./flashbench --open-au --random --open-au-nr=1 --erasesize=$[12 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 24M/s 6MiB 10.2M/s 3MiB 16.1M/s 1.5MiB 5.89M/s 768KiB 4.49M/s 384KiB 4.45M/s 192KiB 4.36M/s 96KiB 4.06M/s 48KiB 3.44M/s 24KiB 2.84M/s # ./flashbench --open-au --random --open-au-nr=2 --erasesize=$[12 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 9.93M/s 6MiB 13.8M/s 3MiB 10.4M/s 1.5MiB 5.34M/s 768KiB 3M/s 384KiB 2.04M/s 192KiB 1.37M/s 96KiB 666K/s ^C
Are you still motivated?
Yeah, performance tuning is always fun.
I'd suggest starting out with one erase block (--open-au-nr=1) and without --random, trying all possible --erasesize values from $[3 * 512 * 1024] to $[12 * 1024 * 1024], to see if one or more get you the best-case performance. Out of those that are fast, try increasing the number of erase blocks until you hit the limit, and/or add --random.
OK, I scripted this, and removed those which aren't possible:
# ./flashbench --open-au --open-au-nr=1 --erasesize=$[3 * 512 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 1.5MiB 6M/s 768KiB 16.8M/s 384KiB 3.17M/s 192KiB 4.97M/s 96KiB 19.4M/s 48KiB 3.43M/s 24KiB 5.99M/s # ./flashbench --open-au --open-au-nr=1 --erasesize=$[3 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 3MiB 5.9M/s 1.5MiB 5.94M/s 768KiB 5.81M/s 384KiB 15.6M/s 192KiB 5.62M/s 96KiB 5.88M/s 48KiB 4.15M/s 24KiB 3.22M/s # ./flashbench --open-au --open-au-nr=1 --erasesize=$[6 * 512 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 3MiB 23.9M/s 1.5MiB 12M/s 768KiB 5.87M/s 384KiB 4.87M/s 192KiB 10.7M/s 96KiB 5.74M/s 48KiB 9.82M/s 24KiB 3.71M/s # ./flashbench --open-au --open-au-nr=1 --erasesize=$[6 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 6MiB 4.4M/s 3MiB 14.6M/s 1.5MiB 14.6M/s 768KiB 14.7M/s 384KiB 13.8M/s 192KiB 12.9M/s 96KiB 14.9M/s 48KiB 12.1M/s 24KiB 6.41M/s # ./flashbench --open-au --open-au-nr=1 --erasesize=$[12 * 512 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 6MiB 14.4M/s 3MiB 14.4M/s 1.5MiB 14.7M/s 768KiB 14.6M/s 384KiB 13.6M/s 192KiB 12.9M/s 96KiB 14.9M/s 48KiB 12M/s 24KiB 6.35M/s # ./flashbench --open-au --open-au-nr=1 --erasesize=$[12 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 18.1M/s 6MiB 23.6M/s 3MiB 23.8M/s 1.5MiB 24.2M/s 768KiB 24.6M/s 384KiB 22.2M/s 192KiB 20.8M/s 96KiB 24.9M/s 48KiB 19.4M/s 24KiB 7.52M/s
So this last result seems like the clear winner. Testing with it:
# ./flashbench --open-au --open-au-nr=1 --erasesize=$[12 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 23.9M/s 6MiB 9.87M/s 3MiB 13M/s 1.5MiB 13.4M/s 768KiB 13.5M/s 384KiB 10.4M/s 192KiB 20.5M/s 96KiB 25.1M/s 48KiB 20M/s 24KiB 7.73M/s # ./flashbench --open-au --open-au-nr=2 --erasesize=$[12 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 12.9M/s 6MiB 23.9M/s 3MiB 24M/s 1.5MiB 24.4M/s 768KiB 25.1M/s 384KiB 21.8M/s 192KiB 20.3M/s 96KiB 24.7M/s 48KiB 19.2M/s 24KiB 7.71M/s # ./flashbench --open-au --open-au-nr=3 --erasesize=$[12 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 23.9M/s 6MiB 23.9M/s 3MiB 24.1M/s 1.5MiB 24.5M/s 768KiB 24.6M/s 384KiB 21.8M/s 192KiB 20.4M/s 96KiB 25.8M/s 48KiB 19.7M/s 24KiB 7.77M/s # ./flashbench --open-au --open-au-nr=4 --erasesize=$[12 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 24.1M/s 6MiB 18M/s 3MiB 9.08M/s 1.5MiB 4.55M/s 768KiB 2.28M/s ^C
Seems au-nr=3 is quite good. I also retestet for erasesize 512:
# ./flashbench --open-au --open-au-nr=1 --erasesize=$[12 * 512 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 6MiB 15.9M/s 3MiB 11.1M/s 1.5MiB 14.6M/s 768KiB 14.7M/s 384KiB 13.8M/s 192KiB 13.3M/s 96KiB 15.1M/s 48KiB 13.1M/s 24KiB 6.54M/s # ./flashbench --open-au --open-au-nr=2 --erasesize=$[12 * 512 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 6MiB 15.2M/s 3MiB 14.4M/s 1.5MiB 14.5M/s 768KiB 14.5M/s 384KiB 13.7M/s 192KiB 13.2M/s 96KiB 14.7M/s 48KiB 12.4M/s 24KiB 6.31M/s # ./flashbench --open-au --open-au-nr=3 --erasesize=$[12 * 512 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 6MiB 16.6M/s 3MiB 14.4M/s 1.5MiB 14.6M/s 768KiB 14.7M/s 384KiB 13.7M/s 192KiB 13.2M/s 96KiB 15.3M/s 48KiB 12.6M/s 24KiB 6.22M/s # ./flashbench --open-au --open-au-nr=4 --erasesize=$[12 * 512 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 6MiB 14.4M/s 3MiB 8.05M/s 1.5MiB 4.28M/s 768KiB 2.2M/s 384KiB 1.11M/s ^C
This is not so good.
I'm sorry that this is all so complicated, but I don't have a stick with this behaviour myself.
# ./flashbench --open-au --open-au-nr=3 --erasesize=$[12 * 1024 * 1024] /dev/sdd --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024]
So that is the best result. Funny, we're at au-nr=3 now. Those stick really likes a threesome ;-)
And what does all this mean now? Is there any FAT/NTFS/other FS layout I could optimize it with? Does the 12MB partition start still hold? I guess so.
I guess with XFS I would use mount option logbufs=8,logbsize=128k,largeio,delaylog and make the filesystem with mkfs.xfs -s size=4k -b size=4k -d su=96k,sw=1 In a lot of options I can only use power of 2 increments, so I benched with this:
# ./flashbench --open-au --open-au-nr=3 --erasesize=$[16 * 1024 * 1024] /dev/sdd --blocksize=$[32 * 1024] --offset=$[32 * 1024 * 1024] sched_setscheduler: Operation not permitted 16MiB 24M/s 8MiB 23.9M/s 4MiB 10.2M/s 2MiB 5.62M/s 1MiB 13.4M/s 512KiB 12.2M/s 256KiB 4.48M/s 128KiB 3.26M/s 64KiB 9.93M/s 32KiB 16.9M/s
Not too bad in the 32K size. I wanted to format NTFS with 64k clusters, but from this results I'd say 32K would be better. Or can't I tell from those results?