On Monday 28 March 2011, Michael Monnerie wrote:
# ./flashbench --open-au --random --open-au-nr=5 --erasesize=$[3 * 1024 * 1024] /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 3MiB 17.2M/s 1.5MiB 6.3M/s 768KiB 2.53M/s 384KiB 1.17M/s 192KiB 568K/s 96KiB 283K/s 48KiB 140K/s 24KiB 69.5K/s
D'oh. I thought your stick can do 5 chunks, but looking at the earlier results, it can only do 4. Thankfully, you've tested that as well.
# nice --20 ./flashbench --open-au --random --open-au-nr=4 --erasesize=$[12 * 1024 * 1024] /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] sched_setscheduler: Operation not permitted 12MiB 22.8M/s 6MiB 15.7M/s 3MiB 7.76M/s 1.5MiB 4.45M/s 768KiB 2.46M/s 384KiB 1.13M/s 192KiB 562K/s 96KiB 284K/s 48KiB 142K/s 24KiB 70.6K/s
The goal is to find the minimum value for --erasesize= and the maximum value for --open-au-nr= that can give the full bandwidth of 20+MB/s, which I'm guessing will be 12 MB and 5, but it would be helpful to know for sure.
OK, seems au=4 and erasesize=12M is the best.
No, unfortunately my prediction was wrong. What I meant was that we should look for the case where it does not get slower in the later rows, i.e. the 96 KiB row should be closer to 25 MB/s, where here it is only 284K/s -- clearly not "fast".
I'm pretty sure that you will see the fast case with
# one 12-MB chunk ./flashbench --open-au --open-au-nr=1 --erasesize=$[12 * 1024 * 1024] \ /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024]
and probably also for
# one 12-MB chunk in randomized order ./flashbench --open-au --random --open-au-nr=1 --erasesize=$[12 * 1024 * 1024] \ /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024]
but apparently not for
# two 12-MB chunks ./flashbench --open-au --random --open-au-nr=2 --erasesize=$[12 * 1024 * 1024] \ /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024] 12MiB 9.28M/s 6MiB 11.7M/s 3MiB 10.4M/s
Plus, you've shown earlier that the card can do four 4MB-chunks with
# four 4-MB chunks ./flashbench -O --erasesize=$[4 * 1024 * 1024] /dev/sde --open-au-nr=4 4MiB 7.6M/s 2MiB 6.75M/s 1MiB 6.32M/s 512KiB 8.42M/s ...
(this was slower than 10 MB/s, but did not degrade with smaller blocks)
For your information, in the 4x4MB case, the respective blocks are
|0 |16 |32 |48 |64 |72 | | | | |XX| | | |XX| | | |XX| | | |XX| | | | | | |
While 2X12MB is
|0 |16 |32 |48 |64 |80 | | | | | | |XX|XX|XX| | | | | | | | | |XX|XX|XX| | |
The 4*6MB test is
|0 |16 |32 |48 |64 |80 | | | | | | |XX|X | | | | |XX|X | | | | |XX|X | | | | |XX|X
I think it would still be good to have more data points, trying to get the stick into the 20-25MB/s range with multiple erase blocks open, either in random or linear mode. If this stick uses a form of log-structured writes, the linear numbers will be better than the random ones, as long as you use the correct erasesize.
Are you still motivated?
I'd suggest starting out with one erase block (--open-au-nr=1) and without --random, trying all possible --erasesize values from $[3 * 512 * 1024] to $[12 * 1024 * 1024], to see if one or more get you the best-case performance. Out of those that are fast, try increasing the number of erase blocks until you hit the limit, and/or add --random.
I'm sorry that this is all so complicated, but I don't have a stick with this behaviour myself.
Arnd