Re: [Flashbench] USB Stick: Corsair Flash Voyager GT

27 Mar 2011


      On Sunday 27 March 2011 20:54:58 Michael Monnerie wrote:
...
On Sonntag, 27. März 2011 Arnd Bergmann wrote:
...
# Write 256 MB using 4 MB blocks
dd if=/dev/zero of=/dev/sde bs=4M oflag=direct count=64
# Write 256 MB using 32 Kb blocks
dd if=/dev/zero of=/dev/sde1 bs=32K oflag=direct count=8192
My guess is that in the first case, you get to around 8 MB/s,
while the second one should be a bit better.
"A bit" is underestimation:
# dd if=/dev/zero of=/dev/sde1 bs=4M oflag=direct count=64
64+0 records in
64+0 records out
268435456 bytes (268 MB) copied, 33.054 s, 8.1 MB/s
# dd if=/dev/zero of=/dev/sde1 bs=32K oflag=direct count=8192
8192+0 records in
8192+0 records out
268435456 bytes (268 MB) copied, 17.7888 s, 15.1 MB/s
Ok, I see. The big difference is an indication that the theory
about the cache was really wrong, and it's actually using the
non-power-of-two segments.
...
...
./flashbench --findfat --fat-nr=6 /dev/sde --blocksize=$[3072]
   		--erasesize=$[12 * 1024 * 1024]
If this is the case, the stick will frequently have to do garbage
collection, which makes it slower than it could be.
Why would it be so? You mean because nobody actually writes in 12MB chunks?
When flashbench assumes that all erase blocks are power-of-two aligned
while they are really not, you sometimes get into the case where it
writes into two erase blocks at once. This requires the stick to
garbage-collect both erase blocks as soon the next time that the
user writes to another one.
...
It seems like writing in at least 768KiB chunks gives best performance,
which are 24x 32KB. Maybe there are 3x8bit chips on it?
Actually, the most likely explanation is that there is just one or
two chips, but those use 3-bit MLC NAND (also called TLC). In this
flash memory, each transistor can store three bits. Since there
is a power-of-two number of usable transistors in each erase
block in most flash chips, you get 3*2^n bytes.
See http://www.centon.com/flash-products/chiptype for an
explanation.
...
# ./flashbench --findfat --fat-nr=6 /dev/sde --blocksize=$[3072] --erasesize=$[12 * 1024 * 1024]
sched_setscheduler: Operation not permitted
12MiB   24M/s    23.6M/s  23.5M/s  23.7M/s  23.5M/s  23.6M/s  
6MiB    23.4M/s  23.4M/s  23.5M/s  23.6M/s  23.2M/s  23.2M/s  
3MiB    23.4M/s  23.4M/s  23.3M/s  23.4M/s  23.4M/s  23.4M/s  
1.5MiB  24M/s    23.8M/s  23.6M/s  23.8M/s  24.1M/s  23.8M/s  
768KiB  24.1M/s  24.2M/s  24.6M/s  24.7M/s  24M/s    24M/s    
384KiB  21.6M/s  21.7M/s  21.7M/s  21.6M/s  22.1M/s  22.3M/s  
192KiB  20.5M/s  20.4M/s  20.2M/s  20.6M/s  20.9M/s  20.9M/s  
96KiB   25.8M/s  26.4M/s  26.2M/s  25.9M/s  25.7M/s  26.3M/s  
48KiB   20.5M/s  21M/s    19.9M/s  19.8M/s  20M/s    20.6M/s  
24KiB   7.85M/s  7.68M/s  8.06M/s  7.57M/s  7.62M/s  7.56M/s  
12KiB   2.04M/s  2.16M/s  2.15M/s  2.17M/s  2.16M/s  2.15M/s  
6KiB    1.19M/s  1.17M/s  1.14M/s  1.13M/s  1.16M/s  1.17M/s  
3KiB    643K/s   642K/s   625K/s   634K/s   653K/s   650K/s
Ok, very nice. I consider this a proof that what I explained
is actually what's going on here. So we know that the underlying
erase block size is really not 4 MB but rather 12 MB or a smaller
value of 3*2^n.
Note that the number for 96 KB is actually higher than the one
for 768 KB that you pointed out. We already know that 32 KB
is much faster than 16 KB here, and 96 is the lowest multiple
of 32 in 3*2^n.
I've seen a similar behaviour on another drive, but did not
think it was common enough to handle it in flashbench yet.
I am now updating the tool to work with this case.
...
So I re-tested with dd:
# dd if=/dev/zero of=/dev/sde1 bs=768K oflag=direct count=341
320+0 records in
320+0 records out
251658240 bytes (252 MB) copied, 11.4378 s, 22.0 MB/s
# dd if=/dev/zero of=/dev/sde1 bs=32K oflag=direct count=8192
8192+0 records in
8192+0 records out
268435456 bytes (268 MB) copied, 16.545 s, 16.2 MB/s
# dd if=/dev/zero of=/dev/sde1 bs=768K oflag=direct count=346
346+0 records in
346+0 records out
272105472 bytes (272 MB) copied, 11.1504 s, 24.4 MB/s
Yes, writing 768KiB chunks seems the best idea.
Ok. 96kb shouldn't be much slower than this, but there is a little
additional overhead for the extra write accesses when you try
that.
Given the new data, I'd like to ask you to do one more test
run of the --open-au test, if you still have the energy.
Please run
flashbench --open-au --random --open-au-nr=5 --erasesize=$[3 * 1024 * 1024] \
    /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024]
flashbench --open-au --random --open-au-nr=5 --erasesize=$[6 * 1024 * 1024] \
    /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024]
flashbench --open-au --random --open-au-nr=5 --erasesize=$[12 * 1024 * 1024] \
    /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024]
flashbench --open-au --random --open-au-nr=6 --erasesize=$[3 * 1024 * 1024] \
    /dev/sde --blocksize=$[24 * 1024] --offset=$[24 * 1024 * 1024]
The goal is to find the minimum value for --erasesize= and the maximum value
for --open-au-nr= that can give the full bandwidth of 20+MB/s, which I'm guessing
will be 12 MB and 5, but it would be helpful to know for sure.
You will need to pull the latest version of flashbench, in which I
have added the --offset argument parsing.
In any case, it's a good idea to align the partition to the start of the
erase block (6 or 12 MB), not the 4 MB that I told you before.
Arnd

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Flashbench] USB Stick: Corsair Flash Voyager GT