On Thursday 24 March 2011, Michael Monnerie wrote:
We could speak german, I guess?
Yes, but I'd prefer to keep the discussion on the mailing list if interesting results come up.
On Donnerstag, 24. März 2011 Arnd Bergmann wrote:
Are these results reproducible? It's not at all clear to me that the erase size is really 4 MB, it could also be 1 MB for instance, especially if your stick is from before 2010. You can rerun the command with --count=100 or more, or with larger block sizes to get a better feeling.
# ./flashbench -a /dev/sde --blocksize=4096 --count=100 sched_setscheduler: Operation not permitted align 536870912 pre 703µs on 798µs post 666µs diff 114µs align 268435456 pre 700µs on 753µs post 681µs diff 62.8µs align 134217728 pre 786µs on 783µs post 645µs diff 67.4µs align 67108864 pre 723µs on 772µs post 675µs diff 73.3µs align 33554432 pre 701µs on 762µs post 674µs diff 74.4µs align 16777216 pre 701µs on 741µs post 685µs diff 48.3µs align 8388608 pre 695µs on 749µs post 668µs diff 66.8µs align 4194304 pre 706µs on 791µs post 671µs diff 102µs align 2097152 pre 674µs on 707µs post 692µs diff 23.9µs align 1048576 pre 683µs on 726µs post 701µs diff 34.3µs align 524288 pre 671µs on 726µs post 717µs diff 32.3µs align 262144 pre 698µs on 713µs post 687µs diff 20.7µs align 131072 pre 682µs on 740µs post 704µs diff 46.9µs align 65536 pre 687µs on 727µs post 697µs diff 35.1µs align 32768 pre 712µs on 722µs post 692µs diff 19.9µs align 16384 pre 667µs on 699µs post 674µs diff 27.9µs align 8192 pre 702µs on 770µs post 686µs diff 75.8µs
Ok, this is much clearer: I'm pretty sure that it's either 4 MB or 8 MB, based on this result. Note how all diff values below 4 MB are smaller than all diff values above 4 MB. Something strange is going at at 4 MB, so it's not clear whether it belongs to the upper or lower half.
# ./flashbench -a /dev/sde --blocksize=8192 --count=50 sched_setscheduler: Operation not permitted align 536870912 pre 879µs on 899µs post 858µs diff 30.9µs align 268435456 pre 847µs on 906µs post 842µs diff 61.1µs align 134217728 pre 996µs on 968µs post 836µs diff 52µs align 67108864 pre 811µs on 839µs post 757µs diff 55µs align 33554432 pre 871µs on 908µs post 847µs diff 48.7µs align 16777216 pre 854µs on 914µs post 818µs diff 78µs align 8388608 pre 851µs on 908µs post 850µs diff 58.2µs align 4194304 pre 892µs on 880µs post 908µs diff -20511n align 2097152 pre 828µs on 849µs post 885µs diff -7708ns align 1048576 pre 874µs on 886µs post 862µs diff 17.9µs align 524288 pre 852µs on 869µs post 915µs diff -15025n align 262144 pre 844µs on 895µs post 940µs diff 2.73µs align 131072 pre 848µs on 884µs post 907µs diff 6.1µs align 65536 pre 837µs on 857µs post 840µs diff 18µs align 32768 pre 831µs on 864µs post 861µs diff 17.7µs align 16384 pre 855µs on 841µs post 826µs diff 201ns
Could it be some linux cache is in the way?
No, all I/O is done with O_DIRECT, which completely bypasses the page cache. In theory, there could be a cache on the stick, but that's rarely the case. The USB protocol adds some jitter here, and for some reason you cannot use real-time scheduling, which makes the results less accurate.
Another helpful indication would be the output of 'fdisk -lu /dev/sde': It will show the start of the partition and the size of the drive, both should be a multiple of the erase block size.
# fdisk -lu /dev/sde
Disk /dev/sde: 16.2 GB, 16240345088 bytes 255 heads, 63 sectors/track, 1974 cylinders, total 31719424 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x925df597
Ok, 16240345088 bytes is 121 times 128 MiB, so it's certainly not using 4128 KiB blocks or some other strange unit.
Device Boot Start End Blocks Id System /dev/sde1 63 31712309 15856123+ 7 HPFS/NTFS/exFAT
I didn't use sde1 but sde for the tests, so it shouldn't have a meaning. I'll change the partition to start at 2048 or whatever I find the erasesize to be.
Be careful here. The original FAT layout is typically made specifically for this stick, so it's probably a good idea to make a backup of the first few blocks.
Finally, a third way is to look at a gnuplot chart on the output of
flashbench -s -o output.plot /dev/sde --scatter-order=10 --scatter-span=2 --blocksize=8192 gnuplot -p -e 'plot "output.plot"'
On many drives, the boundaries between erase blocks show up as spikes in the chart.
Phew, two lines, more or less, and spikes everwhere (see attached).
Right, nothing to see here yet. It only shows the first 8 MB, so if the spike is every 8 MB, it won't show up here. Use --scatter-order=12 to show more. Also, the jitter from the USB protocol may hide some details, which might get better with a larger --count= value, but that would make the test run much longer.
It's probably good enough to assume that the size is actually 8 MB or 4 MB, and keep going from there.
Also, please post the USB ID and name output from 'lsusb' for reference.
# lsusb -v Bus 001 Device 005: ID 1b1c:1a90
Ok.
Now with 16 I got it slower, still no visible border:
# ./flashbench -O --erasesize=$[4 * 1024 * 1024] --blocksize=$[64 * 1024] /dev/sde --open-au-nr=4 sched_setscheduler: Operation not permitted 4MiB 7.3M/s 2MiB 11.2M/s 1MiB 15.7M/s 512KiB 19.2M/s 256KiB 14.5M/s 128KiB 12.8M/s 64KiB 18.8M/s # ./flashbench -O --erasesize=$[4 * 1024 * 1024] --blocksize=$[64 * 1024] /dev/sde --open-au-nr=16 sched_setscheduler: Operation not permitted 4MiB 9.77M/s 2MiB 3.41M/s 1MiB 7.38M/s 512KiB 5.69M/s ^C (took too long)
If it takes too long, that's a good indication that something interesting is happening ;-)
Seems that stick wants to hide it's internals?
Not completely unusual, but somewhat harder than most.
I've committed some updates now that might help make the --open-au results a bit clearer, and faster.
What I would recommend now is to update to the latest git version (just uploaded), and then try increasing numbers of --open-au-nr= with 4 MB erasesize, until you hit the cutoff. Try the same with --random, for the last fast one and the first slow one:
./flashbench -O --erasesize=$[4 * 1024 * 1024] /dev/sde --open-au-nr=5 ./flashbench -O --erasesize=$[4 * 1024 * 1024] /dev/sde --open-au-nr=6 ./flashbench -O --erasesize=$[4 * 1024 * 1024] /dev/sde --open-au-nr=7 ... ./flashbench -O --erasesize=$[4 * 1024 * 1024] /dev/sde --open-au-nr=N
=> N is the number of 4 MB blocks that can not be handled
./flashbench -O --erasesize=$[4 * 1024 * 1024] /dev/sde --open-au-nr=(N-1) --random ./flashbench -O --erasesize=$[4 * 1024 * 1024] /dev/sde --open-au-nr=N --random
=> Verify that the (N-1) --random case is still fast
./flashbench -O --erasesize=$[8 * 1024 * 1024] /dev/sde --open-au-nr=((N-1)/2) --random ./flashbench -O --erasesize=$[8 * 1024 * 1024] /dev/sde --open-au-nr=(N-1) --random ./flashbench -O --erasesize=$[8 * 1024 * 1024] /dev/sde --open-au-nr=(N-1) --random
Arnd