On Wed, Jun 27, 2012, at 03:56 PM, Arnd Bergmann wrote:
On Tuesday 26 June 2012, Andrew Bradford wrote:
With 'dd' you have to be careful to use the 'oflag=direct' argument. Otherwise part of the data may still be in the page cache waiting for writeback.
Ah, OK. I had (incorrectly it seems) thought that the cache only affected file systems and not more raw forms of disk I/O. With some testing on smaller count= dd commands, I now see how the cache might play with my numbers and skew them.
[andrew@mythdvr flashbench]$ sudo ./flashbench /dev/sdb --findfat --erasesize=$[3*1024*1024] --blocksize=$[12*1024] --fat-nr=9 --count=100 3MiB 32.1M/s 32.7M/s 31.2M/s 32.6M/s 32.8M/s 31.4M/s 32M/s 32.8M/s 31.4M/s 1.5MiB 32.2M/s 33.3M/s 31.9M/s 33.1M/s 33.2M/s 31.9M/s 33.2M/s 33.3M/s 32.7M/s 768KiB 32.6M/s 32.9M/s 31.5M/s 32.5M/s 32.6M/s 31.3M/s 33M/s 33.2M/s 31.7M/s 384KiB 31.1M/s 32M/s 30.1M/s 30.6M/s 30.7M/s 29.4M/s 30.7M/s 30.9M/s 31.9M/s 192KiB 31.2M/s 32.1M/s 30.7M/s 32.4M/s 32.5M/s 30.6M/s 32M/s 32.1M/s 30.7M/s 96KiB 31.9M/s 32.6M/s 31M/s 33.4M/s 33.6M/s 31.7M/s 32.6M/s 32.9M/s 30.9M/s 48KiB 12.7M/s 28.8M/s 12.6M/s 28.9M/s 28.9M/s 12.4M/s 29.1M/s 28.3M/s 12.4M/s 24KiB 22.1M/s 22.6M/s 21.9M/s 22.2M/s 22.4M/s 20.8M/s 21.7M/s 22.2M/s 21.1M/s 12KiB 17.6M/s 17.3M/s 16.3M/s 18.2M/s 17.9M/s 18M/s 9.6M/s 18.4M/s 3.74M/s
Seems like 8MiB erase block is consistent for the special fat area, assuming I'm reading these right.
Well, look at the 48KB row, which is slow in columns 3 and 6.
|0 |3 |6 |9 |12 |15 |18 |21 | |0 |8 |16 |
Good point.
The slow ones are those that cross an 8MB boundary. I don't think there is any special area on this device.
I didn't get the impression that the first few erase blocks were any better / different than the others but I need to learn more about --findfat and FAT in general before I feel confident making conclusions.
So both 4MiB and 2MiB look like they're still within that same erase block and not crossing a bound with a 241 offset. So erase block size should be larger than 4MiB, possibly ruling out the 3MiB theory. But the fact that at 241MiB offset and 4MiB eraseblock the 4MiB, 2MiB, and 32KiB are slow, makes me question how much I trust that 8MiB is really the eraseblock size.
I would assume that those are just random artifacts of the device having to clean up the physical erase blocks occasionally when you don't write the entire 8MB block all the time. Doing a 2 MB write on an 8MB physical erase block is semi-random I/O.
OK.
I think I'm going to go with 8MiB erase blocks on this. It's going to be an ext4 disk and I'll partition at 24MiB bounds, just in case 3MiB is really the proper eraseblock.
Right.
There are a few other things you can do:
- Set the stride= and stripe-width= to 8MB. the RAID configuration has a
lot in common with flash media, so it can only improve performance if you do that.
So for example, I'd set stride to 8MiB and stripe-width to 1?
- Use a separate partition for an external journal and align that to 8MB
as well. Otherwise the journal might not be erase block aligned (it might do that automatically when you set the stripe-width as above, don't know yet).
I have a 2 erase block area (soon to be a partition) at the beginning of the drive set aside for this but haven't yet set it up. I'm considering either putting the journal there, or running without a journal. I'm a little queasy about having no journal, the machine I use most often is a laptop that overheats after a few hours causing memory issues.
- Consider using btrfs instead of ext4. According to our research, 3 or 4
erase blocks is not really enough for ext4, but btrfs can cope with this unless you do a lot of sync() operations, which require more and would work better on ext4.
I had 1 bad experience with btrfs and lock-ups with a few SD cards when setting a few non-standard things and that turned me off from it. I'll look at it again, but for now I'm getting decent performance with ext4 (faster boot than the spinning disk in my desktop and only the occasional stutter in the UI). Setting elevator to noop and moving to relatime are next, then if I want more I'll look at btrfs or etx4 with stride and stripe-width.
I'm actually currently booted from the disk in Debian 6 without any optimizations beyond aligning partitions to erase blocks on ext4. apt appears to run much more quickly than I've experienced on my Beaglebones with SanDisk 4GB microSD (that I've sent results in for before, but those cards had really bad random (0 open-au basically)).
Thanks! This has been really helpful! -Andrew