On Friday 18 March 2011 18:45:34 Justin Piszcz wrote:
> On Fri, 18 Mar 2011, Arnd Bergmann wrote:
> > Getting back to the rogiinal question, I'd recommend testing the
> > stick by doing raw accesses instead of a file system. A simple
>
> Ok, here are the results:
>
> root@sysresccd /root % time dd if=/dev/zero of=/dev/sda oflag=direct bs=4M
> dd: writing `/dev/sda': No space left on device
> 1961+0 records in
> 1960+0 records out
> 8220835840 bytes (8.2 GB) copied, 283.744 s, 29.0 MB/s
Ok, so no immediate problem there.
> > I'm also interested in results from flashbench
> > (git://git.linaro.org/people/arnd/flashbench.git, e.g. like
> > http://lists.linaro.org/pipermail/flashbench-results/2011-March/000039.html)
> > That might help explain how the stick failed.
>
> Certainly, testing below, following this:
> http://lists.linaro.org/pipermail/flashbench-results/2011-March/000039.html
I'm sorry, I should have been more specific. Unfortunately, running flashbench
is not very user friendly yet.
The results indicate that the device does not have a 2 MB erase block size
but rather 4 or 8, which is more common on 8 GB media.
> # ./flashbench --open-au --open-au-nr=1 /dev/sda --blocksize=8192 --erasesize=$[2* 1024 * 1024] --random
> 2MiB 29.5M/s
> 1MiB 29.1M/s
> 512KiB 28.5M/s
> 256KiB 22.8M/s
> 128KiB 23.8M/s
> 64KiB 24.4M/s
> 32KiB 18.9M/s
> 16KiB 13.1M/s
> 8KiB 8.22M/s
>
> # ./flashbench --open-au --open-au-nr=4 /dev/sda --blocksize=8192 --erasesize=$[2* 1024 * 1024] --random
> 2MiB 25.9M/s
> 1MiB 21.8M/s
> 512KiB 15M/s
> 256KiB 11.9M/s
> 128KiB 12.1M/s
> 64KiB 13.6M/s
> 32KiB 9.81M/s
> 16KiB 6.41M/s
> 8KiB 3.88M/s
The numbers are jumping around a bit with the incorrectly guessed erasesize.
These values should be more like the ones in the first test. Can you rerun
with --erasesize=$[4 * 1024 * 1024]?
Also, what is the output of 'lsusb' for this stick? I'd like to add the
data to https://wiki.linaro.org/WorkingGroups/KernelConsolidation/Projects/FlashCar…
> # ./flashbench --open-au --open-au-nr=5 /dev/sda --blocksize=8192 --erasesize=$[2* 1024 * 1024] --random
> 2MiB 29.2M/s
> 1MiB 27.8M/s
> 512KiB 18.4M/s
> 256KiB 7.82M/s
> 128KiB 4.62M/s
> 64KiB 2.47M/s
> 32KiB 1.26M/s
> 16KiB 642K/s
> 8KiB 327K/s
This is where your drive stops coping with the accesses: Writing small
blocks to four different erase blocks (2MB for the test, probably
larger) works fine, but writing to five of them is devestating for
performance, going from 30 MB/s to 300 KB/s, or lower if you were
to write smaller than 8 KB blocks.
The cutoff at --open-au-nr=4 is coincidentally the same as for the
SD card I was testing. This is what happens in the animation in
http://lwn.net/Articles/428799/. The example given there is for
a drive that can only have two open AUs (allocation units aka
erase blocks), while yours does 4.
> (did not run one with 7)
Note that the test results I had with 6 and 7 are without --random,
so the cut-off there was higher for that card when writing an
multiple erase blocks from start to finish instead of writing random
sectors inside of them.
> # ./flashbench --findfat --fat-nr=10 /dev/sda --blocksize=1024 --erasesize=$[2* 1024 * 1024] --random
> 2MiB 22.7M/s 19.1M/s 15.5M/s 13.1M/s 29.5M/s 29.5M/s 29.6M/s 29.6M/s 29.5M/s 29.5M/s
> 1MiB 20.6M/s 13.3M/s 13.3M/s 20.8M/s 18.1M/s 17.8M/s 18M/s 18.3M/s 18.8M/s 18.6M/s
> 512KiB 18.4M/s 18.6M/s 18.3M/s 18.1M/s 23.5M/s 23.2M/s 23.5M/s 23.5M/s 23.4M/s 23.4M/s
> 256KiB 26.9M/s 21.3M/s 21.2M/s 21M/s 21.1M/s 21.2M/s 21.1M/s 21.1M/s 20.6M/s 21M/s
> 128KiB 22.2M/s 22.3M/s 22.6M/s 21.4M/s 21.5M/s 21.3M/s 21.6M/s 21.3M/s 21.4M/s 21.4M/s
> 64KiB 23.9M/s 22.6M/s 22.9M/s 23M/s 22.5M/s 22.4M/s 22.4M/s 22.4M/s 22.5M/s 22.4M/s
> 32KiB 18.2M/s 18.3M/s 18.3M/s 18.3M/s 18.3M/s 18.4M/s 18.3M/s 18.2M/s 18.3M/s 18.3M/s
> 16KiB 12.9M/s 12.9M/s 13M/s 13M/s 12.9M/s 13M/s 12.9M/s 12.9M/s 12.9M/s 12.9M/s
> 8KiB 8.14M/s 8.15M/s 8.15M/s 8.15M/s 8.15M/s 8.14M/s 8.14M/s 8.15M/s 8.15M/s 8.06M/s
> 4KiB 4.07M/s 4.08M/s 4.07M/s 4.06M/s 4.04M/s 4.04M/s 4.04M/s 4.04M/s 4.04M/s 4.04M/s
> 2KiB 2.02M/s 2.02M/s 2.02M/s 2.02M/s 2.02M/s 2.01M/s 2.01M/s 2.01M/s 2.01M/s 2.02M/s
> 1KiB 956K/s 954K/s 956K/s 953K/s 947K/s 947K/s 947K/s 950K/s 947K/s 948K/s
>
One thing that is very clear from this is that this stick has a page size
of 8KB, and that it requires at least 64 KB transfers for the maximum speed.
If your partition is not aligned to 8 KB or more (better: to the erase
block size, e.g. 4 MB) or if the file system writes smaller than 8 KB
naturally aligned blocks at once, the drive has to do read-modify-write
cycles that severely impact performance and the expected life-time.
I cannot see any block that is optimzied for storing the FAT, which is
good, as this means that the manufacturer did not exclusively design
the stick for FAT32, as is normally the case with flash memory cards.
For this stick, I would strongly recommend creating the file system
in a way that writes at least 16 KB naturally aligned blocks at all
times, but I don't know if that's supported by XFS.
Also, the limitation of forcing a garbage collection when writing to
more than four 4 MB (or so) segments may be a problem, depending on
how XFS stores its metadata. The good news is that it can do random
write access inside of the erase blocks.
Arnd
Hello Arnd,
Here are the hdparm and smartctl outputs for the first card I tested
(Transcend ultra 2GB Industrial 'CF100BA8'). There seems to be no dma
mode there, only PIO.
Philippe
tmp179:~ # cat /proc/partitions
major minor #blocks name
8 0 244198584 sda
8 1 2103296 sda1
8 2 20972544 sda2
8 16 1990296 sdb
tmp179:~ # hdparm -i /dev/sdb
/dev/sdb:
Model=PIO, FwRev=20100202, SerialNo=20100716 CF100BA8
Config={ HardSect NotMFM Fixed DTR>10Mbs }
RawCHS=3949/16/63, TrkSize=0, SectSize=576, ECCbytes=4
BuffType=DualPort, BuffSize=1kB, MaxMultSect=1, MultSect=off
CurCHS=3949/16/63, CurSects=3980592, LBA=yes, LBAsects=3980592
IORDY=no, tPIO={min:120,w/IORDY:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
AdvancedPM=yes: disabled (255)
Drive conforms to: Unspecified: ATA/ATAPI-4
* signifies the current active mode
tmp179:~ # hdparm -I /dev/sdb
/dev/sdb:
CompactFlash ATA device
Model Number: PIO
Serial Number: 20100716 CF100BA8
Firmware Revision: 20100202
Standards:
Supported: 4
Likely used: 6
Configuration:
Logical max current
cylinders 3949 3949
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 3980592
LBA user addressable sectors: 3980592
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 1943 MBytes
device size with M = 1000*1000: 2038 MBytes (2 GB)
cache/buffer size = 1 KBytes (type=DualPort)
Capabilities:
LBA, IORDY(may be)(cannot be disabled)
Standby timer values: spec'd by Vendor
R/W multiple sector transfer: Max = 1 Current = 0
Advanced power management level: disabled
DMA: not supported
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
Power Management feature set
WRITE_BUFFER command
READ_BUFFER command
NOP cmd
CFA feature set
Advanced Power Management feature set
* Gen1 signaling speed (1.5Gb/s)
* CFA Power Level 1 (max 500mA)
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
2min for SECURITY ERASE UNIT.
Integrity word not set (found 0x0000, expected 0x52a5)
tmp179:~ # smartctl --all -T permissive /dev/sdb
smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (SUSE RPM)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: PIO
Serial Number: 20100716 CF100BA8
Firmware Version: 20100202
User Capacity: 2,038,063,104 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 4
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Fri Mar 18 11:35:28 2011 CET
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled.
Checking to be sure by trying SMART RETURN STATUS command.
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x26) Offline data collection activity
is in a Reserved state.
Auto Offline Data Collection: Disabled.
Total time to complete Offline
data collection: ( 3) seconds.
Offline data collection
capabilities: (0x00) Offline data collection not supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x00) Error logging NOT supported.
No General Purpose Logging support.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0000 100 100 000 Old_age Offline - 0
2 Throughput_Performance 0x0000 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 0
7 Seek_Error_Rate 0x0000 100 100 000 Old_age Offline - 0
8 Seek_Time_Performance 0x0000 100 100 000 Old_age Offline - 0
12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 43
195 Hardware_ECC_Recovered 0x0000 100 100 000 Old_age Offline - 0
196 Reallocated_Event_Count 0x0000 100 100 000 Old_age Offline - 0
197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0000 100 100 000 Old_age Offline - 0
200 Multi_Zone_Error_Rate 0x0000 100 100 000 Old_age Offline - 0
Warning: device does not support Error Logging
Error SMART Error Log Read failed: Input/output error
Smartctl: SMART Error Log Read Failed
Warning: device does not support Self Test Logging
Error SMART Error Self-Test Log Read failed: Input/output error
Smartctl: SMART Self Test Log Read Failed
Device does not support Selective Self Tests/Logging
tmp179:~ #
--
Philippe De Muyter +32 2 6101532 Macq SA rue de l'Aeronef 2 B-1140 Bruxelles
On Saturday 05 March 2011 00:55:29 Philippe De Muyter wrote:
> I have read with interest your article in lwn, and decided to try your flashbench
> program to discover the characteristics of the CF card we use.
>
> Unfortunately,
>
> ./flashbench -a -b 1K /dev/sdb
>
> fails with endless :
>
> time_read: Invalid argument
>
> Looking with strace, I get e.g.:
>
> pread64(3, 0xb3728000, 1, 1023410175) = -1 EINVAL (Invalid argument)
>
> Is that a behavior you can explain (and fix) ?
Yes, there are two known problems that I need to fix:
1. flashbench uses O_DIRECT for talking to the device, which means that
all accesses must be on multiples of full sectors based on the
underlying device sector size. It should detect this
2. command line arguments currently need to be natural numbers. There
is no parser for interpreting arguments like 1K or 4M as kilobytes and
megabytes.
It should work when you do
./flashbench -a -b 1024 /dev/sdb
Arnd
On Sunday 13 March 2011, Dirk Behme wrote:
> Hi Arnd,
>
> reading the flashbench README [1] what I really like is the
> explanation of the -a and the -O option. Explaining the options in
> this README, giving the examples and how to interpret the resulting
> figures really does help.
>
> Somehow I miss something similar for the -s and -f options. I.e. how
> to select the proper values for --scatter-order and --scatter-span and
> how to interpret the output of -s and -f.
>
> Once I understood it, I would be able to send a patch for the README ;)
>
> Additionally, it would be nice to give the flashbench options used for
> the graphs [2] [3] in the LWN article. The LWN article explains quite
> nicely how to interpret the given graphs, but it's not mentioned which
> flashbench options were used to get these graphs.
>
> Many thanks for your help and best regards
>
> Dirk
>
> [1]
> http://git.linaro.org/gitweb?p=people/arnd/flashbench.git;a=blob;f=README;h…
>
Sorry for the late reply. I promise I'll get to it and update the
README.
I should actually remove --scatter-order, it's too difficult to
understand this. It specifies the log2 of the number of blocks
in terms of --blocksize to be tested at the start of the medium.
The output file can be interpreted by
gnuplot -p -e 'plot "output.file"'
or by importing it into a spreadsheet program like oocalc and
using the XY chart function on two columns.
For --findfat, the output shows how each of the first N erase blocks
on the drive reacts to certain access patterns within the erase
block. Most drives do something different for a few blocks in the
beginning to optimize storing the FAT on them. Each column is one
erase block here. If they are all the same, the card does not have
an optimzied FAT area.
Arnd
On Friday 04 March 2011 20:16:05 Xianghua Xiao wrote:
> From the data below it appears APACER has a 1MB erasing block instead
> of the typical 4MB?
Yes, and this is not surprising for an SLC card, although it is
the first one I have seen.
> Also from the following email(waiting for approval on the list, will
> forward to you soon), that looks like a 4MB erasing block but is
> slower than APACER.
Sorry, I need to find the place on the mailing list settings to allow
non-members to post, and approve the mails you already sent.
> Does open-au-nr with higher number mean the underlying filesystem esp
> ext4 will do better,
Yes
> does that imply multithread parallel writing to some extent?
No, it's not about threads, but about how the data is laid out.
ext4 will have to write data, metadata and journal data for
many accesses, but these three are normally in different locations
on the driver, so you need at least three open segments for the first
process that is writing data. If you have other processes that also
write to the drive, or one process writing to multiple files, you
will need more.
> open-au-nr means how many different segments you can use
> to write to in the same time(so you don't need wait for one segment)?
Yes. The apacer card evidently can write to three segments, but not
to six. Can you also try 4 and 5 segments? The interesting number
is the maximum.
For the unigen card, it looks like it does not handle random access
well, independent of the number of AUs. Best try again without --random.
> I probably will recommend APACER over UNIGEN, does that make sense?
Depends on the other measurements.
Arnd