I have a few inter-related issues:
- Why would one kernel boot a card that another kernel can't? - Why would a card's disk geometry matter for boot? - Who is a good manufacturer for getting hardware-identical cards in bulk? - How can I probe the actual "disk geometry" of an sd card?
I bought 100 Transcend SD cards a little while ago and duplicated them with an OpenEmbedded-based filesystem (linux-2.6.36). There were a few "bad" cards that I threw out, but the success rate was acceptable.
In the next round of 40 SD cards I used a Linaro-based filesystem (linux-2.6.39) and had about a 50% failure rate when testing that the cards would boot, which is absurd. There kernel reports: [ 1.003204] mmcblk0: unknown partition table However, the cards would mount and show files just fine.
I reduplicated one of the non-booting cards with an OpenEmbedded filesystem and then it booted. Weird!
After some investigation I found that using `gparted` (instead of `fdisk`) to create a new partition table and then `rsync`ing the contents of the original filesystem resulted in a booting Linaro card. Rinse and repeat and I ended up with 3 images which only vary by the disk geometry as reported by `fdisk -l`:
- 50% -- 255 heads, 63 sectors/track, 974 cylinders - 40% -- 2 heads, 4 sectors/track, 1957632 cylinders - 10% -- 247 heads, 62 sectors/track, 1022 cylinders - 1 card still didn't boot
I'm lost. Please advise.
AJ ONeal
Non-booting kernel message
[ 0.923309] Waiting for root device /dev/mmcblk0p2... [ 0.957885] mmc0: host does not support reading read-only switch. assuming write-enable. [ 0.982025] mmc0: new high speed SDHC card at address b368 [ 0.988494] mmcblk0: mmc0:b368 USD 7.46 GiB [ 0.993957] mmcblk0: detected capacity change from 0 to 8018460672 [ 1.003204] mmcblk0: unknown partition table [ 1.036926] VFS: Cannot open root device "mmcblk0p2" or unknown-block(179,2) [ 1.044433] Please append a correct "root=" boot option; here are the available partitions: [ 1.053344] b300 7830528 mmcblk0 driver: mmcblk [ 1.058959] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(179,2)
Booting kernel message
[ 1.122070] mmc0: host does not support reading read-only switch. assuming write-enable. [ 1.146087] mmc0: new high speed SDHC card at address b368 [ 1.152557] mmcblk0: mmc0:b368 USD 7.46 GiB [ 1.158020] mmcblk0: detected capacity change from 0 to 8018460672 [ 1.166351] mmcblk0: p1 p2 p3 [ 1.259674] EXT3-fs: barriers not enabled [ 1.265411] kjournald starting. Commit interval 5 seconds [ 1.271331] EXT3-fs (mmcblk0p2): mounted filesystem with ordered data mode [ 1.278686] VFS: Mounted root (ext3 filesystem) readonly on device 179:2.
Confirmed: the combination of a linaro-2.6.39 kernel with a transcend 8gb card results in flakey boots.
- linaro-2.6.39 kernel is affected - transcend cards are affected - oe-2.6.36 kernel is *not* affected - sandisk 8gb cards are *not* affected (67 megabytes smaller)
I zero'd a card, but instead of only zeros, I also wrote the sector number in each 512-byte block.
I changed the kernel to print out the 512 bytes it read as the mbr to the screen.
When the card didn't boot fully the printk showed all zeros except for a sector number about 100mb into the card.
How weird! Hardware? Kernel?
Anyway, today my sandisk cards arrived and all 10 of them worked flawlessly. I ordered another 50 and hope to see the same results.
AJ ONeal
On Wed, Jun 29, 2011 at 11:48 AM, AJ ONeal coolaj86@gmail.com wrote:
I have a few inter-related issues:
- Why would one kernel boot a card that another kernel can't?
- Why would a card's disk geometry matter for boot?
- Who is a good manufacturer for getting hardware-identical cards in
bulk?
- How can I probe the actual "disk geometry" of an sd card?
I bought 100 Transcend SD cards a little while ago and duplicated them with an OpenEmbedded-based filesystem (linux-2.6.36). There were a few "bad" cards that I threw out, but the success rate was acceptable.
In the next round of 40 SD cards I used a Linaro-based filesystem (linux-2.6.39) and had about a 50% failure rate when testing that the cards would boot, which is absurd. There kernel reports: [ 1.003204] mmcblk0: unknown partition table However, the cards would mount and show files just fine.
I reduplicated one of the non-booting cards with an OpenEmbedded filesystem and then it booted. Weird!
After some investigation I found that using `gparted` (instead of `fdisk`) to create a new partition table and then `rsync`ing the contents of the original filesystem resulted in a booting Linaro card. Rinse and repeat and I ended up with 3 images which only vary by the disk geometry as reported by `fdisk -l`:
- 50% -- 255 heads, 63 sectors/track, 974 cylinders
- 40% -- 2 heads, 4 sectors/track, 1957632 cylinders
- 10% -- 247 heads, 62 sectors/track, 1022 cylinders
- 1 card still didn't boot
I'm lost. Please advise.
AJ ONeal
Non-booting kernel message
[ 0.923309] Waiting for root device /dev/mmcblk0p2... [ 0.957885] mmc0: host does not support reading read-only switch. assuming write-enable. [ 0.982025] mmc0: new high speed SDHC card at address b368 [ 0.988494] mmcblk0: mmc0:b368 USD 7.46 GiB [ 0.993957] mmcblk0: detected capacity change from 0 to 8018460672 [ 1.003204] mmcblk0: unknown partition table [ 1.036926] VFS: Cannot open root device "mmcblk0p2" or unknown-block(179,2) [ 1.044433] Please append a correct "root=" boot option; here are the available partitions: [ 1.053344] b300 7830528 mmcblk0 driver: mmcblk [ 1.058959] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(179,2)
Booting kernel message
[ 1.122070] mmc0: host does not support reading read-only switch. assuming write-enable. [ 1.146087] mmc0: new high speed SDHC card at address b368 [ 1.152557] mmcblk0: mmc0:b368 USD 7.46 GiB [ 1.158020] mmcblk0: detected capacity change from 0 to 8018460672 [ 1.166351] mmcblk0: p1 p2 p3 [ 1.259674] EXT3-fs: barriers not enabled [ 1.265411] kjournald starting. Commit interval 5 seconds [ 1.271331] EXT3-fs (mmcblk0p2): mounted filesystem with ordered data mode [ 1.278686] VFS: Mounted root (ext3 filesystem) readonly on device 179:2.
On Friday 08 July 2011 00:37:35 AJ ONeal wrote:
Confirmed: the combination of a linaro-2.6.39 kernel with a transcend 8gb card results in flakey boots. linaro-2.6.39 kernel is affected transcend cards are affected oe-2.6.36 kernel is not affected sandisk 8gb cards are not affected (67 megabytes smaller) I zero'd a card, but instead of only zeros, I also wrote the sector number in each 512-byte block.
I changed the kernel to print out the 512 bytes it read as the mbr to the screen.
When the card didn't boot fully the printk showed all zeros except for a sector number about 100mb into the card.
How weird! Hardware? Kernel?
My guess is that it's the card's fault. There are a lot of cards that are simply not going to work with a Linux file system. In my experience, the Sandisk cards tend to have controllers that cope with proper file systems, while Kingston never do.
Transcend doesn't make their own cards, they buy stuff from everybody, so you can be lucky or not.
Anyway, today my sandisk cards arrived and all 10 of them worked flawlessly. I ordered another 50 and hope to see the same results.
Ok. Can you be more specific which cards worked for you and which had problems? Please list the contents of /sys/block/mmcblk0/device/*, in particular the *id and date fields.
We are still doing more analysis what the specific requirements are given a particular file system, but I have a good understanding of what the problems with many of the cards are. If you haven't seen my articles, please have a look at https://lwn.net/Articles/428584/ and https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
If you send me a specimen of each cards you're interested in, I'll gladly do my analysis, or I can teach you how to do it yourself. It does take a bit of experience because there are so many different ways in which the drives can be screwed up.
If you want a very simple test to see if a card is any good before you buy a lot of them, I recommend running (Warning: overwrites data on the card)
flashbench --open-au --erasesize=$[4*1024*1024] --blocksize=4096 --open-au-nr=5 --random /dev/mmcblk0
This will print performance ratings for random write accesses to the card on five different erase blocks (assuming 4 MB erase block size). A card that can handle this should look like
4MiB 9.03M/s 2MiB 6.17M/s 1MiB 6.24M/s 512KiB 4.06M/s 256KiB 4.54M/s 128KiB 3.8M/s 64KiB 6.47M/s 32KiB 5.79M/s 16KiB 2.72M/s 8KiB 1.33M/s
while a card that cannot handle this look more like
4MiB 8.81M/s 2MiB 5.43M/s 1MiB 3.81M/s 512KiB 2.27M/s 256KiB 1.34M/s 128KiB 860K/s 64KiB 481K/s 32KiB 229K/s 16KiB 117K/s 8KiB 57K/s
If the last row is above 1MB/s, the card is fine, if it is below 100K/s, don't even think about using it. The explanation for this behavior is in the lwn.net article.
Arnd
Some more points:
On Friday 08 July 2011 00:37:35 AJ ONeal wrote:
On Wed, Jun 29, 2011 at 11:48 AM, AJ ONeal coolaj86@gmail.com wrote: I have a few inter-related issues: Why would one kernel boot a card that another kernel can't?
Possibly the controller is set up in a slightly different way, resulting in bad timings or other problems.
Why would a card's disk geometry matter for boot?
The card's geometry is calculated from the partition table, it's not actually a property of the card. If you have identical cards that report different geometry, the reasons may be:
* they were partitioned differently * some cards are read incorrectly, if the kernel interprets random data as a partition table, you get random geometry
Who is a good manufacturer for getting hardware-identical cards in bulk?
This is hard. I suggest you first start with another equally hard question though: what is a manufacturer that can provide actually working cards?
I would recommend using only those that make both the controller chips and the flash chips, which basically leaves
* Toshiba * Samsung * Sandisk (cooperating with toshiba) * Lexar (micron/intel)
Contact them directly to get a sample, do an extensive test the way I described, document the results and buy a lot from them.
There are also companies like swissbit that know what they are doing and charge a lot extra for providing you cards that are actually made for running with Linux.
How can I probe the actual "disk geometry" of an sd card?
It's complicated. For 8GB cards and larger, there is only one possible correct geometry: 63/255/1023. fdisk knows that and should do it correctly. On 4 GB cards, there are multiple options. You can either stick with the geometry that was originally used to partition the card, or you also use 63/255/xxx, with xxx being smaller than 1023 in that case.
HOWEVER: Do not align partitions to full cylinders, even though that is the default in old versions of fdisk. Doing that will result in low performance and cards breaking faster when writing to them. The only correct way to partition an SD card is to align each partition to 4MB (8192 sectors). Old versions of xloader couldn't deal with that, so you might have to update xloader, or alternatively start the boot partition at sector 63 anyway (and then never write to it).
We fixed this in linaro-media-create for the 11.06 release, but the gumstix wiki and many other places list still describe a procedure that *will* *eat* *your* *data*:
http://wiki.gumstix.org/index.php?title=Boot_from_MMC#Partition_the_card http://www.gumstix.org/create-a-bootable-microsd-card.html http://fastr.github.com/articles/Partition-MicroSDHC-for-Gumstix-Overo.html http://www.omappedia.org/wiki/Android_Video_Run_From_SD_Tutorial
The reason for this is simply that flash pages on the card have a size between 2KB and 32KB dependening on how dumb the controller is. Writing data that is not aligned to the pages requires expensive read-modify-write operations, and a lot of cards just end up doing many extra erase cycles.
Someone should really start documenting a correct procedure in all of those places. What you really need to do using fdisk is to disabled compat mode ('c'), set sector addressing mode ('u'), and create the partitions each at a multiple of 8192 sectors.
Arnd