All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
1) GPT and block size GPT uses LBA for its data stuctures The size of a block is historically 512B but is moving to larger sizes (4KB). The code needs to handle this on a per device mount basis. How does the driver know the block size used in the LBA?
1A) By querying the device 1B) Some MBR magic?
If 1A then that means to me that dd if=/dev/sdb of=/dev/sdc won't produce a usable image on sdc if its block size is different than sdb's.
(Of course I also assume that the total space ion sdc is also == or > than that of sdb. Which brings me to ...)
2) Can GPT be grown?
In the above example if sdc is much bigger than sdb,
I presume this is OK, at least as long as the GPT header in LBA1 passes its CRC. Mounters won't query the drive size and refuse to mount the GPT just because it does not cover the whole disk right?
Now what happens if LBA1 becomes corrupted? Now does the driver query the drive size and block size and look at drive_size-block_size for the backup GOT header? Again does it use the block size from the device or does try something else? (I suppose to could try several block sizes until it found a good CRC. However it does seem that it must assume that the redundant copy is at the end of the physical disk.)
So even if the GPT is "mounted" OK, the extra space on the drive is not usable, even for new partitions. Are there utilities that will "grow" the GPT? Such growing would find the new end of disk and move the redundant GPT table & header there.
3) Is it actually required that the partition array start at LBA2?
If not, then it would be possible to create a GPT assuming 512B blocks but allow it to be "re-block sized" later by leaving 7 512B blocks free before the table. Of course the partitions themselves should be aligned and sized to multiples of the max block size expected. This is probibly already done as you would want them to align to the prefered read/write and those will almost certianly be larger than 512B.
Why?
The main case I am thinking about is:
wget http://downloads.new-wizbang-os.org/images/latest/aarch64-disk.img dd if=aarch64-disk.img of=/dev/my-usb-sd-adapter
Then boot the image and the OS will resize the GPT and last filesystem to cover the 16GB of my SD card even though they only require a minimum size of 2GB.
Thanks, Bill
---------------- William A. Mills Chief Technologist, Open Solutions, SDO Texas Instruments, Inc. 20450 Century Blvd Germantown MD 20878 240-643-0836
Looking at this a bit more:
A Disk mounter / reader can determine the LBA size of the writer by: Verify the MBR signature at byte offset 510 of the disk Verify the MBR partition type is protective (or valid hybrid if desired) Search for the GPT EFI header signature starting at byte 512 (Search at offsets that are powers of two? or that are multiples of 512?) The offset of the signature is the LBA size. Verify MyLBA is 1
The standard specifies that any data in LBA 0 past the 512 byte offset mark is filled with 0s. This should ensure no false matches, assuming people actually do it.
So it should be possible to mount a GPT written by a writer with a different LBA size. Now does anyone do it?
WRT to leaving room to adjust existing GPT for a different block-size Space could be left after the last table entry before the first partition. This can be represented in FirstUsableLBA if allowed.
Bill
-----Original Message----- From: William Mills [mailto:wmills@ti.com] Sent: Sunday, July 1, 2018 10:38 AM To: arm.ebbr-discuss@arm.com; Architecture Mailman List Subject: Questions about GPT
All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
1) GPT and block size GPT uses LBA for its data stuctures The size of a block is historically 512B but is moving to larger sizes (4KB). The code needs to handle this on a per device mount basis. How does the driver know the block size used in the LBA?
1A) By querying the device 1B) Some MBR magic?
If 1A then that means to me that dd if=/dev/sdb of=/dev/sdc won't produce a usable image on sdc if its block size is different than sdb's.
(Of course I also assume that the total space ion sdc is also == or > than that of sdb. Which brings me to ...)
2) Can GPT be grown?
In the above example if sdc is much bigger than sdb,
I presume this is OK, at least as long as the GPT header in LBA1 passes its CRC. Mounters won't query the drive size and refuse to mount the GPT just because it does not cover the whole disk right?
Now what happens if LBA1 becomes corrupted? Now does the driver query the drive size and block size and look at drive_size-block_size for the backup GOT header? Again does it use the block size from the device or does try something else? (I suppose to could try several block sizes until it found a good CRC. However it does seem that it must assume that the redundant copy is at the end of the physical disk.)
So even if the GPT is "mounted" OK, the extra space on the drive is not usable, even for new partitions. Are there utilities that will "grow" the GPT? Such growing would find the new end of disk and move the redundant GPT table & header there.
3) Is it actually required that the partition array start at LBA2?
If not, then it would be possible to create a GPT assuming 512B blocks but allow it to be "re-block sized" later by leaving 7 512B blocks free before the table. Of course the partitions themselves should be aligned and sized to multiples of the max block size expected. This is probibly already done as you would want them to align to the prefered read/write and those will almost certianly be larger than 512B.
Why?
The main case I am thinking about is:
wget http://downloads.new-wizbang-os.org/images/latest/aarch64-disk.img dd if=aarch64-disk.img of=/dev/my-usb-sd-adapter
Then boot the image and the OS will resize the GPT and last filesystem to cover the 16GB of my SD card even though they only require a minimum size of 2GB.
Thanks, Bill
---------------- William A. Mills Chief Technologist, Open Solutions, SDO Texas Instruments, Inc. 20450 Century Blvd Germantown MD 20878 240-643-0836
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
- GPT and block size 1A) By querying the device 1B) Some MBR magic?
There's some comments in the fdisk man page that recent Linux kernels "just knows" the sector size and the code to work with GPT partitions in the kernel (block/partitions/efi.c ) will error out of MyLBA does not match the LBA the kernel thinks it is. This means that (unless there' s some fallback code at a layer above the partition parsing code) then if you copy a GPT to a disk with a different sector size it will be broken.
Not sure that matters much though: if you want to fix it up you would arrange for the fixup logic to be part of your initramfs.
- Can GPT be grown?
If the backup table is not found at the end of the disk then Linux will log in the dmesg trace that the partition table is damaged but I think will use it nevertheless.
Tools like fdisk are typically "uneasy" when why cannot find the backup GPT header and will offer recreate it if you let then. IIRC it basically marks the partition table dirty regardless of whether you have changed it or not (so that it will get updated if you write-and-exit).
- Is it actually required that the partition array start at LBA2?
I don't think so, although you'd probably have to author it (or modify a template) by hand.
Assuming the code to validate the primary and backup partition tables is shared (e.g. properly decomposed into functions) the code will naturally end up honouring PartitionEntryLBA.
BTW this last question made me realize that:
a) one of the boards we've always believed to have a boot ROM that mandated MBR might just have a workaround
b) I might have overlooked something in the EBBR text about protective partitioning (a.k.a. is it OK to place the system firmware outside the FirstUsableLBA).
Daniel.
PS Is this merely of academic (or vendor) interest or are you cooking up some crazy addendum for EBBR?
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
- GPT and block size 1A) By querying the device 1B) Some MBR magic?
There's some comments in the fdisk man page that recent Linux kernels "just knows" the sector size and the code to work with GPT partitions in the kernel (block/partitions/efi.c ) will error out of MyLBA does not match the LBA the kernel thinks it is. This means that (unless there' s some fallback code at a layer above the partition parsing code) then if you copy a GPT to a disk with a different sector size it will be broken.
I'm not sure we have seen 4K block devices much in the wild yet have we? I think most vendors are just publishing suggested read & write sizes and leaving the "block size" set at 512.
(I don't really know why the LBA size needs to change in the first place. Is 16,777,216 TB not enough for a few years? Drives already publish enough info for OS'es not to dumb things.)
Not sure that matters much though: if you want to fix it up you would arrange for the fixup logic to be part of your initramfs.
Yes, initramfs would be a good place to fix this. But it means firmware must deal with it. We can make U-boot handle this but what does tianocore do?
- Can GPT be grown?
If the backup table is not found at the end of the disk then Linux will log in the dmesg trace that the partition table is damaged but I think will use it nevertheless.
Tools like fdisk are typically "uneasy" when why cannot find the backup GPT header and will offer recreate it if you let then. IIRC it basically marks the partition table dirty regardless of whether you have changed it or not (so that it will get updated if you write-and-exit).
fdisk is uneasy if it can't find it via AlternateLBA or is uneasy if that is not the end of the disk?
Yesterday I did find language in the UEFI spec (5.3.2 GPT Header) that talks about what happens when a volume grows so it is an expected case. (They were talking about RAID disks but the same principle applies.)
The wording is a bit strange in the spec. It says its up to platform policy whether it automatically restores the primary GPT with out asking the user but then says it should ask the user. If is not clear if they are talking about the UEFI firmware, the OS during normal boot, or a disk tool like fdisk.
- Is it actually required that the partition array start at LBA2?
I don't think so, although you'd probably have to author it (or modify a template) by hand.
Assuming the code to validate the primary and backup partition tables is shared (e.g. properly decomposed into functions) the code will naturally end up honouring PartitionEntryLBA.
BTW this last question made me realize that:
a) one of the boards we've always believed to have a boot ROM that mandated MBR might just have a workaround b) I might have overlooked something in the EBBR text about protective partitioning (a.k.a. is it OK to place the system firmware outside the FirstUsableLBA).
Daniel.
PS Is this merely of academic (or vendor) interest or are you cooking up some crazy addendum for EBBR?
I don't think this is academic at all. If the size of LBA is going to start changing on devices we see in the field, we should understand the consequences.
The instructions for boards today are to use dd or Win32DiskWriter. This works if your writing to a USB stick, an SD Card, a hard disk, or an SSD. It works if the image provider is suppling a whole hard disk like image or an iso.
The instructions for most OS's even for x86 is to download the .iso and dd it to a USB stick. (Actually using a CD/DVD does take extra software).
If dd works for the legacy boot methods but EBBR compliance requires a special USB writer, then I would assume everyone would just stay with the legacy stuff.
Perhaps it will only be SSDs that change the LBA size or perhaps no one will. However, I think I did see wording in the eMMC spec about the block size changing in the future. Does that mean SD will change also?
Even if the block size changes will the OS layers hide it? The real sector size on CDs is 2048 but linux reports 512 to me.
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Bill
On 07/02/2018 02:40 PM, William Mills wrote:
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
- GPT and block size 1A) By querying the device 1B) Some MBR magic?
There's some comments in the fdisk man page that recent Linux kernels "just knows" the sector size and the code to work with GPT partitions in the kernel (block/partitions/efi.c ) will error out of MyLBA does not match the LBA the kernel thinks it is. This means that (unless there' s some fallback code at a layer above the partition parsing code) then if you copy a GPT to a disk with a different sector size it will be broken.
I'm not sure we have seen 4K block devices much in the wild yet have we?
https://en.wikipedia.org/wiki/Advanced_Format """Since April 2014, enterprise-class 4K native hard disk drives have been available on the market."""
I think most vendors are just publishing suggested read & write sizes and leaving the "block size" set at 512.
(I don't really know why the LBA size needs to change in the first place. Is 16,777,216 TB not enough for a few years? Drives already publish enough info for OS'es not to dumb things.)
Not sure that matters much though: if you want to fix it up you would arrange for the fixup logic to be part of your initramfs.
Yes, initramfs would be a good place to fix this. But it means firmware must deal with it. We can make U-boot handle this but what does tianocore do?
- Can GPT be grown?
If the backup table is not found at the end of the disk then Linux will log in the dmesg trace that the partition table is damaged but I think will use it nevertheless.
Tools like fdisk are typically "uneasy" when why cannot find the backup GPT header and will offer recreate it if you let then. IIRC it basically marks the partition table dirty regardless of whether you have changed it or not (so that it will get updated if you write-and-exit).
fdisk is uneasy if it can't find it via AlternateLBA or is uneasy if that is not the end of the disk?
Yesterday I did find language in the UEFI spec (5.3.2 GPT Header) that talks about what happens when a volume grows so it is an expected case. (They were talking about RAID disks but the same principle applies.)
The wording is a bit strange in the spec. It says its up to platform policy whether it automatically restores the primary GPT with out asking the user but then says it should ask the user. If is not clear if they are talking about the UEFI firmware, the OS during normal boot, or a disk tool like fdisk.
- Is it actually required that the partition array start at LBA2?
I don't think so, although you'd probably have to author it (or modify a template) by hand.
Assuming the code to validate the primary and backup partition tables is shared (e.g. properly decomposed into functions) the code will naturally end up honouring PartitionEntryLBA.
BTW this last question made me realize that:
a) one of the boards we've always believed to have a boot ROM that mandated MBR might just have a workaround b) I might have overlooked something in the EBBR text about protective partitioning (a.k.a. is it OK to place the system firmware outside the FirstUsableLBA).
Daniel.
PS Is this merely of academic (or vendor) interest or are you cooking up some crazy addendum for EBBR?
I don't think this is academic at all. If the size of LBA is going to start changing on devices we see in the field, we should understand the consequences.
The instructions for boards today are to use dd or Win32DiskWriter. This works if your writing to a USB stick, an SD Card, a hard disk, or an SSD. It works if the image provider is suppling a whole hard disk like image or an iso.
The instructions for most OS's even for x86 is to download the .iso and dd it to a USB stick. (Actually using a CD/DVD does take extra software).
If dd works for the legacy boot methods but EBBR compliance requires a special USB writer, then I would assume everyone would just stay with the legacy stuff.
Perhaps it will only be SSDs that change the LBA size or perhaps no one will. However, I think I did see wording in the eMMC spec about the block size changing in the future. Does that mean SD will change also?
Even if the block size changes will the OS layers hide it? The real sector size on CDs is 2048 but linux reports 512 to me.
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Bill
On Mon, Jul 02, 2018 at 02:40:50PM -0400, William Mills wrote:
Not sure that matters much though: if you want to fix it up you would arrange for the fixup logic to be part of your initramfs.
Yes, initramfs would be a good place to fix this. But it means firmware must deal with it. We can make U-boot handle this but what does tianocore do?
Good point.
- Can GPT be grown?
If the backup table is not found at the end of the disk then Linux will log in the dmesg trace that the partition table is damaged but I think will use it nevertheless.
Tools like fdisk are typically "uneasy" when why cannot find the backup GPT header and will offer recreate it if you let then. IIRC it basically marks the partition table dirty regardless of whether you have changed it or not (so that it will get updated if you write-and-exit).
fdisk is uneasy if it can't find it via AlternateLBA or is uneasy if that is not the end of the disk?
Both.
Yesterday I did find language in the UEFI spec (5.3.2 GPT Header) that talks about what happens when a volume grows so it is an expected case. (They were talking about RAID disks but the same principle applies.)
The wording is a bit strange in the spec. It says its up to platform policy whether it automatically restores the primary GPT with out asking the user but then says it should ask the user. If is not clear if they are talking about the UEFI firmware, the OS during normal boot, or a disk tool like fdisk.
<snip> > PS Is this merely of academic (or vendor) interest or are you cooking up > some crazy addendum for EBBR? >
I don't think this is academic at all. If the size of LBA is going to start changing on devices we see in the field, we should understand the consequences.
I suspected it wouldn't merely be academic when you started asking the questions ;-). Just wanted to check.
The instructions for boards today are to use dd or Win32DiskWriter. This works if your writing to a USB stick, an SD Card, a hard disk, or an SSD. It works if the image provider is suppling a whole hard disk like image or an iso.
The instructions for most OS's even for x86 is to download the .iso and dd it to a USB stick. (Actually using a CD/DVD does take extra software).
If dd works for the legacy boot methods but EBBR compliance requires a special USB writer, then I would assume everyone would just stay with the legacy stuff.
Perhaps it will only be SSDs that change the LBA size or perhaps no one will. However, I think I did see wording in the eMMC spec about the block size changing in the future. Does that mean SD will change also?
Even if the block size changes will the OS layers hide it? The real sector size on CDs is 2048 but linux reports 512 to me.
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Still thinking on this one.
It feels like we might be able to simplify this down to removable media only. We don't really throw full disk images around much for fixed media any more... tools like fastboot (which reads partition tables and also has special case logic to author them) encourages partition at a time images.
If we can simplify down to removable media only then the question becomes will removable media use 4K addressing... and if it does will there already be a solution for us to copy from the non-embedded world?
Daniel.
On 02.07.18 20:40, William Mills wrote:
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
- GPT and block size 1A) By querying the device 1B) Some MBR magic?
There's some comments in the fdisk man page that recent Linux kernels "just knows" the sector size and the code to work with GPT partitions in the kernel (block/partitions/efi.c ) will error out of MyLBA does not match the LBA the kernel thinks it is. This means that (unless there' s some fallback code at a layer above the partition parsing code) then if you copy a GPT to a disk with a different sector size it will be broken.
I'm not sure we have seen 4K block devices much in the wild yet have we? I think most vendors are just publishing suggested read & write sizes and leaving the "block size" set at 512.
FWIW most (all?) hard drive vendors went back to 512 logical, 4k physical. But let me CC Hannes to confirm this.
(I don't really know why the LBA size needs to change in the first place. Is 16,777,216 TB not enough for a few years? Drives already publish enough info for OS'es not to dumb things.)
Not sure that matters much though: if you want to fix it up you would arrange for the fixup logic to be part of your initramfs.
Yes, initramfs would be a good place to fix this. But it means firmware must deal with it. We can make U-boot handle this but what does tianocore do?
This is pretty much how we do all our images today. It works in both Tianocore and U-Boot. We also do resize the image on first boot to the actual target disk size inside initramfs. Works like a charm.
- Can GPT be grown?
If the backup table is not found at the end of the disk then Linux will log in the dmesg trace that the partition table is damaged but I think will use it nevertheless.
Tools like fdisk are typically "uneasy" when why cannot find the backup GPT header and will offer recreate it if you let then. IIRC it basically marks the partition table dirty regardless of whether you have changed it or not (so that it will get updated if you write-and-exit).
fdisk is uneasy if it can't find it via AlternateLBA or is uneasy if that is not the end of the disk?
Yesterday I did find language in the UEFI spec (5.3.2 GPT Header) that talks about what happens when a volume grows so it is an expected case. (They were talking about RAID disks but the same principle applies.)
The wording is a bit strange in the spec. It says its up to platform policy whether it automatically restores the primary GPT with out asking the user but then says it should ask the user. If is not clear if they are talking about the UEFI firmware, the OS during normal boot, or a disk tool like fdisk.
- Is it actually required that the partition array start at LBA2?
I don't think so, although you'd probably have to author it (or modify a template) by hand.
Assuming the code to validate the primary and backup partition tables is shared (e.g. properly decomposed into functions) the code will naturally end up honouring PartitionEntryLBA.
BTW this last question made me realize that:
a) one of the boards we've always believed to have a boot ROM that mandated MBR might just have a workaround b) I might have overlooked something in the EBBR text about protective partitioning (a.k.a. is it OK to place the system firmware outside the FirstUsableLBA).
Daniel.
PS Is this merely of academic (or vendor) interest or are you cooking up some crazy addendum for EBBR?
I don't think this is academic at all. If the size of LBA is going to start changing on devices we see in the field, we should understand the consequences.
From what I can tell it's not going to change any time soon. Even my
shiny NVMe shows up as 512 byte sector size.
The only case where I'm not sure which direction we'll see things moving are NV-DIMMs. There running with PAGE_SIZE == LBA size seems intuitive.
The instructions for boards today are to use dd or Win32DiskWriter. This works if your writing to a USB stick, an SD Card, a hard disk, or an SSD. It works if the image provider is suppling a whole hard disk like image or an iso.
The instructions for most OS's even for x86 is to download the .iso and dd it to a USB stick. (Actually using a CD/DVD does take extra software).
If dd works for the legacy boot methods but EBBR compliance requires a special USB writer, then I would assume everyone would just stay with the legacy stuff.
Perhaps it will only be SSDs that change the LBA size or perhaps no one will. However, I think I did see wording in the eMMC spec about the block size changing in the future. Does that mean SD will change also?
Even if the block size changes will the OS layers hide it? The real sector size on CDs is 2048 but linux reports 512 to me.
That's because the iso9660 driver in Linux (and U-Boot) simply ignores the actual sector size ;).
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
Alex
On 07/02/2018 11:58 PM, Alexander Graf wrote:
On 02.07.18 20:40, William Mills wrote:
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
- GPT and block size 1A) By querying the device 1B) Some MBR magic?
There's some comments in the fdisk man page that recent Linux kernels "just knows" the sector size and the code to work with GPT partitions in the kernel (block/partitions/efi.c ) will error out of MyLBA does not match the LBA the kernel thinks it is. This means that (unless there' s some fallback code at a layer above the partition parsing code) then if you copy a GPT to a disk with a different sector size it will be broken.
I'm not sure we have seen 4K block devices much in the wild yet have we? I think most vendors are just publishing suggested read & write sizes and leaving the "block size" set at 512.
FWIW most (all?) hard drive vendors went back to 512 logical, 4k physical. But let me CC Hannes to confirm this.
This 'just knows' is in fact the driver querying the device prior to reading the GPT table. Reading the GPT table out of necessity means that you _already_ know the block size; you read from LBA 0 _and_ you need to know how large the buffer for LBA 0 needs to be, otherwise you might end up with a buffer too small for holding an entire block, and the read will fail...
So really it's quite pointless to have it specified in the GPT, as by that time you already need to know the blocksize.
And most drives went back to 512e as they figured that most BIOS can't handle 4k sectorsize, and having disks which can't be used for booting are kinda pointless for the mass-market, where most systems only have one disk ...
(I don't really know why the LBA size needs to change in the first place. Is 16,777,216 TB not enough for a few years? Drives already publish enough info for OS'es not to dumb things.)
Simple physics. You need some guard area between each (physical) block, so that the drive head can tell one block from the other. So if you increase the blocksize you lower the ratio between data area and guard area, which means you can stuff more data onto the same surface area. IE increasing the capacity of the drive without having to change anything.
Not sure that matters much though: if you want to fix it up you would arrange for the fixup logic to be part of your initramfs.
Yes, initramfs would be a good place to fix this. But it means firmware must deal with it. We can make U-boot handle this but what does tianocore do?
This is pretty much how we do all our images today. It works in both Tianocore and U-Boot. We also do resize the image on first boot to the actual target disk size inside initramfs. Works like a charm.
Tianocore should be handling things correctly already.
- Can GPT be grown?
If the backup table is not found at the end of the disk then Linux will log in the dmesg trace that the partition table is damaged but I think will use it nevertheless.
Tools like fdisk are typically "uneasy" when why cannot find the backup GPT header and will offer recreate it if you let then. IIRC it basically marks the partition table dirty regardless of whether you have changed it or not (so that it will get updated if you write-and-exit).
fdisk is uneasy if it can't find it via AlternateLBA or is uneasy if that is not the end of the disk?
Yesterday I did find language in the UEFI spec (5.3.2 GPT Header) that talks about what happens when a volume grows so it is an expected case. (They were talking about RAID disks but the same principle applies.)
The wording is a bit strange in the spec. It says its up to platform policy whether it automatically restores the primary GPT with out asking the user but then says it should ask the user. If is not clear if they are talking about the UEFI firmware, the OS during normal boot, or a disk tool like fdisk.
- Is it actually required that the partition array start at LBA2?
I don't think so, although you'd probably have to author it (or modify a template) by hand.
To quote the UEFI 2.7 spec:
The start of the GPT Partition Entry Array is located at the LBA indicated by the Partition Entry LBA field.
So, no.
Assuming the code to validate the primary and backup partition tables is shared (e.g. properly decomposed into functions) the code will naturally end up honouring PartitionEntryLBA.
BTW this last question made me realize that:
a) one of the boards we've always believed to have a boot ROM that mandated MBR might just have a workaround b) I might have overlooked something in the EBBR text about protective partitioning (a.k.a. is it OK to place the system firmware outside the FirstUsableLBA).
Daniel.
PS Is this merely of academic (or vendor) interest or are you cooking up some crazy addendum for EBBR?
I don't think this is academic at all. If the size of LBA is going to start changing on devices we see in the field, we should understand the consequences.
From what I can tell it's not going to change any time soon. Even my shiny NVMe shows up as 512 byte sector size.
The only case where I'm not sure which direction we'll see things moving are NV-DIMMs. There running with PAGE_SIZE == LBA size seems intuitive.
The instructions for boards today are to use dd or Win32DiskWriter. This works if your writing to a USB stick, an SD Card, a hard disk, or an SSD. It works if the image provider is suppling a whole hard disk like image or an iso.
The instructions for most OS's even for x86 is to download the .iso and dd it to a USB stick. (Actually using a CD/DVD does take extra software).
If dd works for the legacy boot methods but EBBR compliance requires a special USB writer, then I would assume everyone would just stay with the legacy stuff.
Perhaps it will only be SSDs that change the LBA size or perhaps no one will. However, I think I did see wording in the eMMC spec about the block size changing in the future. Does that mean SD will change also?
Even if the block size changes will the OS layers hide it? The real sector size on CDs is 2048 but linux reports 512 to me.
That's because the iso9660 driver in Linux (and U-Boot) simply ignores the actual sector size ;).
No. Linux is using 512 byte blocks internally within the block layer. Only the lower-level drivers know about the physical sector size stuff. (Check my patch to support 4k physical sector size in loop.c. It just changes the setting for the physical sector size, with leaving the driver completely unchanged otherwise.)
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
We have been using 4k sector size on S/390 since ages immemorial, so using them in linux is not a problem per se.
Cheers,
Hannes
On 07/03/2018 02:52 AM, Hannes Reinecke wrote:
On 07/02/2018 11:58 PM, Alexander Graf wrote:
On 02.07.18 20:40, William Mills wrote:
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
- GPT and block size 1A) By querying the device 1B) Some MBR magic?
There's some comments in the fdisk man page that recent Linux kernels "just knows" the sector size and the code to work with GPT partitions in the kernel (block/partitions/efi.c ) will error out of MyLBA does not match the LBA the kernel thinks it is. This means that (unless there' s some fallback code at a layer above the partition parsing code) then if you copy a GPT to a disk with a different sector size it will be broken.
I'm not sure we have seen 4K block devices much in the wild yet have we? I think most vendors are just publishing suggested read & write sizes and leaving the "block size" set at 512.
FWIW most (all?) hard drive vendors went back to 512 logical, 4k physical. But let me CC Hannes to confirm this.
This is good to know. Thanks.
This 'just knows' is in fact the driver querying the device prior to reading the GPT table. Reading the GPT table out of necessity means that you _already_ know the block size; you read from LBA 0 _and_ you need to know how large the buffer for LBA 0 needs to be, otherwise you might end up with a buffer too small for holding an entire block, and the read will fail...
Yes of course you need to query the devices block size to read data.
However, *IF* you wanted to handle GPT images from a different LBA you could easily do it. For example if you wanted to cover LBAs of 512, 1K, 2K, and 4K you would:
query the device's native block size read enough blocks to cover 8K
if the native block size was 4K you would expect: MBR signature at byte offset 510 EFI GPT signature at byte offset 4096 All bytes from 513 through 4095 inclusive to be 0 (UEFI spec)
However if you did not find the EFI GPT signature at 4096, you could also check 512, 1024, and 2048. If found, you would know you were dealing with a non-native GPT image and you would have the data needed to interpret the rest of the GPT.
Of course this does not help you if someone was actually dumb enough to create partitions that are not 4K aligned. Any partition not aligned to the devices actual block size should not be used. I don't care about that. I care about a GPT image that were carefully constructed to be portable.
I also agree that according to current specs of GPT, the above non-native handling is not required but instead is explicitly disallowed. This whole thread is me *wondering* if that should be loosened.
So really it's quite pointless to have it specified in the GPT, as by that time you already need to know the blocksize.
I don't think I suggested that it be added to the GPT header. If I implied that, I take it back. It is, in effect, already there by nature of the MBR 0 fill and the EFI GPT signature.
And most drives went back to 512e as they figured that most BIOS can't handle 4k sectorsize, and having disks which can't be used for booting are kinda pointless for the mass-market, where most systems only have one disk ...
Thanks. Again this is good to know.
(I don't really know why the LBA size needs to change in the first place. Is 16,777,216 TB not enough for a few years? Drives already publish enough info for OS'es not to dumb things.)
Simple physics. You need some guard area between each (physical) block, so that the drive head can tell one block from the other. So if you increase the blocksize you lower the ratio between data area and guard area, which means you can stuff more data onto the same surface area. IE increasing the capacity of the drive without having to change anything.
Yes, I understand the reason to change the block size on disk. I was asking why change the LBA? In effect I was asking why 4Kn as opposed to just 512e. I think you answered this above when you said most manufactures went back to 512e.
Not sure that matters much though: if you want to fix it up you would arrange for the fixup logic to be part of your initramfs.
Yes, initramfs would be a good place to fix this. But it means firmware must deal with it. We can make U-boot handle this but what does tianocore do?
This is pretty much how we do all our images today. It works in both Tianocore and U-Boot. We also do resize the image on first boot to the actual target disk size inside initramfs. Works like a charm.
Tianocore should be handling things correctly already.
I think Alex and you are saying that a GPT that does not cover the whole disk is handled in SUSE initramfs and tianocore. This is good information. It is particularity interesting to know you grow the GPT in initramfs.
But I don't think either is handling a GPT written for 512 LBA found on a 4Kn drive. I did not really expect this and your other answer make it clear this case is not handled today.
- Can GPT be grown?
If the backup table is not found at the end of the disk then Linux will log in the dmesg trace that the partition table is damaged but I think will use it nevertheless.
Tools like fdisk are typically "uneasy" when why cannot find the backup GPT header and will offer recreate it if you let then. IIRC it basically marks the partition table dirty regardless of whether you have changed it or not (so that it will get updated if you write-and-exit).
fdisk is uneasy if it can't find it via AlternateLBA or is uneasy if that is not the end of the disk?
Yesterday I did find language in the UEFI spec (5.3.2 GPT Header) that talks about what happens when a volume grows so it is an expected case. (They were talking about RAID disks but the same principle applies.)
The wording is a bit strange in the spec. It says its up to platform policy whether it automatically restores the primary GPT with out asking the user but then says it should ask the user. If is not clear if they are talking about the UEFI firmware, the OS during normal boot, or a disk tool like fdisk.
- Is it actually required that the partition array start at LBA2?
I don't think so, although you'd probably have to author it (or modify a template) by hand.
To quote the UEFI 2.7 spec:
The start of the GPT Partition Entry Array is located at the LBA indicated by the Partition Entry LBA field.
So, no.
Thanks. Yes I knew of the field. I had seen some source and hearsay suggesting that this had to be 2 for the primary copy but looking again I don't see this in the UEFI spec anywhere.
However I wonder if fdisk or other partition editors will "fix" the GPT entries to be a LBA 2 if we add or delete a partition or even if we write the table with no changes as Daniel reported that it does for the volume grown situation.
That won't be an issue for the use case I am talking: offline creation of images that work on media of unbounded max size and unknown LBA.
However it may be if Daniel is planning to stuff firmware between the GPT header and the GPT entry table.
Assuming the code to validate the primary and backup partition tables is shared (e.g. properly decomposed into functions) the code will naturally end up honouring PartitionEntryLBA.
BTW this last question made me realize that:
a) one of the boards we've always believed to have a boot ROM that mandated MBR might just have a workaround b) I might have overlooked something in the EBBR text about protective partitioning (a.k.a. is it OK to place the system firmware outside the FirstUsableLBA).
Daniel.
Thanks for the info! Bill
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
- GPT and block size 1A) By querying the device 1B) Some MBR magic?
There's some comments in the fdisk man page that recent Linux kernels "just knows" the sector size and the code to work with GPT partitions in the kernel (block/partitions/efi.c ) will error out of MyLBA does not match the LBA the kernel thinks it is. This means that (unless there' s some fallback code at a layer above the partition parsing code) then if you copy a GPT to a disk with a different sector size it will be broken.
I'm not sure we have seen 4K block devices much in the wild yet have we? I think most vendors are just publishing suggested read & write sizes and leaving the "block size" set at 512.
FWIW most (all?) hard drive vendors went back to 512 logical, 4k physical. But let me CC Hannes to confirm this.
(I don't really know why the LBA size needs to change in the first place. Is 16,777,216 TB not enough for a few years? Drives already publish enough info for OS'es not to dumb things.)
Not sure that matters much though: if you want to fix it up you would arrange for the fixup logic to be part of your initramfs.
Yes, initramfs would be a good place to fix this. But it means firmware must deal with it. We can make U-boot handle this but what does tianocore do?
This is pretty much how we do all our images today. It works in both Tianocore and U-Boot. We also do resize the image on first boot to the actual target disk size inside initramfs. Works like a charm.
- Can GPT be grown?
If the backup table is not found at the end of the disk then Linux will log in the dmesg trace that the partition table is damaged but I think will use it nevertheless.
Tools like fdisk are typically "uneasy" when why cannot find the backup GPT header and will offer recreate it if you let then. IIRC it basically marks the partition table dirty regardless of whether you have changed it or not (so that it will get updated if you write-and-exit).
fdisk is uneasy if it can't find it via AlternateLBA or is uneasy if that is not the end of the disk?
Yesterday I did find language in the UEFI spec (5.3.2 GPT Header) that talks about what happens when a volume grows so it is an expected case. (They were talking about RAID disks but the same principle applies.)
The wording is a bit strange in the spec. It says its up to platform policy whether it automatically restores the primary GPT with out asking the user but then says it should ask the user. If is not clear if they are talking about the UEFI firmware, the OS during normal boot, or a disk tool like fdisk.
- Is it actually required that the partition array start at LBA2?
I don't think so, although you'd probably have to author it (or modify a template) by hand.
Assuming the code to validate the primary and backup partition tables is shared (e.g. properly decomposed into functions) the code will naturally end up honouring PartitionEntryLBA.
BTW this last question made me realize that:
a) one of the boards we've always believed to have a boot ROM that mandated MBR might just have a workaround
b) I might have overlooked something in the EBBR text about protective partitioning (a.k.a. is it OK to place the system firmware outside the FirstUsableLBA).
Daniel.
PS Is this merely of academic (or vendor) interest or are you cooking up some crazy addendum for EBBR?
I don't think this is academic at all. If the size of LBA is going to start changing on devices we see in the field, we should understand the consequences.
From what I can tell it's not going to change any time soon. Even my shiny NVMe shows up as 512 byte sector size.
The only case where I'm not sure which direction we'll see things moving are NV-DIMMs. There running with PAGE_SIZE == LBA size seems intuitive.
The instructions for boards today are to use dd or Win32DiskWriter. This works if your writing to a USB stick, an SD Card, a hard disk, or an SSD. It works if the image provider is suppling a whole hard disk like image or an iso.
The instructions for most OS's even for x86 is to download the .iso and dd it to a USB stick. (Actually using a CD/DVD does take extra software).
If dd works for the legacy boot methods but EBBR compliance requires a special USB writer, then I would assume everyone would just stay with the legacy stuff.
Perhaps it will only be SSDs that change the LBA size or perhaps no one will. However, I think I did see wording in the eMMC spec about the block size changing in the future. Does that mean SD will change also?
Even if the block size changes will the OS layers hide it? The real sector size on CDs is 2048 but linux reports 512 to me.
That's because the iso9660 driver in Linux (and U-Boot) simply ignores the actual sector size ;).
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Alex _______________________________________________ boot-architecture mailing list boot-architecture@lists.linaro.org https://lists.linaro.org/mailman/listinfo/boot-architecture
On 07/03/2018 05:08 AM, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Interesting. Can you also "hd" the first 8K of that device? (should be <1K of data with >7k of zeros mixed in. hd suppresses repeated lines so the dump should not be too long.)
Thanks, Bill
On Tue, Jul 3, 2018 at 1:13 PM William Mills wmills@ti.com wrote:
On 07/03/2018 05:08 AM, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Interesting. Can you also "hd" the first 8K of that device? (should be <1K of data with >7k of zeros mixed in. hd suppresses repeated lines so the dump should not be too long.)
root@db820c:~# hd -n 8192 /dev/sda 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000001c0 02 00 ee ff ff ff 01 00 00 00 ff ab 60 00 00 00 |............`...| 000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.| 00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 45 46 49 20 50 41 52 54 00 00 01 00 5c 00 00 00 |EFI PART.......| 00001010 5e 01 f2 67 00 00 00 00 01 00 00 00 00 00 00 00 |^..g............| 00001020 ff ab 60 00 00 00 00 00 06 00 00 00 00 00 00 00 |..`.............| 00001030 fa ab 60 00 00 00 00 00 32 1b 10 98 e2 bb f2 4b |..`.....2......K| 00001040 a0 6e 2b b3 3d 00 0c 20 02 00 00 00 00 00 00 00 |.n+.=.. ........| 00001050 08 00 00 00 80 00 00 00 42 c6 95 4a 00 00 00 00 |........B..J....| 00001060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00002000
Thanks, Bill
I did this days ago but I guess I never sent it.
On 07/03/2018 08:04 AM, Nicolas Dechesne wrote:
On Tue, Jul 3, 2018 at 1:13 PM William Mills wmills@ti.com wrote:
On 07/03/2018 05:08 AM, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Interesting. Can you also "hd" the first 8K of that device? (should be <1K of data with >7k of zeros mixed in. hd suppresses repeated lines so the dump should not be too long.)
Decoding mostly for myself but you all can listen in...
root@db820c:~# hd -n 8192 /dev/sda 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
No X86 boot code
[ 000001b8 00 00 00 00 ] MBR disk sig == 0 as required
[ 000001bc 00 00 ] reserved
[ 000001be 00 00 ]
000001c0 02 00 ee ff ff ff 01 00 00 00 ff ab 60 00 00 00 |............`...| 000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Only 1 partition Not bootable Starting CHS = 0x00,0x02,0x00 C = 0 H = 0 S = 2 (1 based)
OSType = 0xee == GPT Protective
Ending CHS = 0xff, 0xff, 0xff C = 1023 (0 based, 1024 cylinders) H = 255 (0 based, 256 heads per cyl) S = 63 (1 based, 63 sectors per track)
StartingLBA = 0x0000_0001 EndingLBA = 0x0060_abff
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa |..............U.|
MBR signature
00000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Yes, the rest of the MBR LBA is in fact 0 filled
00001000 45 46 49 20 50 41 52 54 00 00 01 00 5c 00 00 00 |EFI PART
GPT Header sig
00 00 01 00 5c 00 00 00
GPT Version 1 and header length of 92 bytes
00001010 5e 01 f2 67
Header CRC
00 00 00 00
4 bytes resv
01 00 00 00 00 00 00 00
MyLBA=1
00001020 ff ab 60 00 00 00 00 00
AlternateLBA=0x60abff = 6,335,488 "sectors" - 1 (matches above exactly)
06 00 00 00 00 00 00 00
FirstUsableLBA = 6
00001030 fa ab 60 00 00 00 00 00
LastUsableLBA=0x60abfa
So any of the LBA's 0x60ab{fb,fc,fd,fe} can contain the 1 LB of Alt partition table. (See below) (I would presume fb.) The others are presumably empty.
32 1b 10 98 e2 bb f2 4b 00001040 a0 6e 2b b3 3d 00 0c 20
DiskGUID
02 00 00 00 00 00 00 00
PartitionEntryLBA=2
00001050 08 00 00 00 80 00 00 00
NumberOfPartitionEntries=8 SizeofPartitionEntries=0x80
8*0x80 = 4K == 1 LB FirstUsable above is 6 So LBA's 3, 4, and 5 are free in this GPT structure. I presume this platform has firmware stuffed in there.
42 c6 95 4a
CRC of partition table
00 00 00 00> 00001060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
reserved
----- Even Deeper Dive WRT EndingCHS in the MBR the spec says:
""" Set to the CHS address of the last logical block on the disk. Set to 0xFFFFFF if it is not possible to represent the value in this field """
The max disksize for 512 byte sectors in CHS is 7.8 GB. The CHS limit for 4K sectors should be 61 GB. However, assuming 63 sectors and 256 heads you can not represent the exact size of the disk. 392 cylinders is too small and 393 is too big. I believe that the ending head and ending sector are also used as max head and max cylinders for the whole disk. This means you can not represent a fractional cylinder.
(This assumption had escaped me until now. I never understood how fdisk would let me change the number of "heads" and "sectors" on a USB stick.)
Bill
On 03/07/2018 10:08, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
[...]
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Hmmm. That's interesting. I had assumed that on UFS devices, device partitions would be used and the GPT would be omitted. Evidently that is not done on the 810c.
Is that just because sticking with GPT is the 'known-working-solution'? Will there eventually be a time when OSes use UFS-native partitioning? Or should the recommendation be to keep doing GPT partitioning on the whole device?
I'm assuming that UFS-native partitioning allows for better management of the underlying flash media, but I'm no expert.
How do UFS partitions show up in Linux right now? Do we have good partitioning tools for UFS? Will UFS partitioning cause issues for the distros?
g. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Tue, Jul 03, 2018 at 04:46:38PM +0100, Grant Likely wrote:
On 03/07/2018 10:08, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
[...]
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Hmmm. That's interesting. I had assumed that on UFS devices, device partitions would be used and the GPT would be omitted. Evidently that is not done on the 810c.
I might be missing something but can EFI run on top of raw hardware device partitions? I didn't think EFI provided any means to locate an ESP except when they are described in one of the three supported ways (GPT, MBR or El Torito).
Daniel.
Is that just because sticking with GPT is the 'known-working-solution'? Will there eventually be a time when OSes use UFS-native partitioning? Or should the recommendation be to keep doing GPT partitioning on the whole device?
I'm assuming that UFS-native partitioning allows for better management of the underlying flash media, but I'm no expert.
How do UFS partitions show up in Linux right now? Do we have good partitioning tools for UFS? Will UFS partitioning cause issues for the distros?
g. _______________________________________________ Arm.ebbr-discuss mailing list Arm.ebbr-discuss@arm.com
Correct
- DW - -----Original Message----- From: arm.ebbr-discuss-bounces@arm.com arm.ebbr-discuss-bounces@arm.com On Behalf Of Daniel Thompson Sent: Tuesday, July 3, 2018 8:59 AM To: Grant Likely Grant.Likely@arm.com Cc: boot-architecture@lists.linaro.org; hare@suse.de; arm.ebbr-discuss arm.ebbr-discuss@arm.com; Nicolas Dechesne nicolas.dechesne@linaro.org; Architecture@cam-list1.cambridge.arm.com Subject: Re: [Arm.ebbr-discuss] Questions about GPT
On Tue, Jul 03, 2018 at 04:46:38PM +0100, Grant Likely wrote:
On 03/07/2018 10:08, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
[...]
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Hmmm. That's interesting. I had assumed that on UFS devices, device partitions would be used and the GPT would be omitted. Evidently that is not done on the 810c.
I might be missing something but can EFI run on top of raw hardware device partitions? I didn't think EFI provided any means to locate an ESP except when they are described in one of the three supported ways (GPT, MBR or El Torito).
Daniel.
Is that just because sticking with GPT is the 'known-working-solution'? Will there eventually be a time when OSes use UFS-native partitioning? Or should the recommendation be to keep doing GPT partitioning on the whole device?
I'm assuming that UFS-native partitioning allows for better management of the underlying flash media, but I'm no expert.
How do UFS partitions show up in Linux right now? Do we have good partitioning tools for UFS? Will UFS partitioning cause issues for the distros?
g. _______________________________________________ Arm.ebbr-discuss mailing list Arm.ebbr-discuss@arm.com
_______________________________________________ Arm.ebbr-discuss mailing list Arm.ebbr-discuss@arm.com IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 03/07/2018 16:58, Daniel Thompson wrote:
On Tue, Jul 03, 2018 at 04:46:38PM +0100, Grant Likely wrote:
On 03/07/2018 10:08, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
[...]
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Hmmm. That's interesting. I had assumed that on UFS devices, device partitions would be used and the GPT would be omitted. Evidently that is not done on the 810c.
I might be missing something but can EFI run on top of raw hardware device partitions? I didn't think EFI provided any means to locate an ESP except when they are described in one of the three supported ways (GPT, MBR or El Torito).
It looks like UFS models partitions after iSCSI LUN. I suppose it is reasonable to assume that each LUN used to store an ESP or OS partitions will need also have a GPT.
At some point in the future it may make sense to explicitly define how to find the LUN that contains the ESP, but that would be in the scope of the UEFI spec, not EBBR.
It would be reasonable for one or more LUNs to be dedicated to firmware which gets us out of the shared ESP scenario and the OS can do what it wants with the GPT in the 'general-purpose' LUN.
Okay, I'll rework some of my UFS discussion in EBBR.
g. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 07/03/2018 12:24 PM, Grant Likely wrote:
On 03/07/2018 16:58, Daniel Thompson wrote:
On Tue, Jul 03, 2018 at 04:46:38PM +0100, Grant Likely wrote:
On 03/07/2018 10:08, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
[...]
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Hmmm. That's interesting. I had assumed that on UFS devices, device partitions would be used and the GPT would be omitted. Evidently that is not done on the 810c.
I might be missing something but can EFI run on top of raw hardware device partitions? I didn't think EFI provided any means to locate an ESP except when they are described in one of the three supported ways (GPT, MBR or El Torito).
It looks like UFS models partitions after iSCSI LUN. I suppose it is reasonable to assume that each LUN used to store an ESP or OS partitions will need also have a GPT.
At some point in the future it may make sense to explicitly define how to find the LUN that contains the ESP, but that would be in the scope of the UEFI spec, not EBBR.
It would be reasonable for one or more LUNs to be dedicated to firmware which gets us out of the shared ESP scenario and the OS can do what it wants with the GPT in the 'general-purpose' LUN.
This is what I thought we were doing all along. UFS looks like eMMC with a bit more flexibility in the *boot" partitioning.
-- Bill
On 03/07/2018 17:24, Grant Likely wrote:
On 03/07/2018 16:58, Daniel Thompson wrote:
On Tue, Jul 03, 2018 at 04:46:38PM +0100, Grant Likely wrote:
On 03/07/2018 10:08, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
[...]
I am still trying to figure out if a real issue exists or will soon exist. If this issue is real, I think it should be addressed in UEFI but if not there then in EBBR. We move "disks" around a lot more than other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Hmmm. That's interesting. I had assumed that on UFS devices, device partitions would be used and the GPT would be omitted. Evidently that is not done on the 810c.
I might be missing something but can EFI run on top of raw hardware device partitions? I didn't think EFI provided any means to locate an ESP except when they are described in one of the three supported ways (GPT, MBR or El Torito).
It looks like UFS models partitions after iSCSI LUN. I suppose it is reasonable to assume that each LUN used to store an ESP or OS partitions will need also have a GPT.
At some point in the future it may make sense to explicitly define how to find the LUN that contains the ESP, but that would be in the scope of the UEFI spec, not EBBR.
It would be reasonable for one or more LUNs to be dedicated to firmware which gets us out of the shared ESP scenario and the OS can do what it wants with the GPT in the 'general-purpose' LUN.
Actually reading the UFS spec helps a lot! :-)
https://www.jedec.org/system/files/docs/JESD220D.pdf
Right, so UFS seems to support up to 128 partitions, or LUNs. It appears that each LUN can be treated as an separate block device. Up to 2 can be configured as boot LUNs (boot A and B), and one can be an RPMB. Size of the boot and RPMB regions is not fixed, so as much space as needed for firmware could be allocated.
I'm going to rework the text to talk about shared storage in terms of a single device or LUN. If firmware is contained in a separate LUN (one of the boot partitions), then it is outside the scope of EBBR.
It would be possible for separate LUNs to be allocated for each OS partition, but I don't think EBBR needs to tackle that. In that scenario each LUN would probably still need to have a GPT partition table, (or at the very least the LUN containing the ESP would). Each LUN would show up as a separate block device in Linux (I think). Yet I don't think that affects EBBR if EBBR treats each LUN as a separate block device. I imagine UEFI behaviour in that case would possibly to search each LUN for an ESP if the boot variables are not set.
I've also learned that removable UFS cards exist. If the platform strictly requires a UFS boot partition on the removable media, then that could be an issue for the firmware for multiple platforms on a single card use case. We could mitigate this by recommending a filesystem be used on the boot partition. I'm concerned about overreaching though.
g. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Wed, Jul 4, 2018 at 11:57 AM Grant Likely grant.likely@arm.com wrote:
On 03/07/2018 17:24, Grant Likely wrote:
On 03/07/2018 16:58, Daniel Thompson wrote:
On Tue, Jul 03, 2018 at 04:46:38PM +0100, Grant Likely wrote:
On 03/07/2018 10:08, Nicolas Dechesne wrote:
On Mon, Jul 2, 2018 at 11:59 PM Alexander Graf agraf@suse.de wrote:
On 02.07.18 20:40, William Mills wrote:
[...]
> I am still trying to figure out if a real issue exists or will soon > exist. If this issue is real, I think it should be addressed in UEFI > but if not there then in EBBR. We move "disks" around a lot more > than > other people do.
Yes, let's double check with Hannes :).
On Dragonboard 820c, that has on board UFS disk:
Found valid GPT with protective MBR; using GPT. Disk /dev/sda: 6335488 sectors, 24.2 GiB Model: THGBF7G8K4LBATRB Sector size (logical/physical): 4096/4096 bytes
Hmmm. That's interesting. I had assumed that on UFS devices, device partitions would be used and the GPT would be omitted. Evidently that is not done on the 810c.
I might be missing something but can EFI run on top of raw hardware device partitions? I didn't think EFI provided any means to locate an ESP except when they are described in one of the three supported ways (GPT, MBR or El Torito).
It looks like UFS models partitions after iSCSI LUN. I suppose it is reasonable to assume that each LUN used to store an ESP or OS partitions will need also have a GPT.
At some point in the future it may make sense to explicitly define how to find the LUN that contains the ESP, but that would be in the scope of the UEFI spec, not EBBR.
It would be reasonable for one or more LUNs to be dedicated to firmware which gets us out of the shared ESP scenario and the OS can do what it wants with the GPT in the 'general-purpose' LUN.
Actually reading the UFS spec helps a lot! :-)
https://www.jedec.org/system/files/docs/JESD220D.pdf
Right, so UFS seems to support up to 128 partitions, or LUNs. It appears that each LUN can be treated as an separate block device. Up to 2 can be configured as boot LUNs (boot A and B), and one can be an RPMB. Size of the boot and RPMB regions is not fixed, so as much space as needed for firmware could be allocated.
I'm going to rework the text to talk about shared storage in terms of a single device or LUN. If firmware is contained in a separate LUN (one of the boot partitions), then it is outside the scope of EBBR.
It would be possible for separate LUNs to be allocated for each OS partition, but I don't think EBBR needs to tackle that. In that scenario each LUN would probably still need to have a GPT partition table, (or at the very least the LUN containing the ESP would). Each LUN would show up as a separate block device in Linux (I think). Yet I don't think that affects EBBR if EBBR treats each LUN as a separate block device. I imagine UEFI behaviour in that case would possibly to search each LUN for an ESP if the boot variables are not set.
Yes, correct. Each LUN under Linux shows up as a separate block device. On a DB820c, where 6 LUNs have been provisioned, i can see:
root@db820c:~# ls -1 /dev/sd? /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
Each LUN/block device has its own GPT. The DB820c still uses the 'old school' QC bootloaders, the ROM code will look for first stage bootloader (xbl) on the LUN boot device.
The (re) provisioning of the onboard UFS device is possible using some QCOM tools. To give an idea the current 'provisioning' config file is [1] where you can see all LUNs and how each LUN is 'partitioned'. there are various tools involved to create the proper GPT on each LUN.
An interesting recent kernel patch series can be found at [2] that adds runtime UFS provisioning capabilities.
[1] https://git.linaro.org/landing-teams/working/qualcomm/db-boot-tools.git/tree... [2] https://lkml.org/lkml/2018/6/15/568
I've also learned that removable UFS cards exist. If the platform strictly requires a UFS boot partition on the removable media, then that could be an issue for the firmware for multiple platforms on a single card use case. We could mitigate this by recommending a filesystem be used on the boot partition. I'm concerned about overreaching though.
g. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Wed, Jul 04, 2018 at 10:57:16AM +0100, Grant Likely wrote:
It would be reasonable for one or more LUNs to be dedicated to firmware which gets us out of the shared ESP scenario and the OS can do what it wants with the GPT in the 'general-purpose' LUN.
Actually reading the UFS spec helps a lot! :-)
https://www.jedec.org/system/files/docs/JESD220D.pdf
Right, so UFS seems to support up to 128 partitions, or LUNs. It appears that each LUN can be treated as an separate block device. Up to 2 can be configured as boot LUNs (boot A and B), and one can be an RPMB. Size of the boot and RPMB regions is not fixed, so as much space as needed for firmware could be allocated.
I'm going to rework the text to talk about shared storage in terms of a single device or LUN. If firmware is contained in a separate LUN (one of the boot partitions), then it is outside the scope of EBBR.
Thanks for summary!
It would be possible for separate LUNs to be allocated for each OS partition, but I don't think EBBR needs to tackle that. In that scenario each LUN would probably still need to have a GPT partition table, (or at the very least the LUN containing the ESP would). Each LUN would show up as a separate block device in Linux (I think).
Given the "I think" here is a quick grep over the Dragonboard 820C boot logs to confirm that... you will see most of the LUNs (those that aren't special) being allocated their own block device:
https://gist.github.com/daniel-thompson/45275d0667bf93581703ad0dbc867a29
I've also learned that removable UFS cards exist. If the platform strictly requires a UFS boot partition on the removable media, then that could be an issue for the firmware for multiple platforms on a single card use case. We could mitigate this by recommending a filesystem be used on the boot partition. I'm concerned about overreaching though.
We are certainly approaching aspirational with things like that ;-) .
No objections on my side but I'd like it to be very clearly separated from level 0 requirements.
Daniel.
boot-architecture@lists.linaro.org