On 07/03/2018 02:52 AM, Hannes Reinecke wrote:
On 07/02/2018 11:58 PM, Alexander Graf wrote:
On 02.07.18 20:40, William Mills wrote:
On 07/02/2018 12:15 PM, Daniel Thompson wrote:
On Sun, Jul 01, 2018 at 10:37:49AM -0400, William Mills wrote:
All,
I rely on your greater knowledge to help me understand these questions. Thanks in advance.
- GPT and block size 1A) By querying the device 1B) Some MBR magic?
There's some comments in the fdisk man page that recent Linux kernels "just knows" the sector size and the code to work with GPT partitions in the kernel (block/partitions/efi.c ) will error out of MyLBA does not match the LBA the kernel thinks it is. This means that (unless there' s some fallback code at a layer above the partition parsing code) then if you copy a GPT to a disk with a different sector size it will be broken.
I'm not sure we have seen 4K block devices much in the wild yet have we? I think most vendors are just publishing suggested read & write sizes and leaving the "block size" set at 512.
FWIW most (all?) hard drive vendors went back to 512 logical, 4k physical. But let me CC Hannes to confirm this.
This is good to know. Thanks.
This 'just knows' is in fact the driver querying the device prior to reading the GPT table. Reading the GPT table out of necessity means that you _already_ know the block size; you read from LBA 0 _and_ you need to know how large the buffer for LBA 0 needs to be, otherwise you might end up with a buffer too small for holding an entire block, and the read will fail...
Yes of course you need to query the devices block size to read data.
However, *IF* you wanted to handle GPT images from a different LBA you could easily do it. For example if you wanted to cover LBAs of 512, 1K, 2K, and 4K you would:
query the device's native block size read enough blocks to cover 8K
if the native block size was 4K you would expect: MBR signature at byte offset 510 EFI GPT signature at byte offset 4096 All bytes from 513 through 4095 inclusive to be 0 (UEFI spec)
However if you did not find the EFI GPT signature at 4096, you could also check 512, 1024, and 2048. If found, you would know you were dealing with a non-native GPT image and you would have the data needed to interpret the rest of the GPT.
Of course this does not help you if someone was actually dumb enough to create partitions that are not 4K aligned. Any partition not aligned to the devices actual block size should not be used. I don't care about that. I care about a GPT image that were carefully constructed to be portable.
I also agree that according to current specs of GPT, the above non-native handling is not required but instead is explicitly disallowed. This whole thread is me *wondering* if that should be loosened.
So really it's quite pointless to have it specified in the GPT, as by that time you already need to know the blocksize.
I don't think I suggested that it be added to the GPT header. If I implied that, I take it back. It is, in effect, already there by nature of the MBR 0 fill and the EFI GPT signature.
And most drives went back to 512e as they figured that most BIOS can't handle 4k sectorsize, and having disks which can't be used for booting are kinda pointless for the mass-market, where most systems only have one disk ...
Thanks. Again this is good to know.
(I don't really know why the LBA size needs to change in the first place. Is 16,777,216 TB not enough for a few years? Drives already publish enough info for OS'es not to dumb things.)
Simple physics. You need some guard area between each (physical) block, so that the drive head can tell one block from the other. So if you increase the blocksize you lower the ratio between data area and guard area, which means you can stuff more data onto the same surface area. IE increasing the capacity of the drive without having to change anything.
Yes, I understand the reason to change the block size on disk. I was asking why change the LBA? In effect I was asking why 4Kn as opposed to just 512e. I think you answered this above when you said most manufactures went back to 512e.
Not sure that matters much though: if you want to fix it up you would arrange for the fixup logic to be part of your initramfs.
Yes, initramfs would be a good place to fix this. But it means firmware must deal with it. We can make U-boot handle this but what does tianocore do?
This is pretty much how we do all our images today. It works in both Tianocore and U-Boot. We also do resize the image on first boot to the actual target disk size inside initramfs. Works like a charm.
Tianocore should be handling things correctly already.
I think Alex and you are saying that a GPT that does not cover the whole disk is handled in SUSE initramfs and tianocore. This is good information. It is particularity interesting to know you grow the GPT in initramfs.
But I don't think either is handling a GPT written for 512 LBA found on a 4Kn drive. I did not really expect this and your other answer make it clear this case is not handled today.
- Can GPT be grown?
If the backup table is not found at the end of the disk then Linux will log in the dmesg trace that the partition table is damaged but I think will use it nevertheless.
Tools like fdisk are typically "uneasy" when why cannot find the backup GPT header and will offer recreate it if you let then. IIRC it basically marks the partition table dirty regardless of whether you have changed it or not (so that it will get updated if you write-and-exit).
fdisk is uneasy if it can't find it via AlternateLBA or is uneasy if that is not the end of the disk?
Yesterday I did find language in the UEFI spec (5.3.2 GPT Header) that talks about what happens when a volume grows so it is an expected case. (They were talking about RAID disks but the same principle applies.)
The wording is a bit strange in the spec. It says its up to platform policy whether it automatically restores the primary GPT with out asking the user but then says it should ask the user. If is not clear if they are talking about the UEFI firmware, the OS during normal boot, or a disk tool like fdisk.
- Is it actually required that the partition array start at LBA2?
I don't think so, although you'd probably have to author it (or modify a template) by hand.
To quote the UEFI 2.7 spec:
The start of the GPT Partition Entry Array is located at the LBA indicated by the Partition Entry LBA field.
So, no.
Thanks. Yes I knew of the field. I had seen some source and hearsay suggesting that this had to be 2 for the primary copy but looking again I don't see this in the UEFI spec anywhere.
However I wonder if fdisk or other partition editors will "fix" the GPT entries to be a LBA 2 if we add or delete a partition or even if we write the table with no changes as Daniel reported that it does for the volume grown situation.
That won't be an issue for the use case I am talking: offline creation of images that work on media of unbounded max size and unknown LBA.
However it may be if Daniel is planning to stuff firmware between the GPT header and the GPT entry table.
Assuming the code to validate the primary and backup partition tables is shared (e.g. properly decomposed into functions) the code will naturally end up honouring PartitionEntryLBA.
BTW this last question made me realize that:
a) one of the boards we've always believed to have a boot ROM that mandated MBR might just have a workaround b) I might have overlooked something in the EBBR text about protective partitioning (a.k.a. is it OK to place the system firmware outside the FirstUsableLBA).
Daniel.
Thanks for the info! Bill