Hello Jonathan, Hello Rob,
Thanks for bringing this topic to discussion. I think the status quo is an obvious shortcoming that needs to be addressed.
On 02.05.24 16:00, Rob Herring wrote:
On Wed, May 1, 2024 at 4:18 PM Humphreys, Jonathan j-humphreys@ti.com wrote:
Problem statement:
Device trees are in theory a pure description of the hardware, and since the hardware doesn't change, the device tree describing the hardware likewise never changes. With this, a device tree could then be burned into the hardware's ROM to be queried by software for hardware discovery. In practice, though, device trees evolve over time. They evolve for many reasons, including
- support for previously unsupported hardware
- device driver improvements that require additional hardware information
- bug fixes
I really would like specific cases of these where compatibility is broken highlighted.
Screening for backwards-compatibility of new kernels (or their bindings) with old DTs is not enough. When an A/B system fails to boot and does a fallback, you can run into the inverse situation, namely: An old kernel is presented with a new device tree as bootloader updates are often not rolled back.
This seems unavoidable and the solution we have for that is to ship device trees along with kernel updates and load both together.
The tooling and reviewing to identify these cases has gotten much better.
barebox has been pulling in kernel device trees for many years and it's a frequent cause of regressions. Here are some recent fixes found with $(git log --grep="^Fixes:.*dts: update"):
* "aiodev: imx_thermal: fix breakage after device tree sync" https://github.com/barebox/barebox/commit/451c25b60e
* "pinctrl: stm32: Remove check for pins-are-numbered" https://github.com/barebox/barebox/commit/38ff8dad11
* "ARM: dts: i.MX8MP: snps,dis-u2-freeclk-exists-quirk" https://github.com/barebox/barebox/commit/db01bf84cf
* "clk: imx8mp: add USB suspend clock" https://github.com/barebox/barebox/commit/d86bbaed71
* "ARM: i.MX8MN: assume USBOTG power domains to be powered" https://github.com/barebox/barebox/commit/7b62fbc632
All of these bugs would have broken a newer Linux kernel being booted with an old device tree. In practice, they didn't because normally barebox-built device trees are used for barebox and Linux-built device trees are shipped along with Linux, even if they might have been at identical some point.
I've been prototyping a tool which will compare 2 versions of binding schemas and spit out incompatible changes for example. Those aren't the only types of changes as you point out, but if we can eliminate a whole class of issues I think the situation would be much better.
I look forward to this. Would your tooling have detected any of the above regressions?
Fortunately, most of these issues are caught before a barebox release (features, unlike bug fixes, sit in master a month before making it into a monthly release), but some slip through and it introduces a lot of churn.
Linux's device tree source is maintained with the kernel source, and kernel builds include building the device trees too. This ensures that the device tree matching the kernel's usage is always kept in sync. Often, embedded distros will include the matching device tree blobs.
The EBBR mandates that the device tree blob is provided by the firmware.
Thus it is likely that the device tree provided by the firmware and given to the operating system is not the matching device tree blob for that kernel. This can cause hardware to be missing, buggy, or non-functional.
Yes. My first experience with EBBR was AFAIR a system that didn't boot, because an up-to-date Debian kernel failed to handle the old device tree provided by the firmware. At least updating the EFI firmware with a USB stick worked well.
This proposal then has the firmware choose the device tree by name, or some other identifier that can be used to match the device tree for the board [1]. It has the OS-provided OS loader select the location of the matching versions of DTBs for it.
The firmware would pass the device tree filename/id to the OS loader, instead of the DTB itself. If the firmware can't know which version of DTB, how can it know whether to pass a DTB vs. an identifier? The OS might be perfectly fine with firmware's DTB.
I think it's a fair assumption that if the kernel ships with a matching DTB, it would be fine booting with it instead of the firmware provided DTB.
If we had a way to express this "shipped-with" relationship, we could thus have the EFI firmware just select the matching device tree and pass it along the exact way it's done now.
Some ways to describe this "shipped-with" relationship:
- a section in the image as UKIs do, see Jan's mail - a fixed naming scheme in the EFI partition, e.g. \EFI\Debian\BOOTAA64.EFI -> \EFI\Debian\DTS-BOOTAA64.EFI/ - an EFI variable or protocol?
This proposal should be in addition to supporting the standard way of passing in a firmware-provided DT, in cases where the OS doesn't provide or have a need to provide a matching DT.
Agreed, but that contradicts what you said above unless you mean we define 2 ways to operate with some platforms working one standard way and other platforms working the other standard way.
I agree that an OS-provided DT should be an alternative, not a replacement for the firmware-provided DT.
We discussed this a while back on this list (or u-boot?). To summarize, both using the filename or root node compatible were proposed. Several folks (myself included) don't like making the filename an ABI. However, there are some cases where the filename is more unique than the root node compatible. We should fix those root node compatibles in that case IMO.
Agreed.
Cheers, Ahmad