Re: Ideas for DT improvements
by Zhangpeng (Parker, Kirin)
Hi Bill,
Improve search algorithm performance:
We will need data to show the problem. I suppose this would best be done when unflatening the data at runtime? What is the expected gain in boot time? Are there any measurements of how much time is spent in the search
routines today?
Actually, It is nessary to search and modify device tree in boot loader, and the device tree has not be unflatened.
We need to search device tree node int dtb by libfdt, however libfdt algorithm is pretty horrible, eg:fdt_node_offset_by_compatible.
It takes about 10ms on average to search for a node.
I suppose the dtb structure and libfdt algorithm can be optimized to reduce boot time?
The time for searching for a node should be less than 1 ms.
Reduce DTB space:
What is the goal of the use case?
1) Fit in limited storage ( ex: 256MB )
2) Conserve more space of modest storage for container data ( 1GB eMMC)
3) Improve boot time
For 3, the load time will be reduced but the decompression time will be added. These need to be balanced based on the CPU.
One pet peeve I have in most of our boot loaders today is that they do loading and decompression serially. During loading the IO is 100% loaded and the CPU is very lightly loaded. During decomoression the CPU is 100% loaded and the IO is 0%.
It makes sense to pipleline / overlap these things which means that it needs to go into the loader. To optimize boot time the decompression algorithm needs to be chosen correctly. On smaller CPUs the time taken to decompress
newer algorithms can greatly outweigh the time taken to load the decompressed data. Ideally the time to decompress 1 block == time to load one block. The dynamics shift with CPU and IO performance.
Today, a lot of people focused on boot speed just use decompressed data but I think we could do better if we pipeline
Since one dtbo image contains of hundreds of dtbs, the iamge is too large.
The goal of reducing dtb space is to fit in limited storage without increasing boot time.
Pipleline / overlap loading and decompression is a good idea.
We can try to overlap them in boot loader, and discuss it further if there's a problem.
Define specific rule for properties:
This is harder.In 2019 I had proposed an ATOM based DTB enhancement [2]. I was told Frank Rowand had other proposals for format changes.
OK.I am very interested in the ATOM based DTB enhancement.
I will learn about it and other format changes, then make some detailed discussions with you.
Thanks very much, Bill.
Happy Chinese New Year
Regards,
Zhangpeng
发件人: Jammy Zhou [mailto:jammy.zhou@linaro.org]
发送时间: 2021年2月9日 16:45
收件人: Xiamingliang (XML, Hisilicon) <xiamingliang(a)huawei.com>
抄送: Bill Mills <bill.mills(a)linaro.org>; boot-architecture(a)lists.linaro.org; Frank Rowand <frowand.list(a)gmail.com>; Zhangpeng (Parker, Kirin) <zhangpeng55(a)huawei.com>; Wangjun (U) <wangjundrv.wang(a)huawei.com>
主题: Re: Ideas for DT improvements
Hi Bill,
Thanks very much for your comments. Since we're close to the Chinese New Year holiday, I would assume there will be some delay for the response by Zhangpeng.
Regards,
Jammy
On Sun, 7 Feb 2021 at 09:35, Xiamingliang (XML, Hisilicon) <xiamingliang(a)huawei.com<mailto:xiamingliang@huawei.com>> wrote:
+ zhangpeng, owner of DT in Hisilicon
-----Original Message-----
From: Bill Mills [mailto:bill.mills@linaro.org<mailto:bill.mills@linaro.org>]
Sent: 2021年2月7日 1:39
To: Jammy Zhou <jammy.zhou(a)linaro.org<mailto:jammy.zhou@linaro.org>>; boot-architecture(a)lists.linaro.org<mailto:boot-architecture@lists.linaro.org>; Frank Rowand <frowand.list(a)gmail.com<mailto:frowand.list@gmail.com>>
Cc: Xiamingliang (XML, Hisilicon) <xiamingliang(a)huawei.com<mailto:xiamingliang@huawei.com>>
Subject: Re: Ideas for DT improvements
Hi Jammy & Mingliang,
On 2/5/21 2:59 AM, Jammy Zhou wrote:
> Hi,
>
> There are several ideas for DT improvements. Please check if they are
> reasonable, and any comments are welcome. I would let Mingliang (CCed)
> share more details if needed.
>
> 1) Improve search algorithm performance: Is the binary search tree or
> other algorithm better than current algorithm?
>
We will need data to show the problem. I suppose this would best be done when unflatening the data at runtime? What is the expected gain in boot time? Are there any measurements of how much time is spent in the search routines today?
> 2) Reduce DTB space: when use one DTB to support multiple boards, the
> image is quite big (e.g, ~39MB space for 100 configurations), and some
> compression algorithm can reduce the space a lot (e.g, from 39MB to 7MB).
> Shall we have such compression support for DTB? And it can be helpful
> if we can have more efficient compression algorithm.
>
This could be done as an enhancement to the DTB loader instead of the DTB format itself.
Compressing each DTB (boardx.dtb.xz) will get you gains but compressing a set of boards (vmlinux-5.4.0-65-generic-dbt-set-20.tar.xz) might give you more.
To be significant, the number of boards would need to be large and the size of the rootfs would need to be modest. A 200 to 300 MB minimal image would make an interesting comparison point. (A rootfs of 10s of MB would probably only target a few boards.)
What is the goal of the use case?
1) Fit in limited storage ( ex: 256MB )
2) Conserve more space of modest storage for container data ( 1GB eMMC)
3) Improve boot time
For 3, the load time will be reduced but the decompression time will be added. These need to be balanced based on the CPU.
One pet peeve I have in most of our boot loaders today is that they do loading and decompression serially. During loading the IO is 100% loaded and the CPU is very lightly loaded. During decomoression the CPU is 100% loaded and the IO is 0%. It makes sense to pipleline / overlap these things which means that it needs to go into the loader. To optimize boot time the decompression algorithm needs to be chosen correctly. On smaller CPUs the time taken to decompress newer algorithms can greatly outweigh the time taken to load the decompressed data. Ideally the time to decompress 1 block == time to load one block.
The dynamics shift with CPU and IO performance.
Today, a lot of people focused on boot speed just use decompressed data but I think we could do better if we pipeline
> 3) Define specific rule for properties: The property value
> (FDT_PROP_DATA) itself occupies only ~50% of the total DTB space. And
> the property of each node is different and the private property name
> length is too long, for
> example: “freq-autodown-baseaddress-num” in dt_strings. It seems more
> reasonable that the property value should occupies more than 70% of
> the total DTB space. It can probably be achieved to define some rules
> to restrict the length of property name, etc.
>
This is harder. In 2019 I had proposed an ATOM based DTB enhancement [2]. I was told Frank Rowand had other proposals for format changes.
Thanks,
Bill
[2]
https://docs.google.com/document/d/19XbxN-zX-GYwOXdF78lGnp0j7UNx1MT3wzyCjai…
> Thanks,
> Jammy
> _______________________________________________
> boot-architecture mailing list
> boot-architecture(a)lists.linaro.org<mailto:boot-architecture@lists.linaro.org>
> https://lists.linaro.org/mailman/listinfo/boot-architecture
>
--
Bill Mills
Principal Technical Consultant, Linaro
+1-240-643-0836
TZ: US Eastern
Work Schedule: Tues/Wed/Thur