Dear Wolfgang Denk,
On 27 July 2013 03:06, Wolfgang Denk wd@denx.de wrote:
Dear Andy Green,
In message CAAfg0W7p6rvvaMqsKsC09yQnfPod08YDTbeh_MQrz6n+B4iyYA@mail.gmail.com you wrote:
Instead of making assumptions on the performance of memcpy() and
As I wrote, I measured the performance and got a very big gain, it's 3x faster on my setup to use memcpy() then default memmove().
Yes, in your single test case of copying a Linux kernel image, i. e. a multi-megabyte file.
That "single test case" of copying a "multi-megabyte file" is the reason d'etre of a bootloader...if it doesn't perform well doing that then it's a problem.
However I agree the alternative arch-specific implementation for ARM is pretty good, so this is kind of moot.
Since I traced my problem to here I "fixed" it there, but that actual problem was were using the default implementations at all (the config was inherited)..
By calling that an "assumption" you're saying that there exist platforms where 32-bit linear memmove() is slower than doing it with 8-bit actions?
No. I said you should not assume that memcpy() is always faster than memmove(); a system may use optiomized versions of either.
I did not assume that, I looked at your code for both and saw, and proved, that using 32-bit operations for mass move actions is going to perform better than using 8-bit operations.
That's not something you can write off as an "assumption", it's a fact.
adding the overhead of an additional function call (which can be expensive especially for short copy operations) it would make more
I am not sure U-Boot is really in the business of doing small memmoves, but okay...
It's easy to avoid this overhead, and also get rid of the restrictions you built into it (otimizong only the non-overlapping case), so if we touch that code, we should do it right.
Given that code should perferably never be used, maybe it should print a warning like "Using default memory ops" and leave it like it is. The problem is not correctness just inefficiency.
sense to pull the "copy a word at a time" code from memcpy() into memmove(), too.
On the other hand - if you really care about performance, then why do
I spent several hours figuring out why our NOR boot performance was terrible.... it's because this default memmove code is gloriously inefficient for all cases.
If you like it like that, no worries.
Don't twist my words. I asked for a different, better implementation, that's all.
Unfortunately I'm only looking at it because it made trouble, and we now no longer use that code.
For the use-case I'm studying (very fast overall boot) it still may make sense to implement the NEON stuff in which case I'll offer a patch for that.
-Andy
Best regards,
Wolfgang Denk
-- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de The only time the world beats a path to your door is when you are in the bathroom.