On Thursday 08 November 2012, Viresh Kumar wrote:
It copies 4M byte data from source to dest repeatedly and takes 12
seconds. But when I get rid of the part of code in red color, it takes only 4 seconds. Why does the writing to source firstly has influence the memcpy( ) time so greatly?
I tested it on Vexpress TC2 with 2 A15's and 3 A7's.
Interesting. I added additional if, endif on the second loop to and checked their assembly in three conditions:
- Both loops are enabled: full: Time: 3.7 sec
- Only first loop is enabled: first: .189 sec
- Only second loop is enabled: second: 1.6 sec
Find assembly of these attached.
I don't see any trick in assembly that can do it. That that is out of the way.
What is left is Cache. But yes, i must admit, i am still not able to solve this puzzle.
Lets see if Arnd can help here :)
When you allocate memory in user space but never write to it, the backing pages are in the "empty zero page" in the kernel and don't consume any physical memory beyond that. Moreover, after reading the page the first time, all data will be in clean cache lines and readily available for the CPU to read from.
If you initialize the data first, all 4 MB end up in actual memory. Since this is larger than your available cache memory, copying out of that area generates lots of cache misses.
Arnd