Hi Tushar,
Here is my assessment of the current situation.
Thanks for digging into this and the detailed diagnosis.
*Bug in the u-boot* Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the core uses a global structure (gd->bd) to maintain the start and size of individual banks. Depending on the revision of SoC used on the board, the board file [1] updates the start/size for either 8 or 12 banks. In case of current revision of Arndale-Octa boards, the board file always updates start/size for 8 banks, leaving the start/size data for remaining 4 banks uninitialized.
But the u-boot core[2] updates the value of all the 12 banks, thus potentially updating invalid data for last 4 banks.
The issue can be fixed by resetting the start/size for unused memory banks to 0/0.[3]
*Before migration to memblock* The path for adding DRAM banks was done through [4]. For Exynos systems, NR_BANKS was defined as 8. The initial check for rejecting any banks beyond NR_BANKS was good enough for fixing this issue. The bootlog[5] (with some debug messages) shows the invalid data, both in u-boot and kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog.
*After migration to memblock* Now that the memory banks are added through [6], all the memory banks are getting updated unconditionally resulting in the panic.
IMO, the bug is in u-boot and we should fix that.
I agree that the u-boot bug needs to be fixed, and FWIW, I updated my u-boot and haven't seen the boot failure yet after several boots with next-20140625.
That being said, since it's not always feasible/practical to update u-boot, and when it comes down to it, this is still a kernel regression, we should also fix the kernel to sanity check the values coming from u-boot, like it was doing before.
Could you (or Laura) come up with a way to recreate the sanity check that was detecting this problem before and ignoring those banks?
Thanks,
Kevin