On 16.01.2013 03:40, Jason Cooper wrote:
Soeren,
On Wed, Jan 16, 2013 at 01:17:59AM +0100, Soeren Moch wrote:
On 15.01.2013 22:56, Jason Cooper wrote:
On Tue, Jan 15, 2013 at 03:16:17PM -0500, Jason Cooper wrote:
If my understanding is correct, one of the drivers (most likely one) either asks for too small of a dma buffer, or is not properly deallocating blocks from the per-device pool. Either case leads to exhaustion, and falling back to the atomic pool. Which subsequently gets wiped out as well.
If my hunch is right, could you please try each of the three dvb drivers in turn and see which one (or more than one) causes the error?
In fact I use only 2 types of DVB sticks: em28xx usb bridge plus drxk demodulator, and dib0700 usb bridge plus dib7000p demod.
I would bet for em28xx causing the error, but this is not thoroughly tested. Unfortunately testing with removed sticks is not easy, because this is a production system and disabling some services for the long time we need to trigger this error will certainly result in unhappy users.
Just out of curiosity, what board is it?
The kirkwood board? A modified Guruplug Server Plus.
I will see what I can do here. Is there an easy way to track the buffer usage without having to wait for complete exhaustion?
DMA_API_DEBUG
OK, maybe I can try this.
In linux-3.5.x there is no such problem. Can we use all available memory for dma buffers here on armv5 architectures, in contrast to newer kernels?
Were the loads exactly the same when you tested 3.5.x?
Exactly the same, yes.
I looked at the changes from v3.5 to v3.7.1 for all four drivers you mentioned as well as sata_mv.
The biggest thing I see is that all of the media drivers got shuffled around into their own subdirectories after v3.5. 'git show -M 0c0d06c' shows it was a clean copy of all the files.
What would be most helpful is if you could do a git bisect between v3.5.x (working) and the oldest version where you know it started failing (v3.7.1 or earlier if you know it).
I did not bisect it, but Marek mentioned earlier that commit e9da6e9905e639b0f842a244bc770b48ad0523e9 in Linux v3.6-rc1 introduced new code for dma allocations. This is probably the root cause for the new (mis-)behavior (due to my tests 3.6.0 is not working anymore). I'm not very familiar with arm mm code, and from the patch itself I cannot understand what's different. Maybe CONFIG_CMA is default also for armv5 (not only v6) now? But I might be totally wrong here, maybe someone of the mm experts can explain the difference?
Regards, Soeren