[PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
per.forlin at linaro.org
Mon Aug 8 20:45:24 UTC 2011
On 8 August 2011 22:23, Alan Stern <stern at rowland.harvard.edu> wrote:
> On Mon, 8 Aug 2011, Per Forlin wrote:
>> On 8 August 2011 20:45, Alan Stern <stern at rowland.harvard.edu> wrote:
>> > On Mon, 8 Aug 2011, Per Forlin wrote:
>> >> > Okay, 6% is a worthwhile improvement, though not huge. Did you try 6
>> >> > or 8 buffers? I bet going beyond 4 makes very little difference.
>> >> >
>> >> On my board 4 buffers are enough. More buffers will make no difference.
>> >> Background study
>> >> I started by running dd to measure performance on my target side.
>> >> Simply to measure what would be the maximum bandwidth, 20MiB/s on my
>> >> board. Then I started the gadget mass storage on the device and run
>> >> the sae test from the PC host side, 18.7 MiB/s. I guessed this might
>> >> be due to serialized cache handling. I tested to remove the dma_map
>> >> call (replaced it with virt_to_dma). This is just a dummy test to see
>> >> if this is causing the performance drop. Without dma_map I get
>> >> 20MiB/s. It appears that the loss is due to dma_map call. The dma_map
>> >> only adds latency for the first request.
>> > What exactly do you mean by "first request"?
>> When both buffers are empty. The first request are filled up by vfs
>> and then prepared. This first time there are no ongoing transfer over
>> USB therefore this cost more, since it can't run in parallel with
>> ongoing transfer. Every time the to buffers run empty there is an
>> extra cost for the first request in the next series of requests. The
>> reason for not getting data from vfs in time I don't know.
> Okay, that makes sense. But what connection does this have with
> dma_map? Don't you also have situations where both buffers are empty
> when you replace dma_map with virt_to_dma?
All I do here is to remove the cost of dma_map in order to verify if
dma_map is the one responsible for the delay. If the performance is
good without dma_map I know the extra cost is due to dma_map running
in serial instead of parallel. If the performance would be bad even
without dma_map something else is causing it. It doesn't affect how
the VFS feeds data to the buffers.
>> Instead of getting refill from vfs smoothly the refills comes in
>> burst. VFS fills up the two buffers, then USB manage to transmit all
>> two buffers before VFS refills new data. Roughly 20% of the times VFS
>> fills up data in time before USB has consumed it all. 80% of the times
>> USB manage to consume all data before VFS refills data in the buffers.
>> The reason for the burst like affects are probably due to power save,
>> which add latency in the system.
>> This is true. In my system the buffer doesn't get filled up before the
>> previous has finished. With 4 buffers it works fine.
> Adding buffers to smooth out the bursty VFS behavior is a reasonable
> thing to do. But you should improve the patch description and the
> comments; the real reason for reduced throughput is VFS's behavior.
> Something like what you just wrote here would be fine.
I agree. The main reason is to compensate for bursty VFS behavior.
> And reduce the maximum number of buffers to 4. When you've made those
> changes, I'll Ack the patch.
Thanks for your time and patience,
More information about the linaro-dev