[PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
per.forlin at linaro.org
Mon Aug 8 19:34:00 UTC 2011
On 8 August 2011 20:45, Alan Stern <stern at rowland.harvard.edu> wrote:
> On Mon, 8 Aug 2011, Per Forlin wrote:
>> > Okay, 6% is a worthwhile improvement, though not huge. Did you try 6
>> > or 8 buffers? I bet going beyond 4 makes very little difference.
>> On my board 4 buffers are enough. More buffers will make no difference.
>> Background study
>> I started by running dd to measure performance on my target side.
>> Simply to measure what would be the maximum bandwidth, 20MiB/s on my
>> board. Then I started the gadget mass storage on the device and run
>> the sae test from the PC host side, 18.7 MiB/s. I guessed this might
>> be due to serialized cache handling. I tested to remove the dma_map
>> call (replaced it with virt_to_dma). This is just a dummy test to see
>> if this is causing the performance drop. Without dma_map I get
>> 20MiB/s. It appears that the loss is due to dma_map call. The dma_map
>> only adds latency for the first request.
> What exactly do you mean by "first request"?
When both buffers are empty. The first request are filled up by vfs
and then prepared. This first time there are no ongoing transfer over
USB therefore this cost more, since it can't run in parallel with
ongoing transfer. Every time the to buffers run empty there is an
extra cost for the first request in the next series of requests. The
reason for not getting data from vfs in time I don't know.
> Your test showed an extra latency of about 0.4 seconds when using
> dma_map. All that latency comes from a single request (the first one)?
> That doesn't sound right.
This "first request scenario" I refer to happens very often.
>> Studying the flow of buffers I can see, both buffers are filled with
>> data from vfs, then both are transmitted over USB, burst like
> Actually the first buffer is filled from VFS, then it is transmitted
> over USB. While the transmission is in progress, the second buffer is
> filled from VFS. Each time a transmission ends, the next buffer is
> queued for transmission -- unless it hasn't been filled yet, in which
> case it is queued when it gets filled.
> What do you mean by "burst like behavior"?
Instead of getting refill from vfs smoothly the refills comes in
burst. VFS fills up the two buffers, then USB manage to transmit all
two buffers before VFS refills new data. Roughly 20% of the times VFS
fills up data in time before USB has consumed it all. 80% of the times
USB manage to consume all data before VFS refills data in the buffers.
The reason for the burst like affects are probably due to power save,
which add latency in the system.
>> More buffers will add "smoothness" and prevent the bursts
>> to affect performance negatively. With 4 buffers, there were very few
>> cases when all 4 buffers were empty during a transfer
> In what sense do the bursts affect performance? It makes sense that
> having all the buffers empty from time to time would reduce throughput.
> But as long as each buffer gets filled and queued before the previous
> buffer has finished transmitting, it shouldn't matter how "bursty" the
> behavior is.
This is true. In my system the buffer doesn't get filled up before the
previous has finished. With 4 buffers it works fine.
More information about the linaro-dev