2016-12-27 13:16 GMT+01:00 Baolin Wang baolin.wang@linaro.org:
Hi,
On 27 December 2016 at 19:11, Felipe Balbi balbi@kernel.org wrote:
Hi,
Baolin Wang baolin.wang@linaro.org writes:
Hi,
On 27 December 2016 at 18:52, Janusz Dziedzic janusz.dziedzic@gmail.com wrote:
2016-12-26 9:01 GMT+01:00 Baolin Wang baolin.wang@linaro.org:
On some platfroms(like x86 platform), when one core is running the USB gadget irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can respond other interrupts from dwc3 controller and modify the event buffer by dwc3_interrupt() function, that will cause getting the wrong event count in irq thread handler to make the USB function abnormal.
We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
Interesting, I always think we mask interrupt in dwc3_interrupt() by setting DWC3_GEVNTSIZ_INTMASK And unmask interrupt when we end dwc3_thread_interrupt().
So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(), or I miss something? Do you have some traces that indicate this masking will not work correctly?
Yes, but we just masked the interrupts described in DEVTEN register, and we did not mask all the interrupts, like the endpoint command complete event, transfer complete event and so on, so we can still get interrupts.
not true, we masked interrupts for the entire event buffer:
Yes, you are right and I missed that. I should reproduce this problem and analyse the real reason.
static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt) { struct dwc3 *dwc = evt->dwc; u32 count; u32 reg;
if (pm_runtime_suspended(dwc->dev)) { pm_runtime_get(dwc->dev); disable_irq_nosync(dwc->irq_gadget); dwc->pending_events = true; return IRQ_HANDLED; } count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0)); count &= DWC3_GEVNTCOUNT_MASK; if (!count) return IRQ_NONE; evt->count = count; evt->flags |= DWC3_EVENT_PENDING; /* Mask interrupt */ reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0)); reg |= DWC3_GEVNTSIZ_INTMASK;
See here ?!?
dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg); return IRQ_WAKE_THREAD;
}
BTW, what value you get when problem occured, 0xFFFC?
Yes, something like this, the event count become huge.
Probably you have little bit different code than current community version (depends how your PM works).
This is possible when we write: dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0); And after that dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 4);
After that we will get 0xFFFC (-4).
Possible races: 1) dwc3_event_buffers_setup/dwc3_event_buffers_cleanup - write 0 2) dwc3_thread - write 4
While [1] could be called in PM work or UM context (init in Android case) spin_lock_irqsave() will only disable local irqs and still we could get IRQ on different core, next update evt->count and run thread...
So, seems your patch will solve this.
I am not sure this problem could be also visible in community current version.
Anyway, thanks for handling this.
BR Janusz
please send us tracepoint data. You probably need to compress it. Something like 256k of trace data is probably enough, so:
# mkdir -p /t # mount -t tracefs none /t # cd /t # echo 256 > buffer_size_kb # echo 1 > events/dwc3/enable # echo 0 > events/dwc3/dwc3_readl/enable # echo 0 > events/dwc3/dwc3_writel/enable
(reproduce)
# cp /t/trace /path/to/non-volatile/media/trace.txt
Okay, I try to do that. Thanks.
-- Baolin.wang Best Regards