On 8/3/20 7:16 AM, Christian Eggers wrote:
On Saturday, 1 August 2020, 18:02:34 CEST, Jonathan Cameron wrote:
On Mon, 27 Jul 2020 16:57:13 +0200
Christian Eggers ceggers@arri.de wrote:
iio_trigger_poll() calls generic_handle_irq(). This function expects to be run with local IRQs disabled.
Was there an error or warning that lead to this patch?
[ 17.448466] 000: ------------[ cut here ]------------ [ 17.448481] 000: WARNING: CPU: 0 PID: 9 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x55/0xae [ 17.448511] 000: irq 236 handler irq_default_primary_handler+0x1/0x4 enabled interrupts [ 17.448526] 000: Modules linked in: bridge stp llc usb_f_ncm u_ether libcomposite sd_mod configfs cdc_acm usb_storage scsi_mod ci_hdrc_imx ci_hdrc st_magn_spi ulpi st_sensors_spi ehci_hcd regmap_spi tcpm roles st_magn_i2c typec st_sensors_i2c udc_core st_magn as73211 st_sensors imx_thermal usb49xx usbcore industrialio_triggered_buffer rtc_rv3028 kfifo_buf at24 usb_common nls_base i2c_dev usbmisc_imx phy_mxs_usb anatop_regulator imx2_wdt imx_fan spidev leds_pwm leds_gpio led_class iio_trig_sysfs imx6sx_adc industrialio fixed at25 spi_imx spi_bitbang imx_napi dev imx_sdma virt_dma nfsv3 nfs lockd grace sunrpc ksz9477_i2c ksz9477 tag_ksz ksz_common dsa_core phylink regmap_i2c i2c_imx i2c_core fec ptp pps_core micrel [ 17.448712] 000: CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.47-rt28+ #446 [ 17.448723] 000: Hardware name: Freescale i.MX6 Ultralite (Device Tree) [ 17.448738] 000: [<c0108265>] (unwind_backtrace) from [<c01070a7>] (show_stack+0xb/0xc) [ 17.448754] 000: [<c01070a7>] (show_stack) from [<c0110673>] (__warn+0x7b/0x8c) [ 17.448772] 000: [<c0110673>] (__warn) from [<c01106b5>] (warn_slowpath_fmt+0x31/0x50) [ 17.448787] 000: [<c01106b5>] (warn_slowpath_fmt) from [<c012be53>] (__handle_irq_event_percpu+0x55/0xae) [ 17.448807] 000: [<c012be53>] (__handle_irq_event_percpu) from [<c012bec5>] (handle_irq_event_percpu+0x19/0x40) [ 17.448823] 000: [<c012bec5>] (handle_irq_event_percpu) from [<c012bf2b>] (handle_irq_event+0x3f/0x5c) [ 17.448839] 000: [<c012bf2b>] (handle_irq_event) from [<c012dd73>] (handle_simple_irq+0x67/0x6a) [ 17.448854] 000: [<c012dd73>] (handle_simple_irq) from [<c012b915>] (generic_handle_irq+0xd/0x16) [ 17.448870] 000: [<c012b915>] (generic_handle_irq) from [<bf8fcf05>] (iio_trigger_poll+0x33/0x44 [industrialio]) [ 17.448962] 000: [<bf8fcf05>] (iio_trigger_poll [industrialio]) from [<c0147b93>] (irq_work_run_list+0x43/0x66) [ 17.449010] 000: [<c0147b93>] (irq_work_run_list) from [<c013804f>] (run_timer_softirq+0x7/0x3c) [ 17.449029] 000: [<c013804f>] (run_timer_softirq) from [<c01022cf>] (__do_softirq+0x10f/0x160) [ 17.449045] 000: [<c01022cf>] (__do_softirq) from [<c0112255>] (run_ksoftirqd+0x19/0x2c) [ 17.449061] 000: [<c0112255>] (run_ksoftirqd) from [<c012153b>] (smpboot_thread_fn+0x13b/0x140) [ 17.449078] 000: [<c012153b>] (smpboot_thread_fn) from [<c011f823>] (kthread+0xa3/0xac) [ 17.449095] 000: [<c011f823>] (kthread) from [<c01010f1>] (ret_from_fork+0x11/0x20) [ 17.449110] 000: Exception stack(0xc2063fb0 to 0xc2063ff8) [ 17.449119] 000: 3fa0: 00000000 00000000 00000000 00000000 [ 17.449130] 000: 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 17.449139] 000: 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 17.449146] 000: ---[ end trace 0000000000000002 ]---
Or can you point to what call in generic_handle_irq is making the assumption that we are breaking?
Given this is using the irq_work framework I'm wondering if this is a more general problem?
If I understand correctly, the kernel temporarily disables hardware interrupts while hardware irq handlers are run. In case of the iio-trig-hrtim and iio-trig-sysfs interrupts, __handle_irq_event_percpu() is not called from a hardware irq (where interrupts would be disabled), but from software.
The sysfs IIO trigger uses irq_work to schedule the iio_trigger_poll() and the promise of irq_work is that the callback will run in hard IRQ context. That's the whole point of it.
irq_work_run_list(), which shows up in your callgraph, has as BUG_ON(!irqs_disabled())[1], so we should never even get to calling iio_trigger_poll() if IRQs where not disabled at this point. That's the same condition that triggers the WARN_ON() in __handle_irq_event_percpu.
Are you using a non-upstream kernel? Maybe a RT kernel?
- Lars
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kern...