On Tue, Aug 05, 2025 at 01:46:51PM +0300, Dmitry Baryshkov wrote:
On Mon, Aug 04, 2025 at 11:13:59PM +0300, Nicusor Huhulea wrote:
A regression in output polling was introduced by commit 4ad8d57d902fbc7c82507cfc1b031f3a07c3de6e ("drm: Check output polling initialized before disabling") in the 6.1.y stable tree. As a result, when the i915 driver detects an HPD IRQ storm and attempts to switch from IRQ-based hotplug detection to polling, output polling fails to resume.
The root cause is the use of dev->mode_config.poll_running. Once poll_running is set (during the first connector detection) the calls to drm_kms_helper_poll_enable(), such as intel_hpd_irq_storm_switch_to_polling() fails to schedule output_poll_work as expected. Therefore, after an IRQ storm disables HPD IRQs, polling does not start, breaking hotplug detection.
Why doesn't disable path use drm_kms_helper_poll_disable() ?
In general i915 doesn't disable polling as a whole after an IRQ storm and a 2 minute delay (or runtime resuming), since on some other connectors the polling may be still required.
Also, in the 6.1.y stable tree drm_kms_helper_poll_disable() is:
if (drm_WARN_ON(dev, !dev->mode_config.poll_enabled)) return;
cancel_delayed_work_sync(&dev->mode_config.output_poll_work);
so calling that wouldn't really fix the problem, which is clearly just an incorrect backport of the upstream commit 5abffb66d12bcac8 ("drm: Check output polling initialized before disabling") to the v6.1.y stable tree by commit 4ad8d57d902f ("drm: Check output polling initialized before disabling") in v6.1.y.
The upstream commit did not add the check for dev->mode_config.poll_running in drm_kms_helper_poll_enable(), the condition was only part of the diff context in the commit. Hence adding the condition in the 4ad8d57d902f backport commit was incorrect.
The fix is to remove the dev->mode_config.poll_running in the check condition, ensuring polling is always scheduled as requested.
Notes: Initial analysis, assumptions, device testing details, the correct fix and detailed rationale were discussed here https://lore.kernel.org/stable/aI32HUzrT95nS_H9@ideak-desk/
-- With best wishes Dmitry