On Wed, Aug 06, 2025 at 12:02:02PM +0000, nicusor.huhulea@siemens.com wrote:
-----Original Message----- From: Imre Deak imre.deak@intel.com Sent: Tuesday, August 5, 2025 6:20 PM To: Dmitry Baryshkov dmitry.baryshkov@oss.qualcomm.com Cc: Huhulea, Nicusor Liviu (FT FDS CES LX PBU 1) nicusor.huhulea@siemens.com; stable@vger.kernel.org; dri- devel@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; cip-dev@lists.cip- project.org; shradhagupta@linux.microsoft.com; jouni.hogander@intel.com; neil.armstrong@linaro.org; jani.nikula@linux.intel.com; maarten.lankhorst@linux.intel.com; mripard@kernel.org; tzimmermann@suse.de; airlied@gmail.com; daniel@ffwll.ch; joonas.lahtinen@linux.intel.com; rodrigo.vivi@intel.com; laurentiu.palcu@oss.nxp.com; Hombourger, Cedric (FT FDS CES LX) cedric.hombourger@siemens.com; Bobade, Shrikant Krishnarao (FT FDS CES LX PBU 1) shrikant.bobade@siemens.com Subject: Re: [PATCH] drm/probe-helper: fix output polling not resuming after HPD IRQ storm
On Tue, Aug 05, 2025 at 01:46:51PM +0300, Dmitry Baryshkov wrote:
On Mon, Aug 04, 2025 at 11:13:59PM +0300, Nicusor Huhulea wrote:
A regression in output polling was introduced by commit 4ad8d57d902fbc7c82507cfc1b031f3a07c3de6e ("drm: Check output polling initialized before disabling") in the 6.1.y stable
tree.
As a result, when the i915 driver detects an HPD IRQ storm and attempts to switch from IRQ-based hotplug detection to polling, output polling
fails to resume.
The root cause is the use of dev->mode_config.poll_running. Once poll_running is set (during the first connector detection) the calls to drm_kms_helper_poll_enable(), such as intel_hpd_irq_storm_switch_to_polling() fails to schedule output_poll_work as
expected.
Therefore, after an IRQ storm disables HPD IRQs, polling does not start,
breaking hotplug detection.
Why doesn't disable path use drm_kms_helper_poll_disable() ?
In general i915 doesn't disable polling as a whole after an IRQ storm and a 2 minute delay (or runtime resuming), since on some other connectors the polling may be still required.
Also, in the 6.1.y stable tree drm_kms_helper_poll_disable() is:
if (drm_WARN_ON(dev, !dev->mode_config.poll_enabled)) return; cancel_delayed_work_sync(&dev->mode_config.output_poll_work);
so calling that wouldn't really fix the problem, which is clearly just an incorrect backport of the upstream commit 5abffb66d12bcac8 ("drm: Check output polling initialized before disabling") to the v6.1.y stable tree by commit 4ad8d57d902f ("drm: Check output polling initialized before disabling") in v6.1.y.
The upstream commit did not add the check for dev->mode_config.poll_running in drm_kms_helper_poll_enable(), the condition was only part of the diff context in the commit. Hence adding the condition in the 4ad8d57d902f backport commit was incorrect.
The fix is to remove the dev->mode_config.poll_running in the check condition, ensuring polling is always scheduled as requested.
I'm not a full-time kernel developer, but I spent some time trying to really understand both the rationale and the effects of commit "Check output polling initialized before disabling." Here's how I see the issue: That commit was introduced mainly as a defensive measure, to protect drivers such as hyperv-drm that do not initialize connector polling. In those drivers, calling drm_kms_helper_poll_disable() without proper polling setup could trigger warnings or even stack traces, since there are no outputs to poll and the polling helpers don't apply. From what I understand, the poll_running variable was added to prevent cases where polling was accidentally disabled for drivers that were never set up for it. However, this fix ended up creating a new class of breakage, which I have observed and describe here.
It was added to implement the very simple logic: If something isn't initialized, enabling or disabling it is an error. If something isn't enabled, disabling it is an error. If something isn't disabled, enabling it is an error.
To me, the core logic should be simple: if polling is needed, then we should allow it to proceed (regardless of whether it's been scheduled previously).
Looking at the drm_kms_helper_poll_disable() if (drm_WARN_ON(dev, !dev->mode_config.poll_enabled)) return;
If the driver never enabled polling (that is, drm_kms_helper_poll_enable() was never called), then the disable operation is effectively a no-op-totally safe for drivers like hyperv-drm.
And in the enable function drm_kms_helper_poll_enable(..): if (drm_WARN_ON_ONCE(dev, !dev->mode_config.poll_enabled) ||
!drm_kms_helper_poll || dev->mode_config.poll_running)
!drm_kms_helper_poll) return;
Why?
The main thing being guarded here is whether polling has actually been initialized: 1.For polling drivers like i915, removing poll_running from the enable path is both safe and necessary: it fixes the regression with HPD polling after IRQ storms
I believe in paired calls. If you want to use drm_kms_helper_poll_enable(), it should be previously drm_kms_helper_poll_disable()d. If you have manually disabled the IRQ, it should also be manually enabled.
Pairing the calls makes life much much easier.
2.For non-polling drivers like hyperv-drm, the existing checks on poll_enabled in both enable and disable functions are sufficient. Removing poll_running doesn't affect these drivers-they don't go through the polling code paths, so no polling gets scheduled or canceled by mistake
Therefore based on my understanding and testing removing poll_running guard not only fixes a real bug but also does not introduce new regressions for drivers that don't use output polling.