On Mon, Jun 4, 2018 at 8:33 AM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> apic_ack_edge() is explicitely for handling interrupt affinity cleanup when
> interrupt remapping is not available or disable.
>
> Remapped interrupts and also some of the platform specific special
> interrupts, e.g. UV, invoke ack_APIC_irq() directly.
>
> To address the issue of failing an affinity update with -EBUSY the delayed
> affinity mechanism can be reused, but ack_APIC_irq() does not handle
> that. Adding this to ack_APIC_irq() is not possible, because that function
> is also used for exceptions and directly handled interrupts like IPIs.
>
> Create a new function, which just contains the conditional invocation of
> irq_move_irq() and the final ack_APIC_irq(). Making the invocation of
> irq_move_irq() conditional avoids the out of line call if the pending bit
> is not set.
>
> Reuse the new function in apic_ack_edge().
>
> Preparatory change for the real fix
>
> Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: stable(a)vger.kernel.org
Tested-by: Song Liu <songliubraving(a)fb.com>
> ---
> arch/x86/include/asm/apic.h | 2 ++
> arch/x86/kernel/apic/vector.c | 10 ++++++++--
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -436,6 +436,8 @@ static inline void apic_set_eoi_write(vo
>
> #endif /* CONFIG_X86_LOCAL_APIC */
>
> +extern void apic_ack_irq(struct irq_data *data);
> +
> static inline void ack_APIC_irq(void)
> {
> /*
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -809,11 +809,17 @@ static int apic_retrigger_irq(struct irq
> return 1;
> }
>
> +void apic_ack_irq(struct irq_data *irqd)
> +{
> + if (unlikely(irqd_is_setaffinity_pending(irqd)))
> + irq_move_irq(irqd);
> + ack_APIC_irq();
> +}
> +
> void apic_ack_edge(struct irq_data *irqd)
> {
> irq_complete_move(irqd_cfg(irqd));
> - irq_move_irq(irqd);
> - ack_APIC_irq();
> + apic_ack_irq(irqd);
> }
>
> static struct irq_chip lapic_controller = {
>
>
On Mon, Jun 4, 2018 at 8:33 AM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> The generic pending interrupt mechanism moves interrupts from the interrupt
> handler on the original target CPU to the new destination CPU. This is
> required for x86 and ia64 due to the way the interrupt delivery and
> acknowledge works if the interrupts are not remapped.
>
> However that update can fail for various reasons. Some of them are valid
> reasons to discard the pending update, but the case, when the previous move
> has not been fully cleaned up is not a legit reason to fail.
>
> Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
> a pending cleanup, and rearm the pending move in the irq dexcriptor so it's
> tried again when the next interrupt arrives.
>
> Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race")
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: stable(a)vger.kernel.org
Tested-by: Song Liu <songliubraving(a)fb.com>
> ---
> kernel/irq/migration.c | 24 ++++++++++++++++++------
> 1 file changed, 18 insertions(+), 6 deletions(-)
>
> --- a/kernel/irq/migration.c
> +++ b/kernel/irq/migration.c
> @@ -38,17 +38,18 @@ bool irq_fixup_move_pending(struct irq_d
> void irq_move_masked_irq(struct irq_data *idata)
> {
> struct irq_desc *desc = irq_data_to_desc(idata);
> - struct irq_chip *chip = desc->irq_data.chip;
> + struct irq_data *data = &desc->irq_data;
> + struct irq_chip *chip = data->chip;
>
> - if (likely(!irqd_is_setaffinity_pending(&desc->irq_data)))
> + if (likely(!irqd_is_setaffinity_pending(data)))
> return;
>
> - irqd_clr_move_pending(&desc->irq_data);
> + irqd_clr_move_pending(data);
>
> /*
> * Paranoia: cpu-local interrupts shouldn't be calling in here anyway.
> */
> - if (irqd_is_per_cpu(&desc->irq_data)) {
> + if (irqd_is_per_cpu(data)) {
> WARN_ON(1);
> return;
> }
> @@ -73,9 +74,20 @@ void irq_move_masked_irq(struct irq_data
> * For correct operation this depends on the caller
> * masking the irqs.
> */
> - if (cpumask_any_and(desc->pending_mask, cpu_online_mask) < nr_cpu_ids)
> - irq_do_set_affinity(&desc->irq_data, desc->pending_mask, false);
> + if (cpumask_any_and(desc->pending_mask, cpu_online_mask) < nr_cpu_ids) {
> + int ret;
>
> + ret = irq_do_set_affinity(data, desc->pending_mask, false);
> + /*
> + * If the there is a cleanup pending in the underlying
> + * vector management, reschedule the move for the next
> + * interrupt. Leave desc->pending_mask intact.
> + */
> + if (ret == -EBUSY) {
> + irqd_set_move_pending(data);
> + return;
> + }
> + }
> cpumask_clear(desc->pending_mask);
> }
>
>
>
On Mon, Jun 4, 2018 at 8:33 AM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> Several people observed the WARN_ON() in irq_matrix_free() which triggers
> when the caller tries to free an vector which is not in the allocation
> range. Song provided the trace information which allowed to decode the root
> cause.
>
> The rework of the vector allocation mechanism failed to preserve a sanity
> check, which prevents setting a new target vector/CPU when the previous
> affinity change has not fully completed.
>
> As a result a half finished affinity change can be overwritten, which can
> cause the leak of a irq descriptor pointer on the previous target CPU and
> double enqueue of the hlist head into the cleanup lists of two or more
> CPUs. After one CPU cleaned up its vector the next CPU will invoke the
> cleanup handler with vector 0, which triggers the out of range warning in
> the matrix allocator.
>
> Prevent this by checking the apic_data of the interrupt whether the
> move_in_progress flag is false and the hlist node is not hashed. Return
> -EBUSY if not.
>
> This prevents the damage and restores the behaviour before the vector
> allocation rework, but due to other changes in that area it also widens the
> chance that user space can observe -EBUSY. In theory this should be fine,
> but actually not all user space tools handle -EBUSY correctly. Addressing
> that is not part of this fix, but will be addressed in follow up patches.
>
> Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment")
> Reported-by: Dmitry Safonov <0x7f454c46(a)gmail.com>
> Reported-by: Tariq Toukan <tariqt(a)mellanox.com>
> Reported-by: Song Liu <liu.song.a23(a)gmail.com>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: stable(a)vger.kernel.org
Thanks Thomas!
This patch alone fixes my test: ethtool -L in a loop.
I also run the same test for the full set, and it works well.
Tested-by: Song Liu <songliubraving(a)fb.com>
> ---
> arch/x86/kernel/apic/vector.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -235,6 +235,15 @@ static int allocate_vector(struct irq_da
> if (vector && cpu_online(cpu) && cpumask_test_cpu(cpu, dest))
> return 0;
>
> + /*
> + * Careful here. @apicd might either have move_in_progress set or
> + * be enqueued for cleanup. Assigning a new vector would either
> + * leave a stale vector on some CPU around or in case of a pending
> + * cleanup corrupt the hlist.
> + */
> + if (apicd->move_in_progress || !hlist_unhashed(&apicd->clist))
> + return -EBUSY;
> +
> vector = irq_matrix_alloc(vector_matrix, dest, resvd, &cpu);
> if (vector > 0)
> apic_update_vector(irqd, vector, cpu);
>
>
Decided to add Enric's commit because it is also a bug fix instead
of modifying Chris commit.
Chris Chiu (1):
tpm: self test failure should not cause suspend to fail
Enric Balletbo i Serra (1):
tpm: do not suspend/resume if power stays on
drivers/char/tpm/tpm-chip.c | 12 ++++++++++++
drivers/char/tpm/tpm-interface.c | 7 +++++++
drivers/char/tpm/tpm.h | 1 +
3 files changed, 20 insertions(+)
--
v2: moved the check from tpm_of.c to tpm-chip.c as in v4.4 chip is
unreachable otherwise. I did compilation test now with BuildRoot
for power arch.
2.17.0