Skip to content

Commit 52d6b92

Browse files
Ashok RajKAGA-KOKO
authored andcommitted
x86/hotplug: Silence APIC only after all interrupts are migrated
There is a race when taking a CPU offline. Current code looks like this: native_cpu_disable() { ... apic_soft_disable(); /* * Any existing set bits for pending interrupt to * this CPU are preserved and will be sent via IPI * to another CPU by fixup_irqs(). */ cpu_disable_common(); { .... /* * Race window happens here. Once local APIC has been * disabled any new interrupts from the device to * the old CPU are lost */ fixup_irqs(); // Too late to capture anything in IRR. ... } } The fix is to disable the APIC *after* cpu_disable_common(). Testing was done with a USB NIC that provided a source of frequent interrupts. A script migrated interrupts to a specific CPU and then took that CPU offline. Fixes: 60dcaad ("x86/hotplug: Silence APIC and NMI when CPU is dead") Reported-by: Evan Green <evgreen@chromium.org> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mathias Nyman <mathias.nyman@linux.intel.com> Tested-by: Evan Green <evgreen@chromium.org> Reviewed-by: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
1 parent d4f0726 commit 52d6b92

1 file changed

Lines changed: 20 additions & 6 deletions

File tree

arch/x86/kernel/smpboot.c

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1594,14 +1594,28 @@ int native_cpu_disable(void)
15941594
if (ret)
15951595
return ret;
15961596

1597-
/*
1598-
* Disable the local APIC. Otherwise IPI broadcasts will reach
1599-
* it. It still responds normally to INIT, NMI, SMI, and SIPI
1600-
* messages.
1601-
*/
1602-
apic_soft_disable();
16031597
cpu_disable_common();
16041598

1599+
/*
1600+
* Disable the local APIC. Otherwise IPI broadcasts will reach
1601+
* it. It still responds normally to INIT, NMI, SMI, and SIPI
1602+
* messages.
1603+
*
1604+
* Disabling the APIC must happen after cpu_disable_common()
1605+
* which invokes fixup_irqs().
1606+
*
1607+
* Disabling the APIC preserves already set bits in IRR, but
1608+
* an interrupt arriving after disabling the local APIC does not
1609+
* set the corresponding IRR bit.
1610+
*
1611+
* fixup_irqs() scans IRR for set bits so it can raise a not
1612+
* yet handled interrupt on the new destination CPU via an IPI
1613+
* but obviously it can't do so for IRR bits which are not set.
1614+
* IOW, interrupts arriving after disabling the local APIC will
1615+
* be lost.
1616+
*/
1617+
apic_soft_disable();
1618+
16051619
return 0;
16061620
}
16071621

0 commit comments

Comments
 (0)