c2856ae2f315d ("blk-mq: quiesce queue before freeing queue") has
already fixed this race, however the implied synchronize_rcu()
in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
performance regression.
Then 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
tried to quiesce queue for avoiding unnecessary synchronize_rcu()
only when queue initialization is done, because it is usual to see
lots of inexistent LUNs which need to be probed.
However, turns out it isn't safe to quiesce queue only when queue
initialization is done. Because when one SCSI command is completed,
the user of sending command can be waken up immediately, then the
scsi device may be removed, meantime the run queue in scsi_end_request()
is still in-progress, so kernel panic can be caused.
In Red Hat QE lab, there are several reports about this kind of kernel
panic triggered during kernel booting.
This patch tries to address the issue by grabing one queue usage
counter during freeing one request and the following run queue.
Fixes: 1311326cf4755c7 ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
Cc: Andrew Jones <drjones(a)redhat.com>
Cc: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: linux-scsi(a)vger.kernel.org
Cc: Martin K. Petersen <martin.petersen(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: James E.J. Bottomley <jejb(a)linux.vnet.ibm.com>
Cc: stable <stable(a)vger.kernel.org>
Cc: jianchao.wang <jianchao.w.wang(a)oracle.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-core.c | 5 ++---
drivers/scsi/scsi_lib.c | 8 ++++++++
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index ce12515f9b9b..deb56932f8c4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -798,9 +798,8 @@ void blk_cleanup_queue(struct request_queue *q)
* dispatch may still be in-progress since we dispatch requests
* from more than one contexts.
*
- * No need to quiesce queue if it isn't initialized yet since
- * blk_freeze_queue() should be enough for cases of passthrough
- * request.
+ * We rely on driver to deal with the race in case that queue
+ * initialization isn't done.
*/
if (q->mq_ops && blk_queue_init_done(q))
blk_mq_quiesce_queue(q);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c7fccbb8f554..fa6e0c3b3aa6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -697,6 +697,12 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
*/
scsi_mq_uninit_cmd(cmd);
+ /*
+ * queue is still alive, so grab the ref for preventing it
+ * from being cleaned up during running queue.
+ */
+ percpu_ref_get(&q->q_usage_counter);
+
__blk_mq_end_request(req, error);
if (scsi_target(sdev)->single_lun ||
@@ -704,6 +710,8 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
kblockd_schedule_work(&sdev->requeue_work);
else
blk_mq_run_hw_queues(q, true);
+
+ percpu_ref_put(&q->q_usage_counter);
} else {
unsigned long flags;
--
2.9.5
This is a follow-up to the discussion in [1], [2].
IOMMUs using ARMv7 short-descriptor format require page tables
(level 1 and 2) to be allocated within the first 4GB of RAM, even
on 64-bit systems.
For L1 tables that are bigger than a page, we can just use __get_free_pages
with GFP_DMA32 (on arm64 systems only, arm would still use GFP_DMA).
For L2 tables that only take 1KB, it would be a waste to allocate a full
page, so we considered 3 approaches:
1. This series, adding support for GFP_DMA32 slab caches.
2. genalloc, which requires pre-allocating the maximum number of L2 page
tables (4096, so 4MB of memory).
3. page_frag, which is not very memory-efficient as it is unable to reuse
freed fragments until the whole page is freed. [3]
This series is the most memory-efficient approach.
stable@ note:
We confirmed that this is a regression, and IOMMU errors happen on 4.19
and linux-next/master on MT8173 (elm, Acer Chromebook R13). The issue
most likely starts from commit ad67f5a6545f ("arm64: replace ZONE_DMA
with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number of Mediatek
platforms (and maybe others?).
[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html
[3] https://patchwork.codeaurora.org/patch/671639/
Changes since v1:
- Add support for SLAB_CACHE_DMA32 in slab and slub (patches 1/2)
- iommu/io-pgtable-arm-v7s (patch 3):
- Changed approach to use SLAB_CACHE_DMA32 added by the previous
commit.
- Use DMA or DMA32 depending on the architecture (DMA for arm,
DMA32 for arm64).
Changes since v2:
- Reworded and expanded commit messages
- Added cache_dma32 documentation in PATCH 2/3.
v3 used the page_frag approach, see [3].
Changes since v4:
- Dropped change that removed GFP_DMA32 from GFP_SLAB_BUG_MASK:
instead we can just call kmem_cache_*alloc without GFP_DMA32
parameter. This also means that we can drop PATCH v4 1/3, as we
do not make any changes in GFP flag verification.
- Dropped hunks that added cache_dma32 sysfs file, and moved
the hunks to PATCH v5 3/3, so that maintainer can decide whether
to pick the change independently.
Changes since v5:
- Rename ARM_V7S_TABLE_SLAB_CACHE to ARM_V7S_TABLE_SLAB_FLAGS.
- Add stable@ to cc.
Nicolas Boichat (3):
mm: Add support for kmem caches in DMA32 zone
iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
mm: Add /sys/kernel/slab/cache/cache_dma32
Documentation/ABI/testing/sysfs-kernel-slab | 9 +++++++++
drivers/iommu/io-pgtable-arm-v7s.c | 19 +++++++++++++++----
include/linux/slab.h | 2 ++
mm/slab.c | 2 ++
mm/slab.h | 3 ++-
mm/slab_common.c | 2 +-
mm/slub.c | 16 ++++++++++++++++
tools/vm/slabinfo.c | 7 ++++++-
8 files changed, 53 insertions(+), 7 deletions(-)
--
2.20.0.rc2.403.gdbc3b29805-goog
This is the start of the stable review cycle for the 4.19.1 release.
There are 24 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Nov 4 18:28:10 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.1-rc1
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: remove ipv6 zero address check in mcast queries
David S. Miller <davem(a)davemloft.net>
sparc64: Wire up compat getpeername and getsockname.
David Miller <davem(a)redhat.com>
sparc64: Make corrupted user stacks more debuggable.
David S. Miller <davem(a)davemloft.net>
sparc64: Export __node_distance.
Xin Long <lucien.xin(a)gmail.com>
sctp: check policy more carefully when getting pr status
Ivan Vecera <ivecera(a)redhat.com>
Revert "be2net: remove desc field from be_eq_obj"
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix broken Wake-on-LAN from S5 (poweroff)
David S. Miller <davem(a)davemloft.net>
net: Properly unlink GRO packets on overflow.
Cong Wang <xiyou.wangcong(a)gmail.com>
net: drop skb on failure in ip_check_defrag()
Shalom Toledo <shalomt(a)mellanox.com>
mlxsw: core: Fix devlink unregister flow
Petr Machata <petrm(a)mellanox.com>
mlxsw: spectrum_switchdev: Don't ignore deletions of learned MACs
Karsten Graul <kgraul(a)linux.ibm.com>
net/smc: fix smc_buf_unuse to use the lgr pointer
David Ahern <dsahern(a)gmail.com>
net/ipv6: Allow onlink routes to have a device mismatch if it is the default route
Jaime Caamaño Ruiz <jcaamano(a)suse.com>
openvswitch: Fix push/pop ethernet validation
Tobias Jungel <tobias.jungel(a)gmail.com>
bonding: fix length of actor system
Jason Wang <jasowang(a)redhat.com>
vhost: Fix Spectre V1 vulnerability
Ido Schimmel <idosch(a)mellanox.com>
rtnetlink: Disallow FDB configuration for non-Ethernet device
Karsten Graul <kgraul(a)linux.ibm.com>
Revert "net: simplify sock_poll_wait"
Sean Tranchetti <stranche(a)codeaurora.org>
net: udp: fix handling of CHECKSUM_COMPLETE packets
Niklas Cassel <niklas.cassel(a)linaro.org>
net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
Jakub Kicinski <jakub.kicinski(a)netronome.com>
net: sched: gred: pass the right attribute to gred_change_table_def()
Eric Dumazet <edumazet(a)google.com>
net/mlx5e: fix csum adjustments caused by RXFCS
Stefano Brivio <sbrivio(a)redhat.com>
ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called
Hangbin Liu <liuhangbin(a)gmail.com>
bridge: do not add port to router list when receives query with source 0.0.0.0
-------------
Diffstat:
Makefile | 4 +-
arch/sparc/include/asm/switch_to_64.h | 3 +-
arch/sparc/kernel/process_64.c | 25 +++++++++---
arch/sparc/kernel/rtrap_64.S | 1 +
arch/sparc/kernel/signal32.c | 12 +++++-
arch/sparc/kernel/signal_64.c | 6 ++-
arch/sparc/kernel/systbls_64.S | 4 +-
arch/sparc/mm/init_64.c | 1 +
crypto/af_alg.c | 2 +-
drivers/net/bonding/bond_netlink.c | 3 +-
drivers/net/ethernet/emulex/benet/be.h | 1 +
drivers/net/ethernet/emulex/benet/be_main.c | 6 +--
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 45 +++++-----------------
drivers/net/ethernet/mellanox/mlxsw/core.c | 24 ++++++++----
.../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 2 -
drivers/net/ethernet/realtek/r8169.c | 9 ++++-
drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 2 +-
drivers/vhost/vhost.c | 2 +
include/net/sock.h | 12 ++++--
net/atm/common.c | 2 +-
net/bridge/br_multicast.c | 9 ++++-
net/caif/caif_socket.c | 2 +-
net/core/datagram.c | 7 ++--
net/core/dev.c | 1 +
net/core/rtnetlink.c | 10 +++++
net/dccp/proto.c | 2 +-
net/ipv4/ip_fragment.c | 12 ++++--
net/ipv4/tcp.c | 2 +-
net/ipv4/udp.c | 20 +++++++++-
net/ipv6/ip6_checksum.c | 20 +++++++++-
net/ipv6/ndisc.c | 3 +-
net/ipv6/route.c | 2 +
net/iucv/af_iucv.c | 2 +-
net/nfc/llcp_sock.c | 2 +-
net/openvswitch/flow_netlink.c | 4 +-
net/rxrpc/af_rxrpc.c | 2 +-
net/sched/sch_gred.c | 2 +-
net/sctp/socket.c | 8 ++--
net/smc/af_smc.c | 2 +-
net/smc/smc_core.c | 25 ++++++------
net/tipc/socket.c | 2 +-
net/unix/af_unix.c | 4 +-
tools/testing/selftests/net/fib-onlink-tests.sh | 14 +++----
43 files changed, 200 insertions(+), 123 deletions(-)
The mce handler for 'nfit' devices is called for memory errors on a
Non-Volatile DIMM, and adds the error location to a 'badblocks' list.
This list is used by the various NVDIMM drivers to avoid consuming known
poison locations during IO.
The mce handler gets called for both corrected and uncorrectable errors.
Until now, both kinds of errors have been added to the badblocks list.
However, corrected memory errors indicate that the problem has already
been fixed by hardware, and the resulting interrupt is merely a
notification to Linux. As far as future accesses to that location are
concerned, it is perfectly fine to use, and thus doesn't need to be
included in the above badblocks list.
Add a check in the nfit mce handler to filter out corrected mce events,
and only process uncorrectable errors.
Reported-by: Omar Avelar <omar.avelar(a)intel.com>
Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
Cc: stable(a)vger.kernel.org
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
arch/x86/include/asm/mce.h | 1 +
arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
drivers/acpi/nfit/mce.c | 4 ++--
3 files changed, 5 insertions(+), 3 deletions(-)
v3: Unchanged from v2
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3a17107594c8..3111b3cee2ee 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -216,6 +216,7 @@ static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *s
int mce_available(struct cpuinfo_x86 *c);
bool mce_is_memory_error(struct mce *m);
+bool mce_is_correctable(struct mce *m);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 953b3ce92dcc..27015948bc41 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -534,7 +534,7 @@ bool mce_is_memory_error(struct mce *m)
}
EXPORT_SYMBOL_GPL(mce_is_memory_error);
-static bool mce_is_correctable(struct mce *m)
+bool mce_is_correctable(struct mce *m)
{
if (m->cpuvendor == X86_VENDOR_AMD && m->status & MCI_STATUS_DEFERRED)
return false;
@@ -544,6 +544,7 @@ static bool mce_is_correctable(struct mce *m)
return true;
}
+EXPORT_SYMBOL_GPL(mce_is_correctable);
static bool cec_add_mce(struct mce *m)
{
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index e9626bf6ca29..7a51707f87e9 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -25,8 +25,8 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
struct acpi_nfit_desc *acpi_desc;
struct nfit_spa *nfit_spa;
- /* We only care about memory errors */
- if (!mce_is_memory_error(mce))
+ /* We only care about uncorrectable memory errors */
+ if (!mce_is_memory_error(mce) || mce_is_correctable(mce))
return NOTIFY_DONE;
/*
--
2.17.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From da791a667536bf8322042e38ca85d55a78d3c273 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Mon, 10 Dec 2018 14:35:14 +0100
Subject: [PATCH] futex: Cure exit race
Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():
CPU0 CPU1
sys_exit() sys_futex()
do_exit() futex_lock_pi()
exit_signals(tsk) No waiters:
tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
mm_release(tsk) Set waiter bit
exit_robust_list(tsk) { *uaddr = 0x80000PID;
Set owner died attach_to_pi_owner() {
*uaddr = 0xC0000000; tsk = get_task(PID);
} if (!tsk->flags & PF_EXITING) {
... attach();
tsk->flags |= PF_EXITPIDONE; } else {
if (!(tsk->flags & PF_EXITPIDONE))
return -EAGAIN;
return -ESRCH; <--- FAIL
}
ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.
Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.
If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.
Reported-by: Stefan Liebler <stli(a)linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: Peter Zijlstra <peterz(a)infradead.org>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: Darren Hart <dvhart(a)infradead.org>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
diff --git a/kernel/futex.c b/kernel/futex.c
index f423f9b6577e..5cc8083a4c89 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1148,11 +1148,65 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
return ret;
}
+static int handle_exit_race(u32 __user *uaddr, u32 uval,
+ struct task_struct *tsk)
+{
+ u32 uval2;
+
+ /*
+ * If PF_EXITPIDONE is not yet set, then try again.
+ */
+ if (tsk && !(tsk->flags & PF_EXITPIDONE))
+ return -EAGAIN;
+
+ /*
+ * Reread the user space value to handle the following situation:
+ *
+ * CPU0 CPU1
+ *
+ * sys_exit() sys_futex()
+ * do_exit() futex_lock_pi()
+ * futex_lock_pi_atomic()
+ * exit_signals(tsk) No waiters:
+ * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID
+ * mm_release(tsk) Set waiter bit
+ * exit_robust_list(tsk) { *uaddr = 0x80000PID;
+ * Set owner died attach_to_pi_owner() {
+ * *uaddr = 0xC0000000; tsk = get_task(PID);
+ * } if (!tsk->flags & PF_EXITING) {
+ * ... attach();
+ * tsk->flags |= PF_EXITPIDONE; } else {
+ * if (!(tsk->flags & PF_EXITPIDONE))
+ * return -EAGAIN;
+ * return -ESRCH; <--- FAIL
+ * }
+ *
+ * Returning ESRCH unconditionally is wrong here because the
+ * user space value has been changed by the exiting task.
+ *
+ * The same logic applies to the case where the exiting task is
+ * already gone.
+ */
+ if (get_futex_value_locked(&uval2, uaddr))
+ return -EFAULT;
+
+ /* If the user space value has changed, try again. */
+ if (uval2 != uval)
+ return -EAGAIN;
+
+ /*
+ * The exiting task did not have a robust list, the robust list was
+ * corrupted or the user space value in *uaddr is simply bogus.
+ * Give up and tell user space.
+ */
+ return -ESRCH;
+}
+
/*
* Lookup the task for the TID provided from user space and attach to
* it after doing proper sanity checks.
*/
-static int attach_to_pi_owner(u32 uval, union futex_key *key,
+static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
struct futex_pi_state **ps)
{
pid_t pid = uval & FUTEX_TID_MASK;
@@ -1162,12 +1216,15 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
/*
* We are the first waiter - try to look up the real owner and attach
* the new pi_state to it, but bail out when TID = 0 [1]
+ *
+ * The !pid check is paranoid. None of the call sites should end up
+ * with pid == 0, but better safe than sorry. Let the caller retry
*/
if (!pid)
- return -ESRCH;
+ return -EAGAIN;
p = find_get_task_by_vpid(pid);
if (!p)
- return -ESRCH;
+ return handle_exit_race(uaddr, uval, NULL);
if (unlikely(p->flags & PF_KTHREAD)) {
put_task_struct(p);
@@ -1187,7 +1244,7 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key,
* set, we know that the task has finished the
* cleanup:
*/
- int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
+ int ret = handle_exit_race(uaddr, uval, p);
raw_spin_unlock_irq(&p->pi_lock);
put_task_struct(p);
@@ -1244,7 +1301,7 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval,
* We are the first waiter - try to look up the owner based on
* @uval and attach to it.
*/
- return attach_to_pi_owner(uval, key, ps);
+ return attach_to_pi_owner(uaddr, uval, key, ps);
}
static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
@@ -1352,7 +1409,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
* attach to the owner. If that fails, no harm done, we only
* set the FUTEX_WAITERS bit in the user space variable.
*/
- return attach_to_pi_owner(uval, key, ps);
+ return attach_to_pi_owner(uaddr, newval, key, ps);
}
/**
Dear Greg,
Commit ef86f3a7 (genirq/affinity: assign vectors to all possible CPUs) added
for Linux 4.14.56 causes the aacraid module to not detect the attached devices
anymore on a Dell PowerEdge R720 with two six core 24x E5-2630 @ 2.30GHz.
```
$ dmesg | grep raid
[ 0.269768] raid6: sse2x1 gen() 7179 MB/s
[ 0.290069] raid6: sse2x1 xor() 5636 MB/s
[ 0.311068] raid6: sse2x2 gen() 9160 MB/s
[ 0.332076] raid6: sse2x2 xor() 6375 MB/s
[ 0.353075] raid6: sse2x4 gen() 11164 MB/s
[ 0.374064] raid6: sse2x4 xor() 7429 MB/s
[ 0.379001] raid6: using algorithm sse2x4 gen() 11164 MB/s
[ 0.386001] raid6: .... xor() 7429 MB/s, rmw enabled
[ 0.391008] raid6: using ssse3x2 recovery algorithm
[ 3.559682] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
[ 3.570061] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
[ 10.725767] Adaptec aacraid driver 1.2.1[50834]-custom
[ 10.731724] aacraid 0000:04:00.0: can't disable ASPM; OS doesn't have ASPM control
[ 10.743295] aacraid: Comm Interface type3 enabled
$ lspci -nn | grep Adaptec
04:00.0 Serial Attached SCSI controller [0107]: Adaptec Series 8 12G SAS/PCIe 3 [9005:028d] (rev 01)
42:00.0 Serial Attached SCSI controller [0107]: Adaptec Smart Storage PQI 12G SAS/PCIe 3 [9005:028f] (rev 01)
```
But, it still works with a Dell PowerEdge R715 with two eight core AMD
Opteron 6136, the card below.
```
$ lspci -nn | grep Adaptec
22:00.0 Serial Attached SCSI controller [0107]: Adaptec Series 8 12G SAS/PCIe 3 [9005:028d] (rev 01)
```
Reverting the commit fixes the issue.
commit ef86f3a72adb8a7931f67335560740a7ad696d1d
Author: Christoph Hellwig <hch(a)lst.de>
Date: Fri Jan 12 10:53:05 2018 +0800
genirq/affinity: assign vectors to all possible CPUs
commit 84676c1f21e8ff54befe985f4f14dc1edc10046b upstream.
Currently we assign managed interrupt vectors to all present CPUs. This
works fine for systems were we only online/offline CPUs. But in case of
systems that support physical CPU hotplug (or the virtualized version of
it) this means the additional CPUs covered for in the ACPI tables or on
the command line are not catered for. To fix this we'd either need to
introduce new hotplug CPU states just for this case, or we can start
assining vectors to possible but not present CPUs.
Reported-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Tested-by: Stefan Haberland <sth(a)linux.vnet.ibm.com>
Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU")
Cc: linux-kernel(a)vger.kernel.org
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
The problem doesn’t happen with Linux 4.17.11, so there are commits in
Linux master fixing this. Unfortunately, my attempts to find out failed.
I was able to cherry-pick the three commits below on top of 4.14.62,
but the problem persists.
6aba81b5a2f5 genirq/affinity: Don't return with empty affinity masks on error
355d7ecdea35 scsi: hpsa: fix selection of reply queue
e944e9615741 scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity
Trying to cherry-pick the commits below, referencing the commit
in question, gave conflicts.
1. adbe552349f2 scsi: megaraid_sas: fix selection of reply queue
2. d3056812e7df genirq/affinity: Spread irq vectors among present CPUs as far as possible
To avoid further trial and error with the server with a slow firmware,
do you know what commits should fix the issue?
Kind regards,
Paul
PS: I couldn’t find, who suggested this for stable, that means how
it was picked to be added to stable. Is there an easy way to find
that out?
Thomas,
Andi and I have made an update to our draft of the Spectre admin guide.
We may be out on Christmas vacation for a while. But we want to
send it out for everyone to take a look.
Thanks.
Tim
From: Andi Kleen <ak(a)linux.intel.com>
There are no document in admin guides describing
Spectre v1 and v2 side channels and their mitigations
in Linux.
Create a document to describe Spectre and the mitigation
methods used in the kernel.
Signed-off-by: Andi Kleen <ak(a)linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen(a)linux.intel.com>
---
Documentation/admin-guide/spectre.rst | 502 ++++++++++++++++++++++++++++++++++
1 file changed, 502 insertions(+)
create mode 100644 Documentation/admin-guide/spectre.rst
diff --git a/Documentation/admin-guide/spectre.rst b/Documentation/admin-guide/spectre.rst
new file mode 100644
index 0000000..0ba708e
--- /dev/null
+++ b/Documentation/admin-guide/spectre.rst
@@ -0,0 +1,502 @@
+Spectre side channels
+=====================
+
+Spectre is a class of side channel attacks against modern CPUs that
+exploit branch prediction and speculative execution to read memory,
+possibly bypassing access controls. These exploits do not modify memory.
+
+This document covers Spectre variant 1 and 2.
+
+Affected processors
+-------------------
+
+The vulnerability affects a wide range of modern high performance
+processors, since most modern high speed processors use branch prediction
+and speculative execution.
+
+The following CPUs are vulnerable:
+
+ - Intel Core, Atom, Pentium, Xeon CPUs
+ - AMD CPUs like Phenom, EPYC, Zen.
+ - IBM processors like POWER and zSeries
+ - Higher end ARM processors
+ - Apple CPUs
+ - Higher end MIPS CPUs
+ - Likely most other high performance CPUs. Contact your CPU vendor for details.
+
+This document describes the mitigations on Intel CPUs. Mitigations
+on other architectures may be different.
+
+Related CVEs
+------------
+
+The following CVE entries describe Spectre variants:
+
+ ============= ======================= ==========
+ CVE-2017-5753 Bounds check bypass Spectre-V1
+ CVE-2017-5715 Branch target injection Spectre-V2
+
+Problem
+-------
+
+CPUs have shared caches, such as buffers for branch prediction, which are
+later used to guide speculative execution. These buffers are not flushed
+over context switches or change in privilege levels. Malicious software
+might influence these buffers and trigger specific speculative execution
+in the kernel or different user processes. This speculative execution can
+then be used to read data in memory and cause side effects, such as displacing
+data in a data cache. The side effect can then later be measured by the
+malicious software, and used to determine the memory values read speculatively.
+
+Spectre attacks allow tricking other software to disclose
+values in their memory.
+
+In a typical Spectre variant 1 attack, the attacker passes an parameter
+to a victim. The victim boundary checks the parameter and rejects illegal
+values. However due to speculation over branch prediction the code path
+for correct values might be speculatively executed, then reference memory
+controlled by the input parameter and leave measurable side effects in
+the caches. The attacker could then measure these side effects
+and determine the leaked value.
+
+There are some extensions of Spectre variant 1 attacks for reading
+data over the network, see [2]. However the attacks are very
+difficult, low bandwidth and fragile and considered low risk.
+
+For Spectre variant 2 the attacker poisons the indirect branch
+predictors of the CPU. Then control is passed to the victim, which
+executes indirect branches. Due to the poisoned branch predictor data
+the CPU can speculatively execute arbitrary code in the victim's
+address space, such as a code sequence ("disclosure gadget") that
+reads arbitrary data on some input parameter and causes a measurable
+cache side effect based on the value. The attacker can then measure
+this side effect after gaining control again and determine the value.
+
+The most useful gadgets take an attacker-controlled input parameter so
+that the memory read can be controlled. Gadgets without input parameters
+might be possible, but the attacker would have very little control over what
+memory can be read, reducing the risk of the attack revealing useful data.
+
+Attack scenarios
+----------------
+
+Here is a list of attack scenarios that have been anticipated, but
+may not cover all possible attack patterns. Reduing the occurrences of
+attack pre-requisites listed can reduce the risk that a spectre attack
+leaks useful data.
+
+1. Local User process attacking kernel
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Code in system calls often enforces access controls with conditional
+branches based on user data. These branches are potential targets for
+Spectre v2 exploits. Interrupt handlers, on the other hand, rarely
+handle user data or enforce access controls, which makes them unlikely
+exploit targets.
+
+For typical variant 2 attack, the attacker may poison the CPU branch
+buffers first, and then enter the kernel and trick it into jumping to a
+disclosure gadget through an indirect branch. If the attacker wants to control the
+memory addresses leaked, it would also need to pass a parameter
+to the gadget, either through a register or through a known address in
+memory. Finally when it executes again it can measure the side effect.
+
+Necessary Prequisites:
+1. Malicious local process passing parameters to kernel
+2. Kernel has secrets.
+
+2. User process attacking another user process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In this scenario an malicious user process wants to attack another
+user process through a context switch.
+
+For variant 1 this generally requires passing some parameter between
+the processes, which needs a data passing relationship, such a remote
+procedure calls (RPC).
+
+For variant 2 the poisoning can happen through a context switch, or
+on CPUs with simultaneous multi-threading (SMT) potentially on the
+thread sibling executing in parallel on the same core. In either case,
+controlling the memory leaked by the disclosure gadget also requires a data
+passing relationship to the victim process, otherwise while it may
+observe values through side effects, it won't know which memory
+addresses they relate to.
+
+Necessary Prerequisites:
+1. Malicious code running as local process
+2. Victim processes containing secrets running on same core.
+
+3. User sandbox attacking runtime in process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A process, such as a web browser, might be running interpreted or JITed
+untrusted code, such as javascript code downloaded from a website.
+It uses restrictions in the JIT code generator and checks in a run time
+to prevent the untrusted code from attacking the hosting process.
+
+The untrusted code might either use variant 1 or 2 to trick
+a disclosure gadget in the run time to read memory inside the process.
+
+Necessary Prerequisites:
+1. Sandbox in process running untrusted code.
+2. Runtime in same process containing secrets.
+
+4. Kernel sandbox attacking kernel
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The kernel has support for running user-supplied programs within the
+kernel. Specific rules (such as bounds checking) are enforced on these
+programs by the kernel to ensure that they do not violate access controls.
+
+eBPF is a kernel sub-system that uses user-supplied program
+to execute JITed untrusted byte code inside the kernel. eBPF is used
+for manipulating and examining network packets, examining system call
+parameters for sand boxes and other uses.
+
+A malicious local process could upload and trigger an malicious
+eBPF script to the kernel, with the script attacking the kernel
+using variant 1 or 2 and reading memory.
+
+Necessary Prerequisites:
+1. Malicious local process
+2. eBPF JIT enabled for unprivileged users, attacking kernel with secrets
+on the same machine.
+
+5. Virtualization guest attacking host
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+An untrusted guest might attack the host through a hyper call
+or other virtualization exit.
+
+Necessary Prerequisites:
+1. Untrusted guest attacking host
+2. Host has secrets on local machine.
+
+For variant 1 VM exits use appropriate mitigations
+("bounds clipping") to prevent speculation leaking data
+in kernel code. For variant 2 the kernel flushes the branch buffer.
+
+6. Virtualization guest attacking other guest
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+An untrusted guest attacking another guest containing
+secrets. Mitigations are similar to when a guest attack
+the host.
+
+Runtime vulnerability information
+---------------------------------
+
+The kernel reports the vulnerability and mitigation status in
+/sys/devices/system/cpu/vulnerabilities/*
+
+The spectre_v1 file describes the always enabled variant 1
+mitigation:
+
+/sys/devices/system/cpu/vulnerabilities/spectre_v1
+
+The value in this file:
+
+ ======================================= =================================
+ 'Mitigation: __user pointer sanitation' Protection in kernel on a case by
+ case base with explicit pointer
+ sanitation.
+ ======================================= =================================
+
+The spectre_v2 kernel file reports if the kernel has been compiled with a
+retpoline aware compiler, if the CPU has hardware mitigation, and if the
+CPU has microcode support for additional process specific mitigations.
+
+It also reports CPU features enabled by microcode to mitigate attack
+between user processes:
+
+1. Indirect Branch Prediction Barrier (IBPB) to add additional
+ isolation between processes of different users
+2. Single Thread Indirect Branch Prediction (STIBP) to additional
+ isolation between CPU threads running on the same core.
+
+These CPU features may impact performance when used and can
+be enabled per process on a case-by-case base.
+
+/sys/devices/system/cpu/vulnerabilities/spectre_v2
+
+The values in this file:
+
+ - Kernel status:
+
+ ==================================== =================================
+ 'Not affected' The processor is not vulnerable
+ 'Vulnerable' Vulnerable, no mitigation
+ 'Mitigation: Full generic retpoline' Software-focused mitigation
+ 'Mitigation: Full AMD retpoline' AMD-specific software mitigation
+ 'Mitigation: Enhanced IBRS' Hardware-focused mitigation
+ ==================================== =================================
+
+ - Firmware status:
+
+ ========== =============================================================
+ 'IBRS_FW' Protection against user program attacks when calling firmware
+ ========== =============================================================
+
+ - Indirect branch prediction barrier (IBPB) status for protection between
+ processes of different users. This feature can be controlled through
+ prctl per process, or through kernel command line options. For more details
+ see below.
+
+ =================== ========================================================
+ 'IBPB: disabled' IBPB unused
+ 'IBPB: always-on' Use IBPB on all tasks
+ 'IBPB: conditional' Use IBPB on SECCOMP or indirect branch restricted tasks
+ =================== ========================================================
+
+ - Single threaded indirect branch prediction (STIBP) status for protection
+ between different hyper threads. This feature can be controlled through
+ prctl per process, or through kernel command line options. For more details
+ see below.
+
+ ==================== ========================================================
+ 'STIBP: disabled' STIBP unused
+ 'STIBP: forced' Use STIBP on all tasks
+ 'STIBP: conditional' Use STIBP on SECCOMP or indirect branch restricted tasks
+ ==================== ========================================================
+
+ - Return stack buffer (RSB) protection status:
+
+ ============= ===========================================
+ 'RSB filling' Protection of RSB on context switch enabled
+ ============= ===========================================
+
+Full mitigations might require an microcode update from the CPU
+vendor. When the necessary microcode is not available the kernel
+will report vulnerability.
+
+Kernel mitigation
+-----------------
+
+The kernel has default on mitigations for Variant 1 and Variant 2
+against attacks from user programs or guests. For variant 1 it
+annotates vulnerable kernel code (as determined by the sparse code
+scanning tool and code audits) to use "bounds clipping" to avoid any
+usable disclosure gadgets.
+
+For variant 2 the kernel employs "retpoline" with compiler help to secure
+the indirect branches inside the kernel, when CONFIG_RETPOLINE is enabled
+and the compiler supports retpoline. On Intel Skylake-era systems the
+mitigation covers most, but not all, cases, see [1] for more details.
+
+On CPUs with hardware mitigations for variant 2, retpoline is
+automatically disabled at runtime.
+
+Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y
+and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration)
+makes attacks on the kernel generally more difficult.
+
+Host mitigation
+---------------
+
+The Linux kernel uses retpoline to eliminate attacks on indirect
+branches. It also flushes the Return Branch Stack on every VM exit to
+prevent guests from attacking the host kernel when retpoline is
+enabled.
+
+Variant 1 attacks are mitigated unconditionally.
+
+The kernel also allows guests to use any microcode based mitigations
+they chose to use (such as IBPB or STIBP), assuming the
+host has an updated microcode and reports the feature in
+/sys/devices/system/cpu/vulnerabilities/spectre_v2.
+
+Mitigation control at kernel build time
+---------------------------------------
+
+When the CONFIG_RETPOLINE option is enabled the kernel uses special
+code sequences to avoid attacks on indirect branches through
+Variant 2 attacks.
+
+The compiler also needs to support retpoline and support the
+-mindirect-branch=thunk-extern -mindirect-branch-register options
+for gcc, or -mretpoline-external-thunk option for clang.
+
+When the compiler doesn't support these options the kernel
+will report that it is vulnerable.
+
+Variant 1 mitigations and other side channel related user APIs are
+enabled unconditionally.
+
+Hardware mitigation
+-------------------
+
+Some CPUs have hardware mitigations (e.g. enhanced IBRS) for Spectre
+variant 2. The 4.19 kernel has support for detecting this capability
+and automatically disable any unnecessary workarounds at runtime.
+
+User program mitigation
+-----------------------
+
+For variant 1 user programs can use LFENCE or bounds clipping. For more
+details see [3].
+
+For variant 2 user programs can be compiled with retpoline or
+restricting its indirect branch speculation via prctl. (See
+Documenation/speculation.txt for detailed API.)
+
+User programs should use address space randomization
+(/proc/sys/kernel/randomize_va_space = 1 or 2) to make any attacks
+more difficult.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+Spectre v2 mitigations can be disabled and force enabled at the kernel
+command line.
+
+ nospectre_v2 [X86] Disable all mitigations for the Spectre variant 2
+ (indirect branch prediction) vulnerability. System may
+ allow data leaks with this option, which is equivalent
+ to spectre_v2=off.
+
+
+ spectre_v2= [X86] Control mitigation of Spectre variant 2
+ (indirect branch speculation) vulnerability.
+ The default operation protects the kernel from
+ user space attacks.
+
+ on - unconditionally enable, implies
+ spectre_v2_user=on
+ off - unconditionally disable, implies
+ spectre_v2_user=off
+ auto - kernel detects whether your CPU model is
+ vulnerable
+
+ Selecting 'on' will, and 'auto' may, choose a
+ mitigation method at run time according to the
+ CPU, the available microcode, the setting of the
+ CONFIG_RETPOLINE configuration option, and the
+ compiler with which the kernel was built.
+
+ Selecting 'on' will also enable the mitigation
+ against user space to user space task attacks.
+
+ Selecting 'off' will disable both the kernel and
+ the user space protections.
+
+ Specific mitigations can also be selected manually:
+
+ retpoline - replace indirect branches
+ retpoline,generic - google's original retpoline
+ retpoline,amd - AMD-specific minimal thunk
+
+ Not specifying this option is equivalent to
+ spectre_v2=auto.
+
+For user space mitigation:
+
+ spectre_v2_user=
+ [X86] Control mitigation of Spectre variant 2
+ (indirect branch speculation) vulnerability between
+ user space tasks
+
+ on - Unconditionally enable mitigations. Is
+ enforced by spectre_v2=on
+
+ off - Unconditionally disable mitigations. Is
+ enforced by spectre_v2=off
+
+ prctl - Indirect branch speculation is enabled,
+ but mitigation can be enabled via prctl
+ per thread. The mitigation control state
+ is inherited on fork.
+
+ prctl,ibpb
+ - Like "prctl" above, but only STIBP is
+ controlled per thread. IBPB is issued
+ always when switching between different user
+ space processes.
+
+ seccomp
+ - Same as "prctl" above, but all seccomp
+ threads will enable the mitigation unless
+ they explicitly opt out.
+
+ seccomp,ibpb
+ - Like "seccomp" above, but only STIBP is
+ controlled per thread. IBPB is issued
+ always when switching between different
+ user space processes.
+
+ auto - Kernel selects the mitigation depending on
+ the available CPU features and vulnerability.
+
+ Default mitigation:
+ If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
+
+ Not specifying this option is equivalent to
+ spectre_v2_user=auto.
+
+ In general the kernel by default selects
+ reasonable mitigations for the current CPU. To
+ disable Spectre v2 mitigations boot with
+ spectre_v2=off. Spectre v1 mitigations cannot
+ be disabled.
+
+APIs for mitigation control of user process
+-------------------------------------------
+
+When enabling the "prctl" option for spectre_v2_user boot parameter,
+prctl can be used to restrict indirect branch speculation on a process.
+See Documenation/speculation.txt for detailed API.
+
+Processes containing secrets, such as cryptographic keys, may invoke
+this prctl for extra protection against Spectre v2.
+
+Before running untrusted processes, restricting their indirect branch
+speculation will prevent such processes from launching Spectre v2 attacks.
+
+Restricting indirect branch speuclation on a process should be only used
+as needed, as restricting speculation reduces both performance of the
+process, and also process running on the sibling CPU thread.
+
+Under the "seccomp" option, the processes sandboxed with SECCOMP will
+have indirect branch speculation restricted automatically.
+
+References
+----------
+
+Intel white papers and documents on Spectre:
+
+https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf
+
+[1]
+https://software.intel.com/security-software-guidance/api-app/sites/default/files/Retpoline-A-Branch-Target-Injection-Mitigation.pdf
+
+https://www.intel.com/content/www/us/en/architecture-and-technology/facts-about-side-channel-analysis-and-intel-products.html
+
+[3] https://software.intel.com/security-software-guidance/
+
+https://software.intel.com/security-software-guidance/insights/deep-dive-single-thread-indirect-branch-predictors
+
+AMD white papers:
+
+https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf
+
+https://www.amd.com/en/corporate/security-updates
+
+ARM white papers:
+
+https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/download-the-whitepaper
+
+https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/latest-updates/cache-speculation-issues-update
+
+MIPS:
+
+https://www.mips.com/blog/mips-response-on-speculative-execution-and-side-channel-vulnerabilities/
+
+Academic papers:
+
+https://spectreattack.com/spectre.pdf [original spectre paper]
+
+[2] https://arxiv.org/abs/1807.10535 [NetSpectre]
+
+https://arxiv.org/abs/1811.05441 [generalization of Spectre]
+
+https://arxiv.org/abs/1807.07940 [Spectre RSB, a variant of Spectre v2]
--
2.9.4
This is a note to let you know that I've just added the patch titled
devres: Align data[] to ARCH_KMALLOC_MINALIGN
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From a66d972465d15b1d89281258805eb8b47d66bd36 Mon Sep 17 00:00:00 2001
From: Alexey Brodkin <alexey.brodkin(a)synopsys.com>
Date: Wed, 31 Oct 2018 18:25:47 +0300
Subject: devres: Align data[] to ARCH_KMALLOC_MINALIGN
Initially we bumped into problem with 32-bit aligned atomic64_t
on ARC, see [1]. And then during quite lengthly discussion Peter Z.
mentioned ARCH_KMALLOC_MINALIGN which IMHO makes perfect sense.
If allocation is done by plain kmalloc() obtained buffer will be
ARCH_KMALLOC_MINALIGN aligned and then why buffer obtained via
devm_kmalloc() should have any other alignment?
This way we at least get the same behavior for both types of
allocation.
[1] http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004009.html
[2] http://lists.infradead.org/pipermail/linux-snps-arc/2018-July/004036.html
Signed-off-by: Alexey Brodkin <abrodkin(a)synopsys.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: David Laight <David.Laight(a)ACULAB.COM>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vineet Gupta <vgupta(a)synopsys.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Greg KH <greg(a)kroah.com>
Cc: <stable(a)vger.kernel.org> # 4.8+
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/devres.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 4aaf00d2098b..e038e2b3b7ea 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -26,8 +26,14 @@ struct devres_node {
struct devres {
struct devres_node node;
- /* -- 3 pointers */
- unsigned long long data[]; /* guarantee ull alignment */
+ /*
+ * Some archs want to perform DMA into kmalloc caches
+ * and need a guaranteed alignment larger than
+ * the alignment of a 64-bit integer.
+ * Thus we use ARCH_KMALLOC_MINALIGN here and get exactly the same
+ * buffer alignment as if it was allocated by plain kmalloc().
+ */
+ u8 __aligned(ARCH_KMALLOC_MINALIGN) data[];
};
struct devres_group {
--
2.19.1