The TLB flush optimisation (a46cc7a90f: powerpc/mm/radix: Improve TLB/PWC
flushes) may result in random memory corruption. Any concurrent page-table walk
could end up with a Use-after-Free. Even on UP this might give issues, since
mmu_gather is preemptible these days. An interrupt or preempted task accessing
user pages might stumble into the free page if the hardware caches page
directories.
The series is a backport of the fix sent by Peter [1].
The first three patches are dependencies for the last patch (avoid potential
double flush). If the performance impact due to double flush is considered
trivial then the first three patches and last patch may be dropped.
This is only for v4.19 stable.
[1] https://patchwork.kernel.org/cover/11284843/
--
Changelog:
v2: Send the patches with the correct format (commit sha1 upstream) for stable
v3: Fix compilation issue on ppc40x_defconfig and ppc44x_defconfig
--
Aneesh Kumar K.V (1):
powerpc/mmu_gather: enable RCU_TABLE_FREE even for !SMP case
Peter Zijlstra (4):
asm-generic/tlb: Track freeing of page-table directories in struct
mmu_gather
asm-generic/tlb, arch: Invert CONFIG_HAVE_RCU_TABLE_INVALIDATE
mm/mmu_gather: invalidate TLB correctly on batch allocation failure
and flush
asm-generic/tlb: avoid potential double flush
Will Deacon (1):
asm-generic/tlb: Track which levels of the page tables have been
cleared
arch/Kconfig | 3 -
arch/powerpc/Kconfig | 2 +-
arch/powerpc/include/asm/book3s/32/pgalloc.h | 8 --
arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 -
arch/powerpc/include/asm/nohash/32/pgalloc.h | 8 --
arch/powerpc/include/asm/nohash/64/pgalloc.h | 9 +-
arch/powerpc/include/asm/tlb.h | 11 ++
arch/powerpc/mm/pgtable-book3s64.c | 7 --
arch/sparc/include/asm/tlb_64.h | 9 ++
arch/x86/Kconfig | 1 -
include/asm-generic/tlb.h | 103 ++++++++++++++++---
mm/memory.c | 20 ++--
12 files changed, 123 insertions(+), 60 deletions(-)
--
2.24.1
When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
intercept in inode_getxattr hooks.
When selinux LSM is installed but not initialized, it will list the
security.selinux xattr in inode_listsecurity, but will not intercept it
in inode_getxattr. This results in -ENODATA for a getxattr call for an
xattr returned by listxattr.
This situation was manifested as overlayfs failure to copy up lower
files from squashfs when selinux is built-in but not initialized,
because ovl_copy_xattr() iterates the lower inode xattrs by
vfs_listxattr() and vfs_getxattr().
Match the logic of inode_listsecurity to that of inode_getxattr and
do not list the security.selinux xattr if selinux is not initialized.
Reported-by: Michael Labriola <michael.d.labriola(a)gmail.com>
Tested-by: Michael Labriola <michael.d.labriola(a)gmail.com>
Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.…
Fixes: c8e222616c7e ("selinux: allow reading labels before policy is loaded")
Cc: stable(a)vger.kernel.org#v5.9+
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
---
security/selinux/hooks.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6b1826fc3658..e132e082a5af 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3406,6 +3406,10 @@ static int selinux_inode_setsecurity(struct inode *inode, const char *name,
static int selinux_inode_listsecurity(struct inode *inode, char *buffer, size_t buffer_size)
{
const int len = sizeof(XATTR_NAME_SELINUX);
+
+ if (!selinux_initialized(&selinux_state))
+ return 0;
+
if (buffer && len <= buffer_size)
memcpy(buffer, XATTR_NAME_SELINUX, len);
return len;
--
2.25.1
It sounds unwise to let user space pass an unchecked 32-bit
offset into a kernel structure in an ioctl. This is an unsigned
variable, so checking the upper bound for the size of the structure
it points into is sufficient to avoid data corruption, but as
the pointer might also be unaligned, it has to be written carefully
as well.
While I stumbled over this problem by reading the code, I did not
continue checking the function for further problems like it.
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 861f7140f52e..c3de69f3bee8 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -8095,7 +8095,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
int error = 0, i;
void *sense = NULL;
dma_addr_t sense_handle;
- unsigned long *sense_ptr;
+ void *sense_ptr;
u32 opcode = 0;
int ret = DCMD_SUCCESS;
@@ -8218,6 +8218,12 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
}
if (ioc->sense_len) {
+ /* make sure the pointer is part of the frame */
+ if (ioc->sense_off > (sizeof(union megasas_frame) - sizeof(__le64))) {
+ error = -EINVAL;
+ goto out;
+ }
+
sense = dma_alloc_coherent(&instance->pdev->dev, ioc->sense_len,
&sense_handle, GFP_KERNEL);
if (!sense) {
@@ -8225,12 +8231,11 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
goto out;
}
- sense_ptr =
- (unsigned long *) ((unsigned long)cmd->frame + ioc->sense_off);
+ sense_ptr = (void *)cmd->frame + ioc->sense_off;
if (instance->consistent_mask_64bit)
- *sense_ptr = cpu_to_le64(sense_handle);
+ put_unaligned_le64(sense_handle, sense_ptr);
else
- *sense_ptr = cpu_to_le32(sense_handle);
+ put_unaligned_le32(sense_handle, sense_ptr);
}
/*
--
2.27.0
From: Linus Walleij <linus.walleij(a)linaro.org>
[ Upstream commit d6d51a96c7d63b7450860a3037f2d62388286a52 ]
Functions like memset()/memmove()/memcpy() do a lot of memory
accesses.
If a bad pointer is passed to one of these functions it is important
to catch this. Compiler instrumentation cannot do this since these
functions are written in assembly.
KASan replaces these memory functions with instrumented variants.
The original functions are declared as weak symbols so that
the strong definitions in mm/kasan/kasan.c can replace them.
The original functions have aliases with a '__' prefix in their
name, so we can call the non-instrumented variant if needed.
We must use __memcpy()/__memset() in place of memcpy()/memset()
when we copy .data to RAM and when we clear .bss, because
kasan_early_init cannot be called before the initialization of
.data and .bss.
For the kernel compression and EFI libstub's custom string
libraries we need a special quirk: even if these are built
without KASan enabled, they rely on the global headers for their
custom string libraries, which means that e.g. memcpy()
will be defined to __memcpy() and we get link failures.
Since these implementations are written i C rather than
assembly we use e.g. __alias(memcpy) to redirected any
users back to the local implementation.
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: kasan-dev(a)googlegroups.com
Reviewed-by: Ard Biesheuvel <ardb(a)kernel.org>
Tested-by: Ard Biesheuvel <ardb(a)kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
Tested-by: Florian Fainelli <f.fainelli(a)gmail.com> # Brahma SoCs
Tested-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de> # i.MX6Q
Reported-by: Russell King - ARM Linux <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de>
Signed-off-by: Abbott Liu <liuwenliang(a)huawei.com>
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/boot/compressed/string.c | 19 +++++++++++++++++++
arch/arm/include/asm/string.h | 26 ++++++++++++++++++++++++++
arch/arm/kernel/head-common.S | 4 ++--
arch/arm/lib/memcpy.S | 3 +++
arch/arm/lib/memmove.S | 5 ++++-
arch/arm/lib/memset.S | 3 +++
6 files changed, 57 insertions(+), 3 deletions(-)
diff --git a/arch/arm/boot/compressed/string.c b/arch/arm/boot/compressed/string.c
index ade5079bebbf9..8c0fa276d9946 100644
--- a/arch/arm/boot/compressed/string.c
+++ b/arch/arm/boot/compressed/string.c
@@ -7,6 +7,25 @@
#include <linux/string.h>
+/*
+ * The decompressor is built without KASan but uses the same redirects as the
+ * rest of the kernel when CONFIG_KASAN is enabled, defining e.g. memcpy()
+ * to __memcpy() but since we are not linking with the main kernel string
+ * library in the decompressor, that will lead to link failures.
+ *
+ * Undefine KASan's versions, define the wrapped functions and alias them to
+ * the right names so that when e.g. __memcpy() appear in the code, it will
+ * still be linked to this local version of memcpy().
+ */
+#ifdef CONFIG_KASAN
+#undef memcpy
+#undef memmove
+#undef memset
+void *__memcpy(void *__dest, __const void *__src, size_t __n) __alias(memcpy);
+void *__memmove(void *__dest, __const void *__src, size_t count) __alias(memmove);
+void *__memset(void *s, int c, size_t count) __alias(memset);
+#endif
+
void *memcpy(void *__dest, __const void *__src, size_t __n)
{
int i = 0;
diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h
index 111a1d8a41ddf..6c607c68f3ad7 100644
--- a/arch/arm/include/asm/string.h
+++ b/arch/arm/include/asm/string.h
@@ -5,6 +5,9 @@
/*
* We don't do inline string functions, since the
* optimised inline asm versions are not small.
+ *
+ * The __underscore versions of some functions are for KASan to be able
+ * to replace them with instrumented versions.
*/
#define __HAVE_ARCH_STRRCHR
@@ -15,15 +18,18 @@ extern char * strchr(const char * s, int c);
#define __HAVE_ARCH_MEMCPY
extern void * memcpy(void *, const void *, __kernel_size_t);
+extern void *__memcpy(void *dest, const void *src, __kernel_size_t n);
#define __HAVE_ARCH_MEMMOVE
extern void * memmove(void *, const void *, __kernel_size_t);
+extern void *__memmove(void *dest, const void *src, __kernel_size_t n);
#define __HAVE_ARCH_MEMCHR
extern void * memchr(const void *, int, __kernel_size_t);
#define __HAVE_ARCH_MEMSET
extern void * memset(void *, int, __kernel_size_t);
+extern void *__memset(void *s, int c, __kernel_size_t n);
#define __HAVE_ARCH_MEMSET32
extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
@@ -39,4 +45,24 @@ static inline void *memset64(uint64_t *p, uint64_t v, __kernel_size_t n)
return __memset64(p, v, n * 8, v >> 32);
}
+/*
+ * For files that are not instrumented (e.g. mm/slub.c) we
+ * must use non-instrumented versions of the mem*
+ * functions named __memcpy() etc. All such kernel code has
+ * been tagged with KASAN_SANITIZE_file.o = n, which means
+ * that the address sanitization argument isn't passed to the
+ * compiler, and __SANITIZE_ADDRESS__ is not set. As a result
+ * these defines kick in.
+ */
+#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
+#define memcpy(dst, src, len) __memcpy(dst, src, len)
+#define memmove(dst, src, len) __memmove(dst, src, len)
+#define memset(s, c, n) __memset(s, c, n)
+
+#ifndef __NO_FORTIFY
+#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
+#endif
+
+#endif
+
#endif
diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
index 4a3982812a401..6840c7c60a858 100644
--- a/arch/arm/kernel/head-common.S
+++ b/arch/arm/kernel/head-common.S
@@ -95,7 +95,7 @@ __mmap_switched:
THUMB( ldmia r4!, {r0, r1, r2, r3} )
THUMB( mov sp, r3 )
sub r2, r2, r1
- bl memcpy @ copy .data to RAM
+ bl __memcpy @ copy .data to RAM
#endif
ARM( ldmia r4!, {r0, r1, sp} )
@@ -103,7 +103,7 @@ __mmap_switched:
THUMB( mov sp, r3 )
sub r2, r1, r0
mov r1, #0
- bl memset @ clear .bss
+ bl __memset @ clear .bss
ldmia r4, {r0, r1, r2, r3}
str r9, [r0] @ Save processor ID
diff --git a/arch/arm/lib/memcpy.S b/arch/arm/lib/memcpy.S
index 09a333153dc66..ad4625d16e117 100644
--- a/arch/arm/lib/memcpy.S
+++ b/arch/arm/lib/memcpy.S
@@ -58,6 +58,8 @@
/* Prototype: void *memcpy(void *dest, const void *src, size_t n); */
+.weak memcpy
+ENTRY(__memcpy)
ENTRY(mmiocpy)
ENTRY(memcpy)
@@ -65,3 +67,4 @@ ENTRY(memcpy)
ENDPROC(memcpy)
ENDPROC(mmiocpy)
+ENDPROC(__memcpy)
diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S
index b50e5770fb44d..fd123ea5a5a4a 100644
--- a/arch/arm/lib/memmove.S
+++ b/arch/arm/lib/memmove.S
@@ -24,12 +24,14 @@
* occurring in the opposite direction.
*/
+.weak memmove
+ENTRY(__memmove)
ENTRY(memmove)
UNWIND( .fnstart )
subs ip, r0, r1
cmphi r2, ip
- bls memcpy
+ bls __memcpy
stmfd sp!, {r0, r4, lr}
UNWIND( .fnend )
@@ -222,3 +224,4 @@ ENTRY(memmove)
18: backward_copy_shift push=24 pull=8
ENDPROC(memmove)
+ENDPROC(__memmove)
diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S
index 6ca4535c47fb6..0e7ff0423f50b 100644
--- a/arch/arm/lib/memset.S
+++ b/arch/arm/lib/memset.S
@@ -13,6 +13,8 @@
.text
.align 5
+.weak memset
+ENTRY(__memset)
ENTRY(mmioset)
ENTRY(memset)
UNWIND( .fnstart )
@@ -132,6 +134,7 @@ UNWIND( .fnstart )
UNWIND( .fnend )
ENDPROC(memset)
ENDPROC(mmioset)
+ENDPROC(__memset)
ENTRY(__memset32)
UNWIND( .fnstart )
--
2.27.0
There are several reports about the tps6598x causing
interrupt flood on boards with the INT3515 ACPI node, which
then causes instability. There appears to be several
problems with the interrupt. One problem is that the
I2CSerialBus resources do not always map to the Interrupt
resource with the same index, but that is not the only
problem. We have not been able to come up with a solution
for all the issues, and because of that disabling the device
for now.
The PD controller on these platforms is autonomous, and the
purpose for the driver is primarily to supply status to the
userspace, so this will not affect any functionality.
Reported-by: Moody Salem <moody(a)uniswap.org>
Fixes: a3dd034a1707 ("ACPI / scan: Create platform device for INT3515 ACPI nodes")
Cc: stable(a)vger.kernel.org
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1883511
Signed-off-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
---
drivers/platform/x86/i2c-multi-instantiate.c | 31 +++++++++++++++-----
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/drivers/platform/x86/i2c-multi-instantiate.c b/drivers/platform/x86/i2c-multi-instantiate.c
index 6acc8457866e1..e1df665d3ad31 100644
--- a/drivers/platform/x86/i2c-multi-instantiate.c
+++ b/drivers/platform/x86/i2c-multi-instantiate.c
@@ -166,13 +166,29 @@ static const struct i2c_inst_data bsg2150_data[] = {
{}
};
-static const struct i2c_inst_data int3515_data[] = {
- { "tps6598x", IRQ_RESOURCE_APIC, 0 },
- { "tps6598x", IRQ_RESOURCE_APIC, 1 },
- { "tps6598x", IRQ_RESOURCE_APIC, 2 },
- { "tps6598x", IRQ_RESOURCE_APIC, 3 },
- {}
-};
+/*
+ * Device with _HID INT3515 (TI PD controllers) has some unresolved interrupt
+ * issues. The most common problem seen is interrupt flood.
+ *
+ * There are at least two known causes. Firstly, on some boards, the
+ * I2CSerialBus resource index does not match the Interrupt resource, i.e. they
+ * are not one-to-one mapped like in the array below. Secondly, on some boards
+ * the irq line from the PD controller is not actually connected at all. But the
+ * interrupt flood is also seen on some boards where those are not a problem, so
+ * there are some other problems as well.
+ *
+ * Because of the issues with the interrupt, the device is disabled for now. If
+ * you wish to debug the issues, uncomment the below, and add an entry for the
+ * INT3515 device to the i2c_multi_instance__ids table.
+ *
+ * static const struct i2c_inst_data int3515_data[] = {
+ * { "tps6598x", IRQ_RESOURCE_APIC, 0 },
+ * { "tps6598x", IRQ_RESOURCE_APIC, 1 },
+ * { "tps6598x", IRQ_RESOURCE_APIC, 2 },
+ * { "tps6598x", IRQ_RESOURCE_APIC, 3 },
+ * { }
+ * };
+ */
/*
* Note new device-ids must also be added to i2c_multi_instantiate_ids in
@@ -181,7 +197,6 @@ static const struct i2c_inst_data int3515_data[] = {
static const struct acpi_device_id i2c_multi_inst_acpi_ids[] = {
{ "BSG1160", (unsigned long)bsg1160_data },
{ "BSG2150", (unsigned long)bsg2150_data },
- { "INT3515", (unsigned long)int3515_data },
{ }
};
MODULE_DEVICE_TABLE(acpi, i2c_multi_inst_acpi_ids);
--
2.29.2
The dentries such as /proc/<pid>/ns/ have the DCACHE_OP_DELETE flag, they
should be deleted when the process exits.
Suppose the following race appears:
release_task dput
-> proc_flush_task
-> dentry->d_op->d_delete(dentry)
-> __exit_signal
-> dentry->d_lockref.count-- and return.
In the proc_flush_task(), if another process is using this dentry, it will
not be deleted. At the same time, in dput(), d_op->d_delete() can be executed
before __exit_signal(pid has not been hashed), d_delete returns false, so
this dentry still cannot be deleted.
This dentry will always be cached (although its count is 0 and the
DCACHE_OP_DELETE flag is set), its parent denry will also be cached too, and
these dentries can only be deleted when drop_caches is manually triggered.
This will result in wasted memory. What's more troublesome is that these
dentries reference pid, according to the commit f333c700c610 ("pidns: Add a
limit on the number of pid namespaces"), if the pid cannot be released, it
may result in the inability to create a new pid_ns.
This problem occurred in our cluster environment (Linux 4.9 LTS).
We could reproduce it by manually constructing a test program + adding some
debugging switches in the kernel:
* A test program to open the directory (/proc/<pid>/ns) [1]
* Adding some debugging switches to the kernel, adding a delay between
proc_flush_task and __exit_signal in release_task() [2]
The test process is as follows:
A, terminal #1
Turn on the debug switch:
echo 1> /proc/sys/vm/dentry_debug_trace
Execute the following unshare command:
sudo unshare --pid --fork --mount-proc bash
B, terminal #2
Find the pid of the unshare process:
# pstree -p | grep unshare
| `-sshd(716)---bash(718)--sudo(816)---unshare(817)---bash(818)
Find the corresponding dentry:
# dmesg | grep pid=818
[70.424722] XXX proc_pid_instantiate:3119 pid=818 tid=818 entry=818/ffff8802c7b670e8
C, terminal #3
Execute the opendir program, it will always open the /proc/818/ns/ directory:
# ./a.out /proc/818/ns/
pid: 876
.
..
net
uts
ipc
pid
user
mnt
cgroup
D, go back to terminal #2
Turn on the debugging switches to construct the race:
# echo 818> /proc/sys/vm/dentry_debug_pid
# echo 1> /proc/sys/vm/dentry_debug_delay
Kill the unshare process (pid 818). Since the debugging switches have been
turned on, it will get stuck in release_task():
# kill -9 818
Then kill the process that opened the /proc/818/ns/ directory:
# kill -9 876
Then turn off these debugging switches to allow the 818 process to exit:
# echo 0> /proc/sys/vm/dentry_debug_delay
# echo 0> /proc/sys/vm/dentry_debug_pid
Checking the dmesg, we will find that the dentry(/proc/818/ns) ’s count is 0,
and the flag is 2800cc (#define DCACHE_OP_DELETE 0x00000008), but it is still
cached:
# dmesg | grep ffff8802a3999548
…
[565.559156] XXX dput:853 dentry=ns/ffff8802bea7b528, flag=2800cc, cnt=0, inode=ffff8802b38c2010, pdentry=818/ffff8802c7b670e8, pflag=20008c, pcnt=1, pinode=ffff8802c7812010, keywords: be cached
It could also be verified via the crash tool:
crash> dentry.d_flags,d_iname,d_inode,d_lockref -x ffff8802bea7b528
d_flags = 0x2800cc
d_iname = "ns\000kkkkkkkkkkkkkkkkkkkkkkkkkkkk"
d_inode = 0xffff8802b38c2010
d_lockref = {
{
lock_count = 0x0,
{
lock = {
{
rlock = {
raw_lock = {
{
val = {
counter = 0x0
},
{
locked = 0x0,
pending = 0x0
},
{
locked_pending = 0x0,
tail = 0x0
}
}
}
}
}
},
count = 0x0
}
}
}
crash> kmem ffff8802bea7b528
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
ffff8802dd5f5900 192 23663 26130 871 16k dentry
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffea000afa9e00 ffff8802bea78000 0 30 25 5
FREE / [ALLOCATED]
[ffff8802bea7b520]
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea000afa9ec0 2bea7b000 dead000000000400 0 0 2fffff80000000
crash>
This series of patches is to fix this issue.
Regards,
Wen
Alexey Dobriyan (1):
proc: use %u for pid printing and slightly less stack
Andreas Gruenbacher (1):
proc: Pass file mode to proc_pid_make_inode
Christian Brauner (1):
clone: add CLONE_PIDFD
Eric W. Biederman (6):
proc: Better ownership of files for non-dumpable tasks in user
namespaces
proc: Rename in proc_inode rename sysctl_inodes sibling_inodes
proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache
proc: Clear the pieces of proc_inode that proc_evict_inode cares about
proc: Use d_invalidate in proc_prune_siblings_dcache
proc: Use a list of inodes to flush from proc
Joel Fernandes (Google) (1):
pidfd: add polling support
fs/proc/base.c | 242 ++++++++++++++++++++-------------------------
fs/proc/fd.c | 20 +---
fs/proc/inode.c | 67 ++++++++++++-
fs/proc/internal.h | 22 ++---
fs/proc/namespaces.c | 3 +-
fs/proc/proc_sysctl.c | 45 ++-------
fs/proc/self.c | 6 +-
fs/proc/thread_self.c | 5 +-
include/linux/pid.h | 5 +
include/linux/proc_fs.h | 4 +-
include/uapi/linux/sched.h | 1 +
kernel/exit.c | 5 +-
kernel/fork.c | 145 ++++++++++++++++++++++++++-
kernel/pid.c | 3 +
kernel/signal.c | 11 +++
security/selinux/hooks.c | 1 +
16 files changed, 357 insertions(+), 228 deletions(-)
[1] A test program to open the directory (/proc/<pid>/ns)
#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
int main(int argc, char *argv[])
{
DIR *dip;
struct dirent *dit;
if (argc < 2) {
printf("Usage :%s <directory>\n", argv[0]);
return -1;
}
if ((dip = opendir(argv[1])) == NULL) {
perror("opendir");
return -1;
}
printf("pid: %d\n", getpid());
while((dit = readdir (dip)) != NULL) {
printf("%s\n", dit->d_name);
}
while (1)
sleep (1);
return 0;
}
[2] Adding some debugging switches to the kernel, also adding a delay between
proc_flush_task and __exit_signal in release_task():
diff --git a/fs/dcache.c b/fs/dcache.c
index 05bad55..fafad37 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -84,6 +84,9 @@
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
+int sysctl_dentry_debug_trace __read_mostly = 0;
+EXPORT_SYMBOL_GPL(sysctl_dentry_debug_trace);
+
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
EXPORT_SYMBOL(rename_lock);
@@ -758,6 +761,26 @@ static inline bool fast_dput(struct dentry *dentry)
return 0;
}
+#define DENTRY_DEBUG_TRACE(dentry, keywords) \
+do { \
+ if (sysctl_dentry_debug_trace) \
+ printk("XXX %s:%d " \
+ "dentry=%s/%p, flag=%x, cnt=%d, inode=%p, " \
+ "pdentry=%s/%p, pflag=%x, pcnt=%d, pinode=%p, " \
+ "keywords: %s\n", \
+ __func__, __LINE__, \
+ dentry->d_name.name, \
+ dentry, \
+ dentry->d_flags, \
+ dentry->d_lockref.count, \
+ dentry->d_inode, \
+ dentry->d_parent->d_name.name, \
+ dentry->d_parent, \
+ dentry->d_parent->d_flags, \
+ dentry->d_parent->d_lockref.count, \
+ dentry->d_parent->d_inode, \
+ keywords); \
+} while (0)
/*
* This is dput
@@ -804,6 +827,8 @@ void dput(struct dentry *dentry)
WARN_ON(d_in_lookup(dentry));
+ DENTRY_DEBUG_TRACE(dentry, "be checked");
+
/* Unreachable? Get rid of it */
if (unlikely(d_unhashed(dentry)))
goto kill_it;
@@ -812,8 +837,10 @@ void dput(struct dentry *dentry)
goto kill_it;
if (unlikely(dentry->d_flags & DCACHE_OP_DELETE)) {
- if (dentry->d_op->d_delete(dentry))
+ if (dentry->d_op->d_delete(dentry)) {
+ DENTRY_DEBUG_TRACE(dentry, "be killed");
goto kill_it;
+ }
}
if (!(dentry->d_flags & DCACHE_REFERENCED))
@@ -822,6 +849,9 @@ void dput(struct dentry *dentry)
dentry->d_lockref.count--;
spin_unlock(&dentry->d_lock);
+
+ DENTRY_DEBUG_TRACE(dentry, "be cached");
+
return;
kill_it:
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b9e4183..419a409 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3090,6 +3090,8 @@ void proc_flush_task(struct task_struct *task)
}
}
+extern int sysctl_dentry_debug_trace;
+
static int proc_pid_instantiate(struct inode *dir,
struct dentry * dentry,
struct task_struct *task, const void *ptr)
@@ -3111,6 +3113,12 @@ static int proc_pid_instantiate(struct inode *dir,
d_set_d_op(dentry, &pid_dentry_operations);
d_add(dentry, inode);
+
+ if (sysctl_dentry_debug_trace)
+ printk("XXX %s:%d pid=%d tid=%d entry=%s/%p\n",
+ __func__, __LINE__, task->pid, task->tgid,
+ dentry->d_name.name, dentry);
+
/* Close the race of the process dying before we return the dentry */
if (pid_revalidate(dentry, 0))
return 0;
diff --git a/kernel/exit.c b/kernel/exit.c
index 27f4168..2b3e1b6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -55,6 +55,8 @@
#include <linux/shm.h>
#include <linux/kcov.h>
+#include <linux/delay.h>
+
#include <asm/uaccess.h>
#include <asm/unistd.h>
#include <asm/pgtable.h>
@@ -164,6 +166,8 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
put_task_struct(tsk);
}
+int sysctl_dentry_debug_delay __read_mostly = 0;
+int sysctl_dentry_debug_pid __read_mostly = 0;
void release_task(struct task_struct *p)
{
@@ -178,6 +182,11 @@ void release_task(struct task_struct *p)
proc_flush_task(p);
+ if (sysctl_dentry_debug_delay && p->pid == sysctl_dentry_debug_pid) {
+ while (sysctl_dentry_debug_delay)
+ mdelay(1);
+ }
+
write_lock_irq(&tasklist_lock);
ptrace_release_task(p);
__exit_signal(p);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 513e6da..27f1395 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -282,6 +282,10 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
static int max_extfrag_threshold = 1000;
#endif
+extern int sysctl_dentry_debug_trace;
+extern int sysctl_dentry_debug_delay;
+extern int sysctl_dentry_debug_pid;
+
static struct ctl_table kern_table[] = {
{
.procname = "sched_child_runs_first",
@@ -1498,6 +1502,30 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
.proc_handler = proc_dointvec,
.extra1 = &zero,
},
+ {
+ .procname = "dentry_debug_trace",
+ .data = &sysctl_dentry_debug_trace,
+ .maxlen = sizeof(sysctl_dentry_debug_trace),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .extra1 = &zero,
+ },
+ {
+ .procname = "dentry_debug_delay",
+ .data = &sysctl_dentry_debug_delay,
+ .maxlen = sizeof(sysctl_dentry_debug_delay),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .extra1 = &zero,
+ },
+ {
+ .procname = "dentry_debug_pid",
+ .data = &sysctl_dentry_debug_pid,
+ .maxlen = sizeof(sysctl_dentry_debug_pid),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .extra1 = &zero,
+ },
#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
{
.procname = "legacy_va_layout",
Signed-off-by: Wen Yang <wenyang(a)linux.alibaba.com>
Cc: Pavel Emelyanov <xemul(a)openvz.org>
Cc: Oleg Nesterov <oleg(a)tv-sign.ru>
Cc: Sukadev Bhattiprolu <sukadev(a)us.ibm.com>
Cc: Paul Menage <menage(a)google.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
--
1.8.3.1
From: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
From: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
commit 56ec43d8b02719402c9fcf984feb52ec2300f8a5 upstream.
As best as I can tell the meminit_pfn_in_nid call is completely redundant.
The deferred memory initialization is already making use of
for_each_free_mem_range which in turn will call into __next_mem_range
which will only return a memory range if it matches the node ID provided
assuming it is not NUMA_NO_NODE.
I am operating on the assumption that there are no zones or pgdata_t
structures that have a NUMA node of NUMA_NO_NODE associated with them. If
that is the case then __next_mem_range will never return a memory range
that doesn't match the zone's node ID and as such the check is redundant.
So one piece I would like to verify on this is if this works for ia64.
Technically it was using a different approach to get the node ID, but it
seems to have the node ID also encoded into the memblock. So I am
assuming this is okay, but would like to get confirmation on that.
On my x86_64 test system with 384GB of memory per node I saw a reduction
in initialization time from 2.80s to 1.85s as a result of this patch.
Link: http://lkml.kernel.org/r/20190405221219.12227.93957.stgit@localhost.localdo…
Signed-off-by: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin(a)microsoft.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Khalid Aziz <khalid.aziz(a)oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Laurent Dufour <ldufour(a)linux.vnet.ibm.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <yi.z.zhang(a)linux.intel.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)soleen.com>
---
mm/page_alloc.c | 51 ++++++++++++++-----------------------------------
1 file changed, 14 insertions(+), 37 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8c3051387d1..c86a117acb5b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1321,36 +1321,22 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
#endif
#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-static inline bool __meminit __maybe_unused
-meminit_pfn_in_nid(unsigned long pfn, int node,
- struct mminit_pfnnid_cache *state)
+/* Only safe to use early in boot when initialisation is single-threaded */
+static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
{
int nid;
- nid = __early_pfn_to_nid(pfn, state);
+ nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache);
if (nid >= 0 && nid != node)
return false;
return true;
}
-/* Only safe to use early in boot when initialisation is single-threaded */
-static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
-{
- return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
-}
-
#else
-
static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
{
return true;
}
-static inline bool __meminit __maybe_unused
-meminit_pfn_in_nid(unsigned long pfn, int node,
- struct mminit_pfnnid_cache *state)
-{
- return true;
-}
#endif
@@ -1480,21 +1466,13 @@ static inline void __init pgdat_init_report_one_done(void)
*
* Then, we check if a current large page is valid by only checking the validity
* of the head pfn.
- *
- * Finally, meminit_pfn_in_nid is checked on systems where pfns can interleave
- * within a node: a pfn is between start and end of a node, but does not belong
- * to this memory node.
*/
-static inline bool __init
-deferred_pfn_valid(int nid, unsigned long pfn,
- struct mminit_pfnnid_cache *nid_init_state)
+static inline bool __init deferred_pfn_valid(unsigned long pfn)
{
if (!pfn_valid_within(pfn))
return false;
if (!(pfn & (pageblock_nr_pages - 1)) && !pfn_valid(pfn))
return false;
- if (!meminit_pfn_in_nid(pfn, nid, nid_init_state))
- return false;
return true;
}
@@ -1502,15 +1480,14 @@ deferred_pfn_valid(int nid, unsigned long pfn,
* Free pages to buddy allocator. Try to free aligned pages in
* pageblock_nr_pages sizes.
*/
-static void __init deferred_free_pages(int nid, int zid, unsigned long pfn,
+static void __init deferred_free_pages(unsigned long pfn,
unsigned long end_pfn)
{
- struct mminit_pfnnid_cache nid_init_state = { };
unsigned long nr_pgmask = pageblock_nr_pages - 1;
unsigned long nr_free = 0;
for (; pfn < end_pfn; pfn++) {
- if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) {
+ if (!deferred_pfn_valid(pfn)) {
deferred_free_range(pfn - nr_free, nr_free);
nr_free = 0;
} else if (!(pfn & nr_pgmask)) {
@@ -1530,17 +1507,18 @@ static void __init deferred_free_pages(int nid, int zid, unsigned long pfn,
* by performing it only once every pageblock_nr_pages.
* Return number of pages initialized.
*/
-static unsigned long __init deferred_init_pages(int nid, int zid,
+static unsigned long __init deferred_init_pages(struct zone *zone,
unsigned long pfn,
unsigned long end_pfn)
{
- struct mminit_pfnnid_cache nid_init_state = { };
unsigned long nr_pgmask = pageblock_nr_pages - 1;
+ int nid = zone_to_nid(zone);
unsigned long nr_pages = 0;
+ int zid = zone_idx(zone);
struct page *page = NULL;
for (; pfn < end_pfn; pfn++) {
- if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) {
+ if (!deferred_pfn_valid(pfn)) {
page = NULL;
continue;
} else if (!page || !(pfn & nr_pgmask)) {
@@ -1603,12 +1581,12 @@ static int __init deferred_init_memmap(void *data)
for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
- nr_pages += deferred_init_pages(nid, zid, spfn, epfn);
+ nr_pages += deferred_init_pages(zone, spfn, epfn);
}
for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
- deferred_free_pages(nid, zid, spfn, epfn);
+ deferred_free_pages(spfn, epfn);
}
pgdat_resize_unlock(pgdat, &flags);
@@ -1640,7 +1618,6 @@ static int __init deferred_init_memmap(void *data)
static noinline bool __init
deferred_grow_zone(struct zone *zone, unsigned int order)
{
- int zid = zone_idx(zone);
int nid = zone_to_nid(zone);
pg_data_t *pgdat = NODE_DATA(nid);
unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
@@ -1690,7 +1667,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
while (spfn < epfn && nr_pages < nr_pages_needed) {
t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION);
first_deferred_pfn = min(t, epfn);
- nr_pages += deferred_init_pages(nid, zid, spfn,
+ nr_pages += deferred_init_pages(zone, spfn,
first_deferred_pfn);
spfn = first_deferred_pfn;
}
@@ -1702,7 +1679,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
epfn = min_t(unsigned long, first_deferred_pfn, PFN_DOWN(epa));
- deferred_free_pages(nid, zid, spfn, epfn);
+ deferred_free_pages(spfn, epfn);
if (first_deferred_pfn == epfn)
break;
--
2.25.1
This is the start of the stable review cycle for the 5.2.6 release.
There are 20 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.6-rc1.…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.2.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.2.6-rc1
Yan, Zheng <zyan(a)redhat.com>
ceph: hold i_ceph_lock when removing caps for freeing inode
Yoshinori Sato <ysato(a)users.sourceforge.jp>
Fix allyesconfig output.
Miroslav Lichvar <mlichvar(a)redhat.com>
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
Linus Torvalds <torvalds(a)linux-foundation.org>
/proc/<pid>/cmdline: add back the setproctitle() special case
Linus Torvalds <torvalds(a)linux-foundation.org>
/proc/<pid>/cmdline: remove all the special cases
Jann Horn <jannh(a)google.com>
sched/fair: Use RCU accessors consistently for ->numa_group
Jann Horn <jannh(a)google.com>
sched/fair: Don't free p->numa_faults with concurrent readers
Vladis Dronov <vdronov(a)redhat.com>
Bluetooth: hci_uart: check for missing tty operations
Marta Rybczynska <mrybczyn(a)kalray.eu>
nvme: fix multipath crash when ANA is deactivated
Florian Westphal <fw(a)strlen.de>
xfrm: policy: fix bydst hlist corruption on hash rebuild
Luke Nowakowski-Krijger <lnowakow(a)eng.ucsd.edu>
media: radio-raremono: change devm_k*alloc to k*alloc
Benjamin Coddington <bcodding(a)redhat.com>
NFS: Cleanup if nfs_match_client is interrupted
Andrey Konovalov <andreyknvl(a)google.com>
media: pvrusb2: use a different format for warnings
Oliver Neukum <oneukum(a)suse.com>
media: cpia2_usb: first wake up, then free in disconnect
Fabio Estevam <festevam(a)gmail.com>
ath10k: Change the warning message string
Sean Young <sean(a)mess.org>
media: au0828: fix null dereference in error path
Stanislav Fomichev <sdf(a)google.com>
bpf: fix NULL deref in btf_type_is_resolve_source_only
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Sanity checks for each pipe and EP types
Phong Tran <tranmanphong(a)gmail.com>
ISDN: hfcsusb: checking idx of ep configuration
Sunil Muthuswamy <sunilmut(a)microsoft.com>
vsock: correct removal of socket from the list
-------------
Diffstat:
Makefile | 4 +-
arch/sh/boards/Kconfig | 14 +--
drivers/bluetooth/hci_ath.c | 3 +
drivers/bluetooth/hci_bcm.c | 3 +
drivers/bluetooth/hci_intel.c | 3 +
drivers/bluetooth/hci_ldisc.c | 13 +++
drivers/bluetooth/hci_mrvl.c | 3 +
drivers/bluetooth/hci_qca.c | 3 +
drivers/bluetooth/hci_uart.h | 1 +
drivers/isdn/hardware/mISDN/hfcsusb.c | 3 +
drivers/media/radio/radio-raremono.c | 30 ++++--
drivers/media/usb/au0828/au0828-core.c | 12 +--
drivers/media/usb/cpia2/cpia2_usb.c | 3 +-
drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 4 +-
drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c | 6 +-
drivers/media/usb/pvrusb2/pvrusb2-std.c | 2 +-
drivers/net/wireless/ath/ath10k/usb.c | 2 +-
drivers/nvme/host/multipath.c | 8 +-
drivers/nvme/host/nvme.h | 6 +-
drivers/pps/pps.c | 8 ++
fs/ceph/caps.c | 10 +-
fs/ceph/inode.c | 2 +-
fs/ceph/super.h | 2 +-
fs/exec.c | 2 +-
fs/nfs/client.c | 4 +-
fs/proc/base.c | 132 +++++++++++++-----------
include/linux/sched.h | 10 +-
include/linux/sched/numa_balancing.h | 4 +-
kernel/bpf/btf.c | 12 +--
kernel/fork.c | 2 +-
kernel/sched/fair.c | 144 +++++++++++++++++++--------
net/vmw_vsock/af_vsock.c | 38 ++-----
net/xfrm/xfrm_policy.c | 12 ++-
sound/usb/helper.c | 17 ++++
sound/usb/helper.h | 1 +
sound/usb/quirks.c | 18 +++-
tools/testing/selftests/net/xfrm_policy.sh | 27 ++++-
37 files changed, 368 insertions(+), 200 deletions(-)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5812b32e01c6d86ba7a84110702b46d8a8531fe9 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 23 Nov 2020 11:23:12 +0100
Subject: [PATCH] of: fix linker-section match-table corruption
Specify type alignment when declaring linker-section match-table entries
to prevent gcc from increasing alignment and corrupting the various
tables with padding (e.g. timers, irqchips, clocks, reserved memory).
This is specifically needed on x86 where gcc (typically) aligns larger
objects like struct of_device_id with static extent on 32-byte
boundaries which at best prevents matching on anything but the first
entry. Specifying alignment when declaring variables suppresses this
optimisation.
Here's a 64-bit example where all entries are corrupt as 16 bytes of
padding has been inserted before the first entry:
ffffffff8266b4b0 D __clk_of_table
ffffffff8266b4c0 d __of_table_fixed_factor_clk
ffffffff8266b5a0 d __of_table_fixed_clk
ffffffff8266b680 d __clk_of_table_sentinel
And here's a 32-bit example where the 8-byte-aligned table happens to be
placed on a 32-byte boundary so that all but the first entry are corrupt
due to the 28 bytes of padding inserted between entries:
812b3ec0 D __irqchip_of_table
812b3ec0 d __of_table_irqchip1
812b3fa0 d __of_table_irqchip2
812b4080 d __of_table_irqchip3
812b4160 d irqchip_of_match_end
Verified on x86 using gcc-9.3 and gcc-4.9 (which uses 64-byte
alignment), and on arm using gcc-7.2.
Note that there are no in-tree users of these tables on x86 currently
(even if they are included in the image).
Fixes: 54196ccbe0ba ("of: consolidate linker section OF match table declarations")
Fixes: f6e916b82022 ("irqchip: add basic infrastructure")
Cc: stable <stable(a)vger.kernel.org> # 3.9
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Link: https://lore.kernel.org/r/20201123102319.8090-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/include/linux/of.h b/include/linux/of.h
index 5d51891cbf1a..af655d264f10 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -1300,6 +1300,7 @@ static inline int of_get_available_child_count(const struct device_node *np)
#define _OF_DECLARE(table, name, compat, fn, fn_type) \
static const struct of_device_id __of_table_##name \
__used __section("__" #table "_of_table") \
+ __aligned(__alignof__(struct of_device_id)) \
= { .compatible = compat, \
.data = (fn == (fn_type)NULL) ? fn : fn }
#else