From: Ronnie Sahlberg lsahlber@redhat.com
[ Upstream commit e6c47dd0da1e3a484e778046fc10da0b20606a86 ]
Some SMB2/3 servers, Win2016 but possibly others too, adds padding not only between PDUs in a compound but also to the final PDU. This padding extends the PDU to a multiple of 8 bytes.
Check if the unexpected length looks like this might be the case and avoid triggering the log messages for :
"SMB2 server sent bad RFC1001 len %d not %d\n"
Signed-off-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/cifs/smb2misc.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c index 3ff7cec2da81..239215dcc00b 100644 --- a/fs/cifs/smb2misc.c +++ b/fs/cifs/smb2misc.c @@ -240,6 +240,13 @@ smb2_check_message(char *buf, unsigned int len, struct TCP_Server_Info *srvr) if (clc_len == len + 1) return 0;
+ /* + * Some windows servers (win2016) will pad also the final + * PDU in a compound to 8 bytes. + */ + if (((clc_len + 7) & ~7) == len) + return 0; + /* * MacOS server pads after SMB2.1 write response with 3 bytes * of junk. Other servers match RFC1001 len to actual
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 7464726cb5998846306ed0a7d6714afb2e37b25d ]
syzbot is reporting NULL pointer dereference at mount_fs() [1]. This is because hfsplus_fill_super() is by error returning 0 when hfsplus_fill_super() detected invalid filesystem image, and mount_bdev() is returning NULL because dget(s->s_root) == NULL if s->s_root == NULL, and mount_fs() is accessing root->d_sb because IS_ERR(root) == false if root == NULL. Fix this by returning -EINVAL when hfsplus_fill_super() detected invalid filesystem image.
[1] https://syzkaller.appspot.com/bug?id=21acb6850cecbc960c927229e597158cf35f33d...
Link: http://lkml.kernel.org/r/d83ce31a-874c-dd5b-f790-41405983a5be@I-love.SAKURA.... Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Reported-by: syzbot syzbot+01ffaf5d9568dd1609f7@syzkaller.appspotmail.com Reviewed-by: Ernesto A. Fernández ernesto.mnd.fernandez@gmail.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/hfsplus/super.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c index a6c0f54c48c3..80abba550bfa 100644 --- a/fs/hfsplus/super.c +++ b/fs/hfsplus/super.c @@ -524,8 +524,10 @@ static int hfsplus_fill_super(struct super_block *sb, void *data, int silent) goto out_put_root; if (!hfs_brec_read(&fd, &entry, sizeof(entry))) { hfs_find_exit(&fd); - if (entry.type != cpu_to_be16(HFSPLUS_FOLDER)) + if (entry.type != cpu_to_be16(HFSPLUS_FOLDER)) { + err = -EINVAL; goto out_put_root; + } inode = hfsplus_iget(sb, be32_to_cpu(entry.folder.id)); if (IS_ERR(inode)) { err = PTR_ERR(inode);
From: Ernesto A. Fernández ernesto.mnd.fernandez@gmail.com
[ Upstream commit dc2572791d3a41bab94400af2b6bca9d71ccd303 ]
hfs_find_exit() expects fd->bnode to be NULL after a search has failed. hfs_brec_insert() may instead set it to an error-valued pointer. Fix this to prevent a crash.
Link: http://lkml.kernel.org/r/53d9749a029c41b4016c495fc5838c9dba3afc52.1530294815... Signed-off-by: Ernesto A. Fernández ernesto.mnd.fernandez@gmail.com Cc: Anatoly Trosinenko anatoly.trosinenko@gmail.com Cc: Viacheslav Dubeyko slava@dubeyko.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/hfs/brec.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/hfs/brec.c b/fs/hfs/brec.c index ad04a5741016..9a8772465a90 100644 --- a/fs/hfs/brec.c +++ b/fs/hfs/brec.c @@ -75,9 +75,10 @@ int hfs_brec_insert(struct hfs_find_data *fd, void *entry, int entry_len) if (!fd->bnode) { if (!tree->root) hfs_btree_inc_height(tree); - fd->bnode = hfs_bnode_find(tree, tree->leaf_head); - if (IS_ERR(fd->bnode)) - return PTR_ERR(fd->bnode); + node = hfs_bnode_find(tree, tree->leaf_head); + if (IS_ERR(node)) + return PTR_ERR(node); + fd->bnode = node; fd->record = -1; } new_node = NULL;
From: Laura Abbott labbott@redhat.com
[ Upstream commit 44090cc876926277329e1608bafc01b9f6da627f ]
Fedora got a bug report from NFS:
kernel BUG at include/linux/scatterlist.h:143! ... RIP: 0010:sg_init_one+0x7d/0x90 .. make_checksum+0x4e7/0x760 [rpcsec_gss_krb5] gss_get_mic_kerberos+0x26e/0x310 [rpcsec_gss_krb5] gss_marshal+0x126/0x1a0 [auth_rpcgss] ? __local_bh_enable_ip+0x80/0xe0 ? call_transmit_status+0x1d0/0x1d0 [sunrpc] call_transmit+0x137/0x230 [sunrpc] __rpc_execute+0x9b/0x490 [sunrpc] rpc_run_task+0x119/0x150 [sunrpc] nfs4_run_exchange_id+0x1bd/0x250 [nfsv4] _nfs4_proc_exchange_id+0x2d/0x490 [nfsv4] nfs41_discover_server_trunking+0x1c/0xa0 [nfsv4] nfs4_discover_server_trunking+0x80/0x270 [nfsv4] nfs4_init_client+0x16e/0x240 [nfsv4] ? nfs_get_client+0x4c9/0x5d0 [nfs] ? _raw_spin_unlock+0x24/0x30 ? nfs_get_client+0x4c9/0x5d0 [nfs] nfs4_set_client+0xb2/0x100 [nfsv4] nfs4_create_server+0xff/0x290 [nfsv4] nfs4_remote_mount+0x28/0x50 [nfsv4] mount_fs+0x3b/0x16a vfs_kern_mount.part.35+0x54/0x160 nfs_do_root_mount+0x7f/0xc0 [nfsv4] nfs4_try_mount+0x43/0x70 [nfsv4] ? get_nfs_version+0x21/0x80 [nfs] nfs_fs_mount+0x789/0xbf0 [nfs] ? pcpu_alloc+0x6ca/0x7e0 ? nfs_clone_super+0x70/0x70 [nfs] ? nfs_parse_mount_options+0xb40/0xb40 [nfs] mount_fs+0x3b/0x16a vfs_kern_mount.part.35+0x54/0x160 do_mount+0x1fd/0xd50 ksys_mount+0xba/0xd0 __x64_sys_mount+0x21/0x30 do_syscall_64+0x60/0x1f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
This is BUG_ON(!virt_addr_valid(buf)) triggered by using a stack allocated buffer with a scatterlist. Convert the buffer for rc4salt to be dynamically allocated instead.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258 Signed-off-by: Laura Abbott labbott@redhat.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- net/sunrpc/auth_gss/gss_krb5_crypto.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c b/net/sunrpc/auth_gss/gss_krb5_crypto.c index 8654494b4d0a..834eb2b9e41b 100644 --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c @@ -169,7 +169,7 @@ make_checksum_hmac_md5(struct krb5_ctx *kctx, char *header, int hdrlen, struct scatterlist sg[1]; int err = -1; u8 *checksumdata; - u8 rc4salt[4]; + u8 *rc4salt; struct crypto_ahash *md5; struct crypto_ahash *hmac_md5; struct ahash_request *req; @@ -183,14 +183,18 @@ make_checksum_hmac_md5(struct krb5_ctx *kctx, char *header, int hdrlen, return GSS_S_FAILURE; }
+ rc4salt = kmalloc_array(4, sizeof(*rc4salt), GFP_NOFS); + if (!rc4salt) + return GSS_S_FAILURE; + if (arcfour_hmac_md5_usage_to_salt(usage, rc4salt)) { dprintk("%s: invalid usage value %u\n", __func__, usage); - return GSS_S_FAILURE; + goto out_free_rc4salt; }
checksumdata = kmalloc(GSS_KRB5_MAX_CKSUM_LEN, GFP_NOFS); if (!checksumdata) - return GSS_S_FAILURE; + goto out_free_rc4salt;
md5 = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC); if (IS_ERR(md5)) @@ -258,6 +262,8 @@ make_checksum_hmac_md5(struct krb5_ctx *kctx, char *header, int hdrlen, crypto_free_ahash(md5); out_free_cksum: kfree(checksumdata); +out_free_rc4salt: + kfree(rc4salt); return err ? GSS_S_FAILURE : 0; }
From: Jann Horn jannh@google.com
[ Upstream commit 06e62a46bbba20aa5286102016a04214bb446141 ]
Before this change, if a multithreaded process forks while one of its threads is changing a signal handler using sigaction(), the memcpy() in copy_sighand() can race with the struct assignment in do_sigaction(). It isn't clear whether this can cause corruption of the userspace signal handler pointer, but it definitely can cause inconsistency between different fields of struct sigaction.
Take the appropriate spinlock to avoid this.
I have tested that this patch prevents inconsistency between sa_sigaction and sa_flags, which is possible before this patch.
Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com Signed-off-by: Jann Horn jannh@google.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Rik van Riel riel@redhat.com Cc: "Peter Zijlstra (Intel)" peterz@infradead.org Cc: Kees Cook keescook@chromium.org Cc: Oleg Nesterov oleg@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- kernel/fork.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/fork.c b/kernel/fork.c index 1b27babc4c78..9cf8add7038d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1417,7 +1417,9 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk) return -ENOMEM;
atomic_set(&sig->count, 1); + spin_lock_irq(¤t->sighand->siglock); memcpy(sig->action, current->sighand->action, sizeof(sig->action)); + spin_unlock_irq(¤t->sighand->siglock); return 0; }
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit a2036a1ef2ee91acab01a0ae4a534070691a42ec ]
Without CONFIG_MMU, we get a build warning:
fs/proc/vmcore.c:228:12: error: 'vmcoredd_mmap_dumps' defined but not used [-Werror=unused-function] static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst,
The function is only referenced from an #ifdef'ed caller, so this uses the same #ifdef around it.
Link: http://lkml.kernel.org/r/20180525213526.2117790-1-arnd@arndb.de Fixes: 7efe48df8a3d ("vmcore: append device dumps to vmcore as elf notes") Signed-off-by: Arnd Bergmann arnd@arndb.de Cc: Ganesh Goudar ganeshgr@chelsio.com Cc: "David S. Miller" davem@davemloft.net Cc: Rahul Lakkireddy rahul.lakkireddy@chelsio.com Cc: Alexey Dobriyan adobriyan@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/proc/vmcore.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index cfb6674331fd..0651646dd04d 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -225,6 +225,7 @@ static int vmcoredd_copy_dumps(void *dst, u64 start, size_t size, int userbuf) return ret; }
+#ifdef CONFIG_MMU static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst, u64 start, size_t size) { @@ -259,6 +260,7 @@ static int vmcoredd_mmap_dumps(struct vm_area_struct *vma, unsigned long dst, mutex_unlock(&vmcoredd_mutex); return ret; } +#endif /* CONFIG_MMU */ #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
/* Read from the ELF header and then the crash dump. On error, negative value is
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 8b73ce6a4bae4fe12bcb2c361c0da4183c2e1b6f ]
This uses the deprecated time_t type but is write-only, and could be removed, but as Jeff explains, having a timestamp can be usefule for post-mortem analysis in crash dumps.
In order to remove one of the last instances of time_t, this changes the type to time64_t, same as j_trans_start_time.
Link: http://lkml.kernel.org/r/20180622133315.221210-1-arnd@arndb.de Signed-off-by: Arnd Bergmann arnd@arndb.de Cc: Jan Kara jack@suse.cz Cc: Jeff Mahoney jeffm@suse.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/reiserfs/reiserfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h index ae4811fecc1f..6d670bd9ab6b 100644 --- a/fs/reiserfs/reiserfs.h +++ b/fs/reiserfs/reiserfs.h @@ -271,7 +271,7 @@ struct reiserfs_journal_list {
struct mutex j_commit_mutex; unsigned int j_trans_id; - time_t j_timestamp; + time64_t j_timestamp; /* write-only but useful for crash dump analysis */ struct reiserfs_list_bitmap *j_list_bitmap; struct buffer_head *j_commit_bh; /* commit buffer head */ struct reiserfs_journal_cnode *j_realblock;
From: Marc Zyngier marc.zyngier@arm.com
[ Upstream commit 3fc7c5c0cff3150e471f5fd12f59971c6d2c6513 ]
pm_runtime_get_if_in_use can fail: either PM has been disabled altogether (-EINVAL), or the device hasn't been enabled yet (0). Sadly, the Rockchip IOMMU driver tends to conflate the two things by considering a non-zero return value as successful.
This has the consequence of hiding other bugs, so let's handle this case throughout the driver, with a WARN_ON_ONCE so that we can try and work out what happened.
Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support") Reviewed-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/iommu/rockchip-iommu.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 054cd2c8e9c8..4e0f9b61cd7f 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -521,10 +521,11 @@ static irqreturn_t rk_iommu_irq(int irq, void *dev_id) u32 int_status; dma_addr_t iova; irqreturn_t ret = IRQ_NONE; - int i; + int i, err;
- if (WARN_ON(!pm_runtime_get_if_in_use(iommu->dev))) - return 0; + err = pm_runtime_get_if_in_use(iommu->dev); + if (WARN_ON_ONCE(err <= 0)) + return ret;
if (WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks))) goto out; @@ -620,11 +621,15 @@ static void rk_iommu_zap_iova(struct rk_iommu_domain *rk_domain, spin_lock_irqsave(&rk_domain->iommus_lock, flags); list_for_each(pos, &rk_domain->iommus) { struct rk_iommu *iommu; + int ret;
iommu = list_entry(pos, struct rk_iommu, node);
/* Only zap TLBs of IOMMUs that are powered on. */ - if (pm_runtime_get_if_in_use(iommu->dev)) { + ret = pm_runtime_get_if_in_use(iommu->dev); + if (WARN_ON_ONCE(ret < 0)) + continue; + if (ret) { WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)); rk_iommu_zap_lines(iommu, iova, size); @@ -891,6 +896,7 @@ static void rk_iommu_detach_device(struct iommu_domain *domain, struct rk_iommu *iommu; struct rk_iommu_domain *rk_domain = to_rk_domain(domain); unsigned long flags; + int ret;
/* Allow 'virtual devices' (eg drm) to detach from domain */ iommu = rk_iommu_from_dev(dev); @@ -909,7 +915,9 @@ static void rk_iommu_detach_device(struct iommu_domain *domain, list_del_init(&iommu->node); spin_unlock_irqrestore(&rk_domain->iommus_lock, flags);
- if (pm_runtime_get_if_in_use(iommu->dev)) { + ret = pm_runtime_get_if_in_use(iommu->dev); + WARN_ON_ONCE(ret < 0); + if (ret > 0) { rk_iommu_disable(iommu); pm_runtime_put(iommu->dev); } @@ -946,7 +954,8 @@ static int rk_iommu_attach_device(struct iommu_domain *domain, list_add_tail(&iommu->node, &rk_domain->iommus); spin_unlock_irqrestore(&rk_domain->iommus_lock, flags);
- if (!pm_runtime_get_if_in_use(iommu->dev)) + ret = pm_runtime_get_if_in_use(iommu->dev); + if (!ret || WARN_ON_ONCE(ret < 0)) return 0;
ret = rk_iommu_enable(iommu);
On Thu, 30 Aug 2018 19:01:16 +0100, Sasha Levin Alexander.Levin@microsoft.com wrote:
From: Marc Zyngier marc.zyngier@arm.com
[ Upstream commit 3fc7c5c0cff3150e471f5fd12f59971c6d2c6513 ]
pm_runtime_get_if_in_use can fail: either PM has been disabled altogether (-EINVAL), or the device hasn't been enabled yet (0). Sadly, the Rockchip IOMMU driver tends to conflate the two things by considering a non-zero return value as successful.
This has the consequence of hiding other bugs, so let's handle this case throughout the driver, with a WARN_ON_ONCE so that we can try and work out what happened.
Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support") Reviewed-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com
Picking this patch on its own feels like a bad idea if CONFIG_PM is not selected. Consider picking up 7db7a8f5638a and d1558dfd9f22 which guarantee that things won't break.
Thanks,
M.
On Sun, Sep 02, 2018 at 08:47:29AM +0100, Marc Zyngier wrote:
On Thu, 30 Aug 2018 19:01:16 +0100, Sasha Levin Alexander.Levin@microsoft.com wrote:
From: Marc Zyngier marc.zyngier@arm.com
[ Upstream commit 3fc7c5c0cff3150e471f5fd12f59971c6d2c6513 ]
pm_runtime_get_if_in_use can fail: either PM has been disabled altogether (-EINVAL), or the device hasn't been enabled yet (0). Sadly, the Rockchip IOMMU driver tends to conflate the two things by considering a non-zero return value as successful.
This has the consequence of hiding other bugs, so let's handle this case throughout the driver, with a WARN_ON_ONCE so that we can try and work out what happened.
Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support") Reviewed-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com
Picking this patch on its own feels like a bad idea if CONFIG_PM is not selected. Consider picking up 7db7a8f5638a and d1558dfd9f22 which guarantee that things won't break.
Grabbed those 2 as well, thanks!
From: Ernesto A. Fernández ernesto.mnd.fernandez@gmail.com
[ Upstream commit a7ec7a4193a2eb3b5341243fc0b621c1ac9e4ec4 ]
An HFS+ filesystem can be mounted read-only without having a metadata directory, which is needed to support hardlinks. But if the catalog data is corrupted, a directory lookup may still find dentries claiming to be hardlinks.
hfsplus_lookup() does check that ->hidden_dir is not NULL in such a situation, but mistakenly does so after dereferencing it for the first time. Reorder this check to prevent a crash.
This happens when looking up corrupted catalog data (dentry) on a filesystem with no metadata directory (this could only ever happen on a read-only mount). Wen Xu sent the replication steps in detail to the fsdevel list: https://bugzilla.kernel.org/show_bug.cgi?id=200297
Link: http://lkml.kernel.org/r/20180712215344.q44dyrhymm4ajkao@eaf Signed-off-by: Ernesto A. Fernández ernesto.mnd.fernandez@gmail.com Reported-by: Wen Xu wen.xu@gatech.edu Cc: Viacheslav Dubeyko slava@dubeyko.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/hfsplus/dir.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c index b5254378f011..cd017d7dbdfa 100644 --- a/fs/hfsplus/dir.c +++ b/fs/hfsplus/dir.c @@ -78,13 +78,13 @@ static struct dentry *hfsplus_lookup(struct inode *dir, struct dentry *dentry, cpu_to_be32(HFSP_HARDLINK_TYPE) && entry.file.user_info.fdCreator == cpu_to_be32(HFSP_HFSPLUS_CREATOR) && + HFSPLUS_SB(sb)->hidden_dir && (entry.file.create_date == HFSPLUS_I(HFSPLUS_SB(sb)->hidden_dir)-> create_date || entry.file.create_date == HFSPLUS_I(d_inode(sb->s_root))-> - create_date) && - HFSPLUS_SB(sb)->hidden_dir) { + create_date)) { struct qstr str; char name[32];
From: Marc Zyngier marc.zyngier@arm.com
[ Upstream commit 1aa55ca9b14af6cfd987ce4fdaf548f7067a5d07 ]
Enabling the interrupt early, before power has been applied to the device, can result in an interrupt being delivered too early if:
- the IOMMU shares an interrupt with a VOP - the VOP has a pending interrupt (after a kexec, for example)
In these conditions, we end-up taking the interrupt without the IOMMU being ready to handle the interrupt (not powered on).
Moving the interrupt request past the pm_runtime_enable() call makes sure we can at least access the IOMMU registers. Note that this is only a partial fix, and that the VOP interrupt will still be screaming until the VOP driver kicks in, which advocates for a more synchronized interrupt enabling/disabling approach.
Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support") Reviewed-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/iommu/rockchip-iommu.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 4e0f9b61cd7f..2b1724e8d307 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -1161,17 +1161,6 @@ static int rk_iommu_probe(struct platform_device *pdev) if (iommu->num_mmu == 0) return PTR_ERR(iommu->bases[0]);
- i = 0; - while ((irq = platform_get_irq(pdev, i++)) != -ENXIO) { - if (irq < 0) - return irq; - - err = devm_request_irq(iommu->dev, irq, rk_iommu_irq, - IRQF_SHARED, dev_name(dev), iommu); - if (err) - return err; - } - iommu->reset_disabled = device_property_read_bool(dev, "rockchip,disable-mmu-reset");
@@ -1228,6 +1217,19 @@ static int rk_iommu_probe(struct platform_device *pdev)
pm_runtime_enable(dev);
+ i = 0; + while ((irq = platform_get_irq(pdev, i++)) != -ENXIO) { + if (irq < 0) + return irq; + + err = devm_request_irq(iommu->dev, irq, rk_iommu_irq, + IRQF_SHARED, dev_name(dev), iommu); + if (err) { + pm_runtime_disable(dev); + goto err_remove_sysfs; + } + } + return 0; err_remove_sysfs: iommu_device_sysfs_remove(&iommu->iommu);
From: James Morse james.morse@arm.com
[ Upstream commit df865e8337c397471b95f51017fea559bc8abb4a ]
elf_kcore_store_hdr() uses __pa() to find the physical address of KCORE_RAM or KCORE_TEXT entries exported as program headers.
This trips CONFIG_DEBUG_VIRTUAL's checks, as the KCORE_TEXT entries are not in the linear map.
Handle these two cases separately, using __pa_symbol() for the KCORE_TEXT entries.
Link: http://lkml.kernel.org/r/20180711131944.15252-1-james.morse@arm.com Signed-off-by: James Morse james.morse@arm.com Cc: Alexey Dobriyan adobriyan@gmail.com Cc: Omar Sandoval osandov@fb.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/proc/kcore.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index e64ecb9f2720..66c373230e60 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -384,8 +384,10 @@ static void elf_kcore_store_hdr(char *bufp, int nphdr, int dataoff) phdr->p_flags = PF_R|PF_W|PF_X; phdr->p_offset = kc_vaddr_to_offset(m->addr) + dataoff; phdr->p_vaddr = (size_t)m->addr; - if (m->type == KCORE_RAM || m->type == KCORE_TEXT) + if (m->type == KCORE_RAM) phdr->p_paddr = __pa(m->addr); + else if (m->type == KCORE_TEXT) + phdr->p_paddr = __pa_symbol(m->addr); else phdr->p_paddr = (elf_addr_t)-1; phdr->p_filesz = phdr->p_memsz = m->size;
From: OGAWA Hirofumi hirofumi@mail.parknet.co.jp
[ Upstream commit 0afa9626667c3659ef8bd82d42a11e39fedf235c ]
On corrupted FATfs may have invalid ->i_start. To handle it, this checks ->i_start before using, and return proper error code.
Link: http://lkml.kernel.org/r/87o9f8y1t5.fsf_-_@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi hirofumi@mail.parknet.co.jp Reported-by: Anatoly Trosinenko anatoly.trosinenko@gmail.com Tested-by: Anatoly Trosinenko anatoly.trosinenko@gmail.com Cc: Alan Cox gnomes@lxorguk.ukuu.org.uk Cc: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/fat/cache.c | 19 ++++++++++++------- fs/fat/fat.h | 5 +++++ fs/fat/fatent.c | 6 +++--- 3 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/fs/fat/cache.c b/fs/fat/cache.c index e9bed49df6b7..78d501c1fb65 100644 --- a/fs/fat/cache.c +++ b/fs/fat/cache.c @@ -225,7 +225,8 @@ static inline void cache_init(struct fat_cache_id *cid, int fclus, int dclus) int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus) { struct super_block *sb = inode->i_sb; - const int limit = sb->s_maxbytes >> MSDOS_SB(sb)->cluster_bits; + struct msdos_sb_info *sbi = MSDOS_SB(sb); + const int limit = sb->s_maxbytes >> sbi->cluster_bits; struct fat_entry fatent; struct fat_cache_id cid; int nr; @@ -234,6 +235,12 @@ int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus)
*fclus = 0; *dclus = MSDOS_I(inode)->i_start; + if (!fat_valid_entry(sbi, *dclus)) { + fat_fs_error_ratelimit(sb, + "%s: invalid start cluster (i_pos %lld, start %08x)", + __func__, MSDOS_I(inode)->i_pos, *dclus); + return -EIO; + } if (cluster == 0) return 0;
@@ -250,9 +257,8 @@ int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus) /* prevent the infinite loop of cluster chain */ if (*fclus > limit) { fat_fs_error_ratelimit(sb, - "%s: detected the cluster chain loop" - " (i_pos %lld)", __func__, - MSDOS_I(inode)->i_pos); + "%s: detected the cluster chain loop (i_pos %lld)", + __func__, MSDOS_I(inode)->i_pos); nr = -EIO; goto out; } @@ -262,9 +268,8 @@ int fat_get_cluster(struct inode *inode, int cluster, int *fclus, int *dclus) goto out; else if (nr == FAT_ENT_FREE) { fat_fs_error_ratelimit(sb, - "%s: invalid cluster chain (i_pos %lld)", - __func__, - MSDOS_I(inode)->i_pos); + "%s: invalid cluster chain (i_pos %lld)", + __func__, MSDOS_I(inode)->i_pos); nr = -EIO; goto out; } else if (nr == FAT_ENT_EOF) { diff --git a/fs/fat/fat.h b/fs/fat/fat.h index 8fc1093da47d..a0a00f3734bc 100644 --- a/fs/fat/fat.h +++ b/fs/fat/fat.h @@ -348,6 +348,11 @@ static inline void fatent_brelse(struct fat_entry *fatent) fatent->fat_inode = NULL; }
+static inline bool fat_valid_entry(struct msdos_sb_info *sbi, int entry) +{ + return FAT_START_ENT <= entry && entry < sbi->max_cluster; +} + extern void fat_ent_access_init(struct super_block *sb); extern int fat_ent_read(struct inode *inode, struct fat_entry *fatent, int entry); diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c index bac10de678cc..3aef8630a4b9 100644 --- a/fs/fat/fatent.c +++ b/fs/fat/fatent.c @@ -23,7 +23,7 @@ static void fat12_ent_blocknr(struct super_block *sb, int entry, { struct msdos_sb_info *sbi = MSDOS_SB(sb); int bytes = entry + (entry >> 1); - WARN_ON(entry < FAT_START_ENT || sbi->max_cluster <= entry); + WARN_ON(!fat_valid_entry(sbi, entry)); *offset = bytes & (sb->s_blocksize - 1); *blocknr = sbi->fat_start + (bytes >> sb->s_blocksize_bits); } @@ -33,7 +33,7 @@ static void fat_ent_blocknr(struct super_block *sb, int entry, { struct msdos_sb_info *sbi = MSDOS_SB(sb); int bytes = (entry << sbi->fatent_shift); - WARN_ON(entry < FAT_START_ENT || sbi->max_cluster <= entry); + WARN_ON(!fat_valid_entry(sbi, entry)); *offset = bytes & (sb->s_blocksize - 1); *blocknr = sbi->fat_start + (bytes >> sb->s_blocksize_bits); } @@ -353,7 +353,7 @@ int fat_ent_read(struct inode *inode, struct fat_entry *fatent, int entry) int err, offset; sector_t blocknr;
- if (entry < FAT_START_ENT || sbi->max_cluster <= entry) { + if (!fat_valid_entry(sbi, entry)) { fatent_brelse(fatent); fat_fs_error(sb, "invalid access to FAT (entry 0x%08x)", entry); return -EIO;
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit d86564a2f085b79ec046a5cba90188e612352806 ]
Jann reported that x86 was missing required TLB invalidates when he hit the !*batch slow path in tlb_remove_table().
This is indeed the case; RCU_TABLE_FREE does not provide TLB (cache) invalidates, the PowerPC-hash where this code originated and the Sparc-hash where this was subsequently used did not need that. ARM which later used this put an explicit TLB invalidate in their __p*_free_tlb() functions, and PowerPC-radix followed that example.
But when we hooked up x86 we failed to consider this. Fix this by (optionally) hooking tlb_remove_table() into the TLB invalidate code.
NOTE: s390 was also needing something like this and might now be able to use the generic code again.
[ Modified to be on top of Nick's cleanups, which simplified this patch now that tlb_flush_mmu_tlbonly() really only flushes the TLB - Linus ]
Fixes: 9e52fc2b50de ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)") Reported-by: Jann Horn jannh@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Rik van Riel riel@surriel.com Cc: Nicholas Piggin npiggin@gmail.com Cc: David Miller davem@davemloft.net Cc: Will Deacon will.deacon@arm.com Cc: Martin Schwidefsky schwidefsky@de.ibm.com Cc: Michael Ellerman mpe@ellerman.id.au Cc: stable@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- arch/Kconfig | 3 +++ arch/x86/Kconfig | 1 + mm/memory.c | 18 ++++++++++++++++++ 3 files changed, 22 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig index d1f2ed462ac8..f03b72644902 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -354,6 +354,9 @@ config HAVE_ARCH_JUMP_LABEL config HAVE_RCU_TABLE_FREE bool
+config HAVE_RCU_TABLE_INVALIDATE + bool + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6b8065d718bd..1aa4dd3b5687 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -179,6 +179,7 @@ config X86 select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE + select HAVE_RCU_TABLE_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR diff --git a/mm/memory.c b/mm/memory.c index 0e356dd923c2..42ce01fd9793 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -330,6 +330,21 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_ * See the comment near struct mmu_table_batch. */
+/* + * If we want tlb_remove_table() to imply TLB invalidates. + */ +static inline void tlb_table_invalidate(struct mmu_gather *tlb) +{ +#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE + /* + * Invalidate page-table caches used by hardware walkers. Then we still + * need to RCU-sched wait while freeing the pages because software + * walkers can still be in-flight. + */ + tlb_flush_mmu_tlbonly(tlb); +#endif +} + static void tlb_remove_table_smp_sync(void *arg) { /* Simply deliver the interrupt */ @@ -366,6 +381,7 @@ void tlb_table_flush(struct mmu_gather *tlb) struct mmu_table_batch **batch = &tlb->batch;
if (*batch) { + tlb_table_invalidate(tlb); call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu); *batch = NULL; } @@ -387,11 +403,13 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table) if (*batch == NULL) { *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN); if (*batch == NULL) { + tlb_table_invalidate(tlb); tlb_remove_table_one(table); return; } (*batch)->nr = 0; } + (*batch)->tables[(*batch)->nr++] = table; if ((*batch)->nr == MAX_TABLE_BATCH) tlb_table_flush(tlb);
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit d6e89786bed977f37f55ffca11e563f6d2b1e3b5 ]
In cancel_work_sync(), we can only have one of two cases, even with an ordered workqueue: * the work isn't running, just cancelled before it started * the work is running, but then nothing else can be on the workqueue before it
Thus, we need to skip the lockdep workqueue dependency handling, otherwise we get false positive reports from lockdep saying that we have a potential deadlock when the workqueue also has other work items with locking, e.g.
work1_function() { mutex_lock(&mutex); ... } work2_function() { /* nothing */ }
other_function() { queue_work(ordered_wq, &work1); queue_work(ordered_wq, &work2); mutex_lock(&mutex); cancel_work_sync(&work2); }
As described above, this isn't a problem, but lockdep will currently flag it as if cancel_work_sync() was flush_work(), which *is* a problem.
Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- kernel/workqueue.c | 37 ++++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 78b192071ef7..a6c2b823f348 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2843,7 +2843,8 @@ void drain_workqueue(struct workqueue_struct *wq) } EXPORT_SYMBOL_GPL(drain_workqueue);
-static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr) +static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, + bool from_cancel) { struct worker *worker = NULL; struct worker_pool *pool; @@ -2885,7 +2886,8 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr) * workqueues the deadlock happens when the rescuer stalls, blocking * forward progress. */ - if (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer) { + if (!from_cancel && + (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer)) { lock_map_acquire(&pwq->wq->lockdep_map); lock_map_release(&pwq->wq->lockdep_map); } @@ -2896,6 +2898,22 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr) return false; }
+static bool __flush_work(struct work_struct *work, bool from_cancel) +{ + struct wq_barrier barr; + + if (WARN_ON(!wq_online)) + return false; + + if (start_flush_work(work, &barr, from_cancel)) { + wait_for_completion(&barr.done); + destroy_work_on_stack(&barr.work); + return true; + } else { + return false; + } +} + /** * flush_work - wait for a work to finish executing the last queueing instance * @work: the work to flush @@ -2909,18 +2927,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr) */ bool flush_work(struct work_struct *work) { - struct wq_barrier barr; - - if (WARN_ON(!wq_online)) - return false; - - if (start_flush_work(work, &barr)) { - wait_for_completion(&barr.done); - destroy_work_on_stack(&barr.work); - return true; - } else { - return false; - } + return __flush_work(work, false); } EXPORT_SYMBOL_GPL(flush_work);
@@ -2986,7 +2993,7 @@ static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) * isn't executing. */ if (wq_online) - flush_work(work); + __flush_work(work, true);
clear_work_data(work);
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 87915adc3f0acdf03c776df42e308e5a155c19af ]
In flush_work(), we need to create a lockdep dependency so that the following scenario is appropriately tagged as a problem:
work_function() { mutex_lock(&mutex); ... }
other_function() { mutex_lock(&mutex); flush_work(&work); // or cancel_work_sync(&work); }
This is a problem since the work might be running and be blocked on trying to acquire the mutex.
Similarly, in flush_workqueue().
These were removed after cross-release partially caught these problems, but now cross-release was reverted anyway. IMHO the removal was erroneous anyway though, since lockdep should be able to catch potential problems, not just actual ones, and cross-release would only have caught the problem when actually invoking wait_for_completion().
Fixes: fd1a5b04dfb8 ("workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes") Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- kernel/workqueue.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index a6c2b823f348..60e80198c3df 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2652,6 +2652,9 @@ void flush_workqueue(struct workqueue_struct *wq) if (WARN_ON(!wq_online)) return;
+ lock_map_acquire(&wq->lockdep_map); + lock_map_release(&wq->lockdep_map); + mutex_lock(&wq->mutex);
/* @@ -2905,6 +2908,11 @@ static bool __flush_work(struct work_struct *work, bool from_cancel) if (WARN_ON(!wq_online)) return false;
+ if (!from_cancel) { + lock_map_acquire(&work->lockdep_map); + lock_map_release(&work->lockdep_map); + } + if (start_flush_work(work, &barr, from_cancel)) { wait_for_completion(&barr.done); destroy_work_on_stack(&barr.work);
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 1f3aa9002dc6a0d59a4b599b4fc8f01cf43ef014 ]
Fix missing error check for memory allocation functions in scripts/mod/modpost.c.
Fixes kernel bugzilla #200319: https://bugzilla.kernel.org/show_bug.cgi?id=200319
Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: Yuexing Wang wangyxlandq@gmail.com Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- scripts/mod/modpost.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 1663fb19343a..b95cf57782a3 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -672,7 +672,7 @@ static void handle_modversions(struct module *mod, struct elf_info *info, if (ELF_ST_TYPE(sym->st_info) == STT_SPARC_REGISTER) break; if (symname[0] == '.') { - char *munged = strdup(symname); + char *munged = NOFAIL(strdup(symname)); munged[0] = '_'; munged[1] = toupper(munged[1]); symname = munged; @@ -1318,7 +1318,7 @@ static Elf_Sym *find_elf_symbol2(struct elf_info *elf, Elf_Addr addr, static char *sec2annotation(const char *s) { if (match(s, init_exit_sections)) { - char *p = malloc(20); + char *p = NOFAIL(malloc(20)); char *r = p;
*p++ = '_'; @@ -1338,7 +1338,7 @@ static char *sec2annotation(const char *s) strcat(p, " "); return r; } else { - return strdup(""); + return NOFAIL(strdup("")); } }
@@ -2036,7 +2036,7 @@ void buf_write(struct buffer *buf, const char *s, int len) { if (buf->size - buf->pos < len) { buf->size += len + SZ; - buf->p = realloc(buf->p, buf->size); + buf->p = NOFAIL(realloc(buf->p, buf->size)); } strncpy(buf->p + buf->pos, s, len); buf->pos += len;
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 0a6b29230ec336189bab32498df3f06c8a6944d8 ]
We should return error pointers in this function. Returning NULL results in a NULL dereference in the caller.
Fixes: 73688d1ed0b8 ("apparmor: refactor prepare_ns() and make usable from different views") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: John Johansen john.johansen@canonical.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- security/apparmor/policy_ns.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/security/apparmor/policy_ns.c b/security/apparmor/policy_ns.c index b0f9dc3f765a..1a7cec5d9cac 100644 --- a/security/apparmor/policy_ns.c +++ b/security/apparmor/policy_ns.c @@ -255,7 +255,7 @@ static struct aa_ns *__aa_create_ns(struct aa_ns *parent, const char *name,
ns = alloc_ns(parent->base.hname, name); if (!ns) - return NULL; + return ERR_PTR(-ENOMEM); ns->level = parent->level + 1; mutex_lock_nested(&ns->lock, ns->level); error = __aafs_ns_mkdir(ns, ns_subns_dir(parent), name, dir);
From: Suzuki K Poulose suzuki.poulose@arm.com
[ Upstream commit 69599206ea9a3f8f2e94d46580579cbf9d08ad6c ]
Legacy PCI over virtio uses a 32bit PFN for the queue. If the queue pfn is too large to fit in 32bits, which we could hit on arm64 systems with 52bit physical addresses (even with 64K page size), we simply miss out a proper link to the other side of the queue.
Add a check to validate the PFN, rather than silently breaking the devices.
Cc: "Michael S. Tsirkin" mst@redhat.com Cc: Jason Wang jasowang@redhat.com Cc: Marc Zyngier marc.zyngier@arm.com Cc: Christoffer Dall cdall@kernel.org Cc: Peter Maydel peter.maydell@linaro.org Cc: Jean-Philippe Brucker jean-philippe.brucker@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/virtio/virtio_pci_legacy.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index 2780886e8ba3..de062fb201bc 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, struct virtqueue *vq; u16 num; int err; + u64 q_pfn;
/* Select the queue we're interested in */ iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL); @@ -141,9 +142,17 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, if (!vq) return ERR_PTR(-ENOMEM);
+ q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT; + if (q_pfn >> 32) { + dev_err(&vp_dev->pci_dev->dev, + "platform bug: legacy virtio-mmio must not be used with RAM above 0x%llxGB\n", + 0x1ULL << (32 + PAGE_SHIFT - 30)); + err = -E2BIG; + goto out_del_vq; + } + /* activate the queue */ - iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, - vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); + iowrite32(q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;
@@ -160,6 +169,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev,
out_deactivate: iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); +out_del_vq: vring_del_virtqueue(vq); return ERR_PTR(err); }
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 704ae091b061082b37a9968621af4c290c641d50 ]
Without linux/irq.h, there is no declaration of notifier_block, leading to a build warning:
In file included from arch/x86/kernel/cpu/mcheck/threshold.c:10: arch/x86/include/asm/mce.h:151:46: error: 'struct notifier_block' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
It's sufficient to declare the struct tag here, which avoids pulling in more header files.
Fixes: 447ae3166702 ("x86: Don't include linux/irq.h from asm/hardirq.h") Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Nicolai Stange nstange@suse.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Borislav Petkov bp@suse.de Link: https://lkml.kernel.org/r/20180817100156.3009043-1-arnd@arndb.de Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- arch/x86/include/asm/mce.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 8c7b3e5a2d01..3a17107594c8 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -148,6 +148,7 @@ enum mce_notifier_prios { MCE_PRIO_LOWEST = 0, };
+struct notifier_block; extern void mce_register_decode_chain(struct notifier_block *nb); extern void mce_unregister_decode_chain(struct notifier_block *nb);
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 0a30446c0dca3483c384b54a431cc951e15f7e79 ]
Currently acpi_gsb_i2c_read_bytes() directly returns i2c_transfer's return value. i2c_transfer returns a value < 0 on error and 2 (for 2 successfully executed transfers) on success. But the ACPI code expects 0 on success, so currently acpi_gsb_i2c_read_bytes()'s caller does:
if (status > 0) status = 0;
This commit makes acpi_gsb_i2c_read_bytes() return a value which can be directly consumed by the ACPI code, mirroring acpi_gsb_i2c_write_bytes(), this commit also makes acpi_gsb_i2c_read_bytes() explitcly check that i2c_transfer returns 2, rather then accepting any value > 0.
Signed-off-by: Hans de Goede hdegoede@redhat.com Acked-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Wolfram Sang wsa@the-dreams.de Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/i2c/i2c-core-acpi.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c index b8f303dea305..32affd3fa8bd 100644 --- a/drivers/i2c/i2c-core-acpi.c +++ b/drivers/i2c/i2c-core-acpi.c @@ -453,8 +453,12 @@ static int acpi_gsb_i2c_read_bytes(struct i2c_client *client, else dev_err(&client->adapter->dev, "i2c read %d bytes from client@%#x starting at reg %#x failed, error: %d\n", data_len, client->addr, cmd, ret); - } else { + /* 2 transfers must have completed successfully */ + } else if (ret == 2) { memcpy(data, buffer, data_len); + ret = 0; + } else { + ret = -EIO; }
kfree(buffer); @@ -595,8 +599,6 @@ i2c_acpi_space_handler(u32 function, acpi_physical_address command, if (action == ACPI_READ) { status = acpi_gsb_i2c_read_bytes(client, command, gsb->data, info->access_length); - if (status > 0) - status = 0; } else { status = acpi_gsb_i2c_write_bytes(client, command, gsb->data, info->access_length);
From: "Michael J. Ruhl" michael.j.ruhl@intel.com
[ Upstream commit c513de490f808d8480346f9a58e6a4a5f3de12e7 ]
If the system BIOS does not supply NUMA node information to the PCI devices, the NUMA node is selected by choosing the current node.
This can lead to the following crash:
divide error: 0000 SMP CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G IOE ------------ 3.10.0-693.21.1.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.01.01.0005.101720141054 10/17/2014 Workqueue: events work_for_cpu_fn task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000 RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1] RSP: 0018:ffff88017448bbf8 EFLAGS: 00010246 RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000 RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002 R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0 R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012 FS: 0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: hfi1_init_dd+0x14b3/0x27a0 [hfi1] ? pcie_capability_write_word+0x46/0x70 ? hfi1_pcie_init+0xc0/0x200 [hfi1] do_init_one+0x153/0x4c0 [hfi1] ? sched_clock_cpu+0x85/0xc0 init_one+0x1b5/0x260 [hfi1] local_pci_probe+0x4a/0xb0 work_for_cpu_fn+0x1a/0x30 process_one_work+0x17f/0x440 worker_thread+0x278/0x3c0 ? manage_workers.isra.24+0x2a0/0x2a0 kthread+0xd1/0xe0 ? insert_kthread_work+0x40/0x40 ret_from_fork+0x77/0xb0 ? insert_kthread_work+0x40/0x40
If the BIOS is not supplying NUMA information: - set the default table count to 1 for all possible nodes - select node 0 (instead of current NUMA) node to get consistent performance - generate an error indicating that the BIOS should be upgraded
Reviewed-by: Gary Leshner gary.s.leshner@intel.com Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com
Signed-off-by: Jason Gunthorpe jgg@mellanox.com
Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/infiniband/hw/hfi1/affinity.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/affinity.c b/drivers/infiniband/hw/hfi1/affinity.c index fbe7198a715a..bedd5fba33b0 100644 --- a/drivers/infiniband/hw/hfi1/affinity.c +++ b/drivers/infiniband/hw/hfi1/affinity.c @@ -198,7 +198,7 @@ int node_affinity_init(void) while ((dev = pci_get_device(ids->vendor, ids->device, dev))) { node = pcibus_to_node(dev->bus); if (node < 0) - node = numa_node_id(); + goto out;
hfi1_per_node_cntr[node]++; } @@ -206,6 +206,18 @@ int node_affinity_init(void) }
return 0; + +out: + /* + * Invalid PCI NUMA node information found, note it, and populate + * our database 1:1. + */ + pr_err("HFI: Invalid PCI NUMA node. Performance may be affected\n"); + pr_err("HFI: System BIOS may need to be upgraded\n"); + for (node = 0; node < node_affinity.num_possible_nodes; node++) + hfi1_per_node_cntr[node] = 1; + + return 0; }
static void node_affinity_destroy(struct hfi1_affinity_node *entry) @@ -622,8 +634,14 @@ int hfi1_dev_affinity_init(struct hfi1_devdata *dd) int curr_cpu, possible, i, ret; bool new_entry = false;
- if (node < 0) - node = numa_node_id(); + /* + * If the BIOS does not have the NUMA node information set, select + * NUMA 0 so we get consistent performance. + */ + if (node < 0) { + dd_dev_err(dd, "Invalid PCI NUMA node. Performance may be affected\n"); + node = 0; + } dd->node = node;
local_mask = cpumask_of_node(dd->node);
From: Jerome Brunet jbrunet@baylibre.com
[ Upstream commit b96e9eb62841c519ba1db32d036628be3cdef91f ]
Current clock name looks like this: /soc/bus@ffd00000/pwm@1b000#mux0
This is bad because CCF uses the clock to create a directory in clk debugfs. With such name, the directory creation (silently) fails and the debugfs entry end up being created at the debugfs root.
With this change, the clock name will now be: ffd1b000.pwm#mux0
This matches the clock naming scheme used in the ethernet and mmc driver. It also fixes the problem with debugfs.
Fixes: 36af66a79056 ("pwm: Convert to using %pOF instead of full_name") Signed-off-by: Jerome Brunet jbrunet@baylibre.com Acked-by: Neil Armstrong narmstrong@baylibre.com Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/pwm/pwm-meson.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/pwm/pwm-meson.c b/drivers/pwm/pwm-meson.c index 822860b4801a..c1ed641b3e26 100644 --- a/drivers/pwm/pwm-meson.c +++ b/drivers/pwm/pwm-meson.c @@ -458,7 +458,6 @@ static int meson_pwm_init_channels(struct meson_pwm *meson, struct meson_pwm_channel *channels) { struct device *dev = meson->chip.dev; - struct device_node *np = dev->of_node; struct clk_init_data init; unsigned int i; char name[255]; @@ -467,7 +466,7 @@ static int meson_pwm_init_channels(struct meson_pwm *meson, for (i = 0; i < meson->chip.npwm; i++) { struct meson_pwm_channel *channel = &channels[i];
- snprintf(name, sizeof(name), "%pOF#mux%u", np, i); + snprintf(name, sizeof(name), "%s#mux%u", dev_name(dev), i);
init.name = name; init.ops = &clk_mux_ops;
From: Jiri Olsa jolsa@kernel.org
[ Upstream commit 721f0dfc3ce821c6a32820ab63edfb48ed4af075 ]
Jaroslav reported errors from valgrind over perf python script:
# echo 0 > /sys/devices/system/cpu/cpu4/online # valgrind ./test.py ==7524== Memcheck, a memory error detector ... ==7524== Command: ./test.py ==7524== pid 7526 exited ==7524== Invalid read of size 8 ==7524== at 0xCC2C2B3: perf_mmap__read_forward (evlist.c:780) ==7524== by 0xCC2A681: pyrf_evlist__read_on_cpu (python.c:959) ... ==7524== Address 0x65c4868 is 16 bytes after a block of size 459,36.. ==7524== at 0x4C2B955: calloc (vg_replace_malloc.c:711) ==7524== by 0xCC2F484: zalloc (util.h:35) ==7524== by 0xCC2F484: perf_evlist__alloc_mmap (evlist.c:978) ...
The reason for this is in the python interface, that allows a script to pass arbitrary cpu number, which is then used to access struct perf_evlist::mmap array. That's obviously wrong and works only when if all cpus are available and fails if some cpu is missing, like in the example above.
This patch makes pyrf_evlist__read_on_cpu() search the evlist's maps array for the proper map to access.
It's linear search at the moment. Based on the way how is the read_on_cpu used, I don't think we need to be fast in here. But we could add some hash in the middle to make it fast/er.
We don't allow python interface to set write_backward event attribute, so it's safe to check only evlist's mmaps.
Reported-by: Jaroslav Škarvada jskarvad@redhat.com Signed-off-by: Jiri Olsa jolsa@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: David Ahern dsahern@gmail.com Cc: Joe Mario jmario@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lkml.kernel.org/r/20180817114556.28000-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- tools/perf/util/python.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index 863b61478edd..eefc56b4b0df 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -11,6 +11,7 @@ #include "cpumap.h" #include "print_binary.h" #include "thread_map.h" +#include "mmap.h"
#if PY_MAJOR_VERSION < 3 #define _PyUnicode_FromString(arg) \ @@ -976,6 +977,20 @@ static PyObject *pyrf_evlist__add(struct pyrf_evlist *pevlist, return Py_BuildValue("i", evlist->nr_entries); }
+static struct perf_mmap *get_md(struct perf_evlist *evlist, int cpu) +{ + int i; + + for (i = 0; i < evlist->nr_mmaps; i++) { + struct perf_mmap *md = &evlist->mmap[i]; + + if (md->cpu == cpu) + return md; + } + + return NULL; +} + static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist, PyObject *args, PyObject *kwargs) { @@ -990,7 +1005,10 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist, &cpu, &sample_id_all)) return NULL;
- md = &evlist->mmap[cpu]; + md = get_md(evlist, cpu); + if (!md) + return NULL; + if (perf_mmap__read_init(md) < 0) goto end;
From: Srikar Dronamraju srikar@linux.vnet.ibm.com
[ Upstream commit 2ea62630681027c455117aa471ea3ab8bb099ead ]
On a shared LPAR, Phyp will not update the CPU associativity at boot time. Just after the boot system does recognize itself as a shared LPAR and trigger a request for correct CPU associativity. But by then the scheduler would have already created/destroyed its sched domains.
This causes - Broken load balance across Nodes causing islands of cores. - Performance degradation esp if the system is lightly loaded - dmesg to wrongly report all CPUs to be in Node 0. - Messages in dmesg saying borken topology. - With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain"), can cause rcu stalls at boot up.
The sched_domains_numa_masks table which is used to generate cpumasks is only created at boot time just before creating sched domains and never updated. Hence, its better to get the topology correct before the sched domains are created.
For example on 64 core Power 8 shared LPAR, dmesg reports
Brought up 512 CPUs Node 0 CPUs: 0-511 Node 1 CPUs: Node 2 CPUs: Node 3 CPUs: Node 4 CPUs: Node 5 CPUs: Node 6 CPUs: Node 7 CPUs: Node 8 CPUs: Node 9 CPUs: Node 10 CPUs: Node 11 CPUs: ... BUG: arch topology borken the DIE domain not a subset of the NUMA domain BUG: arch topology borken the DIE domain not a subset of the NUMA domain
numactl/lscpu output will still be correct with cores spreading across all nodes:
Socket(s): 64 NUMA node(s): 12 Model: 2.0 (pvr 004d 0200) Model name: POWER8 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471 NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479 NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487 NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495 NUMA node4 CPU(s): 208-215,304-311,400-407,496-503 NUMA node5 CPU(s): 168-175,264-271,360-367,456-463 NUMA node6 CPU(s): 128-135,224-231,320-327,416-423 NUMA node7 CPU(s): 136-143,232-239,328-335,424-431 NUMA node8 CPU(s): 216-223,312-319,408-415,504-511 NUMA node9 CPU(s): 144-151,240-247,336-343,432-439 NUMA node10 CPU(s): 152-159,248-255,344-351,440-447 NUMA node11 CPU(s): 160-167,256-263,352-359,448-455
Currently on this LPAR, the scheduler detects 2 levels of Numa and created numa sched domains for all CPUs, but it finds a single DIE domain consisting of all CPUs. Hence it deletes all numa sched domains.
To address this, detect the shared processor and update topology soon after CPUs are setup so that correct topology is updated just before scheduler creates sched domain.
With the fix, dmesg reports:
numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 368-375 464-471 numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 376-383 472-479 numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 384-391 480-487 numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 392-399 488-495 numa: Node 4 CPUs: 208-215 304-311 400-407 496-503 numa: Node 5 CPUs: 168-175 264-271 360-367 456-463 numa: Node 6 CPUs: 128-135 224-231 320-327 416-423 numa: Node 7 CPUs: 136-143 232-239 328-335 424-431 numa: Node 8 CPUs: 216-223 312-319 408-415 504-511 numa: Node 9 CPUs: 144-151 240-247 336-343 432-439 numa: Node 10 CPUs: 152-159 248-255 344-351 440-447 numa: Node 11 CPUs: 160-167 256-263 352-359 448-455
and lscpu also reports:
Socket(s): 64 NUMA node(s): 12 Model: 2.0 (pvr 004d 0200) Model name: POWER8 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471 NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479 NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487 NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495 NUMA node4 CPU(s): 208-215,304-311,400-407,496-503 NUMA node5 CPU(s): 168-175,264-271,360-367,456-463 NUMA node6 CPU(s): 128-135,224-231,320-327,416-423 NUMA node7 CPU(s): 136-143,232-239,328-335,424-431 NUMA node8 CPU(s): 216-223,312-319,408-415,504-511 NUMA node9 CPU(s): 144-151,240-247,336-343,432-439 NUMA node10 CPU(s): 152-159,248-255,344-351,440-447 NUMA node11 CPU(s): 160-167,256-263,352-359,448-455
Reported-by: Manjunatha H R manjuhr1@in.ibm.com Signed-off-by: Srikar Dronamraju srikar@linux.vnet.ibm.com [mpe: Trim / format change log] Tested-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- arch/powerpc/include/asm/topology.h | 5 +++++ arch/powerpc/kernel/smp.c | 5 +++++ arch/powerpc/mm/numa.c | 20 ++++++++++---------- 3 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 16b077801a5f..a4a718dbfec6 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -92,6 +92,7 @@ extern int stop_topology_update(void); extern int prrn_is_enabled(void); extern int find_and_online_cpu_nid(int cpu); extern int timed_topology_update(int nsecs); +extern void __init shared_proc_topology_init(void); #else static inline int start_topology_update(void) { @@ -113,6 +114,10 @@ static inline int timed_topology_update(int nsecs) { return 0; } + +#ifdef CONFIG_SMP +static inline void shared_proc_topology_init(void) {} +#endif #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
#include <asm-generic/topology.h> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 4794d6b4f4d2..b3142c7b9c31 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1156,6 +1156,11 @@ void __init smp_cpus_done(unsigned int max_cpus) if (smp_ops && smp_ops->bringup_done) smp_ops->bringup_done();
+ /* + * On a shared LPAR, associativity needs to be requested. + * Hence, get numa topology before dumping cpu topology + */ + shared_proc_topology_init(); dump_numa_cpu_topology();
/* diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 0c7e05d89244..35ac5422903a 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1078,7 +1078,6 @@ static int prrn_enabled; static void reset_topology_timer(void); static int topology_timer_secs = 1; static int topology_inited; -static int topology_update_needed;
/* * Change polling interval for associativity changes. @@ -1306,11 +1305,8 @@ int numa_update_cpu_topology(bool cpus_locked) struct device *dev; int weight, new_nid, i = 0;
- if (!prrn_enabled && !vphn_enabled) { - if (!topology_inited) - topology_update_needed = 1; + if (!prrn_enabled && !vphn_enabled && topology_inited) return 0; - }
weight = cpumask_weight(&cpu_associativity_changes_mask); if (!weight) @@ -1423,7 +1419,6 @@ int numa_update_cpu_topology(bool cpus_locked)
out: kfree(updates); - topology_update_needed = 0; return changed; }
@@ -1551,6 +1546,15 @@ int prrn_is_enabled(void) return prrn_enabled; }
+void __init shared_proc_topology_init(void) +{ + if (lppaca_shared_proc(get_lppaca())) { + bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask), + nr_cpumask_bits); + numa_update_cpu_topology(false); + } +} + static int topology_read(struct seq_file *file, void *v) { if (vphn_enabled || prrn_enabled) @@ -1608,10 +1612,6 @@ static int topology_update_init(void) return -ENOMEM;
topology_inited = 1; - if (topology_update_needed) - bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask), - nr_cpumask_bits); - return 0; } device_initcall(topology_update_init);
From: Kirill Tkhai ktkhai@virtuozzo.com
[ Upstream commit 44bd4a4759d5a714767aa6be7e806ab54b7fa3a8 ]
This is just refactoring to allow the next patches to have memcg pointer in list_lru_from_kmem().
Link: http://lkml.kernel.org/r/153063060664.1818.9541345386733498582.stgit@localho... Signed-off-by: Kirill Tkhai ktkhai@virtuozzo.com Acked-by: Vladimir Davydov vdavydov.dev@gmail.com Tested-by: Shakeel Butt shakeelb@google.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Guenter Roeck linux@roeck-us.net Cc: "Huang, Ying" ying.huang@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Josef Bacik jbacik@fb.com Cc: Li RongQing lirongqing@baidu.com Cc: Matthew Wilcox willy@infradead.org Cc: Matthias Kaehlcke mka@chromium.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@kernel.org Cc: Minchan Kim minchan@kernel.org Cc: Philippe Ombredanne pombredanne@nexb.com Cc: Roman Gushchin guro@fb.com Cc: Sahitya Tummala stummala@codeaurora.org Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Thomas Gleixner tglx@linutronix.de Cc: Waiman Long longman@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- mm/list_lru.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/mm/list_lru.c b/mm/list_lru.c index fcfb6c89ed47..426ec49f9325 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -75,18 +75,24 @@ static __always_inline struct mem_cgroup *mem_cgroup_from_kmem(void *ptr) }
static inline struct list_lru_one * -list_lru_from_kmem(struct list_lru_node *nlru, void *ptr) +list_lru_from_kmem(struct list_lru_node *nlru, void *ptr, + struct mem_cgroup **memcg_ptr) { - struct mem_cgroup *memcg; + struct list_lru_one *l = &nlru->lru; + struct mem_cgroup *memcg = NULL;
if (!nlru->memcg_lrus) - return &nlru->lru; + goto out;
memcg = mem_cgroup_from_kmem(ptr); if (!memcg) - return &nlru->lru; + goto out;
- return list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg)); + l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg)); +out: + if (memcg_ptr) + *memcg_ptr = memcg; + return l; } #else static inline bool list_lru_memcg_aware(struct list_lru *lru) @@ -101,8 +107,11 @@ list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx) }
static inline struct list_lru_one * -list_lru_from_kmem(struct list_lru_node *nlru, void *ptr) +list_lru_from_kmem(struct list_lru_node *nlru, void *ptr, + struct mem_cgroup **memcg_ptr) { + if (memcg_ptr) + *memcg_ptr = NULL; return &nlru->lru; } #endif /* CONFIG_MEMCG && !CONFIG_SLOB */ @@ -115,7 +124,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
spin_lock(&nlru->lock); if (list_empty(item)) { - l = list_lru_from_kmem(nlru, item); + l = list_lru_from_kmem(nlru, item, NULL); list_add_tail(item, &l->list); l->nr_items++; nlru->nr_items++; @@ -135,7 +144,7 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item)
spin_lock(&nlru->lock); if (!list_empty(item)) { - l = list_lru_from_kmem(nlru, item); + l = list_lru_from_kmem(nlru, item, NULL); list_del_init(item); l->nr_items--; nlru->nr_items--;
On 30.08.2018 21:02, Sasha Levin wrote:
From: Kirill Tkhai ktkhai@virtuozzo.com
[ Upstream commit 44bd4a4759d5a714767aa6be7e806ab54b7fa3a8 ]
This commit is not needed for stable. And I think, there are no more commits in this series, which depend on it.
This is just refactoring to allow the next patches to have memcg pointer in list_lru_from_kmem().
Link: http://lkml.kernel.org/r/153063060664.1818.9541345386733498582.stgit@localho... Signed-off-by: Kirill Tkhai ktkhai@virtuozzo.com Acked-by: Vladimir Davydov vdavydov.dev@gmail.com Tested-by: Shakeel Butt shakeelb@google.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Guenter Roeck linux@roeck-us.net Cc: "Huang, Ying" ying.huang@intel.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Josef Bacik jbacik@fb.com Cc: Li RongQing lirongqing@baidu.com Cc: Matthew Wilcox willy@infradead.org Cc: Matthias Kaehlcke mka@chromium.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Michal Hocko mhocko@kernel.org Cc: Minchan Kim minchan@kernel.org Cc: Philippe Ombredanne pombredanne@nexb.com Cc: Roman Gushchin guro@fb.com Cc: Sahitya Tummala stummala@codeaurora.org Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Thomas Gleixner tglx@linutronix.de Cc: Waiman Long longman@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com
mm/list_lru.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/mm/list_lru.c b/mm/list_lru.c index fcfb6c89ed47..426ec49f9325 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -75,18 +75,24 @@ static __always_inline struct mem_cgroup *mem_cgroup_from_kmem(void *ptr) } static inline struct list_lru_one * -list_lru_from_kmem(struct list_lru_node *nlru, void *ptr) +list_lru_from_kmem(struct list_lru_node *nlru, void *ptr,
struct mem_cgroup **memcg_ptr)
{
- struct mem_cgroup *memcg;
- struct list_lru_one *l = &nlru->lru;
- struct mem_cgroup *memcg = NULL;
if (!nlru->memcg_lrus)
return &nlru->lru;
goto out;
memcg = mem_cgroup_from_kmem(ptr); if (!memcg)
return &nlru->lru;
goto out;
- return list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
- l = list_lru_from_memcg_idx(nlru, memcg_cache_id(memcg));
+out:
- if (memcg_ptr)
*memcg_ptr = memcg;
- return l;
} #else static inline bool list_lru_memcg_aware(struct list_lru *lru) @@ -101,8 +107,11 @@ list_lru_from_memcg_idx(struct list_lru_node *nlru, int idx) } static inline struct list_lru_one * -list_lru_from_kmem(struct list_lru_node *nlru, void *ptr) +list_lru_from_kmem(struct list_lru_node *nlru, void *ptr,
struct mem_cgroup **memcg_ptr)
{
- if (memcg_ptr)
return &nlru->lru;*memcg_ptr = NULL;
} #endif /* CONFIG_MEMCG && !CONFIG_SLOB */ @@ -115,7 +124,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item) spin_lock(&nlru->lock); if (list_empty(item)) {
l = list_lru_from_kmem(nlru, item);
list_add_tail(item, &l->list); l->nr_items++; nlru->nr_items++;l = list_lru_from_kmem(nlru, item, NULL);
@@ -135,7 +144,7 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item) spin_lock(&nlru->lock); if (!list_empty(item)) {
l = list_lru_from_kmem(nlru, item);
list_del_init(item); l->nr_items--; nlru->nr_items--;l = list_lru_from_kmem(nlru, item, NULL);
On Fri, Aug 31, 2018 at 12:38:35AM +0300, Kirill Tkhai wrote:
On 30.08.2018 21:02, Sasha Levin wrote:
From: Kirill Tkhai ktkhai@virtuozzo.com
[ Upstream commit 44bd4a4759d5a714767aa6be7e806ab54b7fa3a8 ]
This commit is not needed for stable. And I think, there are no more commits in this series, which depend on it.
What was this patch for in the first place, wasn't it needed for some other fix?
thanks,
greg k-h
On Fri, Aug 31, 2018 at 08:59:43AM -0700, Greg Kroah-Hartman wrote:
On Fri, Aug 31, 2018 at 12:38:35AM +0300, Kirill Tkhai wrote:
On 30.08.2018 21:02, Sasha Levin wrote:
From: Kirill Tkhai ktkhai@virtuozzo.com
[ Upstream commit 44bd4a4759d5a714767aa6be7e806ab54b7fa3a8 ]
This commit is not needed for stable. And I think, there are no more commits in this series, which depend on it.
What was this patch for in the first place, wasn't it needed for some other fix?
I don't think so, it looks like I just missed it when reviewing the commits.
On Fri, Aug 31, 2018 at 12:38:35AM +0300, Kirill Tkhai wrote:
On 30.08.2018 21:02, Sasha Levin wrote:
From: Kirill Tkhai ktkhai@virtuozzo.com
[ Upstream commit 44bd4a4759d5a714767aa6be7e806ab54b7fa3a8 ]
This commit is not needed for stable. And I think, there are no more commits in this series, which depend on it.
Thanks Kirill, dropped it.
From: Andrey Ryabinin aryabinin@virtuozzo.com
[ Upstream commit a718e28f538441a3b6612da9ff226973376cdf0f ]
Signed integer overflow is undefined according to the C standard. The overflow in ksys_fadvise64_64() is deliberate, but since it is signed overflow, UBSAN complains:
UBSAN: Undefined behaviour in mm/fadvise.c:76:10 signed integer overflow: 4 + 9223372036854775805 cannot be represented in type 'long long int'
Use unsigned types to do math. Unsigned overflow is defined so UBSAN will not complain about it. This patch doesn't change generated code.
[akpm@linux-foundation.org: add comment explaining the casts] Link: http://lkml.kernel.org/r/20180629184453.7614-1-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin aryabinin@virtuozzo.com Reported-by: icytxw@gmail.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Alexander Potapenko glider@google.com Cc: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- mm/fadvise.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/fadvise.c b/mm/fadvise.c index afa41491d324..2d8376e3c640 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -72,8 +72,12 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice) goto out; }
- /* Careful about overflows. Len == 0 means "as much as possible" */ - endbyte = offset + len; + /* + * Careful about overflows. Len == 0 means "as much as possible". Use + * unsigned math because signed overflows are undefined and UBSan + * complains. + */ + endbyte = (u64)offset + (u64)len; if (!len || endbyte < len) endbyte = -1; else
From: Mike Rapoport rppt@linux.vnet.ibm.com
[ Upstream commit d39f8fb4b7776dcb09ec3bf7a321547083078ee3 ]
The deferred memory initialization relies on section definitions, e.g PAGES_PER_SECTION, that are only available when CONFIG_SPARSEMEM=y on most architectures.
Initially DEFERRED_STRUCT_PAGE_INIT depended on explicit ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT configuration option, but since the commit 2e3ca40f03bb13709df4 ("mm: relax deferred struct page requirements") this requirement was relaxed and now it is possible to enable DEFERRED_STRUCT_PAGE_INIT on architectures that support DISCONTINGMEM and NO_BOOTMEM which causes build failures.
For instance, setting SMP=y and DEFERRED_STRUCT_PAGE_INIT=y on arc causes the following build failure:
CC mm/page_alloc.o mm/page_alloc.c: In function 'update_defer_init': mm/page_alloc.c:321:14: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'? (pfn & (PAGES_PER_SECTION - 1)) == 0) { ^~~~~~~~~~~~~~~~~ USEC_PER_SEC mm/page_alloc.c:321:14: note: each undeclared identifier is reported only once for each function it appears in In file included from include/linux/cache.h:5:0, from include/linux/printk.h:9, from include/linux/kernel.h:14, from include/asm-generic/bug.h:18, from arch/arc/include/asm/bug.h:32, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/mm.h:9, from mm/page_alloc.c:18: mm/page_alloc.c: In function 'deferred_grow_zone': mm/page_alloc.c:1624:52: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'? unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); ^ include/uapi/linux/kernel.h:11:47: note: in definition of macro '__ALIGN_KERNEL_MASK' #define __ALIGN_KERNEL_MASK(x, mask) (((x) + (mask)) & ~(mask)) ^~~~ include/linux/kernel.h:58:22: note: in expansion of macro '__ALIGN_KERNEL' #define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) ^~~~~~~~~~~~~~ mm/page_alloc.c:1624:34: note: in expansion of macro 'ALIGN' unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); ^~~~~ In file included from include/asm-generic/bug.h:18:0, from arch/arc/include/asm/bug.h:32, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/mm.h:9, from mm/page_alloc.c:18: mm/page_alloc.c: In function 'free_area_init_node': mm/page_alloc.c:6379:50: error: 'PAGES_PER_SECTION' undeclared (first use in this function); did you mean 'USEC_PER_SEC'? pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION, ^ include/linux/kernel.h:812:22: note: in definition of macro '__typecheck' (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) ^ include/linux/kernel.h:836:24: note: in expansion of macro '__safe_cmp' __builtin_choose_expr(__safe_cmp(x, y), \ ^~~~~~~~~~ include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp' #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) ^~~~~~~~~~~~~ mm/page_alloc.c:6379:29: note: in expansion of macro 'min_t' pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION, ^~~~~ include/linux/kernel.h:836:2: error: first argument to '__builtin_choose_expr' not a constant __builtin_choose_expr(__safe_cmp(x, y), \ ^ include/linux/kernel.h:904:27: note: in expansion of macro '__careful_cmp' #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <) ^~~~~~~~~~~~~ mm/page_alloc.c:6379:29: note: in expansion of macro 'min_t' pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION, ^~~~~ scripts/Makefile.build:317: recipe for target 'mm/page_alloc.o' failed
Let's make the DEFERRED_STRUCT_PAGE_INIT explicitly depend on SPARSEMEM as the systems that support DISCONTIGMEM do not seem to have that huge amounts of memory that would make DEFERRED_STRUCT_PAGE_INIT relevant.
Link: http://lkml.kernel.org/r/1530279308-24988-1-git-send-email-rppt@linux.vnet.i... Signed-off-by: Mike Rapoport rppt@linux.vnet.ibm.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Pavel Tatashin pasha.tatashin@oracle.com Tested-by: Randy Dunlap rdunlap@infradead.org Cc: Pasha Tatashin Pavel.Tatashin@microsoft.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- mm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/Kconfig b/mm/Kconfig index ce95491abd6a..94af022b7f3d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -635,7 +635,7 @@ config DEFERRED_STRUCT_PAGE_INIT bool "Defer initialisation of struct pages to kthreads" default n depends on NO_BOOTMEM - depends on !FLATMEM + depends on SPARSEMEM depends on !NEED_PER_CPU_KM help Ordinarily all struct pages are initialised during early boot in a
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 6cd00a01f0c1ae6a852b09c59b8dd55cc6c35d1d ]
Since only dentry->d_name.len + 1 bytes out of DNAME_INLINE_LEN bytes are initialized at __d_alloc(), we can't copy the whole size unconditionally.
WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff8fa27465ac50) 636f6e66696766732e746d70000000000010000000000000020000000188ffff i i i i i i i i i i i i i u u u u u u u u u u i i i i i u u u u ^ RIP: 0010:take_dentry_name_snapshot+0x28/0x50 RSP: 0018:ffffa83000f5bdf8 EFLAGS: 00010246 RAX: 0000000000000020 RBX: ffff8fa274b20550 RCX: 0000000000000002 RDX: ffffa83000f5be40 RSI: ffff8fa27465ac50 RDI: ffffa83000f5be60 RBP: ffffa83000f5bdf8 R08: ffffa83000f5be48 R09: 0000000000000001 R10: ffff8fa27465ac00 R11: ffff8fa27465acc0 R12: ffff8fa27465ac00 R13: ffff8fa27465acc0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f79737ac8c0(0000) GS:ffffffff8fc30000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8fa274c0b000 CR3: 0000000134aa7002 CR4: 00000000000606f0 take_dentry_name_snapshot+0x28/0x50 vfs_rename+0x128/0x870 SyS_rename+0x3b2/0x3d0 entry_SYSCALL_64_fastpath+0x1a/0xa4 0xffffffffffffffff
Link: http://lkml.kernel.org/r/201709131912.GBG39012.QMJLOVFSFFOOtH@I-love.SAKURA.... Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Vegard Nossum vegard.nossum@gmail.com Cc: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/dcache.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/dcache.c b/fs/dcache.c index ceb7b491d1b9..d19a0dc46c04 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -292,7 +292,8 @@ void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry spin_unlock(&dentry->d_lock); name->name = p->name; } else { - memcpy(name->inline_name, dentry->d_iname, DNAME_INLINE_LEN); + memcpy(name->inline_name, dentry->d_iname, + dentry->d_name.len + 1); spin_unlock(&dentry->d_lock); name->name = name->inline_name; }
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit 340fd4cff43f18bace9358d4decdc9b6ed0715be ]
Fix build errors by #including <linux/io.h>.
../drivers/platform/x86/intel_punit_ipc.c: In function 'ipc_read_status': ../drivers/platform/x86/intel_punit_ipc.c:55:2: error: implicit declaration of function 'readl' [-Werror=implicit-function-declaration] return readl(ipcdev->base[type][BASE_IFACE]); ../drivers/platform/x86/intel_punit_ipc.c: In function 'ipc_write_cmd': ../drivers/platform/x86/intel_punit_ipc.c:60:2: error: implicit declaration of function 'writel' [-Werror=implicit-function-declaration] writel(cmd, ipcdev->base[type][BASE_IFACE]);
Fixes: 447ae3166702 ("x86: Don't include linux/irq.h from asm/hardirq.h") Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: Zha Qipeng qipeng.zha@intel.com Cc: platform-driver-x86@vger.kernel.org Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/platform/x86/intel_punit_ipc.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/platform/x86/intel_punit_ipc.c b/drivers/platform/x86/intel_punit_ipc.c index b5b890127479..b7dfe06261f1 100644 --- a/drivers/platform/x86/intel_punit_ipc.c +++ b/drivers/platform/x86/intel_punit_ipc.c @@ -17,6 +17,7 @@ #include <linux/bitops.h> #include <linux/device.h> #include <linux/interrupt.h> +#include <linux/io.h> #include <linux/platform_device.h> #include <asm/intel_punit_ipc.h>
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit 166ab6f0a0702fdd4d865ad5090bf3094ed83428 ]
The smap_start_sock() and smap_stop_sock() are each protected under the sock->sk_callback_lock from their call-sites except in the case of sock_map_delete_elem() where we drop the old socket from the map slot. This is racy because the same sock could be part of multiple sock maps, so we run smap_stop_sock() in parallel, and given at that point psock->strp_enabled might be true on both CPUs, we might for example wrongly restore the sk->sk_data_ready / sk->sk_write_space. Therefore, hold the sock->sk_callback_lock as well on delete. Looks like 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support") had this right, but later on e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close") removed it again from delete leaving this smap_stop_sock() instance unprotected.
Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Acked-by: Song Liu songliubraving@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- kernel/bpf/sockmap.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index c4d75c52b4fc..694da74d7df0 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -1784,8 +1784,11 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key) if (!psock) goto out;
- if (psock->bpf_parse) + if (psock->bpf_parse) { + write_lock_bh(&sock->sk_callback_lock); smap_stop_sock(psock, sock); + write_unlock_bh(&sock->sk_callback_lock); + } smap_list_map_remove(psock, &stab->sock_map[k]); smap_release_sock(psock, sock); out:
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit 90545cdc3f2b2ea700e24335610cd181e73756da ]
I found that in BPF sockmap programs once we either delete a socket from the map or we updated a map slot and the old socket was purged from the map that these socket can never get reattached into a map even though their related psock has been dropped entirely at that point.
Reason is that tcp_cleanup_ulp() leaves the old icsk->icsk_ulp_ops intact, so that on the next tcp_set_ulp_id() the kernel returns an -EEXIST thinking there is still some active ULP attached.
BPF sockmap is the only one that has this issue as the other user, kTLS, only calls tcp_cleanup_ulp() from tcp_v4_destroy_sock() whereas sockmap semantics allow dropping the socket from the map with all related psock state being cleaned up.
Fixes: 1aa12bdf1bfb ("bpf: sockmap, add sock close() hook to remove socks") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Acked-by: Song Liu songliubraving@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- net/ipv4/tcp_ulp.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c index 622caa4039e0..d0bdfa02dea1 100644 --- a/net/ipv4/tcp_ulp.c +++ b/net/ipv4/tcp_ulp.c @@ -129,6 +129,8 @@ void tcp_cleanup_ulp(struct sock *sk) if (icsk->icsk_ulp_ops->release) icsk->icsk_ulp_ops->release(sk); module_put(icsk->icsk_ulp_ops->owner); + + icsk->icsk_ulp_ops = NULL; }
/* Change upper layer protocol for socket */
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit 585f5a6252ee43ec8feeee07387e3fcc7e8bb292 ]
The current code in sock_map_ctx_update_elem() allows for BPF_EXIST and BPF_NOEXIST map update flags. While on array-like maps this approach is rather uncommon, e.g. bpf_fd_array_map_update_elem() and others enforce map update flags to be BPF_ANY such that xchg() can be used directly, the current implementation in sock map does not guarantee that such operation with BPF_EXIST / BPF_NOEXIST is atomic.
The initial test does a READ_ONCE(stab->sock_map[i]) to fetch the socket from the slot which is then tested for NULL / non-NULL. However later after __sock_map_ctx_update_elem(), the actual update is done through osock = xchg(&stab->sock_map[i], sock). Problem is that in the meantime a different CPU could have updated / deleted a socket on that specific slot and thus flag contraints won't hold anymore.
I've been thinking whether best would be to just break UAPI and do an enforcement of BPF_ANY to check if someone actually complains, however trouble is that already in BPF kselftest we use BPF_NOEXIST for the map update, and therefore it might have been copied into applications already. The fix to keep the current behavior intact would be to add a map lock similar to the sock hash bucket lock only for covering the whole map.
Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Acked-by: Song Liu songliubraving@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- kernel/bpf/sockmap.c | 106 +++++++++++++++++++++++-------------------- 1 file changed, 57 insertions(+), 49 deletions(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 694da74d7df0..871ce45443ed 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -58,6 +58,7 @@ struct bpf_stab { struct bpf_map map; struct sock **sock_map; struct bpf_sock_progs progs; + raw_spinlock_t lock; };
struct bucket { @@ -89,9 +90,9 @@ enum smap_psock_state {
struct smap_psock_map_entry { struct list_head list; + struct bpf_map *map; struct sock **entry; struct htab_elem __rcu *hash_link; - struct bpf_htab __rcu *htab; };
struct smap_psock { @@ -343,13 +344,18 @@ static void bpf_tcp_close(struct sock *sk, long timeout) e = psock_map_pop(sk, psock); while (e) { if (e->entry) { - osk = cmpxchg(e->entry, sk, NULL); + struct bpf_stab *stab = container_of(e->map, struct bpf_stab, map); + + raw_spin_lock_bh(&stab->lock); + osk = *e->entry; if (osk == sk) { + *e->entry = NULL; smap_release_sock(psock, sk); } + raw_spin_unlock_bh(&stab->lock); } else { struct htab_elem *link = rcu_dereference(e->hash_link); - struct bpf_htab *htab = rcu_dereference(e->htab); + struct bpf_htab *htab = container_of(e->map, struct bpf_htab, map); struct hlist_head *head; struct htab_elem *l; struct bucket *b; @@ -1644,6 +1650,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr) return ERR_PTR(-ENOMEM);
bpf_map_init_from_attr(&stab->map, attr); + raw_spin_lock_init(&stab->lock);
/* make sure page count doesn't overflow */ cost = (u64) stab->map.max_entries * sizeof(struct sock *); @@ -1714,14 +1721,15 @@ static void sock_map_free(struct bpf_map *map) * and a grace period expire to ensure psock is really safe to remove. */ rcu_read_lock(); + raw_spin_lock_bh(&stab->lock); for (i = 0; i < stab->map.max_entries; i++) { struct smap_psock *psock; struct sock *sock;
- sock = xchg(&stab->sock_map[i], NULL); + sock = stab->sock_map[i]; if (!sock) continue; - + stab->sock_map[i] = NULL; psock = smap_psock_sk(sock); /* This check handles a racing sock event that can get the * sk_callback_lock before this case but after xchg happens @@ -1733,6 +1741,7 @@ static void sock_map_free(struct bpf_map *map) smap_release_sock(psock, sock); } } + raw_spin_unlock_bh(&stab->lock); rcu_read_unlock();
sock_map_remove_complete(stab); @@ -1776,14 +1785,16 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key) if (k >= map->max_entries) return -EINVAL;
- sock = xchg(&stab->sock_map[k], NULL); + raw_spin_lock_bh(&stab->lock); + sock = stab->sock_map[k]; + stab->sock_map[k] = NULL; + raw_spin_unlock_bh(&stab->lock); if (!sock) return -EINVAL;
psock = smap_psock_sk(sock); if (!psock) - goto out; - + return 0; if (psock->bpf_parse) { write_lock_bh(&sock->sk_callback_lock); smap_stop_sock(psock, sock); @@ -1791,7 +1802,6 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key) } smap_list_map_remove(psock, &stab->sock_map[k]); smap_release_sock(psock, sock); -out: return 0; }
@@ -1827,11 +1837,9 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key) static int __sock_map_ctx_update_elem(struct bpf_map *map, struct bpf_sock_progs *progs, struct sock *sock, - struct sock **map_link, void *key) { struct bpf_prog *verdict, *parse, *tx_msg; - struct smap_psock_map_entry *e = NULL; struct smap_psock *psock; bool new = false; int err = 0; @@ -1904,14 +1912,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map, new = true; }
- if (map_link) { - e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN); - if (!e) { - err = -ENOMEM; - goto out_free; - } - } - /* 3. At this point we have a reference to a valid psock that is * running. Attach any BPF programs needed. */ @@ -1933,17 +1933,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map, write_unlock_bh(&sock->sk_callback_lock); }
- /* 4. Place psock in sockmap for use and stop any programs on - * the old sock assuming its not the same sock we are replacing - * it with. Because we can only have a single set of programs if - * old_sock has a strp we can stop it. - */ - if (map_link) { - e->entry = map_link; - spin_lock_bh(&psock->maps_lock); - list_add_tail(&e->list, &psock->maps); - spin_unlock_bh(&psock->maps_lock); - } return err; out_free: smap_release_sock(psock, sock); @@ -1954,7 +1943,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map, } if (tx_msg) bpf_prog_put(tx_msg); - kfree(e); return err; }
@@ -1964,36 +1952,57 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops, { struct bpf_stab *stab = container_of(map, struct bpf_stab, map); struct bpf_sock_progs *progs = &stab->progs; - struct sock *osock, *sock; + struct sock *osock, *sock = skops->sk; + struct smap_psock_map_entry *e; + struct smap_psock *psock; u32 i = *(u32 *)key; int err;
if (unlikely(flags > BPF_EXIST)) return -EINVAL; - if (unlikely(i >= stab->map.max_entries)) return -E2BIG;
- sock = READ_ONCE(stab->sock_map[i]); - if (flags == BPF_EXIST && !sock) - return -ENOENT; - else if (flags == BPF_NOEXIST && sock) - return -EEXIST; + e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN); + if (!e) + return -ENOMEM;
- sock = skops->sk; - err = __sock_map_ctx_update_elem(map, progs, sock, &stab->sock_map[i], - key); + err = __sock_map_ctx_update_elem(map, progs, sock, key); if (err) goto out;
- osock = xchg(&stab->sock_map[i], sock); - if (osock) { - struct smap_psock *opsock = smap_psock_sk(osock); + /* psock guaranteed to be present. */ + psock = smap_psock_sk(sock); + raw_spin_lock_bh(&stab->lock); + osock = stab->sock_map[i]; + if (osock && flags == BPF_NOEXIST) { + err = -EEXIST; + goto out_unlock; + } + if (!osock && flags == BPF_EXIST) { + err = -ENOENT; + goto out_unlock; + } + + e->entry = &stab->sock_map[i]; + e->map = map; + spin_lock_bh(&psock->maps_lock); + list_add_tail(&e->list, &psock->maps); + spin_unlock_bh(&psock->maps_lock);
- smap_list_map_remove(opsock, &stab->sock_map[i]); - smap_release_sock(opsock, osock); + stab->sock_map[i] = sock; + if (osock) { + psock = smap_psock_sk(osock); + smap_list_map_remove(psock, &stab->sock_map[i]); + smap_release_sock(psock, osock); } + raw_spin_unlock_bh(&stab->lock); + return 0; +out_unlock: + smap_release_sock(psock, sock); + raw_spin_unlock_bh(&stab->lock); out: + kfree(e); return err; }
@@ -2356,7 +2365,7 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops, b = __select_bucket(htab, hash); head = &b->head;
- err = __sock_map_ctx_update_elem(map, progs, sock, NULL, key); + err = __sock_map_ctx_update_elem(map, progs, sock, key); if (err) goto err;
@@ -2382,8 +2391,7 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops, }
rcu_assign_pointer(e->hash_link, l_new); - rcu_assign_pointer(e->htab, - container_of(map, struct bpf_htab, map)); + e->map = map; spin_lock_bh(&psock->maps_lock); list_add_tail(&e->list, &psock->maps); spin_unlock_bh(&psock->maps_lock);
From: Tariq Toukan tariqt@mellanox.com
[ Upstream commit 21b172ee11b6ec260bd7e6a27b11a8a8d392fce5 ]
Fix the warning below by calling rhashtable_lookup_fast. Also, make some code movements for better quality and human readability.
[ 342.450870] WARNING: suspicious RCU usage [ 342.455856] 4.18.0-rc2+ #17 Tainted: G O [ 342.462210] ----------------------------- [ 342.467202] ./include/linux/rhashtable.h:481 suspicious rcu_dereference_check() usage! [ 342.476568] [ 342.476568] other info that might help us debug this: [ 342.476568] [ 342.486978] [ 342.486978] rcu_scheduler_active = 2, debug_locks = 1 [ 342.495211] 4 locks held by modprobe/3934: [ 342.500265] #0: 00000000e23116b2 (mlx5_intf_mutex){+.+.}, at: mlx5_unregister_interface+0x18/0x90 [mlx5_core] [ 342.511953] #1: 00000000ca16db96 (rtnl_mutex){+.+.}, at: unregister_netdev+0xe/0x20 [ 342.521109] #2: 00000000a46e2c4b (&priv->state_lock){+.+.}, at: mlx5e_close+0x29/0x60 [mlx5_core] [ 342.531642] #3: 0000000060c5bde3 (mem_id_lock){+.+.}, at: xdp_rxq_info_unreg+0x93/0x6b0 [ 342.541206] [ 342.541206] stack backtrace: [ 342.547075] CPU: 12 PID: 3934 Comm: modprobe Tainted: G O 4.18.0-rc2+ #17 [ 342.556621] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015 [ 342.565606] Call Trace: [ 342.568861] dump_stack+0x78/0xb3 [ 342.573086] xdp_rxq_info_unreg+0x3f5/0x6b0 [ 342.578285] ? __call_rcu+0x220/0x300 [ 342.582911] mlx5e_free_rq+0x38/0xc0 [mlx5_core] [ 342.588602] mlx5e_close_channel+0x20/0x120 [mlx5_core] [ 342.594976] mlx5e_close_channels+0x26/0x40 [mlx5_core] [ 342.601345] mlx5e_close_locked+0x44/0x50 [mlx5_core] [ 342.607519] mlx5e_close+0x42/0x60 [mlx5_core] [ 342.613005] __dev_close_many+0xb1/0x120 [ 342.617911] dev_close_many+0xa2/0x170 [ 342.622622] rollback_registered_many+0x148/0x460 [ 342.628401] ? __lock_acquire+0x48d/0x11b0 [ 342.633498] ? unregister_netdev+0xe/0x20 [ 342.638495] rollback_registered+0x56/0x90 [ 342.643588] unregister_netdevice_queue+0x7e/0x100 [ 342.649461] unregister_netdev+0x18/0x20 [ 342.654362] mlx5e_remove+0x2a/0x50 [mlx5_core] [ 342.659944] mlx5_remove_device+0xe5/0x110 [mlx5_core] [ 342.666208] mlx5_unregister_interface+0x39/0x90 [mlx5_core] [ 342.673038] cleanup+0x5/0xbfc [mlx5_core] [ 342.678094] __x64_sys_delete_module+0x16b/0x240 [ 342.683725] ? do_syscall_64+0x1c/0x210 [ 342.688476] do_syscall_64+0x5a/0x210 [ 342.693025] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 8d5d88527587 ("xdp: rhashtable with allocator ID to pointer mapping") Signed-off-by: Tariq Toukan tariqt@mellanox.com Suggested-by: Daniel Borkmann daniel@iogearbox.net Cc: Jesper Dangaard Brouer brouer@redhat.com Acked-by: Jesper Dangaard Brouer brouer@redhat.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- net/core/xdp.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/net/core/xdp.c b/net/core/xdp.c index 6771f1855b96..2657056130a4 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -95,23 +95,15 @@ static void __xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) { struct xdp_mem_allocator *xa; int id = xdp_rxq->mem.id; - int err;
if (id == 0) return;
mutex_lock(&mem_id_lock);
- xa = rhashtable_lookup(mem_id_ht, &id, mem_id_rht_params); - if (!xa) { - mutex_unlock(&mem_id_lock); - return; - } - - err = rhashtable_remove_fast(mem_id_ht, &xa->node, mem_id_rht_params); - WARN_ON(err); - - call_rcu(&xa->rcu, __xdp_mem_allocator_rcu_free); + xa = rhashtable_lookup_fast(mem_id_ht, &id, mem_id_rht_params); + if (xa && !rhashtable_remove_fast(mem_id_ht, &xa->node, mem_id_rht_params)) + call_rcu(&xa->rcu, __xdp_mem_allocator_rcu_free);
mutex_unlock(&mem_id_lock); }
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit d40b0116c94bd8fc2b63aae35ce8e66bb53bba42 ]
While working on sockmap I noticed that we do not always kfree the struct smap_psock_map_entry list elements which track psocks attached to maps. In the case of sock_hash_ctx_update_elem(), these map entries are allocated outside of __sock_map_ctx_update_elem() with their linkage to the socket hash table filled. In the case of sock array, the map entries are allocated inside of __sock_map_ctx_update_elem() and added with their linkage to the psock->maps. Both additions are under psock->maps_lock each.
Now, we drop these elements from their psock->maps list in a few occasions: i) in sock array via smap_list_map_remove() when an entry is either deleted from the map from user space, or updated via user space or BPF program where we drop the old socket at that map slot, or the sock array is freed via sock_map_free() and drops all its elements; ii) for sock hash via smap_list_hash_remove() in exactly the same occasions as just described for sock array; iii) in the bpf_tcp_close() where we remove the elements from the list via psock_map_pop() and iterate over them dropping themselves from either sock array or sock hash; and last but not least iv) once again in smap_gc_work() which is a callback for deferring the work once the psock refcount hit zero and thus the socket is being destroyed.
Problem is that the only case where we kfree() the list entry is in case iv), which at that point should have an empty list in normal cases. So in cases from i) to iii) we unlink the elements without freeing where they go out of reach from us. Hence fix is to properly kfree() them as well to stop the leakage. Given these are all handled under psock->maps_lock there is no need for deferred RCU freeing.
I later also ran with kmemleak detector and it confirmed the finding as well where in the state before the fix the object goes unreferenced while after the patch no kmemleak report related to BPF showed up.
[...] unreferenced object 0xffff880378eadae0 (size 64): comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................ 50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00 PMu]............ backtrace: [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210 [<0000000045dd6d3c>] bpf_sock_map_update+0x29/0x60 [<00000000877723aa>] ___bpf_prog_run+0x1e1f/0x4960 [<000000002ef89e83>] 0xffffffffffffffff unreferenced object 0xffff880378ead240 (size 64): comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................ 00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00 .Du]............ backtrace: [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210 [<0000000030e37a3a>] sock_map_update_elem+0x125/0x240 [<000000002e5ce36e>] map_update_elem+0x4eb/0x7b0 [<00000000db453cc9>] __x64_sys_bpf+0x1f9/0x360 [<0000000000763660>] do_syscall_64+0x9a/0x300 [<00000000422a2bb2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<000000002ef89e83>] 0xffffffffffffffff [...]
Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close") Fixes: 54fedb42c653 ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps") Fixes: 2f857d04601a ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Acked-by: Song Liu songliubraving@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- kernel/bpf/sockmap.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c index 871ce45443ed..58899601fccf 100644 --- a/kernel/bpf/sockmap.c +++ b/kernel/bpf/sockmap.c @@ -376,6 +376,7 @@ static void bpf_tcp_close(struct sock *sk, long timeout) } raw_spin_unlock_bh(&b->lock); } + kfree(e); e = psock_map_pop(sk, psock); } rcu_read_unlock(); @@ -1685,8 +1686,10 @@ static void smap_list_map_remove(struct smap_psock *psock,
spin_lock_bh(&psock->maps_lock); list_for_each_entry_safe(e, tmp, &psock->maps, list) { - if (e->entry == entry) + if (e->entry == entry) { list_del(&e->list); + kfree(e); + } } spin_unlock_bh(&psock->maps_lock); } @@ -1700,8 +1703,10 @@ static void smap_list_hash_remove(struct smap_psock *psock, list_for_each_entry_safe(e, tmp, &psock->maps, list) { struct htab_elem *c = rcu_dereference(e->hash_link);
- if (c == hash_link) + if (c == hash_link) { list_del(&e->list); + kfree(e); + } } spin_unlock_bh(&psock->maps_lock); }
From: Jesper Dangaard Brouer brouer@redhat.com
[ Upstream commit 817b89beb9d8876450fcde9155e17425c329569d ]
It is common XDP practice to unload/deattach the XDP bpf program, when the XDP sample program is Ctrl-C interrupted (SIGINT) or killed (SIGTERM).
The samples/bpf programs xdp_redirect_cpu and xdp_rxq_info, forgot to trap signal SIGTERM (which is the default signal used by the kill command).
This was discovered by Red Hat QA, which automated scripts depend on killing the XDP sample program after a timeout period.
Fixes: fad3917e361b ("samples/bpf: add cpumap sample program xdp_redirect_cpu") Fixes: 0fca931a6f21 ("samples/bpf: program demonstrating access to xdp_rxq_info") Reported-by: Jean-Tsung Hsiao jhsiao@redhat.com Signed-off-by: Jesper Dangaard Brouer brouer@redhat.com Acked-by: Yonghong Song yhs@fb.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- samples/bpf/xdp_redirect_cpu_user.c | 3 ++- samples/bpf/xdp_rxq_info_user.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/samples/bpf/xdp_redirect_cpu_user.c b/samples/bpf/xdp_redirect_cpu_user.c index 4b4d78fffe30..da9070889223 100644 --- a/samples/bpf/xdp_redirect_cpu_user.c +++ b/samples/bpf/xdp_redirect_cpu_user.c @@ -679,8 +679,9 @@ int main(int argc, char **argv) return EXIT_FAIL_OPTION; }
- /* Remove XDP program when program is interrupted */ + /* Remove XDP program when program is interrupted or killed */ signal(SIGINT, int_exit); + signal(SIGTERM, int_exit);
if (bpf_set_link_xdp_fd(ifindex, prog_fd[prog_num], xdp_flags) < 0) { fprintf(stderr, "link set xdp fd failed\n"); diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c index e4e9ba52bff0..bb278447299c 100644 --- a/samples/bpf/xdp_rxq_info_user.c +++ b/samples/bpf/xdp_rxq_info_user.c @@ -534,8 +534,9 @@ int main(int argc, char **argv) exit(EXIT_FAIL_BPF); }
- /* Remove XDP program when program is interrupted */ + /* Remove XDP program when program is interrupted or killed */ signal(SIGINT, int_exit); + signal(SIGTERM, int_exit);
if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) { fprintf(stderr, "link set xdp fd failed\n");
From: Florian Westphal fw@strlen.de
[ Upstream commit da786717e0894886301ed2536843c13f9e8fd53e ]
Roman reports that DHCPv6 client no longer sees replies from server due to
ip6tables -t raw -A PREROUTING -m rpfilter --invert -j DROP
rule. We need to set the F_IFACE flag for linklocal addresses, they are scoped per-device.
Fixes: 47b7e7f82802 ("netfilter: don't set F_IFACE on ipv6 fib lookups") Reported-by: Roman Mamedov rm@romanrm.net Tested-by: Roman Mamedov rm@romanrm.net Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- net/ipv6/netfilter/ip6t_rpfilter.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/netfilter/ip6t_rpfilter.c b/net/ipv6/netfilter/ip6t_rpfilter.c index 0fe61ede77c6..c3c6b09acdc4 100644 --- a/net/ipv6/netfilter/ip6t_rpfilter.c +++ b/net/ipv6/netfilter/ip6t_rpfilter.c @@ -26,6 +26,12 @@ static bool rpfilter_addr_unicast(const struct in6_addr *addr) return addr_type & IPV6_ADDR_UNICAST; }
+static bool rpfilter_addr_linklocal(const struct in6_addr *addr) +{ + int addr_type = ipv6_addr_type(addr); + return addr_type & IPV6_ADDR_LINKLOCAL; +} + static bool rpfilter_lookup_reverse6(struct net *net, const struct sk_buff *skb, const struct net_device *dev, u8 flags) { @@ -48,7 +54,11 @@ static bool rpfilter_lookup_reverse6(struct net *net, const struct sk_buff *skb, }
fl6.flowi6_mark = flags & XT_RPFILTER_VALID_MARK ? skb->mark : 0; - if ((flags & XT_RPFILTER_LOOSE) == 0) + + if (rpfilter_addr_linklocal(&iph->saddr)) { + lookup_flags |= RT6_LOOKUP_F_IFACE; + fl6.flowi6_oif = dev->ifindex; + } else if ((flags & XT_RPFILTER_LOOSE) == 0) fl6.flowi6_oif = dev->ifindex;
rt = (void *)ip6_route_lookup(net, &fl6, skb, lookup_flags);
From: Philipp Rudo prudo@linux.ibm.com
[ Upstream commit 2d2e7075b87181ed0c675e4936e20bdadba02e1f ]
The vmcoreinfo of a crashed system is potentially fragmented. Thus the crash kernel has an intermediate step where the vmcoreinfo is copied into a temporary, continuous buffer in the crash kernel memory. This temporary buffer is never freed. Free it now to prevent the memleak.
While at it replace all occurrences of "VMCOREINFO" by its corresponding macro to prevent potential renaming issues.
Signed-off-by: Philipp Rudo prudo@linux.ibm.com Acked-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- arch/s390/kernel/crash_dump.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c index 9f5ea9d87069..9b0216d571ad 100644 --- a/arch/s390/kernel/crash_dump.c +++ b/arch/s390/kernel/crash_dump.c @@ -404,11 +404,13 @@ static void *get_vmcoreinfo_old(unsigned long *size) if (copy_oldmem_kernel(nt_name, addr + sizeof(note), sizeof(nt_name) - 1)) return NULL; - if (strcmp(nt_name, "VMCOREINFO") != 0) + if (strcmp(nt_name, VMCOREINFO_NOTE_NAME) != 0) return NULL; vmcoreinfo = kzalloc_panic(note.n_descsz); - if (copy_oldmem_kernel(vmcoreinfo, addr + 24, note.n_descsz)) + if (copy_oldmem_kernel(vmcoreinfo, addr + 24, note.n_descsz)) { + kfree(vmcoreinfo); return NULL; + } *size = note.n_descsz; return vmcoreinfo; } @@ -418,15 +420,20 @@ static void *get_vmcoreinfo_old(unsigned long *size) */ static void *nt_vmcoreinfo(void *ptr) { + const char *name = VMCOREINFO_NOTE_NAME; unsigned long size; void *vmcoreinfo;
vmcoreinfo = os_info_old_entry(OS_INFO_VMCOREINFO, &size); - if (!vmcoreinfo) - vmcoreinfo = get_vmcoreinfo_old(&size); + if (vmcoreinfo) + return nt_init_name(ptr, 0, vmcoreinfo, size, name); + + vmcoreinfo = get_vmcoreinfo_old(&size); if (!vmcoreinfo) return ptr; - return nt_init_name(ptr, 0, vmcoreinfo, size, "VMCOREINFO"); + ptr = nt_init_name(ptr, 0, vmcoreinfo, size, name); + kfree(vmcoreinfo); + return ptr; }
/*
From: Tan Hu tan.hu@zte.com.cn
[ Upstream commit a53b42c11815d2357e31a9403ae3950517525894 ]
We came across infinite loop in ipvs when using ipvs in docker env.
When ipvs receives new packets and cannot find an ipvs connection, it will create a new connection, then if the dest is unavailable (i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently.
But if the dropped packet is the first packet of this connection, the connection control timer never has a chance to start and the ipvs connection cannot be released. This will lead to memory leak, or infinite loop in cleanup_net() when net namespace is released like this:
ip_vs_conn_net_cleanup at ffffffffa0a9f31a [ip_vs] __ip_vs_cleanup at ffffffffa0a9f60a [ip_vs] ops_exit_list at ffffffff81567a49 cleanup_net at ffffffff81568b40 process_one_work at ffffffff810a851b worker_thread at ffffffff810a9356 kthread at ffffffff810b0b6f ret_from_fork at ffffffff81697a18
race condition: CPU1 CPU2 ip_vs_in() ip_vs_conn_new() ip_vs_del_dest() __ip_vs_unlink_dest() ~IP_VS_DEST_F_AVAILABLE cp->dest && !IP_VS_DEST_F_AVAILABLE __ip_vs_conn_put ... cleanup_net ---> infinite looping
Fix this by checking whether the timer already started.
Signed-off-by: Tan Hu tan.hu@zte.com.cn Reviewed-by: Jiang Biao jiang.biao2@zte.com.cn Acked-by: Julian Anastasov ja@ssi.bg Acked-by: Simon Horman horms@verge.net.au Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- net/netfilter/ipvs/ip_vs_core.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index 0679dd101e72..7ca926a03b81 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -1972,13 +1972,20 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { /* the destination server is not available */
- if (sysctl_expire_nodest_conn(ipvs)) { + __u32 flags = cp->flags; + + /* when timer already started, silently drop the packet.*/ + if (timer_pending(&cp->timer)) + __ip_vs_conn_put(cp); + else + ip_vs_conn_put(cp); + + if (sysctl_expire_nodest_conn(ipvs) && + !(flags & IP_VS_CONN_F_ONE_PACKET)) { /* try to expire the connection immediately */ ip_vs_conn_expire_now(cp); } - /* don't restart its timer, and silently - drop the packet. */ - __ip_vs_conn_put(cp); + return NF_DROP; }
From: Guenter Roeck linux@roeck-us.net
[ Upstream commit 2f606da78230f09cf1a71fde6ee91d0c710fa2b2 ]
Instantiating the sm501 OHCI subdevice results in a kernel warning.
sm501-usb sm501-usb: SM501 OHCI sm501-usb sm501-usb: new USB bus registered, assigned bus number 1 WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 ohci_init+0x194/0x2d8 Modules linked in:
CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.18.0-rc7-00178-g0b5b1f9a78b5 #1 PC is at ohci_init+0x194/0x2d8 PR is at ohci_init+0x168/0x2d8 PC : 8c27844c SP : 8f81dd94 SR : 40008001 TEA : 29613060 R0 : 00000000 R1 : 00000000 R2 : 00000000 R3 : 00000202 R4 : 8fa98b88 R5 : 8c277e68 R6 : 00000000 R7 : 00000000 R8 : 8f965814 R9 : 8c388100 R10 : 8fa98800 R11 : 8fa98928 R12 : 8c48302c R13 : 8fa98920 R14 : 8c48302c MACH: 00000096 MACL: 0000017c GBR : 00000000 PR : 8c278420
Call trace: [<(ptrval)>] usb_add_hcd+0x1e8/0x6ec [<(ptrval)>] _dev_info+0x0/0x54 [<(ptrval)>] arch_local_save_flags+0x0/0x8 [<(ptrval)>] arch_local_irq_restore+0x0/0x24 [<(ptrval)>] ohci_hcd_sm501_drv_probe+0x114/0x2d8 ...
Initialize coherent_dma_mask when creating SM501 subdevices to fix the problem.
Fixes: b6d6454fdb66f ("mfd: SM501 core driver") Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/mfd/sm501.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/mfd/sm501.c b/drivers/mfd/sm501.c index 2a87b0d2f21f..a530972c5a7e 100644 --- a/drivers/mfd/sm501.c +++ b/drivers/mfd/sm501.c @@ -715,6 +715,7 @@ sm501_create_subdev(struct sm501_devdata *sm, char *name, smdev->pdev.name = name; smdev->pdev.id = sm->pdev_id; smdev->pdev.dev.parent = sm->dev; + smdev->pdev.dev.coherent_dma_mask = 0xffffffff;
if (res_count) { smdev->pdev.resource = (struct resource *)(smdev+1);
From: Michal Hocko mhocko@suse.com
[ Upstream commit a148ce15375fc664ad64762c751c0c2aecb2cafe ]
eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in xt_alloc_table_info()") has unintentionally fortified xt_alloc_table_info allocation when __GFP_RETRY has been dropped from the vmalloc fallback. Later on there was a syzbot report that this can lead to OOM killer invocations when tables are too large and 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive") has been merged to restore the original behavior. Georgi Nikolov however noticed that he is not able to install his iptables anymore so this can be seen as a regression.
The primary argument for 0537250fdc6c was that this allocation path shouldn't really trigger the OOM killer and kill innocent tasks. On the other hand the interface requires root and as such should allow what the admin asks for. Root inside a namespaces makes this more complicated because those might be not trusted in general. If they are not then such namespaces should be restricted anyway. Therefore drop the __GFP_NORETRY and replace it by __GFP_ACCOUNT to enfore memcg constrains on it.
Fixes: 0537250fdc6c ("netfilter: x_tables: make allocation less aggressive") Reported-by: Georgi Nikolov gnikolov@icdsoft.com Suggested-by: Vlastimil Babka vbabka@suse.cz Acked-by: Florian Westphal fw@strlen.de Signed-off-by: Michal Hocko mhocko@suse.com Acked-by: Vlastimil Babka vbabka@suse.cz Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- net/netfilter/x_tables.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index d0d8397c9588..aecadd471e1d 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1178,12 +1178,7 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if (sz < sizeof(*info) || sz >= XT_MAX_TABLE_SIZE) return NULL;
- /* __GFP_NORETRY is not fully supported by kvmalloc but it should - * work reasonably well if sz is too large and bail out rather - * than shoot all processes down before realizing there is nothing - * more to reclaim. - */ - info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY); + info = kvmalloc(sz, GFP_KERNEL_ACCOUNT); if (!info) return NULL;
From: Vasily Gorbik gor@linux.ibm.com
[ Upstream commit f28bc3c32c059ab4d13f52155fabd3e20f477f65 ]
CC_FLAGS_FTRACE is exported and later used to remove ftrace relevant build flags from files which should be built without ftrace support. For that reason add -mfentry to CC_FLAGS_FTRACE as well. That fixes a problem with vdso32 build on s390, where -mfentry could not be used together with -m31 flag.
At the same time flags like -pg and -mfentry are not relevant for asm files, so avoid adding them to KBUILD_AFLAGS.
Introduce CC_FLAGS_USING instead of CC_USING_FENTRY to collect -DCC_USING_FENTRY (and future alike) which are relevant for both KBUILD_CFLAGS and KBUILD_AFLAGS.
Link: http://lkml.kernel.org/r/patch-1.thread-aa7b8d.git-42971afe87de.your-ad-here...
Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- Makefile | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/Makefile b/Makefile index a41692c5827a..baee71ea3bec 100644 --- a/Makefile +++ b/Makefile @@ -743,12 +743,15 @@ ifdef CONFIG_FUNCTION_TRACER ifndef CC_FLAGS_FTRACE CC_FLAGS_FTRACE := -pg endif -export CC_FLAGS_FTRACE ifdef CONFIG_HAVE_FENTRY -CC_USING_FENTRY := $(call cc-option, -mfentry -DCC_USING_FENTRY) + ifeq ($(call cc-option-yn, -mfentry),y) + CC_FLAGS_FTRACE += -mfentry + CC_FLAGS_USING += -DCC_USING_FENTRY + endif endif -KBUILD_CFLAGS += $(CC_FLAGS_FTRACE) $(CC_USING_FENTRY) -KBUILD_AFLAGS += $(CC_USING_FENTRY) +export CC_FLAGS_FTRACE +KBUILD_CFLAGS += $(CC_FLAGS_FTRACE) $(CC_FLAGS_USING) +KBUILD_AFLAGS += $(CC_FLAGS_USING) ifdef CONFIG_DYNAMIC_FTRACE ifdef CONFIG_HAVE_C_RECORDMCOUNT BUILD_C_RECORDMCOUNT := y
From: Aleh Filipovich aleh@vaolix.com
[ Upstream commit 880b29ac107d15644bf4da228376ba3cd6af6d71 ]
Add entry to WMI keymap for lid flip event on Asus UX360.
On Asus Zenbook ux360 flipping lid from/to tablet mode triggers keyscan code 0xfa which cannot be handled and results in kernel log message "Unknown key fa pressed".
Signed-off-by: Aleh Filipovichaleh@appnexus.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/platform/x86/asus-nb-wmi.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/platform/x86/asus-nb-wmi.c b/drivers/platform/x86/asus-nb-wmi.c index 136ff2b4cce5..db2af09067db 100644 --- a/drivers/platform/x86/asus-nb-wmi.c +++ b/drivers/platform/x86/asus-nb-wmi.c @@ -496,6 +496,7 @@ static const struct key_entry asus_nb_wmi_keymap[] = { { KE_KEY, 0xC4, { KEY_KBDILLUMUP } }, { KE_KEY, 0xC5, { KEY_KBDILLUMDOWN } }, { KE_IGNORE, 0xC6, }, /* Ambient Light Sensor notification */ + { KE_KEY, 0xFA, { KEY_PROG2 } }, /* Lid flip action */ { KE_END, 0}, };
From: Florian Westphal fw@strlen.de
[ Upstream commit 3e673b23b541b8e7f773b2d378d6eb99831741cd ]
Shaochun Chen points out we leak dumper filter state allocations stored in dump_control->data in case there is an error before netlink sets cb_running (after which ->done will be called at some point).
In order to fix this, add .start functions and move allocations there.
Same pattern as used in commit 90fd131afc565159c9e0ea742f082b337e10f8c6 ("netfilter: nf_tables: move dumper state allocation into ->start").
Reported-by: shaochun chen cscnull@gmail.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- net/netfilter/nf_conntrack_netlink.c | 26 ++++++++++++++++--------- net/netfilter/nfnetlink_acct.c | 29 +++++++++++++--------------- 2 files changed, 30 insertions(+), 25 deletions(-)
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 20a2e37c76d1..e952eedf44b4 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -821,6 +821,21 @@ ctnetlink_alloc_filter(const struct nlattr * const cda[]) #endif }
+static int ctnetlink_start(struct netlink_callback *cb) +{ + const struct nlattr * const *cda = cb->data; + struct ctnetlink_filter *filter = NULL; + + if (cda[CTA_MARK] && cda[CTA_MARK_MASK]) { + filter = ctnetlink_alloc_filter(cda); + if (IS_ERR(filter)) + return PTR_ERR(filter); + } + + cb->data = filter; + return 0; +} + static int ctnetlink_filter_match(struct nf_conn *ct, void *data) { struct ctnetlink_filter *filter = data; @@ -1240,19 +1255,12 @@ static int ctnetlink_get_conntrack(struct net *net, struct sock *ctnl,
if (nlh->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { + .start = ctnetlink_start, .dump = ctnetlink_dump_table, .done = ctnetlink_done, + .data = (void *)cda, };
- if (cda[CTA_MARK] && cda[CTA_MARK_MASK]) { - struct ctnetlink_filter *filter; - - filter = ctnetlink_alloc_filter(cda); - if (IS_ERR(filter)) - return PTR_ERR(filter); - - c.data = filter; - } return netlink_dump_start(ctnl, skb, nlh, &c); }
diff --git a/net/netfilter/nfnetlink_acct.c b/net/netfilter/nfnetlink_acct.c index a0e5adf0b3b6..8fa8bf7c48e6 100644 --- a/net/netfilter/nfnetlink_acct.c +++ b/net/netfilter/nfnetlink_acct.c @@ -238,29 +238,33 @@ static const struct nla_policy filter_policy[NFACCT_FILTER_MAX + 1] = { [NFACCT_FILTER_VALUE] = { .type = NLA_U32 }, };
-static struct nfacct_filter * -nfacct_filter_alloc(const struct nlattr * const attr) +static int nfnl_acct_start(struct netlink_callback *cb) { - struct nfacct_filter *filter; + const struct nlattr *const attr = cb->data; struct nlattr *tb[NFACCT_FILTER_MAX + 1]; + struct nfacct_filter *filter; int err;
+ if (!attr) + return 0; + err = nla_parse_nested(tb, NFACCT_FILTER_MAX, attr, filter_policy, NULL); if (err < 0) - return ERR_PTR(err); + return err;
if (!tb[NFACCT_FILTER_MASK] || !tb[NFACCT_FILTER_VALUE]) - return ERR_PTR(-EINVAL); + return -EINVAL;
filter = kzalloc(sizeof(struct nfacct_filter), GFP_KERNEL); if (!filter) - return ERR_PTR(-ENOMEM); + return -ENOMEM;
filter->mask = ntohl(nla_get_be32(tb[NFACCT_FILTER_MASK])); filter->value = ntohl(nla_get_be32(tb[NFACCT_FILTER_VALUE])); + cb->data = filter;
- return filter; + return 0; }
static int nfnl_acct_get(struct net *net, struct sock *nfnl, @@ -275,18 +279,11 @@ static int nfnl_acct_get(struct net *net, struct sock *nfnl, if (nlh->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { .dump = nfnl_acct_dump, + .start = nfnl_acct_start, .done = nfnl_acct_done, + .data = (void *)tb[NFACCT_FILTER], };
- if (tb[NFACCT_FILTER]) { - struct nfacct_filter *filter; - - filter = nfacct_filter_alloc(tb[NFACCT_FILTER]); - if (IS_ERR(filter)) - return PTR_ERR(filter); - - c.data = filter; - } return netlink_dump_start(nfnl, skb, nlh, &c); }
From: Daniel Borkmann daniel@iogearbox.net
[ Upstream commit 037b0b86ecf5646f8eae777d8b52ff8b401692ec ]
Lets not turn the TCP ULP lookup into an arbitrary module loader as we only intend to load ULP modules through this mechanism, not other unrelated kernel modules:
[root@bar]# cat foo.c #include <sys/types.h> #include <sys/socket.h> #include <linux/tcp.h> #include <linux/in.h>
int main(void) { int sock = socket(PF_INET, SOCK_STREAM, 0); setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp")); return 0; }
[root@bar]# gcc foo.c -O2 -Wall [root@bar]# lsmod | grep sctp [root@bar]# ./a.out [root@bar]# lsmod | grep sctp sctp 1077248 4 libcrc32c 16384 3 nf_conntrack,nf_nat,sctp [root@bar]#
Fix it by adding module alias to TCP ULP modules, so probing module via request_module() will be limited to tcp-ulp-[name]. The existing modules like kTLS will load fine given tcp-ulp-tls alias, but others will fail to load:
[root@bar]# lsmod | grep sctp [root@bar]# ./a.out [root@bar]# lsmod | grep sctp [root@bar]#
Sockmap is not affected from this since it's either built-in or not.
Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: John Fastabend john.fastabend@gmail.com Acked-by: Song Liu songliubraving@fb.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- include/net/tcp.h | 4 ++++ net/ipv4/tcp_ulp.c | 2 +- net/tls/tls_main.c | 1 + 3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index cd3ecda9386a..106e01c721e6 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2023,6 +2023,10 @@ int tcp_set_ulp_id(struct sock *sk, const int ulp); void tcp_get_available_ulp(char *buf, size_t len); void tcp_cleanup_ulp(struct sock *sk);
+#define MODULE_ALIAS_TCP_ULP(name) \ + __MODULE_INFO(alias, alias_userspace, name); \ + __MODULE_INFO(alias, alias_tcp_ulp, "tcp-ulp-" name) + /* Call BPF_SOCK_OPS program that returns an int. If the return value * is < 0, then the BPF op failed (for example if the loaded BPF * program does not support the chosen operation or there is no BPF diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c index d0bdfa02dea1..a5995bb2eaca 100644 --- a/net/ipv4/tcp_ulp.c +++ b/net/ipv4/tcp_ulp.c @@ -51,7 +51,7 @@ static const struct tcp_ulp_ops *__tcp_ulp_find_autoload(const char *name) #ifdef CONFIG_MODULES if (!ulp && capable(CAP_NET_ADMIN)) { rcu_read_unlock(); - request_module("%s", name); + request_module("tcp-ulp-%s", name); rcu_read_lock(); ulp = tcp_ulp_find(name); } diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 301f22430469..45188d920013 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -45,6 +45,7 @@ MODULE_AUTHOR("Mellanox Technologies"); MODULE_DESCRIPTION("Transport Layer Security Support"); MODULE_LICENSE("Dual BSD/GPL"); +MODULE_ALIAS_TCP_ULP("tls");
enum { TLSV4,
From: Richard Weinberger richard@nod.at
[ Upstream commit 25677478474a91fa1b46f19a4a591a9848bca6fb ]
We cannot do it last, otherwithse it will be skipped for dynamic volumes.
Reported-by: Lachmann, Juergen juergen.lachmann@harman.com Fixes: 34653fd8c46e ("ubi: fastmap: Check each mapping only once") Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/mtd/ubi/vtbl.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/mtd/ubi/vtbl.c b/drivers/mtd/ubi/vtbl.c index 94d7a865b135..7504f430c011 100644 --- a/drivers/mtd/ubi/vtbl.c +++ b/drivers/mtd/ubi/vtbl.c @@ -578,6 +578,16 @@ static int init_volumes(struct ubi_device *ubi, vol->ubi = ubi; reserved_pebs += vol->reserved_pebs;
+ /* + * We use ubi->peb_count and not vol->reserved_pebs because + * we want to keep the code simple. Otherwise we'd have to + * resize/check the bitmap upon volume resize too. + * Allocating a few bytes more does not hurt. + */ + err = ubi_fastmap_init_checkmap(vol, ubi->peb_count); + if (err) + return err; + /* * In case of dynamic volume UBI knows nothing about how many * data is stored there. So assume the whole volume is used. @@ -620,16 +630,6 @@ static int init_volumes(struct ubi_device *ubi, (long long)(vol->used_ebs - 1) * vol->usable_leb_size; vol->used_bytes += av->last_data_size; vol->last_eb_bytes = av->last_data_size; - - /* - * We use ubi->peb_count and not vol->reserved_pebs because - * we want to keep the code simple. Otherwise we'd have to - * resize/check the bitmap upon volume resize too. - * Allocating a few bytes more does not hurt. - */ - err = ubi_fastmap_init_checkmap(vol, ubi->peb_count); - if (err) - return err; }
/* And add the layout volume */
From: Gal Pressman pressmangal@gmail.com
[ Upstream commit a1ceeca679dccc492235f0f629d9e9f7b3d51ca8 ]
hns bitmap allocation functions return 0 on success and -1 on failure. Callers of these functions wrongly used their return value as an errno, fix that by making a proper conversion.
Fixes: a598c6f4c5a8 ("IB/hns: Simplify function of pd alloc and qp alloc") Signed-off-by: Gal Pressman pressmangal@gmail.com Acked-by: Lijun Ou oulijun@huawei.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/infiniband/hw/hns/hns_roce_pd.c | 2 +- drivers/infiniband/hw/hns/hns_roce_qp.c | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c b/drivers/infiniband/hw/hns/hns_roce_pd.c index b9f2c871ff9a..e11c149da04d 100644 --- a/drivers/infiniband/hw/hns/hns_roce_pd.c +++ b/drivers/infiniband/hw/hns/hns_roce_pd.c @@ -37,7 +37,7 @@
static int hns_roce_pd_alloc(struct hns_roce_dev *hr_dev, unsigned long *pdn) { - return hns_roce_bitmap_alloc(&hr_dev->pd_bitmap, pdn); + return hns_roce_bitmap_alloc(&hr_dev->pd_bitmap, pdn) ? -ENOMEM : 0; }
static void hns_roce_pd_free(struct hns_roce_dev *hr_dev, unsigned long pdn) diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c index baaf906f7c2e..97664570c5ac 100644 --- a/drivers/infiniband/hw/hns/hns_roce_qp.c +++ b/drivers/infiniband/hw/hns/hns_roce_qp.c @@ -115,7 +115,10 @@ static int hns_roce_reserve_range_qp(struct hns_roce_dev *hr_dev, int cnt, { struct hns_roce_qp_table *qp_table = &hr_dev->qp_table;
- return hns_roce_bitmap_alloc_range(&qp_table->bitmap, cnt, align, base); + return hns_roce_bitmap_alloc_range(&qp_table->bitmap, cnt, align, + base) ? + -ENOMEM : + 0; }
enum hns_roce_qp_state to_hns_roce_state(enum ib_qp_state state)
From: Erik Schmauss erik.schmauss@intel.com
[ Upstream commit f016b19a9275089a2ab06c2144567c2ad8d5d6ad ]
The value coming from acpi_hw_read() should not be used if it returns an error code, so check the status returned by it before using that value in two places in acpi_hw_register_read().
Reported-by: Mark Gross mark.gross@intel.com Signed-off-by: Erik Schmauss erik.schmauss@intel.com [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/acpi/acpica/hwregs.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/acpica/hwregs.c b/drivers/acpi/acpica/hwregs.c index 3de794bcf8fa..69603ba52a3a 100644 --- a/drivers/acpi/acpica/hwregs.c +++ b/drivers/acpi/acpica/hwregs.c @@ -528,13 +528,18 @@ acpi_status acpi_hw_register_read(u32 register_id, u32 *return_value)
status = acpi_hw_read(&value64, &acpi_gbl_FADT.xpm2_control_block); - value = (u32)value64; + if (ACPI_SUCCESS(status)) { + value = (u32)value64; + } break;
case ACPI_REGISTER_PM_TIMER: /* 32-bit access */
status = acpi_hw_read(&value64, &acpi_gbl_FADT.xpm_timer_block); - value = (u32)value64; + if (ACPI_SUCCESS(status)) { + value = (u32)value64; + } + break;
case ACPI_REGISTER_SMI_COMMAND_BLOCK: /* 8-bit access */
From: Kim Phillips kim.phillips@arm.com
[ Upstream commit 344353366591acf659a0d0dea498611da78d67e2 ]
The auxtrace init variable 'err' was not being initialized, leading perf to abort early in an SPE record command when there was no explicit error, rather only based whatever memory contents were on the stack. Initialize it explicitly on getting an SPE successfully, the same way cs-etm does.
Signed-off-by: Kim Phillips kim.phillips@arm.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Dongjiu Geng gengdongjiu@huawei.com Cc: Jiri Olsa jolsa@redhat.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Fixes: ffd3d18c20b8 ("perf tools: Add ARM Statistical Profiling Extensions (SPE) support") Link: http://lkml.kernel.org/r/20180810174512.52900813e57cbccf18ce99a2@arm.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- tools/perf/arch/arm64/util/arm-spe.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c index 1120e39c1b00..5ccfce87e693 100644 --- a/tools/perf/arch/arm64/util/arm-spe.c +++ b/tools/perf/arch/arm64/util/arm-spe.c @@ -194,6 +194,7 @@ struct auxtrace_record *arm_spe_recording_init(int *err, sper->itr.read_finish = arm_spe_read_finish; sper->itr.alignment = 0;
+ *err = 0; return &sper->itr; }
From: Xi Wang wangxi11@huawei.com
[ Upstream commit 6c39d5278e62956238a681e4cfc69fae5507fc57 ]
According to the functional specification of hardware, the first descriptor of response from command 'lookup vlan talbe' is not valid. Currently, the first descriptor is parsed as normal value, which will cause an expected error.
This patch fixes this problem by skipping the first descriptor.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Xi Wang wangxi11@huawei.com Signed-off-by: Peng Li lipeng321@huawei.com Signed-off-by: Salil Mehta salil.mehta@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index d318d35e598f..6fd7ea8074b0 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -3911,7 +3911,7 @@ static bool hclge_is_all_function_id_zero(struct hclge_desc *desc) #define HCLGE_FUNC_NUMBER_PER_DESC 6 int i, j;
- for (i = 0; i < HCLGE_DESC_NUMBER; i++) + for (i = 1; i < HCLGE_DESC_NUMBER; i++) for (j = 0; j < HCLGE_FUNC_NUMBER_PER_DESC; j++) if (desc[i].data[j]) return false;
From: Jens Axboe axboe@kernel.dk
[ Upstream commit b089cfd95d32638335c551651a8e00fd2c4edb0b ]
Don't warn for a flush issued to a read-only device. It's not strictly a writable command, as it doesn't change any on-media data by itself.
Reported-by: Stefan Agner stefan@agner.ch Fixes: 721c7fc701c7 ("block: fail op_is_write() requests to read-only partitions") Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- block/blk-core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/block/blk-core.c b/block/blk-core.c index ee33590f54eb..3bb7237a2384 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2152,7 +2152,9 @@ static inline bool should_fail_request(struct hd_struct *part,
static inline bool bio_check_ro(struct bio *bio, struct hd_struct *part) { - if (part->policy && op_is_write(bio_op(bio))) { + const int op = bio_op(bio); + + if (part->policy && (op_is_write(op) && !op_is_flush(op))) { char b[BDEVNAME_SIZE];
WARN_ONCE(1,
From: Jian Shen shenjian15@huawei.com
[ Upstream commit 60081dcc4fce385ade26d3145b2479789df0b7e5 ]
For marvell phy m88e1510, bit SUPPORTED_FIBRE of phydev->supported is default on. Both phy_resume() and phy_suspend() will check the SUPPORTED_FIBRE bit and write register of fibre page.
Currently in hns3 driver, the SUPPORTED_FIBRE bit will be cleared after phy_connect_direct() finished. Because phy_resume() is called in phy_connect_direct(), and phy_suspend() is called when disconnect phy device, so the operation for fibre page register is not symmetrical. It will cause phy link issue when reload hns3 driver.
This patch fixes it by disable the SUPPORTED_FIBRE before connecting phy.
Fixes: 256727da7395 ("net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC") Signed-off-by: Jian Shen shenjian15@huawei.com Signed-off-by: Peng Li lipeng321@huawei.com Signed-off-by: Salil Mehta salil.mehta@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c index 9f7932e423b5..6315e8ad8467 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c @@ -208,6 +208,8 @@ int hclge_mac_start_phy(struct hclge_dev *hdev) if (!phydev) return 0;
+ phydev->supported &= ~SUPPORTED_FIBRE; + ret = phy_connect_direct(netdev, phydev, hclge_mac_adjust_link, PHY_INTERFACE_MODE_SGMII);
From: Myron Stowe myron.stowe@redhat.com
[ Upstream commit 9f0e89359775ee21fe1ea732e34edb52aef5addf ]
In commit 27d868b5e6cf ("PCI: Set MPS to match upstream bridge"), we made sure every device's MPS setting matches its upstream bridge, making it more likely that a hot-added device will work in a system with an optimized MPS configuration.
Recently I've started encountering systems where the endpoint device's MPSS capability is less than its Root Port's current MPS value, thus the endpoint is not capable of matching its upstream bridge's MPS setting (see: bugzilla via "Link:" below). This leaves the system vulnerable - the upstream Root Port could respond with larger TLPs than the device can handle, and the device will consider them to be 'Malformed'.
One could use the "pci=pcie_bus_safe" kernel parameter to work around the issue, but that forces a user to supply a kernel parameter to get the system to function reliably and may end up limiting MPS settings of other unrelated, sub-topologies which could benefit from maintaining their larger values.
Augment Keith's approach to include tuning down a Root Port's MPS setting when its hot-added endpoint device is not capable of matching it.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527 Signed-off-by: Myron Stowe myron.stowe@redhat.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Acked-by: Jon Mason jdmason@kudzu.us Cc: Keith Busch keith.busch@intel.com Cc: Sinan Kaya okaya@kernel.org Cc: Dongdong Liu liudongdong3@huawei.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/pci/probe.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index b2857865c0aa..a1a243ee36bb 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1725,7 +1725,7 @@ int pci_setup_device(struct pci_dev *dev) static void pci_configure_mps(struct pci_dev *dev) { struct pci_dev *bridge = pci_upstream_bridge(dev); - int mps, p_mps, rc; + int mps, mpss, p_mps, rc;
if (!pci_is_pcie(dev) || !bridge || !pci_is_pcie(bridge)) return; @@ -1753,6 +1753,14 @@ static void pci_configure_mps(struct pci_dev *dev) if (pcie_bus_config != PCIE_BUS_DEFAULT) return;
+ mpss = 128 << dev->pcie_mpss; + if (mpss < p_mps && pci_pcie_type(bridge) == PCI_EXP_TYPE_ROOT_PORT) { + pcie_set_mps(bridge, mpss); + pci_info(dev, "Upstream bridge's Max Payload Size set to %d (was %d, max %d)\n", + mpss, p_mps, 128 << bridge->pcie_mpss); + p_mps = pcie_get_mps(bridge); + } + rc = pcie_set_mps(dev, p_mps); if (rc) { pci_warn(dev, "can't set Max Payload Size to %d; if necessary, use "pci=pcie_bus_safe" and report a bug\n", @@ -1761,7 +1769,7 @@ static void pci_configure_mps(struct pci_dev *dev) }
pci_info(dev, "Max Payload Size set to %d (was %d, max %d)\n", - p_mps, mps, 128 << dev->pcie_mpss); + p_mps, mps, mpss); }
static struct hpp_type0 pci_default_type0 = {
From: Nicholas Kazlauskas nicholas.kazlauskas@amd.com
[ Upstream commit dddc0557e3a02499ce336b1e2e67f5afaecccc80 ]
[Why]
A null pointer deference can occur if crtc is null in amdgpu_dm_crtc_handle_crc_irq. This can happen if get_crtc_by_otg_inst returns NULL during dm_crtc_high_irq, leading to a hang in some IGT test cases.
[How]
Check that CRTC is non-null before accessing its fields.
Signed-off-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Reviewed-by: Sun peng Li Sunpeng.Li@amd.com Acked-by: Leo Li sunpeng.li@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c index 52f2c01349e3..9bfb040352e9 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crc.c @@ -98,10 +98,16 @@ int amdgpu_dm_crtc_set_crc_source(struct drm_crtc *crtc, const char *src_name, */ void amdgpu_dm_crtc_handle_crc_irq(struct drm_crtc *crtc) { - struct dm_crtc_state *crtc_state = to_dm_crtc_state(crtc->state); - struct dc_stream_state *stream_state = crtc_state->stream; + struct dm_crtc_state *crtc_state; + struct dc_stream_state *stream_state; uint32_t crcs[3];
+ if (crtc == NULL) + return; + + crtc_state = to_dm_crtc_state(crtc->state); + stream_state = crtc_state->stream; + /* Early return if CRC capture is not enabled. */ if (!crtc_state->crc_enabled) return;
From: Denis Efremov efremov@linux.com
[ Upstream commit 512ddf7d7db056edfed3159ea7cb4e4a5eefddd4 ]
If coccicheck fails, it should return an error code distinct from zero to signal about an internal problem. Current code instead of exiting with the tool's error code returns the error code of 'echo "coccicheck failed"' which is almost always equals to zero, thus failing the original intention of alerting about a problem. This patch fixes the code.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Denis Efremov efremov@linux.com Acked-by: Julia Lawall julia.lawall@lip6.fr Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- scripts/coccicheck | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/scripts/coccicheck b/scripts/coccicheck index 9fedca611b7f..e04d328210ac 100755 --- a/scripts/coccicheck +++ b/scripts/coccicheck @@ -128,9 +128,10 @@ run_cmd_parmap() { fi echo $@ >>$DEBUG_FILE $@ 2>>$DEBUG_FILE - if [[ $? -ne 0 ]]; then + err=$? + if [[ $err -ne 0 ]]; then echo "coccicheck failed" - exit $? + exit $err fi }
From: Benno Evers bevers@mesosphere.com
[ Upstream commit 3f4417d693b43fa240ac8bde4487f67745ca23d8 ]
The argument to nsinfo__copy() was assumed to be valid, but some code paths exist that will lead to NULL being passed.
In particular, running 'perf script -D' on a perf.data file containing an PERF_RECORD_MMAP event associating the '[vdso]' dso with pid 0 earlier in the event stream will lead to a segfault.
Since all calling code is already checking for a non-null return value, just return NULL for this case as well.
Signed-off-by: Benno Evers bevers@mesosphere.com Acked-by: Namhyung Kim namhyung@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Jiri Olsa jolsa@redhat.com Cc: Krister Johansen kjlx@templeofstupid.com Cc: Peter Zijlstra peterz@infradead.org Link: http://lkml.kernel.org/r/20180810133614.9925-1-bevers@mesosphere.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- tools/perf/util/namespaces.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c index 5be021701f34..cf8bd123cf73 100644 --- a/tools/perf/util/namespaces.c +++ b/tools/perf/util/namespaces.c @@ -139,6 +139,9 @@ struct nsinfo *nsinfo__copy(struct nsinfo *nsi) { struct nsinfo *nnsi;
+ if (nsi == NULL) + return NULL; + nnsi = calloc(1, sizeof(*nnsi)); if (nnsi != NULL) { nnsi->pid = nsi->pid;
From: Chao Yu yuchao0@huawei.com
[ Upstream commit c7079853c859c910b9d047a37891b4aafb8f8dd7 ]
Thread A Background GC - f2fs_zero_range - truncate_pagecache_range - gc_data_segment - get_read_data_page - move_data_page - set_page_dirty - set_cold_data - f2fs_do_zero_range - dn->data_blkaddr = NEW_ADDR; - f2fs_set_data_blkaddr
Actually, we don't need to set dirty & checked flag on the page, since all valid data in the page should be zeroed by zero_range().
Use i_gc_rwsem[WRITE] to avoid such race condition.
Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/f2fs/file.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 6880c6f78d58..abb0feffff11 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1295,8 +1295,6 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, if (ret) goto out_sem;
- truncate_pagecache_range(inode, offset, offset + len - 1); - pg_start = ((unsigned long long) offset) >> PAGE_SHIFT; pg_end = ((unsigned long long) offset + len) >> PAGE_SHIFT;
@@ -1326,12 +1324,19 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, unsigned int end_offset; pgoff_t end;
+ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + + truncate_pagecache_range(inode, + (loff_t)index << PAGE_SHIFT, + ((loff_t)pg_end << PAGE_SHIFT) - 1); + f2fs_lock_op(sbi);
set_new_dnode(&dn, inode, NULL, NULL, 0); ret = f2fs_get_dnode_of_data(&dn, index, ALLOC_NODE); if (ret) { f2fs_unlock_op(sbi); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); goto out; }
@@ -1340,7 +1345,9 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len,
ret = f2fs_do_zero_range(&dn, index, end); f2fs_put_dnode(&dn); + f2fs_unlock_op(sbi); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
f2fs_balance_fs(sbi, dn.node_changed);
From: Chao Yu yuchao0@huawei.com
[ Upstream commit a33c150237a20d97a174243bc658c86502f9d370 ]
Thread A Background GC - f2fs_setattr isize to 0 - truncate_setsize - gc_data_segment - f2fs_get_read_data_page page #0 - set_page_dirty - set_cold_data - f2fs_truncate
- f2fs_setattr isize to 4k - read 4k <--- hit data in cached page #0
Above race condition can cause read out invalid data in a truncated page, fix it by i_gc_rwsem[WRITE] lock.
Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/f2fs/data.c | 4 ++++ fs/f2fs/file.c | 37 +++++++++++++++++++++++-------------- 2 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 8f931d699287..8206389e84c0 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2149,8 +2149,12 @@ static void f2fs_write_failed(struct address_space *mapping, loff_t to)
if (to > i_size) { down_write(&F2FS_I(inode)->i_mmap_sem); + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + truncate_pagecache(inode, i_size); f2fs_truncate_blocks(inode, i_size, true); + + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); up_write(&F2FS_I(inode)->i_mmap_sem); } } diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index abb0feffff11..3ffa341cf586 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -782,22 +782,26 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) }
if (attr->ia_valid & ATTR_SIZE) { - if (attr->ia_size <= i_size_read(inode)) { - down_write(&F2FS_I(inode)->i_mmap_sem); - truncate_setsize(inode, attr->ia_size); + bool to_smaller = (attr->ia_size <= i_size_read(inode)); + + down_write(&F2FS_I(inode)->i_mmap_sem); + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + + truncate_setsize(inode, attr->ia_size); + + if (to_smaller) err = f2fs_truncate(inode); - up_write(&F2FS_I(inode)->i_mmap_sem); - if (err) - return err; - } else { - /* - * do not trim all blocks after i_size if target size is - * larger than i_size. - */ - down_write(&F2FS_I(inode)->i_mmap_sem); - truncate_setsize(inode, attr->ia_size); - up_write(&F2FS_I(inode)->i_mmap_sem); + /* + * do not trim all blocks after i_size if target size is + * larger than i_size. + */ + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + up_write(&F2FS_I(inode)->i_mmap_sem);
+ if (err) + return err; + + if (!to_smaller) { /* should convert inline inode here */ if (!f2fs_may_inline_data(inode)) { err = f2fs_convert_inline_inode(inode); @@ -944,13 +948,18 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len)
blk_start = (loff_t)pg_start << PAGE_SHIFT; blk_end = (loff_t)pg_end << PAGE_SHIFT; + down_write(&F2FS_I(inode)->i_mmap_sem); + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + truncate_inode_pages_range(mapping, blk_start, blk_end - 1);
f2fs_lock_op(sbi); ret = f2fs_truncate_hole(inode, pg_start, pg_end); f2fs_unlock_op(sbi); + + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); up_write(&F2FS_I(inode)->i_mmap_sem); } }
From: Palmer Dabbelt palmer@sifive.com
[ Upstream commit 4938c79bd0f5f3650c8c2cd4cdc972f0a6962ce4 ]
If you use a 64-bit compiler to build a 32-bit kernel then you'll get an error when building the vDSO due to a library mismatch. The happens because the relevant "-march" argument isn't supplied to the GCC run that generates one of the vDSO intermediate files.
I'm not actually sure what the right thing to do here is as I'm not particularly familiar with the kernel build system. I poked the documentation and it appears that KCFLAGS is the correct thing to do (it's suggested that should be used when building modules), but we set KBUILD_CFLAGS in arch/riscv/Makefile.
This does at least fix the build error.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Palmer Dabbelt palmer@sifive.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- arch/riscv/kernel/vdso/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile index f6561b783b61..eed1c137f618 100644 --- a/arch/riscv/kernel/vdso/Makefile +++ b/arch/riscv/kernel/vdso/Makefile @@ -52,8 +52,8 @@ $(obj)/%.so: $(obj)/%.so.dbg FORCE # Add -lgcc so rv32 gets static muldi3 and lshrdi3 definitions. # Make sure only to export the intended __vdso_xxx symbol offsets. quiet_cmd_vdsold = VDSOLD $@ - cmd_vdsold = $(CC) $(KCFLAGS) $(call cc-option, -no-pie) -nostdlib $(SYSCFLAGS_$(@F)) \ - -Wl,-T,$(filter-out FORCE,$^) -o $@.tmp -lgcc && \ + cmd_vdsold = $(CC) $(KBUILD_CFLAGS) $(call cc-option, -no-pie) -nostdlib -nostartfiles $(SYSCFLAGS_$(@F)) \ + -Wl,-T,$(filter-out FORCE,$^) -o $@.tmp && \ $(CROSS_COMPILE)objcopy \ $(patsubst %, -G __vdso_%, $(vdso-syms)) $@.tmp $@
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 4096165d55218a6f58b6c2ebc5d2428aa0aa70e4 ]
If there are any errors in stm32_exti_host_init() then it leads to a NULL dereference in the callers. The function should clean up after itself.
Fixes: f9fc1745501e ("irqchip/stm32: Add host and driver data structures") Reviewed-by: Ludovic Barre ludovic.barre@st.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/irqchip/irq-stm32-exti.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/irqchip/irq-stm32-exti.c b/drivers/irqchip/irq-stm32-exti.c index 3a7e8905a97e..880e48947576 100644 --- a/drivers/irqchip/irq-stm32-exti.c +++ b/drivers/irqchip/irq-stm32-exti.c @@ -602,17 +602,24 @@ stm32_exti_host_data *stm32_exti_host_init(const struct stm32_exti_drv_data *dd, sizeof(struct stm32_exti_chip_data), GFP_KERNEL); if (!host_data->chips_data) - return NULL; + goto free_host_data;
host_data->base = of_iomap(node, 0); if (!host_data->base) { pr_err("%pOF: Unable to map registers\n", node); - return NULL; + goto free_chips_data; }
stm32_host_data = host_data;
return host_data; + +free_chips_data: + kfree(host_data->chips_data); +free_host_data: + kfree(host_data); + + return NULL; }
static struct @@ -664,10 +671,8 @@ static int __init stm32_exti_init(const struct stm32_exti_drv_data *drv_data, struct irq_domain *domain;
host_data = stm32_exti_host_init(drv_data, node); - if (!host_data) { - ret = -ENOMEM; - goto out_free_mem; - } + if (!host_data) + return -ENOMEM;
domain = irq_domain_add_linear(node, drv_data->bank_nr * IRQS_PER_BANK, &irq_exti_domain_ops, NULL); @@ -724,7 +729,6 @@ static int __init stm32_exti_init(const struct stm32_exti_drv_data *drv_data, irq_domain_remove(domain); out_unmap: iounmap(host_data->base); -out_free_mem: kfree(host_data->chips_data); kfree(host_data); return ret; @@ -751,10 +755,8 @@ __init stm32_exti_hierarchy_init(const struct stm32_exti_drv_data *drv_data, }
host_data = stm32_exti_host_init(drv_data, node); - if (!host_data) { - ret = -ENOMEM; - goto out_free_mem; - } + if (!host_data) + return -ENOMEM;
for (i = 0; i < drv_data->bank_nr; i++) stm32_exti_chip_init(host_data, i, node); @@ -776,7 +778,6 @@ __init stm32_exti_hierarchy_init(const struct stm32_exti_drv_data *drv_data,
out_unmap: iounmap(host_data->base); -out_free_mem: kfree(host_data->chips_data); kfree(host_data); return ret;
From: Jonas Gorski jonas.gorski@gmail.com
[ Upstream commit 0702bc4d2fe793018ad9aa0eb14bff7f526c4095 ]
When compiling bmips with SMP disabled, the build fails with:
drivers/irqchip/irq-bcm7038-l1.o: In function `bcm7038_l1_cpu_offline': drivers/irqchip/irq-bcm7038-l1.c:242: undefined reference to `irq_set_affinity_locked' make[5]: *** [vmlinux] Error 1
Fix this by adding and setting bcm7038_l1_cpu_offline only when actually compiling for SMP. It wouldn't have been used anyway, as it requires CPU_HOTPLUG, which in turn requires SMP.
Fixes: 34c535793bcb ("irqchip/bcm7038-l1: Implement irq_cpu_offline() callback") Signed-off-by: Jonas Gorski jonas.gorski@gmail.com Signed-off-by: Marc Zyngier marc.zyngier@arm.com Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/irqchip/irq-bcm7038-l1.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/irqchip/irq-bcm7038-l1.c b/drivers/irqchip/irq-bcm7038-l1.c index faf734ff4cf3..0f6e30e9009d 100644 --- a/drivers/irqchip/irq-bcm7038-l1.c +++ b/drivers/irqchip/irq-bcm7038-l1.c @@ -217,6 +217,7 @@ static int bcm7038_l1_set_affinity(struct irq_data *d, return 0; }
+#ifdef CONFIG_SMP static void bcm7038_l1_cpu_offline(struct irq_data *d) { struct cpumask *mask = irq_data_get_affinity_mask(d); @@ -241,6 +242,7 @@ static void bcm7038_l1_cpu_offline(struct irq_data *d) } irq_set_affinity_locked(d, &new_affinity, false); } +#endif
static int __init bcm7038_l1_init_one(struct device_node *dn, unsigned int idx, @@ -293,7 +295,9 @@ static struct irq_chip bcm7038_l1_irq_chip = { .irq_mask = bcm7038_l1_mask, .irq_unmask = bcm7038_l1_unmask, .irq_set_affinity = bcm7038_l1_set_affinity, +#ifdef CONFIG_SMP .irq_cpu_offline = bcm7038_l1_cpu_offline, +#endif };
static int bcm7038_l1_map(struct irq_domain *d, unsigned int virq,
From: Tomas Bortoli tomasbortoli@gmail.com
[ Upstream commit 9f476d7c540cb57556d3cc7e78704e6cd5100f5f ]
It may be possible to run p9_fd_cancel() with a deleted req->req_list and incur in a double del. To fix hold the client->lock while changing the status, so the other threads will be synchronized.
Link: http://lkml.kernel.org/r/20180723184253.6682-1-tomasbortoli@gmail.com Signed-off-by: Tomas Bortoli tomasbortoli@gmail.com Reported-by: syzbot+735d926e9d1317c3310c@syzkaller.appspotmail.com To: Eric Van Hensbergen ericvh@gmail.com To: Ron Minnich rminnich@sandia.gov To: Latchesar Ionkov lucho@ionkov.net Cc: Yiwen Jiang jiangyiwen@huwei.com Cc: David S. Miller davem@davemloft.net Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- net/9p/trans_fd.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index 588bf88c3305..f9f96d50d96d 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -197,15 +197,14 @@ static void p9_mux_poll_stop(struct p9_conn *m) static void p9_conn_cancel(struct p9_conn *m, int err) { struct p9_req_t *req, *rtmp; - unsigned long flags; LIST_HEAD(cancel_list);
p9_debug(P9_DEBUG_ERROR, "mux %p err %d\n", m, err);
- spin_lock_irqsave(&m->client->lock, flags); + spin_lock(&m->client->lock);
if (m->err) { - spin_unlock_irqrestore(&m->client->lock, flags); + spin_unlock(&m->client->lock); return; }
@@ -217,7 +216,6 @@ static void p9_conn_cancel(struct p9_conn *m, int err) list_for_each_entry_safe(req, rtmp, &m->unsent_req_list, req_list) { list_move(&req->req_list, &cancel_list); } - spin_unlock_irqrestore(&m->client->lock, flags);
list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) { p9_debug(P9_DEBUG_ERROR, "call back req %p\n", req); @@ -226,6 +224,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err) req->t_err = err; p9_client_cb(m->client, req, REQ_STATUS_ERROR); } + spin_unlock(&m->client->lock); }
static __poll_t @@ -373,8 +372,9 @@ static void p9_read_work(struct work_struct *work) if (m->req->status != REQ_STATUS_ERROR) status = REQ_STATUS_RCVD; list_del(&m->req->req_list); - spin_unlock(&m->client->lock); + /* update req->status while holding client->lock */ p9_client_cb(m->client, m->req, status); + spin_unlock(&m->client->lock); m->rc.sdata = NULL; m->rc.offset = 0; m->rc.capacity = 0;
From: Jean-Philippe Brucker jean-philippe.brucker@arm.com
[ Upstream commit 92aef4675d5b1b55404e1532379e343bed0e5cf2 ]
Currently when virtio_find_single_vq fails, we go through del_vqs which throws a warning (Trying to free already-free IRQ). Skip del_vqs if vq allocation failed.
Link: http://lkml.kernel.org/r/20180524101021.49880-1-jean-philippe.brucker@arm.co... Signed-off-by: Jean-Philippe Brucker jean-philippe.brucker@arm.com Reviewed-by: Greg Kurz groug@kaod.org Cc: Eric Van Hensbergen ericvh@gmail.com Cc: Ron Minnich rminnich@sandia.gov Cc: Latchesar Ionkov lucho@ionkov.net Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Dominique Martinet dominique.martinet@cea.fr Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- net/9p/trans_virtio.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c index 05006cbb3361..eaacce086427 100644 --- a/net/9p/trans_virtio.c +++ b/net/9p/trans_virtio.c @@ -563,7 +563,7 @@ static int p9_virtio_probe(struct virtio_device *vdev) chan->vq = virtio_find_single_vq(vdev, req_done, "requests"); if (IS_ERR(chan->vq)) { err = PTR_ERR(chan->vq); - goto out_free_vq; + goto out_free_chan; } chan->vq->vdev->priv = chan; spin_lock_init(&chan->lock); @@ -616,6 +616,7 @@ static int p9_virtio_probe(struct virtio_device *vdev) kfree(tag); out_free_vq: vdev->config->del_vqs(vdev); +out_free_chan: kfree(chan); fail: return err;
From: Chao Yu yuchao0@huawei.com
[ Upstream commit 66110abc4c931f879d70e83e1281f891699364bf ]
PG_checked flag will be set on data page during GC, later, we can recognize such page by the flag and migrate page to cold segment.
But previously, we don't clear this flag when invalidating data page, after page redirtying, we will write it into wrong log.
Let's clear PG_checked flag in set_page_dirty() to avoid this.
Signed-off-by: Weichao Guo guoweichao@huawei.com Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- fs/f2fs/data.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 8206389e84c0..b61954d40c25 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2494,6 +2494,10 @@ static int f2fs_set_data_page_dirty(struct page *page) if (!PageUptodate(page)) SetPageUptodate(page);
+ /* don't remain PG_checked flag which was set during GC */ + if (is_cold_data(page)) + clear_cold_data(page); + if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) { if (!IS_ATOMIC_WRITTEN_PAGE(page)) { f2fs_register_inmem_page(inode, page);
From: Anton Vasilyev vasilyev@ispras.ru
[ Upstream commit 504c76979bccec66e4c2e41f6a006e49e284466f ]
There is no check that allocation in axp20x_funcs_groups_from_mask is successful. The patch adds corresponding check and return values.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev vasilyev@ispras.ru Acked-by: Chen-Yu Tsai wens@csie.org Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- drivers/pinctrl/pinctrl-axp209.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/drivers/pinctrl/pinctrl-axp209.c b/drivers/pinctrl/pinctrl-axp209.c index a52779f33ad4..afd0b533c40a 100644 --- a/drivers/pinctrl/pinctrl-axp209.c +++ b/drivers/pinctrl/pinctrl-axp209.c @@ -316,7 +316,7 @@ static const struct pinctrl_ops axp20x_pctrl_ops = { .get_group_pins = axp20x_group_pins, };
-static void axp20x_funcs_groups_from_mask(struct device *dev, unsigned int mask, +static int axp20x_funcs_groups_from_mask(struct device *dev, unsigned int mask, unsigned int mask_len, struct axp20x_pinctrl_function *func, const struct pinctrl_pin_desc *pins) @@ -331,18 +331,22 @@ static void axp20x_funcs_groups_from_mask(struct device *dev, unsigned int mask, func->groups = devm_kcalloc(dev, ngroups, sizeof(const char *), GFP_KERNEL); + if (!func->groups) + return -ENOMEM; group = func->groups; for_each_set_bit(bit, &mask_cpy, mask_len) { *group = pins[bit].name; group++; } } + + return 0; }
-static void axp20x_build_funcs_groups(struct platform_device *pdev) +static int axp20x_build_funcs_groups(struct platform_device *pdev) { struct axp20x_pctl *pctl = platform_get_drvdata(pdev); - int i, pin, npins = pctl->desc->npins; + int i, ret, pin, npins = pctl->desc->npins;
pctl->funcs[AXP20X_FUNC_GPIO_OUT].name = "gpio_out"; pctl->funcs[AXP20X_FUNC_GPIO_OUT].muxval = AXP20X_MUX_GPIO_OUT; @@ -366,13 +370,19 @@ static void axp20x_build_funcs_groups(struct platform_device *pdev) pctl->funcs[i].groups[pin] = pctl->desc->pins[pin].name; }
- axp20x_funcs_groups_from_mask(&pdev->dev, pctl->desc->ldo_mask, + ret = axp20x_funcs_groups_from_mask(&pdev->dev, pctl->desc->ldo_mask, npins, &pctl->funcs[AXP20X_FUNC_LDO], pctl->desc->pins); + if (ret) + return ret;
- axp20x_funcs_groups_from_mask(&pdev->dev, pctl->desc->adc_mask, + ret = axp20x_funcs_groups_from_mask(&pdev->dev, pctl->desc->adc_mask, npins, &pctl->funcs[AXP20X_FUNC_ADC], pctl->desc->pins); + if (ret) + return ret; + + return 0; }
static const struct of_device_id axp20x_pctl_match[] = { @@ -424,7 +434,11 @@ static int axp20x_pctl_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, pctl);
- axp20x_build_funcs_groups(pdev); + ret = axp20x_build_funcs_groups(pdev); + if (ret) { + dev_err(&pdev->dev, "failed to build groups\n"); + return ret; + }
pctrl_desc = devm_kzalloc(&pdev->dev, sizeof(*pctrl_desc), GFP_KERNEL); if (!pctrl_desc)
From: Yonghong Song yhs@fb.com
[ Upstream commit dc1508a579e682a1e5f1ed0753390e0aa7c23a97 ]
In function map_seq_next() of kernel/bpf/inode.c, the first key will be the "0" regardless of the map type. This works for array. But for hash type, if it happens key "0" is in the map, the bpffs map show will miss some items if the key "0" is not the first element of the first bucket.
This patch fixed the issue by guaranteeing to get the first element, if the seq_show is just started, by passing NULL pointer key to map_get_next_key() callback. This way, no missing elements will occur for bpffs hash table show even if key "0" is in the map.
Fixes: a26ca7c982cb5 ("bpf: btf: Add pretty print support to the basic arraymap") Acked-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Yonghong Song yhs@fb.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- kernel/bpf/inode.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c index 76efe9a183f5..fc5b103512e7 100644 --- a/kernel/bpf/inode.c +++ b/kernel/bpf/inode.c @@ -196,19 +196,21 @@ static void *map_seq_next(struct seq_file *m, void *v, loff_t *pos) { struct bpf_map *map = seq_file_to_map(m); void *key = map_iter(m)->key; + void *prev_key;
if (map_iter(m)->done) return NULL;
if (unlikely(v == SEQ_START_TOKEN)) - goto done; + prev_key = NULL; + else + prev_key = key;
- if (map->ops->map_get_next_key(map, key, key)) { + if (map->ops->map_get_next_key(map, prev_key, key)) { map_iter(m)->done = true; return NULL; }
-done: ++(*pos); return key; }
From: Michael Ellerman mpe@ellerman.id.au
[ Upstream commit f7a6947cd49b7ff4e03f1b4f7e7b223003d752ca ]
Currently if you build a 32-bit powerpc kernel and use get_user() to load a u64 value it will fail to build with eg:
kernel/rseq.o: In function `rseq_get_rseq_cs': kernel/rseq.c:123: undefined reference to `__get_user_bad'
This is hitting the check in __get_user_size() that makes sure the size we're copying doesn't exceed the size of the destination:
#define __get_user_size(x, ptr, size, retval) do { retval = 0; __chk_user_ptr(ptr); if (size > sizeof(x)) (x) = __get_user_bad();
Which doesn't immediately make sense because the size of the destination is u64, but it's not really, because __get_user_check() etc. internally create an unsigned long and copy into that:
#define __get_user_check(x, ptr, size) ({ long __gu_err = -EFAULT; unsigned long __gu_val = 0;
The problem being that on 32-bit unsigned long is not big enough to hold a u64. We can fix this with a trick from hpa in the x86 code, we statically check the type of x and set the type of __gu_val to either unsigned long or unsigned long long.
Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- arch/powerpc/include/asm/uaccess.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 468653ce844c..327f6112fe8e 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -250,10 +250,17 @@ do { \ } \ } while (0)
+/* + * This is a type: either unsigned long, if the argument fits into + * that type, or otherwise unsigned long long. + */ +#define __long_type(x) \ + __typeof__(__builtin_choose_expr(sizeof(x) > sizeof(0UL), 0ULL, 0UL)) + #define __get_user_nocheck(x, ptr, size) \ ({ \ long __gu_err; \ - unsigned long __gu_val; \ + __long_type(*(ptr)) __gu_val; \ const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ __chk_user_ptr(ptr); \ if (!is_kernel_addr((unsigned long)__gu_addr)) \ @@ -267,7 +274,7 @@ do { \ #define __get_user_check(x, ptr, size) \ ({ \ long __gu_err = -EFAULT; \ - unsigned long __gu_val = 0; \ + __long_type(*(ptr)) __gu_val = 0; \ const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ might_fault(); \ if (access_ok(VERIFY_READ, __gu_addr, (size))) { \ @@ -281,7 +288,7 @@ do { \ #define __get_user_nosleep(x, ptr, size) \ ({ \ long __gu_err; \ - unsigned long __gu_val; \ + __long_type(*(ptr)) __gu_val; \ const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ __chk_user_ptr(ptr); \ barrier_nospec(); \
linux-stable-mirror@lists.linaro.org