- Linux-stable-mirror - lists.linaro.org

[Linux-stable-mirror] Patch "ALSA: seq: Make ioctls race-free" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled ALSA: seq: Make ioctls race-free to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: alsa-seq-make-ioctls-race-free.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From b3defb791b26ea0683a93a4f49c77ec45ec96f10 Mon Sep 17 00:00:00 2001 From: Takashi Iwai <tiwai(a)suse.de> Date: Tue, 9 Jan 2018 23:11:03 +0100 Subject: ALSA: seq: Make ioctls race-free From: Takashi Iwai <tiwai(a)suse.de> commit b3defb791b26ea0683a93a4f49c77ec45ec96f10 upstream. The ALSA sequencer ioctls have no protection against racy calls while the concurrent operations may lead to interfere with each other. As reported recently, for example, the concurrent calls of setting client pool with a combination of write calls may lead to either the unkillable dead-lock or UAF. As a slightly big hammer solution, this patch introduces the mutex to make each ioctl exclusive. Although this may reduce performance via parallel ioctl calls, usually it's not demanded for sequencer usages, hence it should be negligible. Reported-by: Luo Quan <a4651386(a)163.com> Reviewed-by: Kees Cook <keescook(a)chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Takashi Iwai <tiwai(a)suse.de> [bwh: Backported to 4.4: ioctl dispatch is done from snd_seq_do_ioctl(); take the mutex and add ret variable there.] Signed-off-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- sound/core/seq/seq_clientmgr.c | 10 ++++++++-- sound/core/seq/seq_clientmgr.h | 1 + 2 files changed, 9 insertions(+), 2 deletions(-) --- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -236,6 +236,7 @@ static struct snd_seq_client *seq_create rwlock_init(&client->ports_lock); mutex_init(&client->ports_mutex); INIT_LIST_HEAD(&client->ports_list_head); + mutex_init(&client->ioctl_mutex); /* find free slot in the client table */ spin_lock_irqsave(&clients_lock, flags); @@ -2200,6 +2201,7 @@ static int snd_seq_do_ioctl(struct snd_s void __user *arg) { struct seq_ioctl_table *p; + int ret; switch (cmd) { case SNDRV_SEQ_IOCTL_PVERSION: @@ -2213,8 +2215,12 @@ static int snd_seq_do_ioctl(struct snd_s if (! arg) return -EFAULT; for (p = ioctl_tables; p->cmd; p++) { - if (p->cmd == cmd) - return p->func(client, arg); + if (p->cmd == cmd) { + mutex_lock(&client->ioctl_mutex); + ret = p->func(client, arg); + mutex_unlock(&client->ioctl_mutex); + return ret; + } } pr_debug("ALSA: seq unknown ioctl() 0x%x (type='%c', number=0x%02x)\n", cmd, _IOC_TYPE(cmd), _IOC_NR(cmd)); --- a/sound/core/seq/seq_clientmgr.h +++ b/sound/core/seq/seq_clientmgr.h @@ -59,6 +59,7 @@ struct snd_seq_client { struct list_head ports_list_head; rwlock_t ports_lock; struct mutex ports_mutex; + struct mutex ioctl_mutex; int convert32; /* convert 32->64bit */ /* output pool */ Patches currently in stable-queue which might be from tiwai(a)suse.de are queue-3.18/alsa-seq-make-ioctls-race-free.patch

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] [PATCH] general protection fault in sock_has_perm

by Mark Salyzyn

general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 14233 Comm: syz-executor2 Not tainted 4.4.112-g5f6325b #28 task: ffff8801d1095f00 task.stack: ffff8800b5950000 RIP: 0010:[<ffffffff81b69b7e>] [<ffffffff81b69b7e>] sock_has_perm+0x1fe/0x3e0 security/selinux/hooks.c:4069 RSP: 0018:ffff8800b5957ce0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 1ffff10016b2af9f RCX: ffffffff81b69b51 RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000010 RBP: ffff8800b5957de0 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000000 R11: 1ffff10016b2af68 R12: ffff8800b5957db8 R13: 0000000000000000 R14: ffff8800b7259f40 R15: 00000000000000d7 FS: 00007f72f5ae2700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000a2fa38 CR3: 00000001d7980000 CR4: 0000000000160670 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffffffff81b69a1f ffff8800b5957d58 00008000b5957d30 0000000041b58ab3 ffffffff83fc82f2 ffffffff81b69980 0000000000000246 ffff8801d1096770 ffff8801d3165668 ffffffff8157844b ffff8801d1095f00 ffff880000000001 Call Trace: [<ffffffff81b6a19d>] selinux_socket_setsockopt+0x4d/0x80 security/selinux/hooks.c:4338 [<ffffffff81b4873d>] security_socket_setsockopt+0x7d/0xb0 security/security.c:1257 [<ffffffff82df1ac8>] SYSC_setsockopt net/socket.c:1757 [inline] [<ffffffff82df1ac8>] SyS_setsockopt+0xe8/0x250 net/socket.c:1746 [<ffffffff83776499>] entry_SYSCALL_64_fastpath+0x16/0x92 Code: c2 42 9b b6 81 be 01 00 00 00 48 c7 c7 a0 cb 2b 84 e8 f7 2f 6d ff 49 8d 7d 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 83 01 00 00 41 8b 75 10 31 RIP [<ffffffff81b69b7e>] sock_has_perm+0x1fe/0x3e0 security/selinux/hooks.c:4069 RSP <ffff8800b5957ce0> ---[ end trace 7b5aaf788fef6174 ]--- In the absence of commit a4298e4522d6 ("net: add SOCK_RCU_FREE socket flag") and all the associated infrastructure changes to take advantage of a RCU grace period before freeing, there is a heightened possibility that a security check is performed while an ill-timed setsockopt call races in from user space. It then is prudent to null check sk_security, and if the case, reject the permissions. This adjustment is orthogonal to infrastructure improvements that may nullify the needed check, but should be added as good code hygiene. Signed-off-by: Mark Salyzyn <salyzyn(a)android.com> Cc: Paul Moore <paul(a)paul-moore.com> Cc: Stephen Smalley <sds(a)tycho.nsa.gov> Cc: Eric Paris <eparis(a)parisplace.org> Cc: James Morris <james.l.morris(a)oracle.com> Cc: "Serge E. Hallyn" <serge(a)hallyn.com> Cc: selinux(a)tycho.nsa.gov Cc: linux-security-module(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: stable(a)vger.kernel.org --- This patch should be applied to all stable trees (author wants minimum of 3.18, 4.4, 4.9 and 4.14) security/selinux/hooks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 8644d864e3c1..95d7c8143373 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4342,7 +4342,7 @@ static int sock_has_perm(struct sock *sk, u32 perms) struct common_audit_data ad; struct lsm_network_audit net = {0,}; - if (sksec->sid == SECINITSID_KERNEL) + if (!sksec || sksec->sid == SECINITSID_KERNEL) return 0; ad.type = LSM_AUDIT_DATA_NET; -- 2.16.0.rc1.238.g530d649a79-goog

7 years, 9 months

5
13
0 0

[Linux-stable-mirror] [PATCH 4.4] ALSA: seq: Make ioctls race-free

by Ben Hutchings

commit b3defb791b26ea0683a93a4f49c77ec45ec96f10 upstream. The ALSA sequencer ioctls have no protection against racy calls while the concurrent operations may lead to interfere with each other. As reported recently, for example, the concurrent calls of setting client pool with a combination of write calls may lead to either the unkillable dead-lock or UAF. As a slightly big hammer solution, this patch introduces the mutex to make each ioctl exclusive. Although this may reduce performance via parallel ioctl calls, usually it's not demanded for sequencer usages, hence it should be negligible. Reported-by: Luo Quan <a4651386(a)163.com> Reviewed-by: Kees Cook <keescook(a)chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Takashi Iwai <tiwai(a)suse.de> [bwh: Backported to 4.4: ioctl dispatch is done from snd_seq_do_ioctl(); take the mutex and add ret variable there.] Signed-off-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk> --- sound/core/seq/seq_clientmgr.c | 10 ++++++++-- sound/core/seq/seq_clientmgr.h | 1 + 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/sound/core/seq/seq_clientmgr.c b/sound/core/seq/seq_clientmgr.c index b36de76f24e2..7bb9fe7a2c8e 100644 --- a/sound/core/seq/seq_clientmgr.c +++ b/sound/core/seq/seq_clientmgr.c @@ -236,6 +236,7 @@ static struct snd_seq_client *seq_create_client1(int client_index, int poolsize) rwlock_init(&client->ports_lock); mutex_init(&client->ports_mutex); INIT_LIST_HEAD(&client->ports_list_head); + mutex_init(&client->ioctl_mutex); /* find free slot in the client table */ spin_lock_irqsave(&clients_lock, flags); @@ -2195,6 +2196,7 @@ static int snd_seq_do_ioctl(struct snd_seq_client *client, unsigned int cmd, void __user *arg) { struct seq_ioctl_table *p; + int ret; switch (cmd) { case SNDRV_SEQ_IOCTL_PVERSION: @@ -2208,8 +2210,12 @@ static int snd_seq_do_ioctl(struct snd_seq_client *client, unsigned int cmd, if (! arg) return -EFAULT; for (p = ioctl_tables; p->cmd; p++) { - if (p->cmd == cmd) - return p->func(client, arg); + if (p->cmd == cmd) { + mutex_lock(&client->ioctl_mutex); + ret = p->func(client, arg); + mutex_unlock(&client->ioctl_mutex); + return ret; + } } pr_debug("ALSA: seq unknown ioctl() 0x%x (type='%c', number=0x%02x)\n", cmd, _IOC_TYPE(cmd), _IOC_NR(cmd)); diff --git a/sound/core/seq/seq_clientmgr.h b/sound/core/seq/seq_clientmgr.h index 20f0a725ec7d..91f8f165bfdc 100644 --- a/sound/core/seq/seq_clientmgr.h +++ b/sound/core/seq/seq_clientmgr.h @@ -59,6 +59,7 @@ struct snd_seq_client { struct list_head ports_list_head; rwlock_t ports_lock; struct mutex ports_mutex; + struct mutex ioctl_mutex; int convert32; /* convert 32->64bit */ /* output pool */ -- 2.15.0.rc0

7 years, 9 months

2
2
0 0

[Linux-stable-mirror] [PATCH v2 4.4 1/3] bpf: fix branch pruning logic

by Ben Hutchings

From: Alexei Starovoitov <ast(a)fb.com> commit c131187db2d3fa2f8bf32fdf4e9a4ef805168467 upstream. when the verifier detects that register contains a runtime constant and it's compared with another constant it will prune exploration of the branch that is guaranteed not to be taken at runtime. This is all correct, but malicious program may be constructed in such a way that it always has a constant comparison and the other branch is never taken under any conditions. In this case such path through the program will not be explored by the verifier. It won't be taken at run-time either, but since all instructions are JITed the malicious program may cause JITs to complain about using reserved fields, etc. To fix the issue we have to track the instructions explored by the verifier and sanitize instructions that are dead at run time with NOPs. We cannot reject such dead code, since llvm generates it for valid C code, since it doesn't do as much data flow analysis as the verifier does. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Acked-by: Daniel Borkmann <daniel(a)iogearbox.net> Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net> [bwh: Backported to 4.4: - s/bpf_verifier_env/verifier_env/ - Adjust context] Signed-off-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk> --- v2: Restore Alexei as author kernel/bpf/verifier.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 014c2d759916..a62679711de0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -191,6 +191,7 @@ struct bpf_insn_aux_data { enum bpf_reg_type ptr_type; /* pointer type for load/store insns */ struct bpf_map *map_ptr; /* pointer for call insn into lookup_elem */ }; + bool seen; /* this insn was processed by the verifier */ }; #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */ @@ -1793,6 +1794,7 @@ static int do_check(struct verifier_env *env) print_bpf_insn(env, insn); } + env->insn_aux_data[insn_idx].seen = true; if (class == BPF_ALU || class == BPF_ALU64) { err = check_alu_op(env, insn); if (err) @@ -1988,6 +1990,7 @@ process_bpf_exit: return err; insn_idx++; + env->insn_aux_data[insn_idx].seen = true; } else { verbose("invalid BPF_LD mode\n"); return -EINVAL; @@ -2125,6 +2128,7 @@ static int adjust_insn_aux_data(struct verifier_env *env, u32 prog_len, u32 off, u32 cnt) { struct bpf_insn_aux_data *new_data, *old_data = env->insn_aux_data; + int i; if (cnt == 1) return 0; @@ -2134,6 +2138,8 @@ static int adjust_insn_aux_data(struct verifier_env *env, u32 prog_len, memcpy(new_data, old_data, sizeof(struct bpf_insn_aux_data) * off); memcpy(new_data + off + cnt - 1, old_data + off, sizeof(struct bpf_insn_aux_data) * (prog_len - off - cnt + 1)); + for (i = off; i < off + cnt - 1; i++) + new_data[i].seen = true; env->insn_aux_data = new_data; vfree(old_data); return 0; @@ -2152,6 +2158,25 @@ static struct bpf_prog *bpf_patch_insn_data(struct verifier_env *env, u32 off, return new_prog; } +/* The verifier does more data flow analysis than llvm and will not explore + * branches that are dead at run time. Malicious programs can have dead code + * too. Therefore replace all dead at-run-time code with nops. + */ +static void sanitize_dead_code(struct verifier_env *env) +{ + struct bpf_insn_aux_data *aux_data = env->insn_aux_data; + struct bpf_insn nop = BPF_MOV64_REG(BPF_REG_0, BPF_REG_0); + struct bpf_insn *insn = env->prog->insnsi; + const int insn_cnt = env->prog->len; + int i; + + for (i = 0; i < insn_cnt; i++) { + if (aux_data[i].seen) + continue; + memcpy(insn + i, &nop, sizeof(nop)); + } +} + /* convert load instructions that access fields of 'struct __sk_buff' * into sequence of instructions that access fields of 'struct sk_buff' */ @@ -2370,6 +2395,9 @@ skip_full_check: while (pop_stack(env, NULL) >= 0); free_states(env); + if (ret == 0) + sanitize_dead_code(env); + if (ret == 0) /* program is valid, convert *(u32*)(ctx + off) accesses */ ret = convert_ctx_accesses(env); -- 2.15.0.rc0

7 years, 9 months

3
4
0 0

[Linux-stable-mirror] [PATCH 4.4 1/3] bpf: fix branch pruning logic

by Ben Hutchings

commit c131187db2d3fa2f8bf32fdf4e9a4ef805168467 upstream. when the verifier detects that register contains a runtime constant and it's compared with another constant it will prune exploration of the branch that is guaranteed not to be taken at runtime. This is all correct, but malicious program may be constructed in such a way that it always has a constant comparison and the other branch is never taken under any conditions. In this case such path through the program will not be explored by the verifier. It won't be taken at run-time either, but since all instructions are JITed the malicious program may cause JITs to complain about using reserved fields, etc. To fix the issue we have to track the instructions explored by the verifier and sanitize instructions that are dead at run time with NOPs. We cannot reject such dead code, since llvm generates it for valid C code, since it doesn't do as much data flow analysis as the verifier does. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Acked-by: Daniel Borkmann <daniel(a)iogearbox.net> Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net> [bwh: Backported to 4.4: - s/bpf_verifier_env/verifier_env/ - Adjust context] Signed-off-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk> --- kernel/bpf/verifier.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 014c2d759916..a62679711de0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -191,6 +191,7 @@ struct bpf_insn_aux_data { enum bpf_reg_type ptr_type; /* pointer type for load/store insns */ struct bpf_map *map_ptr; /* pointer for call insn into lookup_elem */ }; + bool seen; /* this insn was processed by the verifier */ }; #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */ @@ -1793,6 +1794,7 @@ static int do_check(struct verifier_env *env) print_bpf_insn(env, insn); } + env->insn_aux_data[insn_idx].seen = true; if (class == BPF_ALU || class == BPF_ALU64) { err = check_alu_op(env, insn); if (err) @@ -1988,6 +1990,7 @@ process_bpf_exit: return err; insn_idx++; + env->insn_aux_data[insn_idx].seen = true; } else { verbose("invalid BPF_LD mode\n"); return -EINVAL; @@ -2125,6 +2128,7 @@ static int adjust_insn_aux_data(struct verifier_env *env, u32 prog_len, u32 off, u32 cnt) { struct bpf_insn_aux_data *new_data, *old_data = env->insn_aux_data; + int i; if (cnt == 1) return 0; @@ -2134,6 +2138,8 @@ static int adjust_insn_aux_data(struct verifier_env *env, u32 prog_len, memcpy(new_data, old_data, sizeof(struct bpf_insn_aux_data) * off); memcpy(new_data + off + cnt - 1, old_data + off, sizeof(struct bpf_insn_aux_data) * (prog_len - off - cnt + 1)); + for (i = off; i < off + cnt - 1; i++) + new_data[i].seen = true; env->insn_aux_data = new_data; vfree(old_data); return 0; @@ -2152,6 +2158,25 @@ static struct bpf_prog *bpf_patch_insn_data(struct verifier_env *env, u32 off, return new_prog; } +/* The verifier does more data flow analysis than llvm and will not explore + * branches that are dead at run time. Malicious programs can have dead code + * too. Therefore replace all dead at-run-time code with nops. + */ +static void sanitize_dead_code(struct verifier_env *env) +{ + struct bpf_insn_aux_data *aux_data = env->insn_aux_data; + struct bpf_insn nop = BPF_MOV64_REG(BPF_REG_0, BPF_REG_0); + struct bpf_insn *insn = env->prog->insnsi; + const int insn_cnt = env->prog->len; + int i; + + for (i = 0; i < insn_cnt; i++) { + if (aux_data[i].seen) + continue; + memcpy(insn + i, &nop, sizeof(nop)); + } +} + /* convert load instructions that access fields of 'struct __sk_buff' * into sequence of instructions that access fields of 'struct sk_buff' */ @@ -2370,6 +2395,9 @@ skip_full_check: while (pop_stack(env, NULL) >= 0); free_states(env); + if (ret == 0) + sanitize_dead_code(env); + if (ret == 0) /* program is valid, convert *(u32*)(ctx + off) accesses */ ret = convert_ctx_accesses(env); -- 2.15.0.rc0

7 years, 9 months

2
1
0 0

[Linux-stable-mirror] [PATCH 4.4] x86/pti: Make unpoison of pgd for trusted boot work for real

by Hugh Dickins

From: Dave Hansen <dave.hansen(a)linux.intel.com> commit 445b69e3b75e42362a5bdc13c8b8f61599e2228a upstream The inital fix for trusted boot and PTI potentially misses the pgd clearing if pud_alloc() sets a PGD. It probably works in *practice* because for two adjacent calls to map_tboot_page() that share a PGD entry, the first will clear NX, *then* allocate and set the PGD (without NX clear). The second call will *not* allocate but will clear the NX bit. Defer the NX clearing to a point after it is known that all top-level allocations have occurred. Add a comment to clarify why. [ tglx: Massaged changelog ] Fixes: 262b6b30087 ("x86/tboot: Unbreak tboot with PTI enabled") Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Jon Masters <jcm(a)redhat.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: gnomes(a)lxorguk.ukuu.org.uk Cc: peterz(a)infradead.org Cc: ning.sun(a)intel.com Cc: tboot-devel(a)lists.sourceforge.net Cc: andi(a)firstfloor.org Cc: luto(a)kernel.org Cc: law(a)redhat.com Cc: pbonzini(a)redhat.com Cc: torvalds(a)linux-foundation.org Cc: gregkh(a)linux-foundation.org Cc: dwmw(a)amazon.co.uk Cc: nickc(a)redhat.com Cc: stable(a)vger.kernel.org Link: https://lkml.kernel.org/r/20180110224939.2695CD47@viggo.jf.intel.com Cc: Jiri Kosina <jkosina(a)suse.cz> Signed-off-by: Hugh Dickins <hughd(a)google.com> hughd notes: I have not tested tboot, but this looks to me as necessary and as safe in old-Kaiser backports as it is upstream; I'm not submitting the commit-to-be-fixed 262b6b30087, since it was undone by 445b69e3b75e, and makes conflict trouble because of 5-level's p4d versus 4-level's pgd. --- arch/x86/kernel/tboot.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index 91a4496db434..c77ab1f51fbe 100644 --- a/arch/x86/kernel/tboot.c +++ b/arch/x86/kernel/tboot.c @@ -140,6 +140,16 @@ static int map_tboot_page(unsigned long vaddr, unsigned long pfn, return -1; set_pte_at(&tboot_mm, vaddr, pte, pfn_pte(pfn, prot)); pte_unmap(pte); + + /* + * PTI poisons low addresses in the kernel page tables in the + * name of making them unusable for userspace. To execute + * code at such a low address, the poison must be cleared. + * + * Note: 'pgd' actually gets set in pud_alloc(). + */ + pgd->pgd &= ~_PAGE_NX; + return 0; } -- 2.16.0.rc1.238.g530d649a79-goog

7 years, 9 months

2
1
0 0

[Linux-stable-mirror] [PATCH stable 4.4 0/9] BPF stable patches

by Daniel Borkmann

All for 4.4 backported and (limited) testing. Thanks! Alexei Starovoitov (3): bpf: fix bpf_tail_call() x64 JIT bpf: introduce BPF_JIT_ALWAYS_ON config bpf: fix 32-bit divide by zero Daniel Borkmann (4): bpf: fix branch pruning logic bpf: arsh is not supported in 32 bit alu thus reject it bpf: avoid false sharing of map refcount with max_entries bpf: reject stores into ctx via st and xadd Eric Dumazet (2): x86: bpf_jit: small optimization in emit_bpf_tail_call() bpf: fix divides by zero arch/arm64/Kconfig | 1 + arch/s390/Kconfig | 1 + arch/x86/Kconfig | 1 + arch/x86/net/bpf_jit_comp.c | 13 ++++----- include/linux/bpf.h | 16 ++++++++--- init/Kconfig | 7 +++++ kernel/bpf/core.c | 30 ++++++++++++++++--- kernel/bpf/verifier.c | 70 +++++++++++++++++++++++++++++++++++++++++++++ lib/test_bpf.c | 13 +++++---- net/Kconfig | 3 ++ net/core/filter.c | 8 +++++- net/core/sysctl_net_core.c | 6 ++++ net/socket.c | 9 ++++++ 13 files changed, 157 insertions(+), 21 deletions(-) -- 2.9.5

7 years, 9 months

3
21
0 0

[Linux-stable-mirror] Patch "Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: bluetooth-hci_serdev-init-hci_uart-proto_lock-to-avoid-oops.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From d73e172816652772114827abaa2dbc053eecbbd7 Mon Sep 17 00:00:00 2001 From: Lukas Wunner <lukas(a)wunner.de> Date: Fri, 17 Nov 2017 00:54:53 +0100 Subject: Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Lukas Wunner <lukas(a)wunner.de> commit d73e172816652772114827abaa2dbc053eecbbd7 upstream. John Stultz reports a boot time crash with the HiKey board (which uses hci_serdev) occurring in hci_uart_tx_wakeup(). That function is contained in hci_ldisc.c, but also called from the newer hci_serdev.c. It acquires the proto_lock in struct hci_uart and it turns out that we forgot to init the lock in the serdev code path, thus causing the crash. John bisected the crash to commit 67d2f8781b9f ("Bluetooth: hci_ldisc: Allow sleeping while proto locks are held"), but the issue was present before and the commit merely exposed it. (Perhaps by luck, the crash did not occur with rwlocks.) Init the proto_lock in the serdev code path to avoid the oops. Stack trace for posterity: Unable to handle kernel read from unreadable memory at 406f127000 [000000406f127000] user address but active_mm is swapper Internal error: Oops: 96000005 [#1] PREEMPT SMP Hardware name: HiKey Development Board (DT) Call trace: hci_uart_tx_wakeup+0x38/0x148 hci_uart_send_frame+0x28/0x38 hci_send_frame+0x64/0xc0 hci_cmd_work+0x98/0x110 process_one_work+0x134/0x330 worker_thread+0x130/0x468 kthread+0xf8/0x128 ret_from_fork+0x10/0x18 Link: https://lkml.org/lkml/2017/11/15/908 Reported-and-tested-by: John Stultz <john.stultz(a)linaro.org> Cc: Ronald Tschalär <ronald(a)innovation.ch> Cc: Rob Herring <rob.herring(a)linaro.org> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Signed-off-by: Lukas Wunner <lukas(a)wunner.de> Signed-off-by: Marcel Holtmann <marcel(a)holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/bluetooth/hci_serdev.c | 1 + 1 file changed, 1 insertion(+) --- a/drivers/bluetooth/hci_serdev.c +++ b/drivers/bluetooth/hci_serdev.c @@ -304,6 +304,7 @@ int hci_uart_register_device(struct hci_ hci_set_drvdata(hdev, hu); INIT_WORK(&hu->write_work, hci_uart_write_work); + percpu_init_rwsem(&hu->proto_lock); /* Only when vendor specific setup callback is provided, consider * the manufacturer information valid. This avoids filling in the Patches currently in stable-queue which might be from lukas(a)wunner.de are queue-4.14/bluetooth-hci_serdev-init-hci_uart-proto_lock-to-avoid-oops.patch

7 years, 9 months

2
1
0 0

[Linux-stable-mirror] Patch "Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: bluetooth-hci_serdev-init-hci_uart-proto_lock-to-avoid-oops.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From d73e172816652772114827abaa2dbc053eecbbd7 Mon Sep 17 00:00:00 2001 From: Lukas Wunner <lukas(a)wunner.de> Date: Fri, 17 Nov 2017 00:54:53 +0100 Subject: Bluetooth: hci_serdev: Init hci_uart proto_lock to avoid oops MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Lukas Wunner <lukas(a)wunner.de> commit d73e172816652772114827abaa2dbc053eecbbd7 upstream. John Stultz reports a boot time crash with the HiKey board (which uses hci_serdev) occurring in hci_uart_tx_wakeup(). That function is contained in hci_ldisc.c, but also called from the newer hci_serdev.c. It acquires the proto_lock in struct hci_uart and it turns out that we forgot to init the lock in the serdev code path, thus causing the crash. John bisected the crash to commit 67d2f8781b9f ("Bluetooth: hci_ldisc: Allow sleeping while proto locks are held"), but the issue was present before and the commit merely exposed it. (Perhaps by luck, the crash did not occur with rwlocks.) Init the proto_lock in the serdev code path to avoid the oops. Stack trace for posterity: Unable to handle kernel read from unreadable memory at 406f127000 [000000406f127000] user address but active_mm is swapper Internal error: Oops: 96000005 [#1] PREEMPT SMP Hardware name: HiKey Development Board (DT) Call trace: hci_uart_tx_wakeup+0x38/0x148 hci_uart_send_frame+0x28/0x38 hci_send_frame+0x64/0xc0 hci_cmd_work+0x98/0x110 process_one_work+0x134/0x330 worker_thread+0x130/0x468 kthread+0xf8/0x128 ret_from_fork+0x10/0x18 Link: https://lkml.org/lkml/2017/11/15/908 Reported-and-tested-by: John Stultz <john.stultz(a)linaro.org> Cc: Ronald Tschalär <ronald(a)innovation.ch> Cc: Rob Herring <rob.herring(a)linaro.org> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Signed-off-by: Lukas Wunner <lukas(a)wunner.de> Signed-off-by: Marcel Holtmann <marcel(a)holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/bluetooth/hci_serdev.c | 1 + 1 file changed, 1 insertion(+) --- a/drivers/bluetooth/hci_serdev.c +++ b/drivers/bluetooth/hci_serdev.c @@ -303,6 +303,7 @@ int hci_uart_register_device(struct hci_ hci_set_drvdata(hdev, hu); INIT_WORK(&hu->write_work, hci_uart_write_work); + percpu_init_rwsem(&hu->proto_lock); /* Only when vendor specific setup callback is provided, consider * the manufacturer information valid. This avoids filling in the Patches currently in stable-queue which might be from lukas(a)wunner.de are queue-4.15/bluetooth-hci_serdev-init-hci_uart-proto_lock-to-avoid-oops.patch

7 years, 9 months

1
0
0 0

Re: [Linux-stable-mirror] [PATCH v2] xfs: preserve i_rdev when recycling a reclaimable inode

by Amir Goldstein

On Thu, Feb 1, 2018 at 2:27 AM, Amir Goldstein <amir73il(a)gmail.com> wrote: > On Mon, Jan 29, 2018 at 5:50 PM, Darrick J. Wong > <darrick.wong(a)oracle.com> wrote: >> On Mon, Jan 29, 2018 at 01:07:36PM +0200, Amir Goldstein wrote: >>> On Fri, Jan 26, 2018 at 11:44 PM, Darrick J. Wong >>> <darrick.wong(a)oracle.com> wrote: >>> > On Fri, Jan 26, 2018 at 09:44:29AM +0200, Amir Goldstein wrote: >>> >> Commit 66f364649d870 ("xfs: remove if_rdev") moved storing of rdev >>> >> value for special inodes to VFS inodes, but forgot to preserve the >>> >> value of i_rdev when recycling a reclaimable xfs_inode. >>> >> >>> >> This was detected by xfstest overlay/017 with inodex=on mount option >>> >> and xfs base fs. The test does a lookup of overlay chardev and blockdev >>> >> right after drop caches. >>> >> >>> >> Overlayfs inodes hold a reference on underlying xfs inodes when mount >>> >> option index=on is configured. If drop caches reclaim xfs inodes, before >>> >> it relclaims overlayfs inodes, that can sometimes leave a reclaimable xfs >>> >> inode and that test hits that case quite often. >>> >> >>> >> When that happens, the xfs inode cache remains broken (zere i_rdev) >>> >> until the next cycle mount or drop caches. >>> >> >>> >> Fixes: 66f364649d870 ("xfs: remove if_rdev") >>> >> Signed-off-by: Amir Goldstein <amir73il(a)gmail.com> >>> > >>> > Looks ok, >>> > Reviewed-by: Darrick J. Wong <darrick.wong(a)oracle.com> >>> > >>> >>> I recon that now we should now also strap: >>> Cc: <stable(a)vger.kernel.org> #v4.15 >>> >>> Can I assume, you'll add it on apply? >> >> I'll do a proper backport of this and a couple other critical cow >> fixes after I get the 4.16 stuff merged. >> > > I am not sure what "proper backport" means in the context of > this patch. > This is a v4.15-rc1 regression fix that is based on v4.15-rc8. > It applied cleanly on v4.15. > > CC'ing stable for attention. > > This patch is now in master, but due to its timing it did not > get the CC: stable tag. > Now really CC stable. Amir.

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] [patch 116/119] mm, memory_hotplug: fix memmap initialization

by akpm＠linux-foundation.org

From: Michal Hocko <mhocko(a)suse.com> Subject: mm, memory_hotplug: fix memmap initialization Bharata has noticed that onlining a newly added memory doesn't increase the total memory, pointing to f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") as a culprit. This commit has changed the way how the memory for memmaps is initialized and moves it from the allocation time to the initialization time. This works properly for the early memmap init path. It doesn't work for the memory hotplug though because we need to mark page as reserved when the sparsemem section is created and later initialize it completely during onlining. memmap_init_zone is called in the early stage of onlining. With the current code it calls __init_single_page and as such it clears up the whole stage and therefore online_pages_range skips those pages. Fix this by skipping mm_zero_struct_page in __init_single_page for memory hotplug path. This is quite uggly but unifying both early init and memory hotplug init paths is a large project. Make sure we plug the regression at least. Link: http://lkml.kernel.org/r/20180130101141.GW21609@dhcp22.suse.cz Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Michal Hocko <mhocko(a)suse.com> Reported-by: Bharata B Rao <bharata(a)linux.vnet.ibm.com> Tested-by: Bharata B Rao <bharata(a)linux.vnet.ibm.com> Reviewed-by: Pavel Tatashin <pasha.tatashin(a)oracle.com> Cc: Steven Sistare <steven.sistare(a)oracle.com> Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com> Cc: Bob Picco <bob.picco(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff -puN mm/page_alloc.c~mm-memory_hotplug-fix-memmap-initialization mm/page_alloc.c --- a/mm/page_alloc.c~mm-memory_hotplug-fix-memmap-initialization +++ a/mm/page_alloc.c @@ -1177,9 +1177,10 @@ static void free_one_page(struct zone *z } static void __meminit __init_single_page(struct page *page, unsigned long pfn, - unsigned long zone, int nid) + unsigned long zone, int nid, bool zero) { - mm_zero_struct_page(page); + if (zero) + mm_zero_struct_page(page); set_page_links(page, zone, nid, pfn); init_page_count(page); page_mapcount_reset(page); @@ -1194,9 +1195,9 @@ static void __meminit __init_single_page } static void __meminit __init_single_pfn(unsigned long pfn, unsigned long zone, - int nid) + int nid, bool zero) { - return __init_single_page(pfn_to_page(pfn), pfn, zone, nid); + return __init_single_page(pfn_to_page(pfn), pfn, zone, nid, zero); } #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT @@ -1217,7 +1218,7 @@ static void __meminit init_reserved_page if (pfn >= zone->zone_start_pfn && pfn < zone_end_pfn(zone)) break; } - __init_single_pfn(pfn, zid, nid); + __init_single_pfn(pfn, zid, nid, true); } #else static inline void init_reserved_page(unsigned long pfn) @@ -1534,7 +1535,7 @@ static unsigned long __init deferred_in } else { page++; } - __init_single_page(page, pfn, zid, nid); + __init_single_page(page, pfn, zid, nid, true); nr_pages++; } return (nr_pages); @@ -5399,15 +5400,20 @@ not_early: * can be created for invalid pages (for alignment) * check here not to call set_pageblock_migratetype() against * pfn out of zone. + * + * Please note that MEMMAP_HOTPLUG path doesn't clear memmap + * because this is done early in sparse_add_one_section */ if (!(pfn & (pageblock_nr_pages - 1))) { struct page *page = pfn_to_page(pfn); - __init_single_page(page, pfn, zone, nid); + __init_single_page(page, pfn, zone, nid, + context != MEMMAP_HOTPLUG); set_pageblock_migratetype(page, MIGRATE_MOVABLE); cond_resched(); } else { - __init_single_pfn(pfn, zone, nid); + __init_single_pfn(pfn, zone, nid, + context != MEMMAP_HOTPLUG); } } } _

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] [patch 013/119] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

by akpm＠linux-foundation.org

From: Gang He <ghe(a)suse.com> Subject: ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE If we can't get inode lock immediately in the function ocfs2_inode_lock_with_page() when reading a page, we should not return directly here, since this will lead to a softlockup problem when the kernel is configured with CONFIG_PREEMPT is not set. The method is to get a blocking lock and immediately unlock before returning, this can avoid CPU resource waste due to lots of retries, and benefits fairness in getting lock among multiple nodes, increase efficiency in case modifying the same file frequently from multiple nodes. The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1) looks like: Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <IRQ> dump_stack+0x5c/0x82 panic+0xd5/0x21e watchdog_timer_fn+0x208/0x210 ? watchdog_park_threads+0x70/0x70 __hrtimer_run_queues+0xcc/0x200 hrtimer_interrupt+0xa6/0x1f0 smp_apic_timer_interrupt+0x34/0x50 apic_timer_interrupt+0x96/0xa0 </IRQ> RIP: 0010:unlock_page+0x17/0x30 RSP: 0000:ffffaf154080bc88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: dead000000000100 RBX: fffff21e009f5300 RCX: 0000000000000004 RDX: dead0000000000ff RSI: 0000000000000202 RDI: fffff21e009f5300 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaf154080bb00 R10: ffffaf154080bc30 R11: 0000000000000040 R12: ffff993749a39518 R13: 0000000000000000 R14: fffff21e009f5300 R15: fffff21e009f5300 ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2] ocfs2_readpage+0x41/0x2d0 [ocfs2] ? pagecache_get_page+0x30/0x200 filemap_fault+0x12b/0x5c0 ? recalc_sigpending+0x17/0x50 ? __set_task_blocked+0x28/0x70 ? __set_current_blocked+0x3d/0x60 ocfs2_fault+0x29/0xb0 [ocfs2] __do_fault+0x1a/0xa0 __handle_mm_fault+0xbe8/0x1090 handle_mm_fault+0xaa/0x1f0 __do_page_fault+0x235/0x4b0 trace_do_page_fault+0x3c/0x110 async_page_fault+0x28/0x30 RIP: 0033:0x7fa75ded638e RSP: 002b:00007ffd6657db18 EFLAGS: 00010287 RAX: 000055c7662fb700 RBX: 0000000000000001 RCX: 000055c7662fb700 RDX: 0000000000001770 RSI: 00007fa75e909000 RDI: 000055c7662fb700 RBP: 0000000000000003 R08: 000000000000000e R09: 0000000000000000 R10: 0000000000000483 R11: 00007fa75ded61b0 R12: 00007fa75e90a770 R13: 000000000000000e R14: 0000000000001770 R15: 0000000000000000 About performance improvement, we can see the testing time is reduced, and CPU utilization decreases, the detailed data is as follows. I ran multi_mmap test case in ocfs2-test package in a three nodes cluster. Before applying this patch: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2754 ocfs2te+ 20 0 170248 6980 4856 D 80.73 0.341 0:18.71 multi_mmap 1505 root rt 0 222236 123060 97224 S 2.658 6.015 0:01.44 corosync 5 root 20 0 0 0 0 S 1.329 0.000 0:00.19 kworker/u8:0 95 root 20 0 0 0 0 S 1.329 0.000 0:00.25 kworker/u8:1 2728 root 20 0 0 0 0 S 0.997 0.000 0:00.24 jbd2/sda1-33 2721 root 20 0 0 0 0 S 0.664 0.000 0:00.07 ocfs2dc-3C8CFD4 2750 ocfs2te+ 20 0 142976 4652 3532 S 0.664 0.227 0:00.28 mpirun ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 14:44:52 CST 2017 multi_mmap..................................................Passed. Runtime 783 seconds. After apply this patch: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2508 ocfs2te+ 20 0 170248 6804 4680 R 54.00 0.333 0:55.37 multi_mmap 155 root 20 0 0 0 0 S 2.667 0.000 0:01.20 kworker/u8:3 95 root 20 0 0 0 0 S 2.000 0.000 0:01.58 kworker/u8:1 2504 ocfs2te+ 20 0 142976 4604 3480 R 1.667 0.225 0:01.65 mpirun 5 root 20 0 0 0 0 S 1.000 0.000 0:01.36 kworker/u8:0 2482 root 20 0 0 0 0 S 1.000 0.000 0:00.86 jbd2/sda1-33 299 root 0 -20 0 0 0 S 0.333 0.000 0:00.13 kworker/2:1H 335 root 0 -20 0 0 0 S 0.333 0.000 0:00.17 kworker/1:1H 535 root 20 0 12140 7268 1456 S 0.333 0.355 0:00.34 haveged 1282 root rt 0 222284 123108 97224 S 0.333 6.017 0:01.33 corosync ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 15:04:12 CST 2017 multi_mmap..................................................Passed. Runtime 487 seconds. Link: http://lkml.kernel.org/r/1514447305-30814-1-git-send-email-ghe@suse.com Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock") Signed-off-by: Gang He <ghe(a)suse.com> Reviewed-by: Eric Ren <zren(a)suse.com> Acked-by: alex chen <alex.chen(a)huawei.com> Acked-by: piaojun <piaojun(a)huawei.com> Cc: Mark Fasheh <mfasheh(a)versity.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Cc: Joseph Qi <jiangqi903(a)gmail.com> Cc: Changwei Ge <ge.changwei(a)h3c.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/ocfs2/dlmglue.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff -puN fs/ocfs2/dlmglue.c~ocfs2-try-a-blocking-lock-before-return-aop_truncated_page fs/ocfs2/dlmglue.c --- a/fs/ocfs2/dlmglue.c~ocfs2-try-a-blocking-lock-before-return-aop_truncated_page +++ a/fs/ocfs2/dlmglue.c @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct in ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK); if (ret == -EAGAIN) { unlock_page(page); + /* + * If we can't get inode lock immediately, we should not return + * directly here, since this will lead to a softlockup problem. + * The method is to get a blocking lock and immediately unlock + * before returning, this can avoid CPU resource waste due to + * lots of retries, and benefits fairness in getting lock. + */ + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0) + ocfs2_inode_unlock(inode, ex); ret = AOP_TRUNCATED_PAGE; } _

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] v4.14.16 build: 0 failures 0 warnings (v4.14.16)

by Build bot for Mark Brown

Tree/Branch: v4.14.16 Git describe: v4.14.16 Commit: 6c70076667 Linux 4.14.16 Build Time: 109 min 4 sec Passed: 10 / 10 (100.00 %) Failed: 0 / 10 ( 0.00 %) Errors: 0 Warnings: 0 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): ------------------------------------------------------------------------------- =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm64-allmodconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm-multi_v4t_defconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] [PATCH v2 1/2] block: Add a third argument to blk_alloc_queue_node()

by Bart Van Assche

This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Joseph Qi <joseph.qi(a)linux.alibaba.com> Cc: Philipp Reisner <philipp.reisner(a)linbit.com> Cc: Ulf Hansson <ulf.hansson(a)linaro.org> Cc: Kees Cook <keescook(a)chromium.org> Cc: <stable(a)vger.kernel.org> --- block/blk-core.c | 7 ++++--- block/blk-mq.c | 2 +- drivers/block/null_blk.c | 3 ++- drivers/ide/ide-probe.c | 2 +- drivers/lightnvm/core.c | 2 +- drivers/md/dm.c | 2 +- drivers/nvdimm/pmem.c | 2 +- drivers/nvme/host/multipath.c | 2 +- drivers/scsi/scsi_lib.c | 2 +- include/linux/blkdev.h | 3 ++- 10 files changed, 15 insertions(+), 12 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index bd43bc50740a..860a039fd1a8 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -868,7 +868,7 @@ void blk_exit_rl(struct request_queue *q, struct request_list *rl) struct request_queue *blk_alloc_queue(gfp_t gfp_mask) { - return blk_alloc_queue_node(gfp_mask, NUMA_NO_NODE); + return blk_alloc_queue_node(gfp_mask, NUMA_NO_NODE, NULL); } EXPORT_SYMBOL(blk_alloc_queue); @@ -946,7 +946,8 @@ static void blk_rq_timed_out_timer(struct timer_list *t) kblockd_schedule_work(&q->timeout_work); } -struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) +struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id, + spinlock_t *lock) { struct request_queue *q; @@ -1088,7 +1089,7 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) { struct request_queue *q; - q = blk_alloc_queue_node(GFP_KERNEL, node_id); + q = blk_alloc_queue_node(GFP_KERNEL, node_id, NULL); if (!q) return NULL; diff --git a/block/blk-mq.c b/block/blk-mq.c index aacc5280e25f..8191391d1a1d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2554,7 +2554,7 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set) { struct request_queue *uninit_q, *q; - uninit_q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node); + uninit_q = blk_alloc_queue_node(GFP_KERNEL, set->numa_node, NULL); if (!uninit_q) return ERR_PTR(-ENOMEM); diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index aa1c7d4bcac5..9fe8f2a3ec45 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -1717,7 +1717,8 @@ static int null_add_dev(struct nullb_device *dev) } null_init_queues(nullb); } else if (dev->queue_mode == NULL_Q_BIO) { - nullb->q = blk_alloc_queue_node(GFP_KERNEL, dev->home_node); + nullb->q = blk_alloc_queue_node(GFP_KERNEL, dev->home_node, + NULL); if (!nullb->q) { rv = -ENOMEM; goto out_cleanup_queues; diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c index 70d6d8ff0fd9..1303d0e31e80 100644 --- a/drivers/ide/ide-probe.c +++ b/drivers/ide/ide-probe.c @@ -766,7 +766,7 @@ static int ide_init_queue(ide_drive_t *drive) * limits and LBA48 we could raise it but as yet * do not. */ - q = blk_alloc_queue_node(GFP_KERNEL, hwif_to_node(hwif)); + q = blk_alloc_queue_node(GFP_KERNEL, hwif_to_node(hwif), NULL); if (!q) return 1; diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c index dcc9e621e651..5f1988df1593 100644 --- a/drivers/lightnvm/core.c +++ b/drivers/lightnvm/core.c @@ -384,7 +384,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) goto err_dev; } - tqueue = blk_alloc_queue_node(GFP_KERNEL, dev->q->node); + tqueue = blk_alloc_queue_node(GFP_KERNEL, dev->q->node, NULL); if (!tqueue) { ret = -ENOMEM; goto err_disk; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 5c478c185041..93f3ef15b4b2 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1731,7 +1731,7 @@ static struct mapped_device *alloc_dev(int minor) INIT_LIST_HEAD(&md->table_devices); spin_lock_init(&md->uevent_lock); - md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id); + md->queue = blk_alloc_queue_node(GFP_KERNEL, numa_node_id, NULL); if (!md->queue) goto bad; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 49285701fe48..118b4b13592d 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -342,7 +342,7 @@ static int pmem_attach_disk(struct device *dev, return -EBUSY; } - q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev)); + q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev), NULL); if (!q) return -ENOMEM; diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 605f53376e94..4e362949721a 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -162,7 +162,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head) if (!(ctrl->subsys->cmic & (1 << 1)) || !multipath) return 0; - q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE); + q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE, NULL); if (!q) goto out; q->queuedata = head; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 3fcf5c7c7917..799d02615e71 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2206,7 +2206,7 @@ struct request_queue *scsi_old_alloc_queue(struct scsi_device *sdev) struct Scsi_Host *shost = sdev->host; struct request_queue *q; - q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE); + q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE, NULL); if (!q) return NULL; q->cmd_size = sizeof(struct scsi_cmnd) + shost->hostt->cmd_size; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 781992c4124e..f1f3ad6419f1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1330,7 +1330,8 @@ extern long nr_blockdev_pages(void); bool __must_check blk_get_queue(struct request_queue *); struct request_queue *blk_alloc_queue(gfp_t); -struct request_queue *blk_alloc_queue_node(gfp_t, int); +struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id, + spinlock_t *lock); extern void blk_put_queue(struct request_queue *); extern void blk_set_queue_dying(struct request_queue *); -- 2.16.0

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] [PATCH 3/3] KVM: VMX: make MSR bitmaps per-VCPU

by Paolo Bonzini

Place the MSR bitmap in struct loaded_vmcs, and update it in place every time the x2apic or APICv state can change. This is rare and the loop can handle 64 MSRs per iteration, in a similar fashion as nested_vmx_prepare_msr_bitmap. This prepares for choosing, on a per-VM basis, whether to intercept the SPEC_CTRL and PRED_CMD MSRs. Cc: stable(a)vger.kernel.org # prereq for Spectre mitigation Suggested-by: Jim Mattson <jmattson(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> --- arch/x86/kvm/vmx.c | 270 +++++++++++++++++++++++++++++------------------------ 1 file changed, 147 insertions(+), 123 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ab4b9bc99a52..896af99a8606 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -111,6 +111,14 @@ static u64 __read_mostly host_xss; static bool __read_mostly enable_pml = 1; module_param_named(pml, enable_pml, bool, S_IRUGO); +#define MSR_TYPE_R 1 +#define MSR_TYPE_W 2 +#define MSR_TYPE_RW 3 + +#define MSR_BITMAP_MODE_X2APIC 1 +#define MSR_BITMAP_MODE_X2APIC_APICV 2 +#define MSR_BITMAP_MODE_LM 4 + #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL /* Guest_tsc -> host_tsc conversion requires 64-bit division. */ @@ -209,6 +217,7 @@ struct loaded_vmcs { int soft_vnmi_blocked; ktime_t entry_time; s64 vnmi_blocked_time; + unsigned long *msr_bitmap; struct list_head loaded_vmcss_on_cpu_link; }; @@ -449,8 +458,6 @@ struct nested_vmx { bool pi_pending; u16 posted_intr_nv; - unsigned long *msr_bitmap; - struct hrtimer preemption_timer; bool preemption_timer_expired; @@ -573,6 +580,7 @@ struct vcpu_vmx { struct kvm_vcpu vcpu; unsigned long host_rsp; u8 fail; + u8 msr_bitmap_mode; u32 exit_intr_info; u32 idt_vectoring_info; ulong rflags; @@ -927,6 +935,7 @@ static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu); static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -946,12 +955,6 @@ static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock); enum { VMX_IO_BITMAP_A, VMX_IO_BITMAP_B, - VMX_MSR_BITMAP_LEGACY, - VMX_MSR_BITMAP_LONGMODE, - VMX_MSR_BITMAP_LEGACY_X2APIC_APICV, - VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV, - VMX_MSR_BITMAP_LEGACY_X2APIC, - VMX_MSR_BITMAP_LONGMODE_X2APIC, VMX_VMREAD_BITMAP, VMX_VMWRITE_BITMAP, VMX_BITMAP_NR @@ -961,12 +964,6 @@ static unsigned long *vmx_bitmap[VMX_BITMAP_NR]; #define vmx_io_bitmap_a (vmx_bitmap[VMX_IO_BITMAP_A]) #define vmx_io_bitmap_b (vmx_bitmap[VMX_IO_BITMAP_B]) -#define vmx_msr_bitmap_legacy (vmx_bitmap[VMX_MSR_BITMAP_LEGACY]) -#define vmx_msr_bitmap_longmode (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE]) -#define vmx_msr_bitmap_legacy_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC_APICV]) -#define vmx_msr_bitmap_longmode_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV]) -#define vmx_msr_bitmap_legacy_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC]) -#define vmx_msr_bitmap_longmode_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC]) #define vmx_vmread_bitmap (vmx_bitmap[VMX_VMREAD_BITMAP]) #define vmx_vmwrite_bitmap (vmx_bitmap[VMX_VMWRITE_BITMAP]) @@ -2564,36 +2561,6 @@ static void move_msr_up(struct vcpu_vmx *vmx, int from, int to) vmx->guest_msrs[from] = tmp; } -static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu) -{ - unsigned long *msr_bitmap; - - if (is_guest_mode(vcpu)) - msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap; - else if (cpu_has_secondary_exec_ctrls() && - (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & - SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { - if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode_x2apic_apicv; - else - msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv; - } else { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode_x2apic; - else - msr_bitmap = vmx_msr_bitmap_legacy_x2apic; - } - } else { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode; - else - msr_bitmap = vmx_msr_bitmap_legacy; - } - - vmcs_write64(MSR_BITMAP, __pa(msr_bitmap)); -} - /* * Set up the vmcs to automatically save and restore system * msrs. Don't touch the 64-bit msrs if the guest is in legacy @@ -2634,7 +2601,7 @@ static void setup_msrs(struct vcpu_vmx *vmx) vmx->save_nmsrs = save_nmsrs; if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(&vmx->vcpu); + vmx_update_msr_bitmap(&vmx->vcpu); } /* @@ -3844,6 +3811,8 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) loaded_vmcs_clear(loaded_vmcs); free_vmcs(loaded_vmcs->vmcs); loaded_vmcs->vmcs = NULL; + if (loaded_vmcs->msr_bitmap) + free_page((unsigned long)loaded_vmcs->msr_bitmap); WARN_ON(loaded_vmcs->shadow_vmcs != NULL); } @@ -3860,7 +3829,18 @@ static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) loaded_vmcs->shadow_vmcs = NULL; loaded_vmcs_init(loaded_vmcs); + + if (cpu_has_vmx_msr_bitmap()) { + loaded_vmcs->msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); + if (!loaded_vmcs->msr_bitmap) + goto out_vmcs; + memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); + } return 0; + +out_vmcs: + free_loaded_vmcs(loaded_vmcs); + return -ENOMEM; } static void free_kvm_area(void) @@ -4921,10 +4901,8 @@ static void free_vpid(int vpid) spin_unlock(&vmx_vpid_lock); } -#define MSR_TYPE_R 1 -#define MSR_TYPE_W 2 -static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, - u32 msr, int type) +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type) { int f = sizeof(unsigned long); @@ -4958,6 +4936,50 @@ static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, } } +static void __always_inline vmx_enable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type) +{ + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return; + + /* + * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals + * have the write-low and read-high bitmap offsets the wrong way round. + * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff. + */ + if (msr <= 0x1fff) { + if (type & MSR_TYPE_R) + /* read-low */ + __set_bit(msr, msr_bitmap + 0x000 / f); + + if (type & MSR_TYPE_W) + /* write-low */ + __set_bit(msr, msr_bitmap + 0x800 / f); + + } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { + msr &= 0x1fff; + if (type & MSR_TYPE_R) + /* read-high */ + __set_bit(msr, msr_bitmap + 0x400 / f); + + if (type & MSR_TYPE_W) + /* write-high */ + __set_bit(msr, msr_bitmap + 0xc00 / f); + + } +} + +static void __always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type, bool value) +{ + if (value) + vmx_enable_intercept_for_msr(msr_bitmap, msr, type); + else + vmx_disable_intercept_for_msr(msr_bitmap, msr, type); +} + /* * If a msr is allowed by L0, we should check whether it is allowed by L1. * The corresponding bit will be cleared unless both of L0 and L1 allow it. @@ -5004,30 +5026,70 @@ static void nested_vmx_disable_intercept_for_msr(unsigned long *msr_bitmap_l1, } } -static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only) +static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu) { - if (!longmode_only) - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, - msr, MSR_TYPE_R | MSR_TYPE_W); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, - msr, MSR_TYPE_R | MSR_TYPE_W); + u8 mode = 0; + + if (cpu_has_secondary_exec_ctrls() && + (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & + SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { + mode |= MSR_BITMAP_MODE_X2APIC; + if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) + mode |= MSR_BITMAP_MODE_X2APIC_APICV; + } + + if (is_long_mode(vcpu)) + mode |= MSR_BITMAP_MODE_LM; + + return mode; } -static void vmx_disable_intercept_msr_x2apic(u32 msr, int type, bool apicv_active) +#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4)) + +static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap, + u8 mode) { - if (apicv_active) { - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv, - msr, type); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv, - msr, type); - } else { - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, - msr, type); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, - msr, type); + int msr; + + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { + unsigned word = msr / BITS_PER_LONG; + msr_bitmap[word] = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0; + msr_bitmap[word + (0x800 / sizeof(long))] = ~0; + } + + if (mode & MSR_BITMAP_MODE_X2APIC) { + /* + * TPR reads and writes can be virtualized even if virtual interrupt + * delivery is not in use. + */ + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW); + if (mode & MSR_BITMAP_MODE_X2APIC_APICV) { + vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R); + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W); + } } } +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap; + u8 mode = vmx_msr_bitmap_mode(vcpu); + u8 changed = mode ^ vmx->msr_bitmap_mode; + + if (!changed) + return; + + vmx_set_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW, + !(mode & MSR_BITMAP_MODE_LM)); + + if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV)) + vmx_update_msr_bitmap_x2apic(msr_bitmap, mode); + + vmx->msr_bitmap_mode = mode; +} + static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu) { return enable_apicv; @@ -5277,7 +5339,7 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu) } if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); } static u32 vmx_exec_control(struct vcpu_vmx *vmx) @@ -5464,7 +5526,7 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx) vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap)); } if (cpu_has_vmx_msr_bitmap()) - vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy)); + vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap)); vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */ @@ -6747,7 +6809,7 @@ void vmx_enable_tdp(void) static __init int hardware_setup(void) { - int r = -ENOMEM, i, msr; + int r = -ENOMEM, i; rdmsrl_safe(MSR_EFER, &host_efer); @@ -6767,9 +6829,6 @@ static __init int hardware_setup(void) memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE); - memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE); - memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE); - if (setup_vmcs_config(&vmcs_config) < 0) { r = -EIO; goto out; @@ -6838,42 +6897,8 @@ static __init int hardware_setup(void) kvm_tsc_scaling_ratio_frac_bits = 48; } - vmx_disable_intercept_for_msr(MSR_FS_BASE, false); - vmx_disable_intercept_for_msr(MSR_GS_BASE, false); - vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); - - memcpy(vmx_msr_bitmap_legacy_x2apic_apicv, - vmx_msr_bitmap_legacy, PAGE_SIZE); - memcpy(vmx_msr_bitmap_longmode_x2apic_apicv, - vmx_msr_bitmap_longmode, PAGE_SIZE); - memcpy(vmx_msr_bitmap_legacy_x2apic, - vmx_msr_bitmap_legacy, PAGE_SIZE); - memcpy(vmx_msr_bitmap_longmode_x2apic, - vmx_msr_bitmap_longmode, PAGE_SIZE); - set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ - for (msr = 0x800; msr <= 0x8ff; msr++) { - if (msr == 0x839 /* TMCCT */) - continue; - vmx_disable_intercept_msr_x2apic(msr, MSR_TYPE_R, true); - } - - /* - * TPR reads and writes can be virtualized even if virtual interrupt - * delivery is not in use. - */ - vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_W, true); - vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_R | MSR_TYPE_W, false); - - /* EOI */ - vmx_disable_intercept_msr_x2apic(0x80b, MSR_TYPE_W, true); - /* SELF-IPI */ - vmx_disable_intercept_msr_x2apic(0x83f, MSR_TYPE_W, true); - if (enable_ept) vmx_enable_tdp(); else @@ -7162,13 +7187,6 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) if (r < 0) goto out_vmcs02; - if (cpu_has_vmx_msr_bitmap()) { - vmx->nested.msr_bitmap = - (unsigned long *)__get_free_page(GFP_KERNEL); - if (!vmx->nested.msr_bitmap) - goto out_msr_bitmap; - } - vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL); if (!vmx->nested.cached_vmcs12) goto out_cached_vmcs12; @@ -7195,9 +7213,6 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) kfree(vmx->nested.cached_vmcs12); out_cached_vmcs12: - free_page((unsigned long)vmx->nested.msr_bitmap); - -out_msr_bitmap: free_loaded_vmcs(&vmx->nested.vmcs02); out_vmcs02: @@ -7343,10 +7358,6 @@ static void free_nested(struct vcpu_vmx *vmx) free_vpid(vmx->nested.vpid02); vmx->nested.posted_intr_nv = -1; vmx->nested.current_vmptr = -1ull; - if (vmx->nested.msr_bitmap) { - free_page((unsigned long)vmx->nested.msr_bitmap); - vmx->nested.msr_bitmap = NULL; - } if (enable_shadow_vmcs) { vmx_disable_shadow_vmcs(vmx); vmcs_clear(vmx->vmcs01.shadow_vmcs); @@ -8862,7 +8873,7 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set) } vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control); - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); } static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa) @@ -9523,6 +9534,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) { int err; struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); + unsigned long *msr_bitmap; int cpu; if (!vmx) @@ -9559,6 +9571,15 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (err < 0) goto free_msrs; + msr_bitmap = vmx->vmcs01.msr_bitmap; + vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); + vmx->msr_bitmap_mode = 0; + vmx->loaded_vmcs = &vmx->vmcs01; cpu = get_cpu(); vmx_vcpu_load(&vmx->vcpu, cpu); @@ -10022,7 +10043,7 @@ static inline bool nested_vmx_merge_msr_bitmap(struct kvm_vcpu *vcpu, int msr; struct page *page; unsigned long *msr_bitmap_l1; - unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; /* This shortcut is ok because we support only x2APIC MSRs so far. */ if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) @@ -10599,6 +10620,9 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (kvm_has_tsc_control) decache_tsc_multiplier(vmx); + if (cpu_has_vmx_msr_bitmap()) + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); + if (enable_vpid) { /* * There is no direct mapping between vpid02 and vpid12, the @@ -11397,7 +11421,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, vmcs_write64(GUEST_IA32_DEBUGCTL, 0); if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr, vmcs12->vm_exit_msr_load_count)) -- 2.14.3

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] [PATCH 2/3] KVM: VMX: introduce alloc_loaded_vmcs

by Paolo Bonzini

Group together the calls to alloc_vmcs and loaded_vmcs_init. Soon we'll also allocate an MSR bitmap there. Cc: stable(a)vger.kernel.org # prereq for Spectre mitigation Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> --- arch/x86/kvm/vmx.c | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index ad6a883b7a32..ab4b9bc99a52 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3829,11 +3829,6 @@ static struct vmcs *alloc_vmcs_cpu(int cpu) return vmcs; } -static struct vmcs *alloc_vmcs(void) -{ - return alloc_vmcs_cpu(raw_smp_processor_id()); -} - static void free_vmcs(struct vmcs *vmcs) { free_pages((unsigned long)vmcs, vmcs_config.order); @@ -3852,6 +3847,22 @@ static void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) WARN_ON(loaded_vmcs->shadow_vmcs != NULL); } +static struct vmcs *alloc_vmcs(void) +{ + return alloc_vmcs_cpu(raw_smp_processor_id()); +} + +static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) +{ + loaded_vmcs->vmcs = alloc_vmcs(); + if (!loaded_vmcs->vmcs) + return -ENOMEM; + + loaded_vmcs->shadow_vmcs = NULL; + loaded_vmcs_init(loaded_vmcs); + return 0; +} + static void free_kvm_area(void) { int cpu; @@ -7145,12 +7156,11 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs *shadow_vmcs; + int r; - vmx->nested.vmcs02.vmcs = alloc_vmcs(); - vmx->nested.vmcs02.shadow_vmcs = NULL; - if (!vmx->nested.vmcs02.vmcs) + r = alloc_loaded_vmcs(&vmx->nested.vmcs02); + if (r < 0) goto out_vmcs02; - loaded_vmcs_init(&vmx->nested.vmcs02); if (cpu_has_vmx_msr_bitmap()) { vmx->nested.msr_bitmap = @@ -9545,13 +9555,11 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) if (!vmx->guest_msrs) goto free_pml; - vmx->loaded_vmcs = &vmx->vmcs01; - vmx->loaded_vmcs->vmcs = alloc_vmcs(); - vmx->loaded_vmcs->shadow_vmcs = NULL; - if (!vmx->loaded_vmcs->vmcs) + err = alloc_loaded_vmcs(&vmx->vmcs01); + if (err < 0) goto free_msrs; - loaded_vmcs_init(vmx->loaded_vmcs); + vmx->loaded_vmcs = &vmx->vmcs01; cpu = get_cpu(); vmx_vcpu_load(&vmx->vcpu, cpu); vmx->vcpu.cpu = cpu; -- 2.14.3

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] [PATCH 1/3] KVM: nVMX: Eliminate vmcs02 pool

by Paolo Bonzini

From: Jim Mattson <jmattson(a)google.com> The potential performance advantages of a vmcs02 pool have never been realized. To simplify the code, eliminate the pool. Instead, a single vmcs02 is allocated per VCPU when the VCPU enters VMX operation. Cc: stable(a)vger.kernel.org # prereq for Spectre mitigation Signed-off-by: Jim Mattson <jmattson(a)google.com> Signed-off-by: Mark Kanda <mark.kanda(a)oracle.com> Reviewed-by: Ameya More <ameya.more(a)oracle.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com> Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com> --- arch/x86/kvm/vmx.c | 146 +++++++++-------------------------------------------- 1 file changed, 23 insertions(+), 123 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c829d89e2e63..ad6a883b7a32 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -185,7 +185,6 @@ module_param(ple_window_max, int, S_IRUGO); extern const ulong vmx_return; #define NR_AUTOLOAD_MSRS 8 -#define VMCS02_POOL_SIZE 1 struct vmcs { u32 revision_id; @@ -226,7 +225,7 @@ struct shared_msr_entry { * stored in guest memory specified by VMPTRLD, but is opaque to the guest, * which must access it using VMREAD/VMWRITE/VMCLEAR instructions. * More than one of these structures may exist, if L1 runs multiple L2 guests. - * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the + * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the * underlying hardware which will be used to run L2. * This structure is packed to ensure that its layout is identical across * machines (necessary for live migration). @@ -409,13 +408,6 @@ struct __packed vmcs12 { */ #define VMCS12_SIZE 0x1000 -/* Used to remember the last vmcs02 used for some recently used vmcs12s */ -struct vmcs02_list { - struct list_head list; - gpa_t vmptr; - struct loaded_vmcs vmcs02; -}; - /* * The nested_vmx structure is part of vcpu_vmx, and holds information we need * for correct emulation of VMX (i.e., nested VMX) on this vcpu. @@ -440,15 +432,15 @@ struct nested_vmx { */ bool sync_shadow_vmcs; - /* vmcs02_list cache of VMCSs recently used to run L2 guests */ - struct list_head vmcs02_pool; - int vmcs02_num; bool change_vmcs01_virtual_x2apic_mode; /* L2 must run next, and mustn't decide to exit to L1. */ bool nested_run_pending; + + struct loaded_vmcs vmcs02; + /* - * Guest pages referred to in vmcs02 with host-physical pointers, so - * we must keep them pinned while L2 runs. + * Guest pages referred to in the vmcs02 with host-physical + * pointers, so we must keep them pinned while L2 runs. */ struct page *apic_access_page; struct page *virtual_apic_page; @@ -6973,94 +6965,6 @@ static int handle_monitor(struct kvm_vcpu *vcpu) return handle_nop(vcpu); } -/* - * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12. - * We could reuse a single VMCS for all the L2 guests, but we also want the - * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this - * allows keeping them loaded on the processor, and in the future will allow - * optimizations where prepare_vmcs02 doesn't need to set all the fields on - * every entry if they never change. - * So we keep, in vmx->nested.vmcs02_pool, a cache of size VMCS02_POOL_SIZE - * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first. - * - * The following functions allocate and free a vmcs02 in this pool. - */ - -/* Get a VMCS from the pool to use as vmcs02 for the current vmcs12. */ -static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx) -{ - struct vmcs02_list *item; - list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) - if (item->vmptr == vmx->nested.current_vmptr) { - list_move(&item->list, &vmx->nested.vmcs02_pool); - return &item->vmcs02; - } - - if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) { - /* Recycle the least recently used VMCS. */ - item = list_last_entry(&vmx->nested.vmcs02_pool, - struct vmcs02_list, list); - item->vmptr = vmx->nested.current_vmptr; - list_move(&item->list, &vmx->nested.vmcs02_pool); - return &item->vmcs02; - } - - /* Create a new VMCS */ - item = kzalloc(sizeof(struct vmcs02_list), GFP_KERNEL); - if (!item) - return NULL; - item->vmcs02.vmcs = alloc_vmcs(); - item->vmcs02.shadow_vmcs = NULL; - if (!item->vmcs02.vmcs) { - kfree(item); - return NULL; - } - loaded_vmcs_init(&item->vmcs02); - item->vmptr = vmx->nested.current_vmptr; - list_add(&(item->list), &(vmx->nested.vmcs02_pool)); - vmx->nested.vmcs02_num++; - return &item->vmcs02; -} - -/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */ -static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr) -{ - struct vmcs02_list *item; - list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) - if (item->vmptr == vmptr) { - free_loaded_vmcs(&item->vmcs02); - list_del(&item->list); - kfree(item); - vmx->nested.vmcs02_num--; - return; - } -} - -/* - * Free all VMCSs saved for this vcpu, except the one pointed by - * vmx->loaded_vmcs. We must be running L1, so vmx->loaded_vmcs - * must be &vmx->vmcs01. - */ -static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx) -{ - struct vmcs02_list *item, *n; - - WARN_ON(vmx->loaded_vmcs != &vmx->vmcs01); - list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) { - /* - * Something will leak if the above WARN triggers. Better than - * a use-after-free. - */ - if (vmx->loaded_vmcs == &item->vmcs02) - continue; - - free_loaded_vmcs(&item->vmcs02); - list_del(&item->list); - kfree(item); - vmx->nested.vmcs02_num--; - } -} - /* * The following 3 functions, nested_vmx_succeed()/failValid()/failInvalid(), * set the success or error code of an emulated VMX instruction, as specified @@ -7242,6 +7146,12 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs *shadow_vmcs; + vmx->nested.vmcs02.vmcs = alloc_vmcs(); + vmx->nested.vmcs02.shadow_vmcs = NULL; + if (!vmx->nested.vmcs02.vmcs) + goto out_vmcs02; + loaded_vmcs_init(&vmx->nested.vmcs02); + if (cpu_has_vmx_msr_bitmap()) { vmx->nested.msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); @@ -7264,9 +7174,6 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) vmx->vmcs01.shadow_vmcs = shadow_vmcs; } - INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool)); - vmx->nested.vmcs02_num = 0; - hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); vmx->nested.preemption_timer.function = vmx_preemption_timer_fn; @@ -7281,6 +7188,9 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) free_page((unsigned long)vmx->nested.msr_bitmap); out_msr_bitmap: + free_loaded_vmcs(&vmx->nested.vmcs02); + +out_vmcs02: return -ENOMEM; } @@ -7434,7 +7344,7 @@ static void free_nested(struct vcpu_vmx *vmx) vmx->vmcs01.shadow_vmcs = NULL; } kfree(vmx->nested.cached_vmcs12); - /* Unpin physical memory we referred to in current vmcs02 */ + /* Unpin physical memory we referred to in the vmcs02 */ if (vmx->nested.apic_access_page) { kvm_release_page_dirty(vmx->nested.apic_access_page); vmx->nested.apic_access_page = NULL; @@ -7450,7 +7360,7 @@ static void free_nested(struct vcpu_vmx *vmx) vmx->nested.pi_desc = NULL; } - nested_free_all_saved_vmcss(vmx); + free_loaded_vmcs(&vmx->nested.vmcs02); } /* Emulate the VMXOFF instruction */ @@ -7493,8 +7403,6 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) vmptr + offsetof(struct vmcs12, launch_state), &zero, sizeof(zero)); - nested_free_vmcs02(vmx, vmptr); - nested_vmx_succeed(vcpu); return kvm_skip_emulated_instruction(vcpu); } @@ -8406,10 +8314,11 @@ static bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason) /* * The host physical addresses of some pages of guest memory - * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU - * may write to these pages via their host physical address while - * L2 is running, bypassing any address-translation-based dirty - * tracking (e.g. EPT write protection). + * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC + * Page). The CPU may write to these pages via their host + * physical address while L2 is running, bypassing any + * address-translation-based dirty tracking (e.g. EPT write + * protection). * * Mark them dirty on every exit from L2 to prevent them from * getting out of sync with dirty tracking. @@ -10903,20 +10812,15 @@ static int enter_vmx_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - struct loaded_vmcs *vmcs02; u32 msr_entry_idx; u32 exit_qual; - vmcs02 = nested_get_current_vmcs02(vmx); - if (!vmcs02) - return -ENOMEM; - enter_guest_mode(vcpu); if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); - vmx_switch_vmcs(vcpu, vmcs02); + vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02); vmx_segment_cache_clear(vmx); if (prepare_vmcs02(vcpu, vmcs12, from_vmentry, &exit_qual)) { @@ -11534,10 +11438,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, vm_exit_controls_reset_shadow(vmx); vmx_segment_cache_clear(vmx); - /* if no vmcs02 cache requested, remove the one we used */ - if (VMCS02_POOL_SIZE == 0) - nested_free_vmcs02(vmx, vmx->nested.current_vmptr); - /* Update any VMCS fields that might have changed while L2 ran */ vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr); vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr); -- 2.14.3

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] v3.18.93 build: 0 failures 1 warnings (v3.18.93)

by Build bot for Mark Brown

Tree/Branch: v3.18.93 Git describe: v3.18.93 Commit: 90aaf2f256 Linux 3.18.93 Build Time: 50 min 21 sec Passed: 9 / 9 (100.00 %) Failed: 0 / 9 ( 0.00 %) Errors: 0 Warnings: 1 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): 2 warnings 0 mismatches : x86_64-defconfig ------------------------------------------------------------------------------- Warnings Summary: 1 2 ../include/linux/ftrace.h:632:36: warning: calling '__builtin_return_address' with a nonzero argument is unsafe [-Wframe-address] =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- x86_64-defconfig : PASS, 0 errors, 2 warnings, 0 section mismatches Warnings: ../include/linux/ftrace.h:632:36: warning: calling '__builtin_return_address' with a nonzero argument is unsafe [-Wframe-address] ../include/linux/ftrace.h:632:36: warning: calling '__builtin_return_address' with a nonzero argument is unsafe [-Wframe-address] ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm64-allmodconfig arm-multi_v5_defconfig arm-multi_v7_defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] [PATCH] block: Fix a race between the throttling code and request queue initialization

by Bart Van Assche

Initialize the request queue lock earlier such that the following race can no longer occur: blk_init_queue_node blkcg_print_blkgs blk_alloc_queue_node (1) q->queue_lock = &q->__queue_lock (2) blkcg_init_queue(q) (3) spin_lock_irq(blkg->q->queue_lock) (4) q->queue_lock = lock (5) spin_unlock_irq(blkg->q->queue_lock) (6) (1) allocate an uninitialized queue; (2) initialize queue_lock to its default internal lock; (3) initialize blkcg part of request queue, which will create blkg and then insert it to blkg_list; (4) traverse blkg_list and find the created blkg, and then take its queue lock, here it is the default *internal lock*; (5) *race window*, now queue_lock is overridden with *driver specified lock*; (6) now unlock *driver specified lock*, not the locked *internal lock*, unlock balance breaks. The changes in this patch are as follows: - Rename blk_alloc_queue_node() into blk_alloc_queue_node2() and add a new queue lock argument. - Introduce a wrapper function with the same name and behavior as the old blk_alloc_queue_node() function. - Move the .queue_lock initialization from blk_init_queue_node() into blk_alloc_queue_node2(). - For all all block drivers that initialize .queue_lock explicitly, change the blk_alloc_queue() call in the driver into a blk_alloc_queue_node2() call and remove the explicit .queue_lock initialization. Additionally, initialize the spin lock that will be used as queue lock earlier if necessary. Reported-by: Joseph Qi <joseph.qi(a)linux.alibaba.com> Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Joseph Qi <joseph.qi(a)linux.alibaba.com> Cc: Philipp Reisner <philipp.reisner(a)linbit.com> Cc: Ulf Hansson <ulf.hansson(a)linaro.org> Cc: Kees Cook <keescook(a)chromium.org> Cc: <stable(a)vger.kernel.org> --- block/blk-core.c | 33 ++++++++++++++++++++++++--------- drivers/block/drbd/drbd_main.c | 4 ++-- drivers/block/umem.c | 7 +++---- drivers/mmc/core/queue.c | 3 +-- include/linux/blkdev.h | 2 ++ 5 files changed, 32 insertions(+), 17 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index bd43bc50740a..cb18f57e5b13 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -946,7 +946,22 @@ static void blk_rq_timed_out_timer(struct timer_list *t) kblockd_schedule_work(&q->timeout_work); } -struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) +/** + * blk_alloc_queue_node2 - allocate a request queue + * @gfp_mask: memory allocation flags + * @node_id: NUMA node to allocate memory from + * @lock: Pointer to a spinlock that will be used to e.g. serialize calls to + * the legacy .request_fn(). Only set this pointer for queues that use + * legacy mode and not for queues that use blk-mq. + * + * Note: use this function instead of calling blk_alloc_queue_node() and + * setting the queue lock pointer explicitly to avoid triggering a crash in + * the blkcg throttling code. That code namely makes sysfs attributes visible + * in user space before this function returns and the show method of these + * attributes uses the queue lock. + */ +struct request_queue *blk_alloc_queue_node2(gfp_t gfp_mask, int node_id, + spinlock_t *lock) { struct request_queue *q; @@ -997,11 +1012,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) mutex_init(&q->sysfs_lock); spin_lock_init(&q->__queue_lock); - /* - * By default initialize queue_lock to internal lock and driver can - * override it later if need be. - */ - q->queue_lock = &q->__queue_lock; + q->queue_lock = lock ? : &q->__queue_lock; /* * A queue starts its life with bypass turned on to avoid @@ -1042,6 +1053,12 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) kmem_cache_free(blk_requestq_cachep, q); return NULL; } +EXPORT_SYMBOL(blk_alloc_queue_node2); + +struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) +{ + return blk_alloc_queue_node2(gfp_mask, node_id, NULL); +} EXPORT_SYMBOL(blk_alloc_queue_node); /** @@ -1088,13 +1105,11 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id) { struct request_queue *q; - q = blk_alloc_queue_node(GFP_KERNEL, node_id); + q = blk_alloc_queue_node2(GFP_KERNEL, node_id, lock); if (!q) return NULL; q->request_fn = rfn; - if (lock) - q->queue_lock = lock; if (blk_init_allocated_queue(q) < 0) { blk_cleanup_queue(q); return NULL; diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index 4b4697a1f963..965e80b13443 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -2822,7 +2822,8 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig drbd_init_set_defaults(device); - q = blk_alloc_queue(GFP_KERNEL); + q = blk_alloc_queue_node2(GFP_KERNEL, NUMA_NO_NODE, + &resource->req_lock); if (!q) goto out_no_q; device->rq_queue = q; @@ -2854,7 +2855,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig /* Setting the max_hw_sectors to an odd value of 8kibyte here This triggers a max_bio_size message upon first attach or connect */ blk_queue_max_hw_sectors(q, DRBD_MAX_BIO_SIZE_SAFE >> 8); - q->queue_lock = &resource->req_lock; device->md_io.page = alloc_page(GFP_KERNEL); if (!device->md_io.page) diff --git a/drivers/block/umem.c b/drivers/block/umem.c index 8077123678ad..f6bb78782afa 100644 --- a/drivers/block/umem.c +++ b/drivers/block/umem.c @@ -888,13 +888,14 @@ static int mm_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) card->Active = -1; /* no page is active */ card->bio = NULL; card->biotail = &card->bio; + spin_lock_init(&card->lock); - card->queue = blk_alloc_queue(GFP_KERNEL); + card->queue = blk_alloc_queue_node2(GFP_KERNEL, NUMA_NO_NODE, + &card->lock); if (!card->queue) goto failed_alloc; blk_queue_make_request(card->queue, mm_make_request); - card->queue->queue_lock = &card->lock; card->queue->queuedata = card; tasklet_init(&card->tasklet, process_page, (unsigned long)card); @@ -968,8 +969,6 @@ static int mm_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) dev_printk(KERN_INFO, &card->dev->dev, "Window size %d bytes, IRQ %d\n", data, dev->irq); - spin_lock_init(&card->lock); - pci_set_drvdata(dev, card); if (pci_write_cmd != 0x0F) /* If not Memory Write & Invalidate */ diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c index 5ecd54088988..274a8c6c8d64 100644 --- a/drivers/mmc/core/queue.c +++ b/drivers/mmc/core/queue.c @@ -216,10 +216,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card, int ret = -ENOMEM; mq->card = card; - mq->queue = blk_alloc_queue(GFP_KERNEL); + mq->queue = blk_alloc_queue_node2(GFP_KERNEL, NUMA_NO_NODE, lock); if (!mq->queue) return -ENOMEM; - mq->queue->queue_lock = lock; mq->queue->request_fn = mmc_request_fn; mq->queue->init_rq_fn = mmc_init_request; mq->queue->exit_rq_fn = mmc_exit_request; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 781992c4124e..37723a8b799c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1331,6 +1331,8 @@ extern long nr_blockdev_pages(void); bool __must_check blk_get_queue(struct request_queue *); struct request_queue *blk_alloc_queue(gfp_t); struct request_queue *blk_alloc_queue_node(gfp_t, int); +struct request_queue *blk_alloc_queue_node2(gfp_t gfp_mask, int node_id, + spinlock_t *lock); extern void blk_put_queue(struct request_queue *); extern void blk_set_queue_dying(struct request_queue *); -- 2.16.0

7 years, 9 months

2
1
0 0

[Linux-stable-mirror] v4.9.79 build: 0 failures 0 warnings (v4.9.79)

by Build bot for Mark Brown

Tree/Branch: v4.9.79 Git describe: v4.9.79 Commit: 6c6f924f9c Linux 4.9.79 Build Time: 92 min 38 sec Passed: 10 / 10 (100.00 %) Failed: 0 / 10 ( 0.00 %) Errors: 0 Warnings: 0 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): ------------------------------------------------------------------------------- =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm64-allmodconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allmodconfig arm-allnoconfig x86_64-allnoconfig arm-multi_v4t_defconfig arm64-defconfig close failed in file object destructor: sys.excepthook is missing lost sys.stderr

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] Patch "loop: fix concurrent lo_open/lo_release" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled loop: fix concurrent lo_open/lo_release to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: loop-fix-concurrent-lo_open-lo_release.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From ae6650163c66a7eff1acd6eb8b0f752dcfa8eba5 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Fri, 5 Jan 2018 16:26:00 -0800 Subject: loop: fix concurrent lo_open/lo_release MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Linus Torvalds <torvalds(a)linux-foundation.org> commit ae6650163c66a7eff1acd6eb8b0f752dcfa8eba5 upstream. 范龙飞 reports that KASAN can report a use-after-free in __lock_acquire. The reason is due to insufficient serialization in lo_release(), which will continue to use the loop device even after it has decremented the lo_refcnt to zero. In the meantime, another process can come in, open the loop device again as it is being shut down. Confusion ensues. Reported-by: 范龙飞 <long7573(a)126.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Cc: Ben Hutchings <ben.hutchings(a)codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/block/loop.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1558,9 +1558,8 @@ out: return err; } -static void lo_release(struct gendisk *disk, fmode_t mode) +static void __lo_release(struct loop_device *lo) { - struct loop_device *lo = disk->private_data; int err; if (atomic_dec_return(&lo->lo_refcnt)) @@ -1586,6 +1585,13 @@ static void lo_release(struct gendisk *d mutex_unlock(&lo->lo_ctl_mutex); } +static void lo_release(struct gendisk *disk, fmode_t mode) +{ + mutex_lock(&loop_index_mutex); + __lo_release(disk->private_data); + mutex_unlock(&loop_index_mutex); +} + static const struct block_device_operations lo_fops = { .owner = THIS_MODULE, .open = lo_open, Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are queue-4.9/loop-fix-concurrent-lo_open-lo_release.patch

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] Patch "loop: fix concurrent lo_open/lo_release" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled loop: fix concurrent lo_open/lo_release to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: loop-fix-concurrent-lo_open-lo_release.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From ae6650163c66a7eff1acd6eb8b0f752dcfa8eba5 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Fri, 5 Jan 2018 16:26:00 -0800 Subject: loop: fix concurrent lo_open/lo_release MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Linus Torvalds <torvalds(a)linux-foundation.org> commit ae6650163c66a7eff1acd6eb8b0f752dcfa8eba5 upstream. 范龙飞 reports that KASAN can report a use-after-free in __lock_acquire. The reason is due to insufficient serialization in lo_release(), which will continue to use the loop device even after it has decremented the lo_refcnt to zero. In the meantime, another process can come in, open the loop device again as it is being shut down. Confusion ensues. Reported-by: 范龙飞 <long7573(a)126.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Cc: Ben Hutchings <ben.hutchings(a)codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/block/loop.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1569,9 +1569,8 @@ out: return err; } -static void lo_release(struct gendisk *disk, fmode_t mode) +static void __lo_release(struct loop_device *lo) { - struct loop_device *lo = disk->private_data; int err; if (atomic_dec_return(&lo->lo_refcnt)) @@ -1597,6 +1596,13 @@ static void lo_release(struct gendisk *d mutex_unlock(&lo->lo_ctl_mutex); } +static void lo_release(struct gendisk *disk, fmode_t mode) +{ + mutex_lock(&loop_index_mutex); + __lo_release(disk->private_data); + mutex_unlock(&loop_index_mutex); +} + static const struct block_device_operations lo_fops = { .owner = THIS_MODULE, .open = lo_open, Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are queue-4.4/loop-fix-concurrent-lo_open-lo_release.patch

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] Patch "loop: fix concurrent lo_open/lo_release" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled loop: fix concurrent lo_open/lo_release to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: loop-fix-concurrent-lo_open-lo_release.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From ae6650163c66a7eff1acd6eb8b0f752dcfa8eba5 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Fri, 5 Jan 2018 16:26:00 -0800 Subject: loop: fix concurrent lo_open/lo_release MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Linus Torvalds <torvalds(a)linux-foundation.org> commit ae6650163c66a7eff1acd6eb8b0f752dcfa8eba5 upstream. 范龙飞 reports that KASAN can report a use-after-free in __lock_acquire. The reason is due to insufficient serialization in lo_release(), which will continue to use the loop device even after it has decremented the lo_refcnt to zero. In the meantime, another process can come in, open the loop device again as it is being shut down. Confusion ensues. Reported-by: 范龙飞 <long7573(a)126.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Cc: Ben Hutchings <ben.hutchings(a)codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/block/loop.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1576,9 +1576,8 @@ out: return err; } -static void lo_release(struct gendisk *disk, fmode_t mode) +static void __lo_release(struct loop_device *lo) { - struct loop_device *lo = disk->private_data; int err; if (atomic_dec_return(&lo->lo_refcnt)) @@ -1605,6 +1604,13 @@ static void lo_release(struct gendisk *d mutex_unlock(&lo->lo_ctl_mutex); } +static void lo_release(struct gendisk *disk, fmode_t mode) +{ + mutex_lock(&loop_index_mutex); + __lo_release(disk->private_data); + mutex_unlock(&loop_index_mutex); +} + static const struct block_device_operations lo_fops = { .owner = THIS_MODULE, .open = lo_open, Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are queue-4.14/futex-fix-owner_dead-fixup.patch queue-4.14/loop-fix-concurrent-lo_open-lo_release.patch

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] Patch "loop: fix concurrent lo_open/lo_release" has been added to the 3.18-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled loop: fix concurrent lo_open/lo_release to the 3.18-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: loop-fix-concurrent-lo_open-lo_release.patch and it can be found in the queue-3.18 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From ae6650163c66a7eff1acd6eb8b0f752dcfa8eba5 Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds(a)linux-foundation.org> Date: Fri, 5 Jan 2018 16:26:00 -0800 Subject: loop: fix concurrent lo_open/lo_release MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Linus Torvalds <torvalds(a)linux-foundation.org> commit ae6650163c66a7eff1acd6eb8b0f752dcfa8eba5 upstream. 范龙飞 reports that KASAN can report a use-after-free in __lock_acquire. The reason is due to insufficient serialization in lo_release(), which will continue to use the loop device even after it has decremented the lo_refcnt to zero. In the meantime, another process can come in, open the loop device again as it is being shut down. Confusion ensues. Reported-by: 范龙飞 <long7573(a)126.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Cc: Ben Hutchings <ben.hutchings(a)codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/block/loop.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1512,9 +1512,8 @@ out: return err; } -static void lo_release(struct gendisk *disk, fmode_t mode) +static void __lo_release(struct loop_device *lo) { - struct loop_device *lo = disk->private_data; int err; mutex_lock(&lo->lo_ctl_mutex); @@ -1542,6 +1541,13 @@ out: mutex_unlock(&lo->lo_ctl_mutex); } +static void lo_release(struct gendisk *disk, fmode_t mode) +{ + mutex_lock(&loop_index_mutex); + __lo_release(disk->private_data); + mutex_unlock(&loop_index_mutex); +} + static const struct block_device_operations lo_fops = { .owner = THIS_MODULE, .open = lo_open, Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are queue-3.18/loop-fix-concurrent-lo_open-lo_release.patch

7 years, 9 months

1
0
0 0

[Linux-stable-mirror] [stable] dccp: CVE-2017-8824: use-after-free in DCCP code

by Ben Hutchings

Please queue up this commit for stable: 69c64866ce07 dccp: CVE-2017-8824: use-after-free in DCCP code Ben. -- Ben Hutchings Software Developer, Codethink Ltd.

7 years, 9 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror