The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 064855a69003c24bd6b473b367d364e418c57625
Gitweb: https://git.kernel.org/tip/064855a69003c24bd6b473b367d364e418c57625
Author: Babu Moger <Babu.Moger(a)amd.com>
AuthorDate: Mon, 02 Aug 2021 14:38:58 -05:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Thu, 12 Aug 2021 20:12:20 +02:00
x86/resctrl: Fix default monitoring groups reporting
Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes
on the entire filesystem.
Steps to reproduce:
1. mount -t resctrl resctrl /sys/fs/resctrl/
2. cd /sys/fs/resctrl/
3. cat mon_data/mon_L3_00/mbm_total_bytes
23189832
4. Create sub monitor group:
mkdir mon_groups/test1
5. cat mon_data/mon_L3_00/mbm_total_bytes
Unavailable
When a new monitoring group is created, a new RMID is assigned to the
new group. But the RMID is not active yet. When the events are read on
the new RMID, it is expected to report the status as "Unavailable".
When the user reads the events on the default monitoring group with
multiple subgroups, the events on all subgroups are consolidated
together. Currently, if any of the RMID reads report as "Unavailable",
then everything will be reported as "Unavailable".
Fix the issue by discarding the "Unavailable" reads and reporting all
the successful RMID reads. This is not a problem on Intel systems as
Intel reports 0 on Inactive RMIDs.
Fixes: d89b7379015f ("x86/intel_rdt/cqm: Add mon_data")
Reported-by: Paweł Szulik <pawel.szulik(a)intel.com>
Signed-off-by: Babu Moger <Babu.Moger(a)amd.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Reinette Chatre <reinette.chatre(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=213311
Link: https://lkml.kernel.org/r/162793309296.9224.15871659871696482080.stgit@bmog…
---
arch/x86/kernel/cpu/resctrl/monitor.c | 27 ++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f07c10b..57e4bb6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -285,15 +285,14 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
return chunks >>= shift;
}
-static int __mon_event_count(u32 rmid, struct rmid_read *rr)
+static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
{
struct mbm_state *m;
u64 chunks, tval;
tval = __rmid_read(rmid, rr->evtid);
if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
- rr->val = tval;
- return -EINVAL;
+ return tval;
}
switch (rr->evtid) {
case QOS_L3_OCCUP_EVENT_ID:
@@ -305,12 +304,6 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
case QOS_L3_MBM_LOCAL_EVENT_ID:
m = &rr->d->mbm_local[rmid];
break;
- default:
- /*
- * Code would never reach here because
- * an invalid event id would fail the __rmid_read.
- */
- return -EINVAL;
}
if (rr->first) {
@@ -361,23 +354,29 @@ void mon_event_count(void *info)
struct rdtgroup *rdtgrp, *entry;
struct rmid_read *rr = info;
struct list_head *head;
+ u64 ret_val;
rdtgrp = rr->rgrp;
- if (__mon_event_count(rdtgrp->mon.rmid, rr))
- return;
+ ret_val = __mon_event_count(rdtgrp->mon.rmid, rr);
/*
- * For Ctrl groups read data from child monitor groups.
+ * For Ctrl groups read data from child monitor groups and
+ * add them together. Count events which are read successfully.
+ * Discard the rmid_read's reporting errors.
*/
head = &rdtgrp->mon.crdtgrp_list;
if (rdtgrp->type == RDTCTRL_GROUP) {
list_for_each_entry(entry, head, mon.crdtgrp_list) {
- if (__mon_event_count(entry->mon.rmid, rr))
- return;
+ if (__mon_event_count(entry->mon.rmid, rr) == 0)
+ ret_val = 0;
}
}
+
+ /* Report error if none of rmid_reads are successful */
+ if (ret_val)
+ rr->val = ret_val;
}
/*
From: Lai Jiangshan <laijs(a)linux.alibaba.com>
commit b1bd5cba3306691c771d558e94baa73e8b0b96b7 upstream.
When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions. Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different. KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries. Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.
For example, here is a shared pagetable:
pgd[] pud[] pmd[] virtual address pointers
/->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
/->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
pgd-| (shared pmd[] as above)
\->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
\->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)
pud1 and pud2 point to the same pmd table, so:
- ptr1 and ptr3 points to the same page.
- ptr2 and ptr4 points to the same page.
(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)
- First, the guest reads from ptr1 first and KVM prepares a shadow
page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
"u--" comes from the effective permissions of pgd, pud1 and
pmd1, which are stored in pt->access. "u--" is used also to get
the pagetable for pud1, instead of "uw-".
- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
even though the pud1 pmd (because of the incorrect argument to
kvm_mmu_get_page in the previous step) has role.access="u--".
- Then the guest reads from ptr3. The hypervisor reuses pud1's
shadow pmd for pud2, because both use "u--" for their permissions.
Thus, the shadow pmd already includes entries for both pmd1 and pmd2.
- At last, the guest writes to ptr4. This causes no vmexit or pagefault,
because pud1's shadow page structures included an "uw-" page even though
its role.access was "u--".
Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.
In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.
The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.
The problem had existed long before the commit 41074d07c78b ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit. So there is no fixes tag here.
Signed-off-by: Lai Jiangshan <laijs(a)linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: cea0f0e7ea54 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
[OP: - apply arch/x86/kvm/mmu/* changes to arch/x86/kvm
- apply documentation changes to Documentation/virtual/kvm/mmu.txt
- adjusted context in arch/x86/kvm/paging_tmpl.h]
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com>
---
Note: The backport was validated by running the kvm-unit-tests testcase [1]
mentioned in the commit message (the testcase fails without the patch and
passes with the patch applied).
[1] https://gitlab.com/kvm-unit-tests/kvm-unit-tests/-/commit/47fd6bc54674fb1d8…
Documentation/virtual/kvm/mmu.txt | 4 ++--
arch/x86/kvm/paging_tmpl.h | 14 +++++++++-----
2 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt
index e507a9e0421e..851a8abcadce 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -152,8 +152,8 @@ Shadow pages contain the following information:
shadow pages) so role.quadrant takes values in the range 0..3. Each
quadrant maps 1GB virtual address space.
role.access:
- Inherited guest access permissions in the form uwx. Note execute
- permission is positive, not negative.
+ Inherited guest access permissions from the parent ptes in the form uwx.
+ Note execute permission is positive, not negative.
role.invalid:
The page is invalid and should not be used. It is a root page that is
currently pinned (by a cpu hardware register pointing to it); once it is
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 8220190b0605..9e15818de973 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -93,8 +93,8 @@ struct guest_walker {
gpa_t pte_gpa[PT_MAX_FULL_LEVELS];
pt_element_t __user *ptep_user[PT_MAX_FULL_LEVELS];
bool pte_writable[PT_MAX_FULL_LEVELS];
- unsigned pt_access;
- unsigned pte_access;
+ unsigned int pt_access[PT_MAX_FULL_LEVELS];
+ unsigned int pte_access;
gfn_t gfn;
struct x86_exception fault;
};
@@ -388,13 +388,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
walker->ptes[walker->level - 1] = pte;
+
+ /* Convert to ACC_*_MASK flags for struct guest_walker. */
+ walker->pt_access[walker->level - 1] = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
} while (!is_last_gpte(mmu, walker->level, pte));
pte_pkey = FNAME(gpte_pkeys)(vcpu, pte);
accessed_dirty = have_ad ? pte_access & PT_GUEST_ACCESSED_MASK : 0;
/* Convert to ACC_*_MASK flags for struct guest_walker. */
- walker->pt_access = FNAME(gpte_access)(pt_access ^ walk_nx_mask);
walker->pte_access = FNAME(gpte_access)(pte_access ^ walk_nx_mask);
errcode = permission_fault(vcpu, mmu, walker->pte_access, pte_pkey, access);
if (unlikely(errcode))
@@ -433,7 +435,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
pgprintk("%s: pte %llx pte_access %x pt_access %x\n",
- __func__, (u64)pte, walker->pte_access, walker->pt_access);
+ __func__, (u64)pte, walker->pte_access,
+ walker->pt_access[walker->level - 1]);
return 1;
error:
@@ -602,7 +605,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
{
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
- unsigned direct_access, access = gw->pt_access;
+ unsigned int direct_access, access;
int top_level, ret;
gfn_t gfn, base_gfn;
@@ -634,6 +637,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
sp = NULL;
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
+ access = gw->pt_access[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
}
--
2.25.1
Hi Greg,
Suggest including next patch (available in linux-mainline) to
5.4-stable branch: commit ae7e86108b12 ("usb: dwc3: Stop active
transfers before halting the controller"). It's also already present
in 5.10 stable. Some fixes exist in 5.10-stable for that patch too.
This patch fixes panic in case of using USB2.0 Dual Role Device
controller, as described below.
1. Boot in peripheral role
2. Configure RNDIS gadget, perform ping, stop ping
3. Switch to host role
4. Kernel panic occurs
Kernel panic happens because gadget->udc->driver->disconnect() (which
is configfs_composite_disconnect()) is not called from
usb_gadget_disconnect() function, due to timeout condition in
dwc3_gadget_run_stop(), which leads to not called rndis_disable(). And
although previously created endpoints are not valid anymore,
eth_start_xmit() gets called and tries to use those, which leads to
invalid memory access. This patch fixes timeout condition, so next
call chain doesn't fail anymore, and RNDIS uninitialized properly on
gadget to host role switch:
<<<<<<<<<<<<<<<<<<<< cut here >>>>>>>>>>>>>>>>>>>
usb_role_switch_set_role()
v
dwc3_usb_role_switch_set()
v
dwc3_set_mode()
v
__dwc3_set_mode()
v
dwc3_gadget_exit()
v
usb_del_gadget_udc()
v
usb_gadget_remove_driver()
v
usb_gadget_disconnect()
v
// THIS IS NOT CALLED because gadget->ops->pullup() =
// dwc3_gadget_pullup() returns -ETIMEDOUT (-110)
gadget->udc->driver->disconnect()
// = configfs_composite_disconnect()
v
composite_disconnect()
v
reset_config()
v
foreach (f : function) : f->disable
v
rndis_disable()
v
gether_disconnect()
v
usb_ep_disable(),
dev->port_usb = NULL
<<<<<<<<<<<<<<<<<<<< cut here >>>>>>>>>>>>>>>>>>>
Thanks!
Hi Greg
Reason:
We found without this, Poll() infinitely wait because OUTPUT queue
will never signaled after last CAPTURE queue is de-queued.
And some buffer can't be popped as expected.
commit id: 566463afdbc43c7744c5a1b89250fc808df03833
subject: "media: v4l2-mem2mem: always consider OUTPUT queue during poll"
This should be applied to 4.4+ without conflict after
(f1a81afc98e315f4bf600d28f8a19a5655f7cfe0 "[media] m2m: fix bad unlock balance")
Thanks,
Lecopzer