Since we recently started warning about uses of this function after the
atomic check phase completes, we've started getting warnings about this in
nouveau. It appears a misplaced drm_atomic_get_crtc_state() call has been
hiding in our .prepare_fb callback for a while.
So, fix this by adding a new nv50_head_atom_get_new() function and use that
in our .prepare_fb callback instead.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 1590700d94ac ("drm/nouveau/kms/nv50-: split each resource type into their own source files")
Cc: <stable(a)vger.kernel.org> # v4.18+
---
V2:
* Don't call IS_ERR against the return value of
drm_atomic_get_new_crtc_state(), it doesn't return pointer errors.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
---
drivers/gpu/drm/nouveau/dispnv50/atom.h | 13 +++++++++++++
drivers/gpu/drm/nouveau/dispnv50/wndw.c | 2 +-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/dispnv50/atom.h b/drivers/gpu/drm/nouveau/dispnv50/atom.h
index 93f8f4f645784..b43c4f9bbcdf5 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/atom.h
+++ b/drivers/gpu/drm/nouveau/dispnv50/atom.h
@@ -152,8 +152,21 @@ static inline struct nv50_head_atom *
nv50_head_atom_get(struct drm_atomic_state *state, struct drm_crtc *crtc)
{
struct drm_crtc_state *statec = drm_atomic_get_crtc_state(state, crtc);
+
if (IS_ERR(statec))
return (void *)statec;
+
+ return nv50_head_atom(statec);
+}
+
+static inline struct nv50_head_atom *
+nv50_head_atom_get_new(struct drm_atomic_state *state, struct drm_crtc *crtc)
+{
+ struct drm_crtc_state *statec = drm_atomic_get_new_crtc_state(state, crtc);
+
+ if (!statec)
+ return NULL;
+
return nv50_head_atom(statec);
}
diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
index ef9e410babbfb..9a2c20fce0f3e 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c
@@ -583,7 +583,7 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state)
asyw->image.offset[0] = nvbo->offset;
if (wndw->func->prepare) {
- asyh = nv50_head_atom_get(asyw->state.state, asyw->state.crtc);
+ asyh = nv50_head_atom_get_new(asyw->state.state, asyw->state.crtc);
if (IS_ERR(asyh))
return PTR_ERR(asyh);
base-commit: c7685d11108acb387e44e3d81194d0d8959eaa44
--
2.52.0
From: Jan H. Schönherr <jschoenh(a)amazon.de>
It is possible to degrade host performance by manipulating performance
counters from a VM and tricking the host hypervisor to enable branch
tracing. When the guest programs a CPU to track branch instructions and
deliver an interrupt after exactly one branch instruction, the value one
is handled by the host KVM/perf subsystems and treated incorrectly as a
special value to enable the branch trace store (BTS) subsystem. It
should not be possible to enable BTS from a guest. When BTS is enabled,
it leads to general host performance degradation to both VMs and host.
Perf considers the combination of PERF_COUNT_HW_BRANCH_INSTRUCTIONS with
a sample_period of 1 a special case and handles this as a BTS event (see
intel_pmu_has_bts_period()) -- a deviation from the usual semantic,
where the sample_period represents the amount of branch instructions to
encounter before the overflow handler is invoked.
Nothing prevents a guest from programming its vPMU with the above
settings (count branch, interrupt after one branch), which causes KVM to
erroneously instruct perf to create a BTS event within
pmc_reprogram_counter(), which does not have the desired semantics.
The guest could also do more benign actions and request an interrupt
after a more reasonable number of branch instructions via its vPMU. In
that case counting works initially. However, KVM occasionally pauses and
resumes the created performance counters. If the remaining amount of
branch instructions until interrupt has reached 1 exactly,
pmc_resume_counter() fails to resume the counter and a BTS event is
created instead with its incorrect semantics.
Fix this behavior by not passing the special value "1" as sample_period
to perf. Instead, perform the same quirk that happens later in
x86_perf_event_set_period() anyway, when the performance counter is
transferred to the actual PMU: bump the sample_period to 2.
Testing:
From guest:
`./wrmsr -p 12 0x186 0x1100c4`
`./wrmsr -p 12 0xc1 0xffffffffffff`
`./wrmsr -p 12 0x186 0x5100c4`
This sequence sets up branch instruction counting, initializes the counter
to overflow after one event (0xffffffffffff), and then enables edge
detection (bit 18) for branch events.
./wrmsr -p 12 0x186 0x1100c4
Writes to IA32_PERFEVTSEL0 (0x186)
Value 0x1100c4 breaks down as:
Event = 0xC4 (Branch instructions)
Bits 16-17: 0x1 (User mode only)
Bit 22: 1 (Enable counter)
./wrmsr -p 12 0xc1 0xffffffffffff
Writes to IA32_PMC0 (0xC1)
Sets counter to maximum value (0xffffffffffff)
This effectively sets up the counter to overflow on the next branch
./wrmsr -p 12 0x186 0x5100c4
Updates IA32_PERFEVTSEL0 again
Similar to first command but adds bit 18 (0x4 to 0x5)
Enables edge detection (bit 18)
These MSR writes are trapped by the hypervisor in KVM and forwarded to
the perf subsystem to create corresponding monitoring events.
It is possible to repro this problem in a more realistic guest scenario:
`perf record -e branches:u -c 2 -a &`
`perf record -e branches:u -c 2 -a &`
This presumably triggers the issue by KVM pausing and resuming the
performance counter at the wrong moment, when its value is about to
overflow.
Signed-off-by: Jan H. Schönherr <jschoenh(a)amazon.de>
Signed-off-by: Fernand Sieber <sieberf(a)amazon.com>
Reviewed-by: David Woodhouse <dwmw(a)amazon.co.uk>
Reviewed-by: Hendrik Borghorst <hborghor(a)amazon.de>
Link: https://lore.kernel.org/r/20251124100220.238177-1-sieberf@amazon.com
---
arch/x86/kvm/pmu.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 487ad19a236e..547512028e24 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -225,6 +225,19 @@ static u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
{
u64 sample_period = (-counter_value) & pmc_bitmask(pmc);
+ /*
+ * A sample_period of 1 might get mistaken by perf for a BTS event, see
+ * intel_pmu_has_bts_period(). This would prevent re-arming the counter
+ * via pmc_resume_counter(), followed by the accidental creation of an
+ * actual BTS event, which we do not want.
+ *
+ * Avoid this by bumping the sampling period. Note, that we do not lose
+ * any precision, because the same quirk happens later anyway (for
+ * different reasons) in x86_perf_event_set_period().
+ */
+ if (sample_period == 1)
+ sample_period = 2;
+
if (!sample_period)
sample_period = pmc_bitmask(pmc) + 1;
return sample_period;
--
2.43.0
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
of_get_child_by_name() returns a node pointer with refcount incremented.
Use the __free() attribute to manage the pgc_node reference, ensuring
automatic of_node_put() cleanup when pgc_node goes out of scope.
This eliminates the need for explicit error handling paths and avoids
reference count leaks.
Fixes: 721cabf6c660 ("soc: imx: move PGC handling to a new GPC driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
Change in V4:
- Fix typo error in code
Change in V3:
- Ensure variable is assigned when using cleanup attribute
Change in V2:
- Use __free() attribute instead of explicit of_node_put() calls
---
drivers/pmdomain/imx/gpc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/pmdomain/imx/gpc.c b/drivers/pmdomain/imx/gpc.c
index f18c7e6e75dd..56a78cc86584 100644
--- a/drivers/pmdomain/imx/gpc.c
+++ b/drivers/pmdomain/imx/gpc.c
@@ -403,13 +403,12 @@ static int imx_gpc_old_dt_init(struct device *dev, struct regmap *regmap,
static int imx_gpc_probe(struct platform_device *pdev)
{
const struct imx_gpc_dt_data *of_id_data = device_get_match_data(&pdev->dev);
- struct device_node *pgc_node;
+ struct device_node *pgc_node __free(device_node)
+ = of_get_child_by_name(pdev->dev.of_node, "pgc");
struct regmap *regmap;
void __iomem *base;
int ret;
- pgc_node = of_get_child_by_name(pdev->dev.of_node, "pgc");
-
/* bail out if DT too old and doesn't provide the necessary info */
if (!of_property_present(pdev->dev.of_node, "#power-domain-cells") &&
!pgc_node)
--
2.34.1
According to PCIe r7.0, sec. 6.2.11, "Implementation Note: Determination
of DPC Control", it is recommended that the Operating System link the
enablement of Downstream Port Containment (DPC) to the enablement of
Advanced Error Reporting (AER).
However, AER is advertised only on Root Port (RP) or Root Complex Event
Collector (RCEC) devices. On the other hand, DPC may be advertised on
any PCIe device in the hierarchy. In fact, since the main usecase of DPC
is for the switch upstream of an endpoint device to trigger a signal to
the host-bridge, it is imperative that it be supported on non-RP,
non-RCEC devices.
Previously portdrv has interpreted the spec to mean that the AER service
must be available on the same device in order for DPC to be available.
This is not what the implementation note meant to imply. If the firmware
hands Linux control of AER via _OSC on the host bridge upstream of the
device, then Linux should be allowed to assume control of DPC on the device.
The comment above this check alludes to this, by saying:
> With dpc-native, allow Linux to use DPC even if it doesn't have
> permission to use AER.
However, permission to use AER is negotiated at the host bridge, not
per-device. So we should not link DPC to enabling AER at the device.
Instead, DPC should be enabled if the OS has control of AER for the
host bridge that is upstream of the device in question, or if dpc-native
was set on the command line.
Cc: stable(a)vger.kernel.org
Signed-off-by: Darshit Shah <darnshah(a)amazon.de>
Reviewed-by: Lukas Wunner <lukas(a)wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
---
Changes from v2:
* + stable(a)vger.kernel.org since moving to v6.2+ breaks DPC on our systems
* Minor stylistic changes to commit message
* NO functional changes
drivers/pci/pcie/portdrv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 38a41ccf79b9..8db2fa140ae2 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -264,7 +264,7 @@ static int get_port_device_capability(struct pci_dev *dev)
*/
if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
pci_aer_available() &&
- (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER)))
+ (host->native_aer || pcie_ports_dpc_native))
services |= PCIE_PORT_SERVICE_DPC;
/* Enable bandwidth control if more than one speed is supported. */
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
scx_enable() calls scx_bypass(true) to initialize in bypass mode and then
scx_bypass(false) on success to exit. If scx_enable() fails during task
initialization - e.g. scx_cgroup_init() or scx_init_task() returns an error -
it jumps to err_disable while bypass is still active. scx_disable_workfn()
then calls scx_bypass(true/false) for its own bypass, leaving the bypass depth
at 1 instead of 0. This causes the system to remain permanently in bypass mode
after a failed scx_enable().
Failures after task initialization is complete - e.g. scx_tryset_enable_state()
at the end - already call scx_bypass(false) before reaching the error path and
are not affected. This only affects a subset of failure modes.
Fix it by tracking whether scx_enable() called scx_bypass(true) in a bool and
having scx_disable_workfn() call an extra scx_bypass(false) to clear it. This
is a temporary measure as the bypass depth will be moved into the sched
instance, which will make this tracking unnecessary.
Fixes: 8c2090c504e9 ("sched_ext: Initialize in bypass mode")
Cc: stable(a)vger.kernel.org # v6.12+
Reported-by: Chris Mason <clm(a)meta.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
---
kernel/sched/ext.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -41,6 +41,13 @@ static bool scx_init_task_enabled;
static bool scx_switching_all;
DEFINE_STATIC_KEY_FALSE(__scx_switched_all);
+/*
+ * Tracks whether scx_enable() called scx_bypass(true). Used to balance bypass
+ * depth on enable failure. Will be removed when bypass depth is moved into the
+ * sched instance.
+ */
+static bool scx_bypassed_for_enable;
+
static atomic_long_t scx_nr_rejected = ATOMIC_LONG_INIT(0);
static atomic_long_t scx_hotplug_seq = ATOMIC_LONG_INIT(0);
@@ -4318,6 +4325,11 @@ static void scx_disable_workfn(struct kt
scx_dsp_max_batch = 0;
free_kick_syncs();
+ if (scx_bypassed_for_enable) {
+ scx_bypassed_for_enable = false;
+ scx_bypass(false);
+ }
+
mutex_unlock(&scx_enable_mutex);
WARN_ON_ONCE(scx_set_enable_state(SCX_DISABLED) != SCX_DISABLING);
@@ -4970,6 +4982,7 @@ static int scx_enable(struct sched_ext_o
* Init in bypass mode to guarantee forward progress.
*/
scx_bypass(true);
+ scx_bypassed_for_enable = true;
for (i = SCX_OPI_NORMAL_BEGIN; i < SCX_OPI_NORMAL_END; i++)
if (((void (**)(void))ops)[i])
@@ -5067,6 +5080,7 @@ static int scx_enable(struct sched_ext_o
scx_task_iter_stop(&sti);
percpu_up_write(&scx_fork_rwsem);
+ scx_bypassed_for_enable = false;
scx_bypass(false);
if (!scx_tryset_enable_state(SCX_ENABLED, SCX_ENABLING)) {
--
tejun