On a x86 system under test with 1780 CPUs, topology_span_sane() takes
around 8 seconds cumulatively for all the iterations. It is an expensive
operation which does the sanity of non-NUMA topology masks.
CPU topology is not something which changes very frequently hence make
this check optional for the systems where the topology is trusted and
need faster bootup.
Restrict this to SCHED_DEBUG builds so that this penalty can be avoided
for the systems who wants to avoid it.
Fixes: ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap")
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
---
kernel/sched/topology.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9748a4c8d668..dacc8c6f978b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2354,6 +2354,7 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
return sd;
}
+#ifdef CONFIG_SCHED_DEBUG
/*
* Ensure topology masks are sane, i.e. there are no conflicts (overlaps) for
* any two given CPUs at this (non-NUMA) topology level.
@@ -2387,6 +2388,7 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
return true;
}
+#endif
/*
* Build sched domains for a given set of CPUs and attach the sched domains
@@ -2417,8 +2419,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
sd = NULL;
for_each_sd_topology(tl) {
+#ifdef CONFIG_SCHED_DEBUG
if (WARN_ON(!topology_span_sane(tl, cpu_map, i)))
goto error;
+#endif
sd = build_sched_domain(tl, cpu_map, attr, sd, i);
--
2.43.0
Hello,
I'd like to report a regression to menuconfig.
In "Device Drivers" ---> "Input device support"
There used to be submenus for keyboards, mice etc.
Now, only the entry "Hardware I/O ports" remains.
They can still be accessed (and configured) by searching for them,
then pressing the corresponding number:
/Keyboard
/KEYBOARD_ATKBD
I also determined the commit:
f79dc03fe68c79d388908182e68d702f7f1786bc
kconfig: refactor choice value calculation
#regzbot introduced: f79dc03fe68c
I've only encountered this here but it is not impossible other entries
elsewhere in menuconfig might be missing aswell.
The issue is present in stable 6.11.5 and mainline.
Kind regards,
Edmund Raile.
Hello,
I'd like to report a regression in firewire-ohci that results
in the kernel hardlocking when re-discovering a FireWire device.
TI XIO2213B
RME FireFace 800
It will occur under three conditions:
* power-cycling the FireWire device
* un- and re-plugging the FireWire device
* suspending and then waking the PC
Often it would also occur directly on boot in QEMU but I have not
yet observed this specific behavior on bare metal.
Here is an excerpt from the stack trace (don't know whether it is
acceptable to send in full):
kernel: ------------[ cut here ]------------
kernel: refcount_t: addition on 0; use-after-free.
kernel: WARNING: CPU: 3 PID: 116 at lib/refcount.c:25
refcount_warn_saturate (/build/linux/lib/refcount.c:25 (discriminator
1))
kernel: Workqueue: firewire_ohci bus_reset_work
kernel: RIP: 0010:refcount_warn_saturate
(/build/linux/lib/refcount.c:25 (discriminator 1))
kernel: Call Trace:
kernel: <TASK>
kernel: ? refcount_warn_saturate (/build/linux/lib/refcount.c:25
(discriminator 1))
kernel: ? __warn.cold (/build/linux/kernel/panic.c:693)
kernel: ? refcount_warn_saturate (/build/linux/lib/refcount.c:25
(discriminator 1))
kernel: ? report_bug (/build/linux/lib/bug.c:180
/build/linux/lib/bug.c:219)
kernel: ? handle_bug (/build/linux/arch/x86/kernel/traps.c:218)
kernel: ? exc_invalid_op (/build/linux/arch/x86/kernel/traps.c:260
(discriminator 1))
kernel: ? asm_exc_invalid_op
(/build/linux/./arch/x86/include/asm/idtentry.h:621)
kernel: ? refcount_warn_saturate (/build/linux/lib/refcount.c:25
(discriminator 1))
kernel: for_each_fw_node (/build/linux/./include/linux/refcount.h:190
/build/linux/./include/linux/refcount.h:241
/build/linux/./include/linux/refcount.h:258
/build/linux/drivers/firewire/core.h:199
/build/linux/drivers/firewire/core-topology.c:275)
kernel: ? __pfx_report_found_node (/build/linux/drivers/firewire/core-
topology.c:312)
kernel: fw_core_handle_bus_reset (/build/linux/drivers/firewire/core-
topology.c:399 (discriminator 1) /build/linux/drivers/firewire/core-
topology.c:504 (discriminator 1))
kernel: bus_reset_work (/build/linux/drivers/firewire/ohci.c:2121)
kernel: process_one_work
(/build/linux/./arch/x86/include/asm/jump_label.h:27
/build/linux/./include/linux/jump_label.h:207
/build/linux/./include/trace/events/workqueue.h:110
/build/linux/kernel/workqueue.c:3236)
kernel: worker_thread (/build/linux/kernel/workqueue.c:3306
(discriminator 2) /build/linux/kernel/workqueue.c:3393 (discriminator
2))
kernel: ? __pfx_worker_thread (/build/linux/kernel/workqueue.c:3339)
kernel: kthread (/build/linux/kernel/kthread.c:389)
kernel: ? __pfx_kthread (/build/linux/kernel/kthread.c:342)
kernel: ret_from_fork (/build/linux/arch/x86/kernel/process.c:153)
kernel: ? __pfx_kthread (/build/linux/kernel/kthread.c:342)
kernel: ret_from_fork_asm (/build/linux/arch/x86/entry/entry_64.S:254)
kernel: </TASK>
I have identified the commit via bisection:
24b7f8e5cd656196a13077e160aec45ad89b58d9
firewire: core: use helper functions for self ID sequence
It was part of the following patch series:
firewire: add tracepoints events for self ID sequence
https://lore.kernel.org/all/20240605235155.116468-6-o-takashi@sakamocchi.jp/
#regzbot introduced: 24b7f8e5cd65
Since this was before v6.10-rc5 and stable 6.10.14 is EOL,
stable v6.11.5 and mainline are affected.
Reversion appears to be non-trivial as it is part of a patch
series, other files have been altered as well and other commits
build on top of it.
Call chain:
core-topology.c fw_core_handle_bus_reset()
-> core-topology.c for_each_fw_node(card, local_node,
report_found_node)
-> core.h fw_node_get(root)
-> refcount.h __refcount_inc(&node)
-> refcount.h __refcount_add(1, r, oldp);
-> refcount.h refcount_warn_saturate(r, REFCOUNT_ADD_UAF);
-> refcount.h REFCOUNT_WARN("addition on 0; use-after-free")
Since local_node of fw_core_handle_bus_reset() is retrieved by
local_node = build_tree(card, self_ids, self_id_count);
build_tree() needs to be looked at, it was indeed altered by
24b7f8e5cd65.
After a hard 3 hour look traversing all used functions and comparing
against the original function (as of e404cacfc5ed), this caught my eye:
for (port_index = 0; port_index < total_port_count;
++port_index) {
switch (port_status) {
case PHY_PACKET_SELF_ID_PORT_STATUS_PARENT:
node->color = i;
In both for loops, "port_index" was replaced by "i"
"i" remains in use above:
for (i = 0, h = &stack; i < child_port_count; i++)
h = h->prev;
While the original also used the less descriptive i in the loop
for (i = 0; i < port_count; i++) {
switch (get_port_type(sid, i)) {
case SELFID_PORT_PARENT:
node->color = i;
but reset it to 0 at the beginning of the loop.
So the stray "i" in the for loop should be replaced with the loop
iterator "port_index" as it is meant to be synchronous with the
loop iterator (i.e. the port_index), no?
diff --git a/drivers/firewire/core-topology.c b/drivers/firewire/core-
topology.c
index 8c10f47cc8fc..7fd91ba9c9c4 100644
--- a/drivers/firewire/core-topology.c
+++ b/drivers/firewire/core-topology.c
@@ -207,7 +207,7 @@ static struct fw_node *build_tree(struct fw_card
*card, const u32 *sid, int self
// the node->ports array where the
parent node should be. Later,
// when we handle the parent node, we
fix up the reference.
++parent_count;
- node->color = i;
+ node->color = port_index;
break;
What threw me off was discaridng node->color as it would be replaced
later anyways (can't be important!), or so I thought.
Please tell me, is this line of reasoning correct or am I missing
something?
Compiling 24b7f8e5cd65 and later mainline with the patch above
resulted in a kernel that didn't crash!
In case my solution should turn out to be correct, I will gladly
submit the patch.
Kind regards,
Edmund Raile.
With the transition of pd-mapper into the kernel, the timing was altered
such that on some targets the initial rpmsg_send() requests from
pmic_glink clients would be attempted before the firmware had announced
intents, and the firmware reject intent requests.
Fix this
Signed-off-by: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com>
---
Changes in v2:
- Introduced "intents" and fixed a few spelling mistakes in the commit
message of patch 1
- Cleaned up log snippet in commit message of patch 2, added battery
manager log
- Changed the arbitrary 10 second timeout to 5... Ought to be enough for
anybody.
- Added a small sleep in the send-loop in patch 2, and by that
refactored the loop completely.
- Link to v1: https://lore.kernel.org/r/20241022-pmic-glink-ecancelled-v1-0-9e26fc74e0a3@…
---
Bjorn Andersson (2):
rpmsg: glink: Handle rejected intent request better
soc: qcom: pmic_glink: Handle GLINK intent allocation rejections
drivers/rpmsg/qcom_glink_native.c | 10 +++++++---
drivers/soc/qcom/pmic_glink.c | 25 ++++++++++++++++++++++---
2 files changed, 29 insertions(+), 6 deletions(-)
---
base-commit: 42f7652d3eb527d03665b09edac47f85fb600924
change-id: 20241022-pmic-glink-ecancelled-d899a9ca0358
Best regards,
--
Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com>
Adds support for detecting and reporting the speed of unaligned vector
accesses on RISC-V CPUs. Adds vec_misaligned_speed key to the hwprobe
adds Zicclsm to cpufeature and fixes the check for scalar unaligned
emulated all CPUs. The vec_misaligned_speed key keeps the same format
as the scalar unaligned access speed key.
This set does not emulate unaligned vector accesses on CPUs that do not
support them. Only reports if userspace can run them and speed of
unaligned vector accesses if supported.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
Signed-off-by: Jesse Taube <jesse(a)rivosinc.com>
---
Changes in V6:
Added ("RISC-V: Scalar unaligned access emulated on hotplug CPUs")
Changes in V8:
Dropped Zicclsm
s/RISCV_HWPROBE_VECTOR_MISALIGNED/RISCV_HWPROBE_MISALIGNED_VECTOR/g
to match RISCV_HWPROBE_MISALIGNED_SCALAR_*
Rebased onto palmer/fixes (32d5f7add080a936e28ab4142bfeea6b06999789)
Changes in V9:
Missed a RISCV_HWPROBE_VECTOR_MISALIGNED...
Changes in V10:
- I sent on behalf of Jesse
- Remove v0 from clobber args in inline asm and leave comment
---
Jesse Taube (6):
RISC-V: Check scalar unaligned access on all CPUs
RISC-V: Scalar unaligned access emulated on hotplug CPUs
RISC-V: Replace RISCV_MISALIGNED with RISCV_SCALAR_MISALIGNED
RISC-V: Detect unaligned vector accesses supported
RISC-V: Report vector unaligned access speed hwprobe
RISC-V: hwprobe: Document unaligned vector perf key
Documentation/arch/riscv/hwprobe.rst | 16 +++
arch/riscv/Kconfig | 58 ++++++++++-
arch/riscv/include/asm/cpufeature.h | 10 +-
arch/riscv/include/asm/entry-common.h | 11 --
arch/riscv/include/asm/hwprobe.h | 2 +-
arch/riscv/include/asm/vector.h | 2 +
arch/riscv/include/uapi/asm/hwprobe.h | 5 +
arch/riscv/kernel/Makefile | 3 +-
arch/riscv/kernel/copy-unaligned.h | 5 +
arch/riscv/kernel/fpu.S | 4 +-
arch/riscv/kernel/sys_hwprobe.c | 41 ++++++++
arch/riscv/kernel/traps_misaligned.c | 139 +++++++++++++++++++++++--
arch/riscv/kernel/unaligned_access_speed.c | 156 +++++++++++++++++++++++++++--
arch/riscv/kernel/vec-copy-unaligned.S | 58 +++++++++++
arch/riscv/kernel/vector.c | 2 +-
15 files changed, 474 insertions(+), 38 deletions(-)
---
base-commit: 98f7e32f20d28ec452afb208f9cffc08448a2652
change-id: 20240920-jesse_unaligned_vector-7083fd28659c
--
- Charlie
The raw value conversion to obtain a measurement in lux as
INT_PLUS_MICRO does not calculate the decimal part properly to display
it as micro (in this case microlux). It only calculates the module to
obtain the decimal part from a resolution that is 10000 times the
provided in the datasheet (0.5376 lux/cnt for the veml6030). The
resulting value must still be multiplied by 100 to make it micro.
This bug was introduced with the original implementation of the driver.
Cc: stable(a)vger.kernel.org
Fixes: 7b779f573c48 ("iio: light: add driver for veml6030 ambient light sensor")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
I found this almost by chance while testing new supported devices. The
decimal part was always suspiciously small, and when I compared samples
to the expected value according to the datasheet, it became clear what was
going on.
Example with a veml7700 (same resolution as the veml6030):
Resolution for gain = 1/8, IT = 100 ms: 0.5736 lux/cnt.
cat in_illuminance_raw in_illuminance_input
40
21.005040 -> wrong! 40 * 0.5736 is 21.504.
Tested with a veml6035 and a veml7700, the same will happen with the
original veml6030, as the operation is identical for all devices.
---
drivers/iio/light/veml6030.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/light/veml6030.c b/drivers/iio/light/veml6030.c
index d6f3b104b0e6..a0bf03e37df7 100644
--- a/drivers/iio/light/veml6030.c
+++ b/drivers/iio/light/veml6030.c
@@ -691,7 +691,7 @@ static int veml6030_read_raw(struct iio_dev *indio_dev,
}
if (mask == IIO_CHAN_INFO_PROCESSED) {
*val = (reg * data->cur_resolution) / 10000;
- *val2 = (reg * data->cur_resolution) % 10000;
+ *val2 = (reg * data->cur_resolution) % 10000 * 100;
return IIO_VAL_INT_PLUS_MICRO;
}
*val = reg;
---
base-commit: 15e7d45e786a62a211dd0098fee7c57f84f8c681
change-id: 20241016-veml6030-fix-processed-micro-616d00d555dc
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Flush the g2h worker explicitly if TLB timeout happens which is
observed on LNL and that points to the recent scheduling issue with
E-cores on LNL.
This is similar to the recent fix:
commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h
response timeout") and should be removed once there is E core
scheduling fix.
v2: Add platform check(Himal)
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2687
Cc: Badal Nilawar <badal.nilawar(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: John Harrison <John.C.Harrison(a)Intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.11+
Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com>
Reviewed-by: Matthew Brost <matthew.brost(a)intel.com>
Acked-by: Badal Nilawar <badal.nilawar(a)intel.com>
---
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 773de1f08db9..5aba6ed950b7 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -6,6 +6,7 @@
#include "xe_gt_tlb_invalidation.h"
#include "abi/guc_actions_abi.h"
+#include "compat-i915-headers/i915_drv.h"
#include "xe_device.h"
#include "xe_force_wake.h"
#include "xe_gt.h"
@@ -72,6 +73,16 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
struct xe_device *xe = gt_to_xe(gt);
struct xe_gt_tlb_invalidation_fence *fence, *next;
+ /*
+ * This is analogous to e51527233804 ("drm/xe/guc/ct: Flush g2h worker
+ * in case of g2h response timeout")
+ *
+ * TODO: Drop this change once workqueue scheduling delay issue is
+ * fixed on LNL Hybrid CPU.
+ */
+ if (IS_LUNARLAKE(xe))
+ flush_work(>->uc.guc.ct.g2h_worker);
+
spin_lock_irq(>->tlb_invalidation.pending_lock);
list_for_each_entry_safe(fence, next,
>->tlb_invalidation.pending_fences, link) {
--
2.46.0