From: Yuanfang Zhang <quic_yuanfang(a)quicinc.com>
As recommended by section 4.3.7 ("Synchronization when using system
instructions to progrom the trace unit") of ARM IHI 0064H.b, the
self-hosted trace analyzer must perform a Context synchronization
event between writing to the TRCPRGCTLR and reading the TRCSTATR.
Fixes: ebddaad09e10 ("coresight: etm4x: Add missing single-shot control API to sysfs")
Signed-off-by: Yuanfang Zhang <quic_yuanfang(a)quicinc.com>
---
Changes in V3:
1. Remove dsb(sy) after polling TRCSTATR.
2. Add isb() after polling TRCSTATR.
---
.../hwtracing/coresight/coresight-etm4x-core.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 66d44a404ad0..c6ea00bba0cc 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -531,7 +531,6 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata)
* As recommended by section 4.3.7 ("Synchronization when using the
* memory-mapped interface") of ARM IHI 0064D
*/
- dsb(sy);
isb();
done:
@@ -906,10 +905,25 @@ static void etm4_disable_hw(void *info)
tsb_csync();
etm4x_relaxed_write32(csa, control, TRCPRGCTLR);
+ /*
+ * As recommended by section 4.3.7 ("Synchronization when using system
+ * instructions to progrom the trace unit") of ARM IHI 0064H.b, the
+ * self-hosted trace analyzer must perform a Context synchronization
+ * event between writing to the TRCPRGCTLR and reading the TRCSTATR.
+ */
+ if (!csa->io_mem)
+ isb();
+
/* wait for TRCSTATR.PMSTABLE to go to '1' */
if (coresight_timeout(csa, TRCSTATR, TRCSTATR_PMSTABLE_BIT, 1))
dev_err(etm_dev,
"timeout while waiting for PM stable Trace Status\n");
+ /*
+ * As recommended by section 4.3.7 (Synchronization of register updates)
+ * of ARM IHI 0064H.b.
+ */
+ isb();
+
/* read the status of the single shot comparators */
for (i = 0; i < drvdata->nr_ss_cmp; i++) {
config->ss_status[i] =
--
2.34.1
This patch series is rebased on coresight-next-v6.12.rc4
* Patches 1 & 2 adds support for allocation of trace buffer pages from
reserved RAM
* Patches 3 & 4 adds support for saving metadata at the time of kernel panic
* Patch 5 adds support for reading trace data captured at the time of panic
* Patches 6 & 7 adds support for disabling coresight blocks at the time of panic
* Patch 8: Gives the full description about this feature as part of documentation
v12 is posted here,
https://lore.kernel.org/linux-arm-kernel/20241129084714.3057080-1-lcherian@…
Changelog from v12:
* Fixed wrong buffer pointer passed to coresigh_insert_barrier_packet
* tmc_read_prepare/unprepare_crashdata need to be called only once and
hence removed from read path and added to tmc_probe
* tmc_read_prepare_crashdata renamed to tmc_prepare_crashdata and
avoid taking locks as its moved to probe function.
* Introduced read status flag, "reading" specific to reserved buffer to keep the
reserved buffer reading independent of the regular buffer.
* open/release ops for reserved buffer has to take care only about the
set/unset the "reading" status flag as the reserved buffer is prepared
during the probe time itself.
* Few other trivial changes
Changelog from v11:
Convert all commands to literal code blocks, that was missed out in v11.
No other code changes.
Changelog from v10:
* Converted all csdev_access_* to readl functions in tmc_panic_sync_*
* Added "tmc" prefix for register snapshots in struct tmc_crash_metadata
* Converted dev_info to dev_dbg in panic handlers
* Converted dsb to dmb in panic handlers
* Fixed marking metadata as invalid when a user is trying to use the
reserved buffer. Earlier this was wrongly set at the time of reading
reserved trace buffer.
* Moved common validation checks to is_tmc_crashdata_valid and minor
code rearrangements for efficiency
* Got rid of sink specific prepare/unprepare invocations
* Got rid of full from struct tmc_resrv_buf
* While reading crashdata, size is now calculated from metadata instead
of relying on reserved buffer size populated by dtb
* Minor documenation fixes
Changelog from v9:
* Add common helper function of_tmc_get_reserved_resource_by_name
for better code reuse
* Reserved buffer validity and crashdata validity has been separated to
avoid interdependence
* New fields added to crash metadata: version, ffcr, ffsr, mode
* Version checks added for metadata validation
* Special file /dev/crash_tmc_xxx would be available only when
crash metadata is valid
* Removed READ_CRASHDATA mode meant for special casing crashdata reads.
Instead, dedicated read function added for crashdata reads from reserved
buffer which is common for both ETR and ETF sinks as well.
* Documentation added to Documentation/tracing/coresight/panic.rst
Changelog from v8:
* Added missing exit path on error in __tmc_probe.
* Few whitespace fixes, checkpatch fixes.
* With perf sessions honouring stop_on_flush sysfs attribute,
removed redundant variable stop_on_flush_en.
Changelog from v7:
* Fixed breakage on perf test -vvvv "arm coresight".
No issues seen with and without "resrv" buffer mode
* Moved the crashdev registration into a separate function.
* Removed redundant variable in tmc_etr_setup_crashdata_buf
* Avoided a redundant memcpy in tmc_panic_sync_etf.
* Tested kernel panic with trace session started uisng perf.
Please see the title "Perf based testing" below for details.
For this, stop_on_flush sysfs attribute is taken into
consideration while starting perf sessions as well.
Changelog from v6:
* Added special device files for reading crashdata, so that
read_prevboot mode flag is removed.
* Added new sysfs TMC device attribute, stop_on_flush.
Stop on flush trigger event is disabled by default.
User need to explicitly enable this from sysfs for panic stop
to work.
* Address parameter for panicstop ETM configuration is
chosen as kernel "panic" address by default.
* Added missing tmc_wait_for_tmcready during panic handling
* Few other misc code rearrangements.
Changelog from v5:
* Fixed issues reported by CONFIG_DEBUG_ATOMIC_SLEEP
* Fixed a memory leak while reading data from /dev/tmc_etrx in
READ_PREVBOOT mode
* Tested reading trace data from crashdump kernel
Changelog from v4:
* Device tree binding
- Description is made more explicit on the usage of reserved memory
region
- Mismatch in memory region names in dts binding and driver fixed
- Removed "mem" suffix from the memory region names
* Rename "struct tmc_register_snapshot" -> "struct tmc_crash_metadata",
since it contains more than register snapshot.
Related variables are named accordingly.
* Rename struct tmc_drvdata members
resrv_buf -> crash_tbuf
metadata -> crash_mdata
* Size field in metadata refers to RSZ register and hence indicates the
size in 32 bit words. ETR metadata follows this convention, the same
has been extended to ETF metadata as well.
* Added crc32 for more robust metadata and tracedata validation.
* Added/modified dev_dbg messages during metadata validation
* Fixed a typo in patch 5 commit description
Changelog from v3:
* Converted the Coresight ETM driver change to a named configuration.
RFC tag has been removed with this change.
* Fixed yaml issues reported by "make dt_binding_check"
* Added names for reserved memory regions 0 and 1
* Added prevalidation checks for metadata processing
* Fixed a regression introduced in RFC v3
- TMC Status register was getting saved wrongly
* Reverted memremap attribute changes from _WB to _WC to match
with the dma map attributes
* Introduced reserved buffer mode specific .sync op.
This fixes a possible crash when reserved buffer mode was used in
normal trace capture, due to unwanted dma maintenance operations.
Linu Cherian (8):
dt-bindings: arm: coresight-tmc: Add "memory-region" property
coresight: tmc-etr: Add support to use reserved trace memory
coresight: core: Add provision for panic callbacks
coresight: tmc: Enable panic sync handling
coresight: tmc: Add support for reading crash data
coresight: tmc: Stop trace capture on FlIn
coresight: config: Add preloaded configuration
Documentation: coresight: Panic support
.../bindings/arm/arm,coresight-tmc.yaml | 26 ++
Documentation/trace/coresight/panic.rst | 362 ++++++++++++++++++
drivers/hwtracing/coresight/Makefile | 2 +-
.../coresight/coresight-cfg-preload.c | 2 +
.../coresight/coresight-cfg-preload.h | 2 +
.../hwtracing/coresight/coresight-cfg-pstop.c | 83 ++++
drivers/hwtracing/coresight/coresight-core.c | 42 ++
.../hwtracing/coresight/coresight-tmc-core.c | 308 ++++++++++++++-
.../hwtracing/coresight/coresight-tmc-etf.c | 92 ++++-
.../hwtracing/coresight/coresight-tmc-etr.c | 184 ++++++++-
drivers/hwtracing/coresight/coresight-tmc.h | 105 +++++
include/linux/coresight.h | 12 +
12 files changed, 1208 insertions(+), 12 deletions(-)
create mode 100644 Documentation/trace/coresight/panic.rst
create mode 100644 drivers/hwtracing/coresight/coresight-cfg-pstop.c
--
2.34.1
Hi
On 11/11/2024 14:12, Alexander Shishkin wrote:
> Uwe Kleine-König <u.kleine-koenig(a)baylibre.com> writes:
>
>> After commit 0edb555a65d1 ("platform: Make platform_driver::remove()
>> return void") .remove() is (again) the right callback to implement for
>> platform drivers.
>>
>> Convert all platform drivers below drivers/hwtracing to use .remove(),
>> with the eventual goal to drop struct platform_driver::remove_new(). As
>> .remove() and .remove_new() have the same prototypes, conversion is done
>> by just changing the structure member name in the driver initializer.
>>
>> Also adapt some whitespace to make indention consistent.
>>
>> Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)baylibre.com>
>
> Acked-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
>
>> ---
>> Hello,
>>
>> I did a single patch for all of drivers/hwtracing. While I usually
>> prefer to do one logical change per patch, this seems to be
>> overengineering here as the individual changes are really trivial and
>> shouldn't be much in the way for stable backports. But I'll happily
>> split the patch if you prefer it split. Maybe split for coresight vs.
>> intel_th? Also if you object the indentation stuff, I can rework that.
>
> I'm fine with it as is.
>
>> This is based on today's next, if conflicts arise when you apply it at
>> some later time and don't want to resolve them, feel free to just drop
>> the changes to the conflicting files. I'll notice and followup at a
>> later time then. Or ask me for a fixed resend. (Having said that, I
>> recommend b4 am -3 + git am -3 which should resolve most conflicts just
>> fine.)
>
> Does anybody want to pick this up or should I? I'm fine either way, but
> if there are any conflicts they won't be from my end of things, so it
> might make sense to take it via the coresight path.
I am happy to take them via coresight tree and queue them for v6.14
Suzuki
>
> Thanks,
> --
> Alex
Introduction of TPDM MCMB(Multi-lane Continuous Multi Bit) subunit
MCMB (Multi-lane CMB) is a special form of CMB dataset type. MCMB
subunit has the same number and usage of registers as CMB subunit.
Just like the CMB subunit, the MCMB subunit must be configured prior
to enablement. This series adds support for TPDM to configure the
MCMB subunit.
Once this series patches are applied properly, the new tpdm nodes for
should be observed at the tpdm path /sys/bus/coresight/devices/tpdm*
which supports MCMB subunit. All sysfs files of CMB subunit TPDM are
included in MCMB subunit TPDM. On this basis, MCMB subunit TPDM will
have new sysfs files to select and enable the lane.
Changes in V3:
1. Update the date in ABI file.
2. Remove the unrelated change.
3. Correct typo.
4. Move the CMB_CR related definitions together.
Changes in V2:
1. Use tdpm_data->cmb instead of (tpdm_has_cmb_dataset(tpdm_data) ||
tpdm_has_mcmb_dataset(tpdm_data)) for cmb dataset support.
2. Embed mcmb_dataset struct into cmb struct.
3. Update the date and version in sysfs-bus-coresight-devices-tpdm
Link: https://patchwork.kernel.org/project/linux-arm-msm/patch/20241105123940.396…
Mao Jinlong (1):
coresight-tpdm: Add MCMB dataset support
Tao Zhang (2):
coresight-tpdm: Add support to select lane
coresight-tpdm: Add support to enable the lane for MCMB TPDM
.../testing/sysfs-bus-coresight-devices-tpdm | 15 +++
drivers/hwtracing/coresight/coresight-tpda.c | 7 +-
drivers/hwtracing/coresight/coresight-tpdm.c | 120 +++++++++++++++++-
drivers/hwtracing/coresight/coresight-tpdm.h | 33 +++--
4 files changed, 155 insertions(+), 20 deletions(-)
--
2.17.1
On 24/12/2024 10:13 am, Yeoreum Yun wrote:
> Hi James.
>>> diff --git a/drivers/hwtracing/coresight/coresight-syscfg.c b/drivers/hwtracing/coresight/coresight-syscfg.c
>>> index a70c1454b410..dfa7dcbaf25d 100644
>>> --- a/drivers/hwtracing/coresight/coresight-syscfg.c
>>> +++ b/drivers/hwtracing/coresight/coresight-syscfg.c
>>> @@ -953,7 +953,8 @@ int cscfg_config_sysfs_activate(struct cscfg_config_desc *config_desc, bool acti
>>> cscfg_mgr->sysfs_active_config = cfg_hash;
>>> } else {
>>> /* disable if matching current value */
>>> - if (cscfg_mgr->sysfs_active_config == cfg_hash) {
>>> + if (cscfg_mgr->sysfs_active_config == cfg_hash &&
>>> + !atomic_read(&cscfg_mgr->sys_enable_cnt)) {
>>> _cscfg_deactivate_config(cfg_hash);
>>
>> So is sys_enable_cnt a global value? If a fix is needed doesn't it need to
>> be a per-config refcount?
>>
>> Say you have two active configs, sys_enable_cnt is now 2, how do you disable
>> one without it always skipping here when the other config is enabled?
>
> Sorry to miss this one!.
> Because when one configuration is enabled,
> cscfg_mgr->sysfs_active_config becomes !NULL, so it wouldn't happen
> there is no two activate configurations. so sys_enable_cnt wouldn't be
> 2.
>
>
>
Maybe "sys_enabled" is a better name then. Count implies that it can be
more than one. And the doc could be updated to say it's only ever 0 or 1.
But what about my other point about enabled always being a subset of
active? Can we not change "sys_active_cnt" to a more generic "refcount",
then both activation and enabling steps increment that same refcount,
because they are both technically users of the config. Then you can
solve the problem without adding another separate counter. I think
that's potentially easier to understand.
Although the easiest is just locking every function with the mutex (or a
spinlock if it also needs to be used from Perf). Obviously all these
atomics are harder to get right than that, and this isn't performance
sensitive in any way.
On 18/12/2024 8:48 am, Yeoreum Yun wrote:
> While enable active config via cscfg_csdev_enable_active_config(),
> active config could be deactivated via configfs' sysfs interface.
> This could make UAF issue in below scenario:
>
> CPU0 CPU1
> (perf or sysfs enable) load module
> cscfg_load_config_sets()
> activate config. // sysfs
> (sys_active_cnt == 1)
> ...
> cscfg_csdev_enable_active_config()
> lock(csdev->cscfg_csdev_lock)
> // here load config activate by CPU1
> unlock(csdev->cscfg_csdev_lock)
>
> deactivate config // sysfs
> (sys_activec_cnt == 0)
Assuming the left side does Perf, are there some steps missing? To get
to enable_active_config() you first need to pass through etm_setup_aux()
-> cscfg_activate_config(). That would also increment sys_active_cnt
which would leave it at 2 if there were two concurrent sessions. Then it
would end up as 1 here after deactivate, rather than 0.
It's not explicitly mentioned in the sequence but I'm assuming the left
and right are the same config, but I suppose it could be an issue with
different configs too.
> cscfg_unload_config_sets()
> unload module
On the left cscfg_activate_config() also bumps the module refcount, so
unload wouldn't cause a UAF here as far as I can see.
>
> // access to config_desc which freed
> // while unloading module.
> cfs_csdev_enable_config
>
> To address this, introduce sys_enable_cnt in cscfg_mgr to prevent
> deactivate while there is enabled configuration.
>
> Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
> ---
> .../hwtracing/coresight/coresight-etm4x-core.c | 3 +++
> drivers/hwtracing/coresight/coresight-syscfg.c | 18 ++++++++++++++++--
> drivers/hwtracing/coresight/coresight-syscfg.h | 2 ++
> 3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index 86893115df17..6218ef40acbc 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -986,6 +986,9 @@ static void etm4_disable_sysfs(struct coresight_device *csdev)
> smp_call_function_single(drvdata->cpu, etm4_disable_hw, drvdata, 1);
>
> raw_spin_unlock(&drvdata->spinlock);
> +
> + cscfg_csdev_disable_active_config(csdev);
> +
> cpus_read_unlock();
>
> /*
> diff --git a/drivers/hwtracing/coresight/coresight-syscfg.c b/drivers/hwtracing/coresight/coresight-syscfg.c
> index a70c1454b410..dfa7dcbaf25d 100644
> --- a/drivers/hwtracing/coresight/coresight-syscfg.c
> +++ b/drivers/hwtracing/coresight/coresight-syscfg.c
> @@ -953,7 +953,8 @@ int cscfg_config_sysfs_activate(struct cscfg_config_desc *config_desc, bool acti
> cscfg_mgr->sysfs_active_config = cfg_hash;
> } else {
> /* disable if matching current value */
> - if (cscfg_mgr->sysfs_active_config == cfg_hash) {
> + if (cscfg_mgr->sysfs_active_config == cfg_hash &&
> + !atomic_read(&cscfg_mgr->sys_enable_cnt)) {
> _cscfg_deactivate_config(cfg_hash);
So is sys_enable_cnt a global value? If a fix is needed doesn't it need
to be a per-config refcount?
Say you have two active configs, sys_enable_cnt is now 2, how do you
disable one without it always skipping here when the other config is
enabled?
> cscfg_mgr->sysfs_active_config = 0;
> } else
> @@ -1055,6 +1056,12 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> if (!atomic_read(&cscfg_mgr->sys_active_cnt))
> return 0;
>
> + /*
> + * increment sys_enable_cnt first to prevent deactivate the config
> + * while enable active config.
> + */
> + atomic_inc(&cscfg_mgr->sys_enable_cnt);
> +
> /*
> * Look for matching configuration - set the active configuration
> * context if found.
> @@ -1098,6 +1105,10 @@ int cscfg_csdev_enable_active_config(struct coresight_device *csdev,
> raw_spin_unlock_irqrestore(&csdev->cscfg_csdev_lock, flags);
> }
> }
> +
> + if (!config_csdev_active || err)
> + atomic_dec(&cscfg_mgr->sys_enable_cnt);
> +
> return err;
> }
> EXPORT_SYMBOL_GPL(cscfg_csdev_enable_active_config);
> @@ -1129,8 +1140,10 @@ void cscfg_csdev_disable_active_config(struct coresight_device *csdev)
> if (config_csdev) {
> if (!config_csdev->enabled)
> config_csdev = NULL;
> - else
> + else {
> config_csdev->enabled = false;
> + atomic_dec(&cscfg_mgr->sys_enable_cnt);
> + }
> }
> csdev->active_cscfg_ctxt = NULL;
> raw_spin_unlock_irqrestore(&csdev->cscfg_csdev_lock, flags);
> @@ -1179,6 +1192,7 @@ static int cscfg_create_device(void)
> INIT_LIST_HEAD(&cscfg_mgr->config_desc_list);
> INIT_LIST_HEAD(&cscfg_mgr->load_order_list);
> atomic_set(&cscfg_mgr->sys_active_cnt, 0);
> + atomic_set(&cscfg_mgr->sys_enable_cnt, 0);
> cscfg_mgr->load_state = CSCFG_NONE;
>
> /* setup the device */
> diff --git a/drivers/hwtracing/coresight/coresight-syscfg.h b/drivers/hwtracing/coresight/coresight-syscfg.h
> index 66e2db890d82..2fc397919985 100644
> --- a/drivers/hwtracing/coresight/coresight-syscfg.h
> +++ b/drivers/hwtracing/coresight/coresight-syscfg.h
> @@ -38,6 +38,7 @@ enum cscfg_load_ops {
> * @config_desc_list: List of system configuration descriptors to load into registered devices.
> * @load_order_list: Ordered list of owners for dynamically loaded configurations.
> * @sys_active_cnt: Total number of active config descriptor references.
> + * @sys_enable_cnt: Total number of enable of active config descriptor references.
When these are next to each other it makes me wonder why active_cnt
isn't enough to prevent unloading? Enabled is always a subset of active,
so as long as you gate unloads or modifications on the existing active
count it seems fine?
> * @cfgfs_subsys: configfs subsystem used to manage configurations.
> * @sysfs_active_config:Active config hash used if CoreSight controlled from sysfs.
> * @sysfs_active_preset:Active preset index used if CoreSight controlled from sysfs.
> @@ -50,6 +51,7 @@ struct cscfg_manager {
> struct list_head config_desc_list;
> struct list_head load_order_list;
> atomic_t sys_active_cnt;
> + atomic_t sys_enable_cnt;
> struct configfs_subsystem cfgfs_subsys;
> u32 sysfs_active_config;
> int sysfs_active_preset;
On 21/12/2024 11:54 am, Marc Zyngier wrote:
> On Fri, 20 Dec 2024 17:32:17 +0000,
> James Clark <james.clark(a)linaro.org> wrote:
>>
>>
>>
>> On 20/12/2024 5:05 pm, Marc Zyngier wrote:
>>> On Wed, 27 Nov 2024 10:01:23 +0000,
>>> James Clark <james.clark(a)linaro.org> wrote:
>>>>
>>>> Currently in nVHE, KVM has to check if TRBE is enabled on every guest
>>>> switch even if it was never used. Because it's a debug feature and is
>>>> more likely to not be used than used, give KVM the TRBE buffer status to
>>>> allow a much simpler and faster do-nothing path in the hyp.
>>>>
>>>> This is always called with preemption disabled except for probe/hotplug
>>>> which gets wrapped with preempt_disable().
>>>>
>>>> Protected mode disables trace regardless of TRBE (because
>>>> guest_trfcr_el1 is always 0), which was not previously done. HAS_TRBE
>>>> becomes redundant, but HAS_TRF is now required for this.
>>>>
>>>> Signed-off-by: James Clark <james.clark(a)linaro.org>
>>>> ---
>>>> arch/arm64/include/asm/kvm_host.h | 10 +++-
>>>> arch/arm64/kvm/debug.c | 25 ++++++++--
>>>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 51 +++++++++++---------
>>>> drivers/hwtracing/coresight/coresight-trbe.c | 5 ++
>>>> 4 files changed, 65 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index 7e3478386351..ba251caa593b 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -611,7 +611,8 @@ struct cpu_sve_state {
>>>> */
>>>> struct kvm_host_data {
>>>> #define KVM_HOST_DATA_FLAG_HAS_SPE 0
>>>> -#define KVM_HOST_DATA_FLAG_HAS_TRBE 1
>>>> +#define KVM_HOST_DATA_FLAG_HAS_TRF 1
>>>> +#define KVM_HOST_DATA_FLAG_TRBE_ENABLED 2
>>>> unsigned long flags;
>>>> struct kvm_cpu_context host_ctxt;
>>>> @@ -657,6 +658,9 @@ struct kvm_host_data {
>>>> u64 mdcr_el2;
>>>> } host_debug_state;
>>>> + /* Guest trace filter value */
>>>> + u64 guest_trfcr_el1;
>>>
>>> Guest value? Or host state while running the guest? If the former,
>>> then this has nothing to do here. If the latter, this should be
>>> spelled out (trfcr_in_guest?), and the comment amended.
>>>
>>>> +
>>>> /* Number of programmable event counters (PMCR_EL0.N) for this CPU */
>>>> unsigned int nr_event_counters;
>>>> };
>>>> @@ -1381,6 +1385,8 @@ static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
>>>> void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr);
>>>> void kvm_clr_pmu_events(u64 clr);
>>>> bool kvm_set_pmuserenr(u64 val);
>>>> +void kvm_enable_trbe(void);
>>>> +void kvm_disable_trbe(void);
>>>> #else
>>>> static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {}
>>>> static inline void kvm_clr_pmu_events(u64 clr) {}
>>>> @@ -1388,6 +1394,8 @@ static inline bool kvm_set_pmuserenr(u64 val)
>>>> {
>>>> return false;
>>>> }
>>>> +static inline void kvm_enable_trbe(void) {}
>>>> +static inline void kvm_disable_trbe(void) {}
>>>> #endif
>>>> void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu);
>>>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>>>> index dd9e139dfd13..0c340ae7b5d1 100644
>>>> --- a/arch/arm64/kvm/debug.c
>>>> +++ b/arch/arm64/kvm/debug.c
>>>> @@ -314,7 +314,26 @@ void kvm_init_host_debug_data(void)
>>>> !(read_sysreg_s(SYS_PMBIDR_EL1) & PMBIDR_EL1_P))
>>>> host_data_set_flag(HAS_SPE);
>>>> - if (cpuid_feature_extract_unsigned_field(dfr0,
>>>> ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
>>>> - !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
>>>> - host_data_set_flag(HAS_TRBE);
>>>> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceFilt_SHIFT))
>>>> + host_data_set_flag(HAS_TRF);
>>>> }
>>>> +
>>>> +void kvm_enable_trbe(void)
>>>> +{
>>>> + if (has_vhe() || is_protected_kvm_enabled() ||
>>>> + WARN_ON_ONCE(preemptible()))
>>>> + return;
>>>> +
>>>> + host_data_set_flag(TRBE_ENABLED);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(kvm_enable_trbe);
>>>> +
>>>> +void kvm_disable_trbe(void)
>>>> +{
>>>> + if (has_vhe() || is_protected_kvm_enabled() ||
>>>> + WARN_ON_ONCE(preemptible()))
>>>> + return;
>>>> +
>>>> + host_data_clear_flag(TRBE_ENABLED);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(kvm_disable_trbe);
>>>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>> index 858bb38e273f..9479bee41801 100644
>>>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>>>> @@ -51,32 +51,39 @@ static void __debug_restore_spe(u64 pmscr_el1)
>>>> write_sysreg_el1(pmscr_el1, SYS_PMSCR);
>>>> }
>>>> -static void __debug_save_trace(u64 *trfcr_el1)
>>>> +static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>>>> {
>>>> - *trfcr_el1 = 0;
>>>> + *saved_trfcr = read_sysreg_el1(SYS_TRFCR);
>>>> + write_sysreg_el1(new_trfcr, SYS_TRFCR);
>>>> - /* Check if the TRBE is enabled */
>>>> - if (!(read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E))
>>>> + /* No need to drain if going to an enabled state or from disabled state */
>>>> + if (new_trfcr || !*saved_trfcr)
>>>
>>> What if TRFCR_EL1.TS is set to something non-zero? I'd rather you
>>> check for the E*TRE bits instead of assuming things.
>>>
>>
>> Yeah it's probably better that way. TS is actually always set when any
>> tracing session starts and then never cleared, so doing it the simpler
>> way made it always flush even after tracing finished, which probably
>> wasn't great.
>
> Quite. Can you please *test* these things?
>
> [...]
>
Sorry to confuse things I wasn't 100% accurate here, yes it's tested and
working. It works because of the split set/clear_trfcr() API. The
Coresight driver specifically calls clear at the end of the session
rather than a set of 0. That signals this function not to be called so
there's no excessive swapping.
Secondly, the buffer flushing case is triggered by TRBE_ENABLED, which
forces TRFCR to 0, so "if (new_trfcr)" is an OK way to gate the flush.
>>>> @@ -253,8 +256,10 @@ static void trbe_drain_and_disable_local(struct trbe_cpudata *cpudata)
>>>> static void trbe_reset_local(struct trbe_cpudata *cpudata)
>>>> {
>>>> + preempt_disable();
>>>> trbe_drain_and_disable_local(cpudata);
>>>> write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>>>> + preempt_enable();
>>>
>>> This looks terribly wrong. If you need to disable preemption here, why
>>> doesn't the critical section cover all register accesses? Surely you
>>> don't want to nuke another CPU's context?
>>>
>>> But looking at the calling sites, this makes even less sense. The two
>>> callers of this thing mess with *per-CPU* interrupts. Dealing with
>>> per-CPU interrupts in preemptible context is a big no-no (hint: they
>>> start with a call to smp_processor_id()).
>>>
>>> So what is this supposed to ensure?
>>>
>>> M.
>>>
>>
>> These ones are only intended to silence the
>> WARN_ON_ONCE(preemptible()) in kvm_enable_trbe() when this is called
>> from boot/hotplug (arm_trbe_enable_cpu()). Preemption isn't disabled,
>> but a guest can't run at that point either.
>>
>> The "real" calls to kvm_enable_trbe() _are_ called from an atomic
>> context. I think there was a previous review comment about when it was
>> safe to call the KVM parts of this change, which is why I added the
>> warning making sure it was always called with preemption disabled. But
>> actually I could remove the warning and these preempt_disables() and
>> replace them with a comment.
>
> You should keep the WARN_ON(), and either *never* end-up calling this
> stuff during a CPUHP event, or handle the fact that preemption isn't
> initialised yet. For example by checking whether the current CPU is
> online.
>
> But this sort of random spreading of preemption disabling is not an
> acceptable outcome.
>
> M.
>
I'll look into this again. This was my initial attempt but couldn't find
any easily accessible state that allowed to to be done this way. Maybe I
missed something, but the obvious cpu_online() etc were already true at
this point.
Thanks
James
On 21/12/2024 12:34 pm, Marc Zyngier wrote:
> On Wed, 27 Nov 2024 10:01:24 +0000,
> James Clark <james.clark(a)linaro.org> wrote:
>>
>> For nVHE, switch the filter value in and out if the Coresight driver
>> asks for it. This will support filters for guests when sinks other than
>> TRBE are used.
>>
>> For VHE, just write the filter directly to TRFCR_EL1 where trace can be
>> used even with TRBE sinks.
>>
>> Signed-off-by: James Clark <james.clark(a)linaro.org>
>> ---
>> arch/arm64/include/asm/kvm_host.h | 5 +++++
>> arch/arm64/kvm/debug.c | 28 ++++++++++++++++++++++++++++
>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 1 +
>> 3 files changed, 34 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index ba251caa593b..cce07887551b 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -613,6 +613,7 @@ struct kvm_host_data {
>> #define KVM_HOST_DATA_FLAG_HAS_SPE 0
>> #define KVM_HOST_DATA_FLAG_HAS_TRF 1
>> #define KVM_HOST_DATA_FLAG_TRBE_ENABLED 2
>> +#define KVM_HOST_DATA_FLAG_GUEST_FILTER 3
>
> Guest filter what? This is meaningless.
>
KVM_HOST_DATA_FLAG_SWAP_TRFCR maybe?
>> unsigned long flags;
>>
>> struct kvm_cpu_context host_ctxt;
>> @@ -1387,6 +1388,8 @@ void kvm_clr_pmu_events(u64 clr);
>> bool kvm_set_pmuserenr(u64 val);
>> void kvm_enable_trbe(void);
>> void kvm_disable_trbe(void);
>> +void kvm_set_trfcr(u64 guest_trfcr);
>> +void kvm_clear_trfcr(void);
>> #else
>> static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {}
>> static inline void kvm_clr_pmu_events(u64 clr) {}
>> @@ -1396,6 +1399,8 @@ static inline bool kvm_set_pmuserenr(u64 val)
>> }
>> static inline void kvm_enable_trbe(void) {}
>> static inline void kvm_disable_trbe(void) {}
>> +static inline void kvm_set_trfcr(u64 guest_trfcr) {}
>> +static inline void kvm_clear_trfcr(void) {}
>> #endif
>>
>> void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu);
>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> index 0c340ae7b5d1..9266f2776991 100644
>> --- a/arch/arm64/kvm/debug.c
>> +++ b/arch/arm64/kvm/debug.c
>> @@ -337,3 +337,31 @@ void kvm_disable_trbe(void)
>> host_data_clear_flag(TRBE_ENABLED);
>> }
>> EXPORT_SYMBOL_GPL(kvm_disable_trbe);
>> +
>> +void kvm_set_trfcr(u64 guest_trfcr)
>
> Again. Is this the guest's view? or the host view while running the
> guest? I asked the question on the previous patch, and you didn't
> reply.
>
Ah sorry missed that one:
> Guest value? Or host state while running the guest? If the former,
> then this has nothing to do here. If the latter, this should be
> spelled out (trfcr_in_guest?), and the comment amended.
Yes, the latter, guest TRFCR reads are undef anyway. I can rename this
and the host_data variable to be trfcr_in_guest.
>> +{
>> + if (is_protected_kvm_enabled() || WARN_ON_ONCE(preemptible()))
>> + return;
>> +
>> + if (has_vhe())
>> + write_sysreg_s(guest_trfcr, SYS_TRFCR_EL12);
>> + else {
>> + *host_data_ptr(guest_trfcr_el1) = guest_trfcr;
>> + host_data_set_flag(GUEST_FILTER);
>> + }
>
> Oh come on. This is basic coding style, see section 3 in
> Documentation/process/coding-style.rst.
>
Oops, I'd have thought checkpatch could catch something like that. Will fix.
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_set_trfcr);
>> +
>> +void kvm_clear_trfcr(void)
>> +{
>> + if (is_protected_kvm_enabled() || WARN_ON_ONCE(preemptible()))
>> + return;
>> +
>> + if (has_vhe())
>> + write_sysreg_s(0, SYS_TRFCR_EL12);
>> + else {
>> + *host_data_ptr(guest_trfcr_el1) = 0;
>> + host_data_clear_flag(GUEST_FILTER);
>> + }
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_clear_trfcr);
>
> Why do we have two helpers? Clearly, calling kvm_set_trfcr() with
> E{1,0}TRE=={0,0} should result in *disabling* things. Except it
> doesn't, and you should fix it. Once that is fixed, it becomes
obvious> that kvm_clear_trfcr() serves no purpose.
>
With only one kvm_set_trfcr() there's no way to distinguish swapping in
a 0 value or stopping swapping altogether. I thought we wanted a single
flag that gated the register accesses so the hyp mostly does nothing?
With only kvm_set_trfcr() you first need to check FEAT_TRF then you need
to compare the real register with trfcr_in_guest to know whether to swap
or not every time.
Actually I think some of the previous versions had something like this
but it was a bit more complicated.
Maybe set/clear_trfcr() aren't great names. Perhaps
kvm_set_trfcr_in_guest() and kvm_disable_trfcr_in_guest()? With the
second one hinting that it stops the swapping regardless of what the
values are.
I don't think calling kvm_set_trfcr() with E{1,0}TRE=={0,0} is actually
broken in this version, it means that the Coresight driver wants that
value to be installed for guests. So it should actually _enable_
swapping in the value of 0, not disable anything.
> To sum it up, KVM's API should reflect the architecture instead of
> making things up.
>
We had kvm_set_trfcr(u64 host_trfcr, u64 guest_trfcr) on the last
version, which also serves the same purpose I mentioned above because
you can check if they're the same or not and disable swapping. I don't
know if that counts as reflecting the architecture better. But Oliver
mentioned he preferred it more "intent" based which is why I added the
clear_trfcr().
>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> index 9479bee41801..7edee7ace433 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> @@ -67,6 +67,7 @@ static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>> static bool __trace_needs_switch(void)
>> {
>> return host_data_test_flag(TRBE_ENABLED) ||
>> + host_data_test_flag(GUEST_FILTER) ||
>> (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRF));
>
> Wouldn't it make more sense to just force the "GUEST_FILTER" flag in
> the pKVM case, and drop the 3rd term altogether?
>
> M.
>
Yep we can set GUEST_FILTER once at startup and it gets dropped along
with HAS_TRF. That's a lot simpler.
Thanks
James
On 20/12/2024 5:05 pm, Marc Zyngier wrote:
> On Wed, 27 Nov 2024 10:01:23 +0000,
> James Clark <james.clark(a)linaro.org> wrote:
>>
>> Currently in nVHE, KVM has to check if TRBE is enabled on every guest
>> switch even if it was never used. Because it's a debug feature and is
>> more likely to not be used than used, give KVM the TRBE buffer status to
>> allow a much simpler and faster do-nothing path in the hyp.
>>
>> This is always called with preemption disabled except for probe/hotplug
>> which gets wrapped with preempt_disable().
>>
>> Protected mode disables trace regardless of TRBE (because
>> guest_trfcr_el1 is always 0), which was not previously done. HAS_TRBE
>> becomes redundant, but HAS_TRF is now required for this.
>>
>> Signed-off-by: James Clark <james.clark(a)linaro.org>
>> ---
>> arch/arm64/include/asm/kvm_host.h | 10 +++-
>> arch/arm64/kvm/debug.c | 25 ++++++++--
>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 51 +++++++++++---------
>> drivers/hwtracing/coresight/coresight-trbe.c | 5 ++
>> 4 files changed, 65 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 7e3478386351..ba251caa593b 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -611,7 +611,8 @@ struct cpu_sve_state {
>> */
>> struct kvm_host_data {
>> #define KVM_HOST_DATA_FLAG_HAS_SPE 0
>> -#define KVM_HOST_DATA_FLAG_HAS_TRBE 1
>> +#define KVM_HOST_DATA_FLAG_HAS_TRF 1
>> +#define KVM_HOST_DATA_FLAG_TRBE_ENABLED 2
>> unsigned long flags;
>>
>> struct kvm_cpu_context host_ctxt;
>> @@ -657,6 +658,9 @@ struct kvm_host_data {
>> u64 mdcr_el2;
>> } host_debug_state;
>>
>> + /* Guest trace filter value */
>> + u64 guest_trfcr_el1;
>
> Guest value? Or host state while running the guest? If the former,
> then this has nothing to do here. If the latter, this should be
> spelled out (trfcr_in_guest?), and the comment amended.
>
>> +
>> /* Number of programmable event counters (PMCR_EL0.N) for this CPU */
>> unsigned int nr_event_counters;
>> };
>> @@ -1381,6 +1385,8 @@ static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
>> void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr);
>> void kvm_clr_pmu_events(u64 clr);
>> bool kvm_set_pmuserenr(u64 val);
>> +void kvm_enable_trbe(void);
>> +void kvm_disable_trbe(void);
>> #else
>> static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {}
>> static inline void kvm_clr_pmu_events(u64 clr) {}
>> @@ -1388,6 +1394,8 @@ static inline bool kvm_set_pmuserenr(u64 val)
>> {
>> return false;
>> }
>> +static inline void kvm_enable_trbe(void) {}
>> +static inline void kvm_disable_trbe(void) {}
>> #endif
>>
>> void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu);
>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> index dd9e139dfd13..0c340ae7b5d1 100644
>> --- a/arch/arm64/kvm/debug.c
>> +++ b/arch/arm64/kvm/debug.c
>> @@ -314,7 +314,26 @@ void kvm_init_host_debug_data(void)
>> !(read_sysreg_s(SYS_PMBIDR_EL1) & PMBIDR_EL1_P))
>> host_data_set_flag(HAS_SPE);
>>
>> - if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) &&
>> - !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P))
>> - host_data_set_flag(HAS_TRBE);
>> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceFilt_SHIFT))
>> + host_data_set_flag(HAS_TRF);
>> }
>> +
>> +void kvm_enable_trbe(void)
>> +{
>> + if (has_vhe() || is_protected_kvm_enabled() ||
>> + WARN_ON_ONCE(preemptible()))
>> + return;
>> +
>> + host_data_set_flag(TRBE_ENABLED);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_enable_trbe);
>> +
>> +void kvm_disable_trbe(void)
>> +{
>> + if (has_vhe() || is_protected_kvm_enabled() ||
>> + WARN_ON_ONCE(preemptible()))
>> + return;
>> +
>> + host_data_clear_flag(TRBE_ENABLED);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_disable_trbe);
>> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> index 858bb38e273f..9479bee41801 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
>> @@ -51,32 +51,39 @@ static void __debug_restore_spe(u64 pmscr_el1)
>> write_sysreg_el1(pmscr_el1, SYS_PMSCR);
>> }
>>
>> -static void __debug_save_trace(u64 *trfcr_el1)
>> +static void __trace_do_switch(u64 *saved_trfcr, u64 new_trfcr)
>> {
>> - *trfcr_el1 = 0;
>> + *saved_trfcr = read_sysreg_el1(SYS_TRFCR);
>> + write_sysreg_el1(new_trfcr, SYS_TRFCR);
>>
>> - /* Check if the TRBE is enabled */
>> - if (!(read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E))
>> + /* No need to drain if going to an enabled state or from disabled state */
>> + if (new_trfcr || !*saved_trfcr)
>
> What if TRFCR_EL1.TS is set to something non-zero? I'd rather you
> check for the E*TRE bits instead of assuming things.
>
Yeah it's probably better that way. TS is actually always set when any
tracing session starts and then never cleared, so doing it the simpler
way made it always flush even after tracing finished, which probably
wasn't great.
>> return;
>> - /*
>> - * Prohibit trace generation while we are in guest.
>> - * Since access to TRFCR_EL1 is trapped, the guest can't
>> - * modify the filtering set by the host.
>> - */
>> - *trfcr_el1 = read_sysreg_el1(SYS_TRFCR);
>> - write_sysreg_el1(0, SYS_TRFCR);
>> +
>> isb();
>> - /* Drain the trace buffer to memory */
>> tsb_csync();
>> }
>>
>> -static void __debug_restore_trace(u64 trfcr_el1)
>> +static bool __trace_needs_switch(void)
>> {
>> - if (!trfcr_el1)
>> - return;
>> + return host_data_test_flag(TRBE_ENABLED) ||
>> + (is_protected_kvm_enabled() && host_data_test_flag(HAS_TRF));
>> +}
>>
>> - /* Restore trace filter controls */
>> - write_sysreg_el1(trfcr_el1, SYS_TRFCR);
>> +static void __trace_switch_to_guest(void)
>> +{
>> + /* Unsupported with TRBE so disable */
>> + if (host_data_test_flag(TRBE_ENABLED))
>> + *host_data_ptr(guest_trfcr_el1) = 0;
>> +
>> + __trace_do_switch(host_data_ptr(host_debug_state.trfcr_el1),
>> + *host_data_ptr(guest_trfcr_el1));
>> +}
>> +
>> +static void __trace_switch_to_host(void)
>> +{
>> + __trace_do_switch(host_data_ptr(guest_trfcr_el1),
>> + *host_data_ptr(host_debug_state.trfcr_el1));
>> }
>>
>> void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>> @@ -84,9 +91,9 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>> /* Disable and flush SPE data generation */
>> if (host_data_test_flag(HAS_SPE))
>> __debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
>> - /* Disable and flush Self-Hosted Trace generation */
>> - if (host_data_test_flag(HAS_TRBE))
>> - __debug_save_trace(host_data_ptr(host_debug_state.trfcr_el1));
>> +
>> + if (__trace_needs_switch())
>> + __trace_switch_to_guest();
>> }
>>
>> void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
>> @@ -98,8 +105,8 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
>> {
>> if (host_data_test_flag(HAS_SPE))
>> __debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
>> - if (host_data_test_flag(HAS_TRBE))
>> - __debug_restore_trace(*host_data_ptr(host_debug_state.trfcr_el1));
>> + if (__trace_needs_switch())
>> + __trace_switch_to_host();
>> }
>>
>> void __debug_switch_to_host(struct kvm_vcpu *vcpu)
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 96a32b213669..9c0f8c43e6fe 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -18,6 +18,7 @@
>> #include <asm/barrier.h>
>> #include <asm/cpufeature.h>
>> #include <linux/vmalloc.h>
>> +#include <linux/kvm_host.h>
>
> Ordering of include files.
>
>>
>> #include "coresight-self-hosted-trace.h"
>> #include "coresight-trbe.h"
>> @@ -221,6 +222,7 @@ static inline void set_trbe_enabled(struct trbe_cpudata *cpudata, u64 trblimitr)
>> */
>> trblimitr |= TRBLIMITR_EL1_E;
>> write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> + kvm_enable_trbe();
>>
>> /* Synchronize the TRBE enable event */
>> isb();
>> @@ -239,6 +241,7 @@ static inline void set_trbe_disabled(struct trbe_cpudata *cpudata)
>> */
>> trblimitr &= ~TRBLIMITR_EL1_E;
>> write_sysreg_s(trblimitr, SYS_TRBLIMITR_EL1);
>> + kvm_disable_trbe();
>>
>> if (trbe_needs_drain_after_disable(cpudata))
>> trbe_drain_buffer();
>> @@ -253,8 +256,10 @@ static void trbe_drain_and_disable_local(struct trbe_cpudata *cpudata)
>>
>> static void trbe_reset_local(struct trbe_cpudata *cpudata)
>> {
>> + preempt_disable();
>> trbe_drain_and_disable_local(cpudata);
>> write_sysreg_s(0, SYS_TRBLIMITR_EL1);
>> + preempt_enable();
>
> This looks terribly wrong. If you need to disable preemption here, why
> doesn't the critical section cover all register accesses? Surely you
> don't want to nuke another CPU's context?
>
> But looking at the calling sites, this makes even less sense. The two
> callers of this thing mess with *per-CPU* interrupts. Dealing with
> per-CPU interrupts in preemptible context is a big no-no (hint: they
> start with a call to smp_processor_id()).
>
> So what is this supposed to ensure?
>
> M.
>
These ones are only intended to silence the WARN_ON_ONCE(preemptible())
in kvm_enable_trbe() when this is called from boot/hotplug
(arm_trbe_enable_cpu()). Preemption isn't disabled, but a guest can't
run at that point either.
The "real" calls to kvm_enable_trbe() _are_ called from an atomic
context. I think there was a previous review comment about when it was
safe to call the KVM parts of this change, which is why I added the
warning making sure it was always called with preemption disabled. But
actually I could remove the warning and these preempt_disables() and
replace them with a comment.
Thanks
James