Hi Mathieu,
In case you don't remember, I work in the ARM architecture team with responsibility for debug and CoreSight. We have met before at Linaro Connect and discussed the Linaro support for CoreSight.
I have to say it's all looking very encouraging, with progress being made on CoreSight and trace support. So that's all good!
Now, I don't normally read the Linux mailing lists, but Al Grant pointed me at this patch to add CoreSight STM support [1].
[1] http://www.spinics.net/lists/arm-kernel/msg479457.html
There're some points that confuse me about this patch. Since I don't follow the mailing lists, Will Deacon suggested I mail you directly. In particular, its behaviour is different for STMv1 vs. STM-500, as well as being different to the Intel driver [2].
[2] https://github.com/torvalds/linux/blob/master/drivers/hwtracing/intel_th/st…
E.g. consider a call for the following:
uint64 data[128]; // A 64b aligned pointer
stm.packet(16, &((char *)data)[1]); // Send 16 bytes, unaligned
The Intel 32bit driver (sth_stm_driver) will:
* Send a D32 packet consisting of data[4..1] (assuming they don't fault misaligned addresses). Because size > 8, it rounds size down to 4.
* Ignore the other data.
* Only ever generates a single packet.
* For other sizes, rounds size down to power of 2 and returns number of bytes written.
The Intel 64bit driver (sth_stm_driver) will:
* Do nothing because size > 8.
* Only ever generates a zero or one packets.
* For size <= 8, rounds size down to power of 2 and returns number of bytes written.
The CoreSight 32bit driver (stm_send) will:
* Send a D1=data[1], D2=data[3..2], D4=data[7..4], D4=data[11..8], D4=data[15..12], D1=data[16] stream.
* This is very inefficient use of bandwidth.
The CoreSight 64bit driver (stm_send_64bit) will:
* Send a D8=ZeroExtend(data[7..1]), D8=data[15..8] D8=ZeroExtend(data[16])
* This function only ever sends D8 packets.
* There is no way for the decoder to work out what the original data was.
I think this function is only called from within the generic driver [3], though. It looks like stm_write() calls it a chunk at a time, with a chunk being at most 4/8 bytes, depending on the capabilities of the STM, and is expecting it to send only a single packet. This means that the code in stm_send/stm_send_64bit to deal with odd sized packets and misaligned addresses looks redundant/wrong.
[3] https://github.com/torvalds/linux/blob/master/drivers/hwtracing/stm/core.c
It looks like there might be support for other sources to link to the driver, but I could only find the stm_console when I looked.
Of course, this is based on my limited understanding of how this is used, and the current generic STM and Intel drivers. It might be that these changes have been agreed and Intel plan to change their driver to match (as I said, I don't generally follow the mailing lists). However, the different 64b and 32b behaviors on the ARM version are weird, and the unaligned pointer handling looks wrong too.
(The different behaviors on the Intel version isn't my problem. :-)
It might also be that I am reverse engineering the behaviour incorrectly.
The other point (which Will has raised on the mailing lists in the past [4]) is this code:
#ifndef CONFIG_64BIT
static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
{
asm volatile("strd %1, %0"
: "+Qo" (*(volatile u64 __force *)addr)
: "r" (val));
}
#undef writeq_relaxed
#define writeq_relaxed(v, c)__raw_writeq((__force u64) cpu_to_le64(v), c)
#endif
This isn't guaranteed to work on the ARM 32 bit architectures. The STM might receive a 64-bit write, or might receive a pair of 32-bit writes to the two addressed words *in either order*. The upshot is that this is not a valid way of writing to the STM. (The data reordering is a killer.)
The driver appears to use this if there is an STM-500 in an AArch32 system. This is because the code interrogates the STM to decide whether it supports 64-bit accesses. It should either (a) not do so, and refuse 64-bit data if AArch32, or (b) use some property of the system to decide. I would still frown on (b) because the architecture makes it clear that this is UNPREDICTABLE, meaning you're not supposed to rely on it and the device isn't allowed to advertise its behaviour.
[4] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/297379.h…
I hope this is useful.
With kind regards,
Mike.
--
Michael Williams Principal Engineer ARM Limited
www.arm.com The Architecture For The Digital World
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi All,
What is the best practice to redirect the Ftrace output over STM?
Can we use the existing module "stm_console" and redirect the Ftrace output
as a kernel message by :
# cat trace_pipe > /dev/kmsg
Other question please:
How can the "stm_core" module knows the STM base address?
Best regards,
Jonatan
Good afternoon/evening Mike (or anyone else in a position to answer),
I wish we could have that conversation on IRC as I am sure my question
will be inaccurate. I'm also well aware the weekend has started in
the UK so it could also wait until Monday. But I'll try to be as
precise as possible....
When decoding STM traces, how is the library aware of the masterIDs
present on the system? I suppose there is an external way of passing
that information to the decoder... The metadata contained in the
perf.data file is irrelevant when dealing with STM traces.
Some clarification would be appreciated.
Many thanks,
Mathieu
Hi,
What's the plan for CoreSight-related discussion at Connect? It would be a good
opportunity to raise awareness of the CoreSight framework among silicon vendors.
Al
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 28 January 2016 at 02:18, Eric Long <eric.long(a)linaro.org> wrote:
> On 28 January 2016 at 00:02, Mathieu Poirier <mathieu.poirier(a)linaro.org> wrote:
>>
>> Before going down a CPU will receive notifications, at which point ETM
>> configuration registers are saved and tracing interrupted. After a
>> CPU went back up the opposite process takes place, i.e registers are
>> restored and tracing can resume. The problem with the CPUidle
>> approach is that code that bring up and take down CPUs isn't traced.
>>
>> When dealing with GPDs, the code that switches off a core is always
>> the last thing to be executed. The same thing in the opposite
>> direction - code that switch on a core is executed first. As such the
>> whole path is traced. That is why I favoured that solution but again,
>> it is a long way before it can be implemented properly in the kernel.
>> At least some people are working on it and I may very well end up
>> taking part of that effort but for now, I simply don't have the time.
>>
>> Whether CPUidle or GPD gets implemented, both will have to deal with
>> the intragation of coresight with the perf framework - something that
>> will be delicate.
>>
>
> Hi Mathieu,
>
> Thank you for your detailed explanation. Now I probably understand your
> plan. I am very much looking forward to fix the ETM retention issue.
Go ahead, it is a very interesting problem to tackle. From hereon
let's call the feature "ETM save/restore" - that way people understand
what you are referring to.
Normally conversations such as this one would be held in a (semi)
public forum - the team at Linaro uses "coresight(a)lists.linaro.org".
That way people are aware of what is going on and can provide input.
Simply subscribe to the list [1] and we will grant you access.
>
> My original plan was to add retention action into secure code, because
> the switch actions of the core power status was be done in secure mode.
> The kernel uses psci to comminicate with secure sys.
Right, but from a non-secure point of view PSCI is simply an API. All
the processing leading to that call happens in the non-secure world.
Adding save/restore capabilities in the secure world means that you'd
have to synchronise with the CS driver in the non-secure world, and
that would be a lot of spaghetti code.
>
> About the PM runtime and GPD, may be there are some details I need to
> understand, and I will spend time to read the PM runtime and GPD code.
> If there are any questions I will let you know. And also if there is
> anything I can do, please let me know. Thank you.
Well, I advise to start with cpu notifiers... The ETMv3 already has
support for hotplug notification but that code needs revision. ETMv4
doesn't register any notifier.
Mathieu
>
> Best regards,
> Eric
[1]. https://lists.linaro.org/mailman/listinfo/coresight
Hi Mike,
The "Appendix A. CoreSight Port List" in [1] documents that many of
CoreSight components have a clock signal called 'ATCLK', but I didn't
see the description on STM ATCLK. So is there the 'ATCLK' signal on
STM too? At what situations the CoreSight components need ATCLK?
Many thanks,
Chunyan
[1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0314h/Cihejf…
Hey Mike,
A colleague of mine, Linus Walleij, stumbled on a board equipped with
an ETM11 and started looking into it. From his point of view it looks
simple enough to endeavour providing driver support for it.
What kind of trace format does it output? How far are we in openCSD
to support traces generated by an ETM11?
Thanks,
Mathieu
Tor, Mike and others,
A thought dawned on me right after the openCSD meeting on IRC today...
>From where I stand things are looking good. Mike has the decoding
part pretty much reined in and Tor is moving fast on the perf front.
All that's left to do is putting things together, which is what this
email is about.
Do you guys think it is a good idea to meetup somewhere before Bangkok?
Serge is away between the 15th and 19th of February - we could take
that opportunity to meet up and work together on openCSD for a week.
That would speed up integration and expose everyone to other
components of the solution. What are your thoughts on that?
I am fully aware that respective company management would have to
approve. We'd also need to find a location... Linaro has offices in
Cambridge and Boston so that's always a possibility. We could also
meet in Blackburn or Houston where ARM and TI are located.
Please think about it and talk to your respective management.
Chunyan, you are welcome to join us.
Best regards,
Mathieu
Good day Mathieu and All,
I revised this documentation a little, and re-sending this to you
(cc'ed maillist and Mark this time) simply because I want to explain
how generic STM deals with master IDs you mentioned in another mail
yesterday. To find the answer, you can directly go to 4. - "Allocate
one master and a range of channels for stm_source class device:"
There are a few concepts / glossaries I need to explain here:
i) 'stm_source class device' - is used to write trace data to an stm
device once linked, like 'stm_ftrace' you wrote for testing
integration of Ftrace with STM before.
ii) STM device - means the real hardware device of STM which can be
found under /dev/ directory on the target.
* STM policy management of master/channel:
1. Policy management source code:
driver/hwtrace/stm/policy.c
2. Policy management introduce: (excerpts from Documentation/trace/stm.txt)
1) On the receiving end of this STP stream (the decoder side), trace
sources can only be identified by master/channel combination, so in
order for the decoder to be able to make sense of the trace that
involves multiple trace sources, it needs to be able to map those
master/channel pairs to the trace sources that it understands.
2) To solve this mapping problem, stm class provides a policy
management mechanism via configfs, that allows defining rules that map
string identifiers to ranges of masters and channels. If these rules
(policy) are consistent with what decoder expects, it will be able to
properly process the trace data.
3. Create policy rules on target:
1) mount -t configfs none /config (the directory 'stp-policy/' will
appear under 'config/')
2) Create policy rule for given STM device:
mkdir /config/stp-policy/10006000.stm.xxx
(‘10006000.stm’ is a STM device name to which this policy applies,
this is just an example. ‘xxx’ is an arbitrary string which is
separated with device by a dot; but "10006000.stm" must be same with
the one which can be found under /dev directory)
3) Create policy rules for a given stm_source class device:
mkdir /config/stp-policy/10006000.stm.my_policy/stm_ftrace
(‘stm_ftrace’ is a registered device of stm_source class which can be
linked with an STM device, and then use this 'stm_ftrace' to write
trace data into STM and finally output to the sink buffer. Note that
the rule's name must be same with the name of stm_source class device)
4) After created policy rule, there will be two files 'master' and
'channel' under rule's directory, for example:
# cat /config/stp-policy/10006000.stm.my_policy/stm_ftrace/masters
0 127
# cat /config/stp-policy/10006000.stm.my_policy/stm_ftrace/channels
0 65535
These values mean the range of master/channels which can be used on
the stm_source device whose name is the same with the rule's name
(stm_ftrace in this case), the default values come from the
configuration [1] of STM device (i.e. 10006000.stm in this case)
These master/channel files are configurable and this rule would be
applied on the stm_source class device (stm_ftrace) when linking this
stm_source class device with any STM device (10006000.stm in this
case), for example, if you want to link 'stm_ftrace' with
'10006000.stm', you can create the directories
'10006000.stm.xxx/stm_ftrace/', and when linking happens, the rule
under this directory will be applied on 'stm_ftrace'.
4. Allocate one master and a range of channels for stm_source class device:
1 ) Like mentioned in 3. above, the policy rule which has the same
name with the stm_source class device will be applied on this
stm_source class device, and then this device can choose its one
master and required number of channels from the range which this
policy rule defined in "masters" and "channels" files (if there isn't
policy rule with the same name, the default configuration of STM
device will be applied.) for outputting traces. The number of
required channels is configured in the stm_source class device driver.
2) When linking stm_source with STM device happens, the program will
poll all masters from either the start master configured in the
"masters" file under the policy rule directory if one policy rule was
built for this stm_source class device or otherwise struct
stm_data::sw_start which is configured in this STM device driver, to
see if there are free channels on the current master, and the number
of freed continuous channels must be larger than or equal to
the quantity of required channels. The first eligible master and
channel range will be configured as the output path of
this stm_source class device.
5. Allocate master/channels for applictions:
1) Set policy rule which should include 'assigned master', 'first
assigned channel', 'the number of required channels' by means of
stm_file ioctl interface.
2) If an application program doesn't set policy rule for itself, when
this application writing data into STM device, a rule whose name is
'default' will be applied, if the 'default' policy cannot be
discovered either, like what I wrote above, the default configuration
of STM device will be applied.
Regards,
Chunyan
[1]
https://git.linaro.org/people/zhang.chunyan/linux.git/blob/5234c83d13a4eb12…
Hi Mathieu,
I've finished the verification of STM master stuff on Juno. We indeed
don't need to care about which master the trace data should go from
software side.
The attachments are the result, one is the trace output from Ftrace
another is the decoding result with Mike's CS-STM decoding library.
FYI:
The CPUs and associated STM Mater id on Juno are:
CPU Master ID
0 A53 core0 0x44
1 A57 core0 0x40
2 A57 core1 0x41
3 A53 core1 0x45
4 A53 core2 0x46
5 A53 core3 0x47
Regard,
Chunyan