From suzuki.poulose@arm.com Tue Apr 27 10:00:57 2021 From: Suzuki K Poulose To: coresight@lists.linaro.org Subject: Re: [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer. Date: Tue, 27 Apr 2021 11:00:51 +0100 Message-ID: In-Reply-To: <20210427034531.GA328795@leoy-ThinkPad-X240s> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4676193709422025884==" --===============4676193709422025884== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On 27/04/2021 04:45, Leo Yan wrote: > On Mon, Apr 26, 2021 at 11:40:44AM +0100, Suzuki Kuruppassery Poulose wrote: >=20 > [...] >=20 >>> @@ -1442,7 +1442,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_per= f_buffer *etr_perf, >>> { >>> long bytes; >>> long pg_idx, pg_offset; >>> - unsigned long head =3D etr_perf->head; >>> + unsigned long head; >>> char **dst_pages, *src_buf; >>> struct etr_buf *etr_buf =3D etr_perf->etr_buf; >>> @@ -1465,7 +1465,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_per= f_buffer *etr_perf, >>> bytes =3D tmc_etr_buf_get_data(etr_buf, src_offset, to_copy, >>> &src_buf); >>> if (WARN_ON_ONCE(bytes <=3D 0)) >>> - break; >>> + return; >>> bytes =3D min(bytes, (long)(PAGE_SIZE - pg_offset)); >>> memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes); >>> @@ -1483,6 +1483,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_per= f_buffer *etr_perf, >>> /* Move source pointers */ >>> src_offset +=3D bytes; >>> } >>> + etr_perf->head =3D (pg_idx << PAGE_SHIFT) + pg_offset; >> >> >> Looking at this patch, I feel the driver is doing a couple wrong things >> already. >> >> 1) We initialise etr_perf->head every time the ETR enable is called, >> irrespective of whether we actually try to enable the Hardware. e.g, >> >> etm_0 on -> .. -> enable_etr : >> etr_perf->head =3D >> enable_hw() >> >> emt_1 on -> ... -> enable_etr: >> etr_perf->head =3D >> already_enabled, skip enable_hw() >> >> etm_2 on -> ... -> enable_etr: >> etr_perf->head =3D >> already_enable, skip enable_hw()... >> >> >> This doesn't look correct as we don't know which handle is going to get the >> data. This looks pointless. >=20 > I'd like to convert mapping into below diagram (for system wide trace): >=20 > CPU0: AUX RB (perf_output_handle_0) -> etr_perf -> +---------+ > CPU1: AUX RB (perf_output_handle_1) -> etr_perf -> | etr_buf | > CPU2: AUX RB (perf_output_handle_2) -> etr_perf -> | | > CPU3: AUX RB (perf_output_handle_3) -> etr_perf -> +---------+ > To make it more clear: CPU0: AUX RB (perf_output_handle_0) -> etr_perf0 -> +---------+ CPU1: AUX RB (perf_output_handle_1) -> etr_perf1 -> |etr_buf0 | CPU2: AUX RB (perf_output_handle_2) -> etr_perf2 -> | | CPU3: AUX RB (perf_output_handle_3) -> etr_perf3 -> +---------+ > Simply to say, there have two layers for controlling ring buffer, one > layer is for perf AUX ring buffer, it mainly uses the structure > perf_output_handle to manage the ring buffer. And in the ETR driver, > it uses structure etr_perf to manage the header pointer for copying > data into ETR buffer (tagged as "etr_buf"). >=20 > ETR buffer is the single one, but the structures "perf_output_handle" > and "etr_perf" are per CPU. We have multiple copies for the headers and minor Correction, they are "per-event" to be precise. And there are=20 events per-CPU in a system wide mode or task mode (but not per-thread=20 mode). So, you are correct > tails to manage a single buffer, but the problem is these multiple > copies have not been synced with each other. >=20 >> 2) Even more problematic is where we copy the AUX buffer content to. >> As mentioned above, we don't know which handle is going to be the last >> one to consume and we have a "etr_perf->head" that came from one of the >> handles and the "pages" that came from the first handle which created a >> etr_perf buffer. In sync_perf_buffer() we copy the hardware buffers to >> the "pages" (say of handle_0) with "etr_perf->head" (which could be from >> any other handle, say handle_2) and then we could return the number of byt= es >> copied, which then is used to update the last handle (could be say >> handle_3), where there is no actual data copied. This is not valid and am relieved that the driver is correct. The=20 assumption that there is only one etr_perf per ETR is incorrect as pictured above. >> >> To fix all of these issues, we must >> 1) Stop using etr_perf->head, and instead use the handle->head where we are >> called update_buffer on. >> >> 2) Keep track of the "pages" that belong to a given "handle" and then use >> those pages to copy the data to the current handle we are called to update >> the buffer on. >=20 > The "pages" are only allocated once, even they are attached to multiple > handles. I think the right way is to use the single structure I assume you mean the pages in the etr_buf and not etr_perf right ? > "etr_perf" and single "perf_output_handle" to manage the "pages", IOW, > if there have single buffer, then we just use one copy of header and > tail to manage it. I think this is not needed and the way we do things are fine and the=20 patch as such looks correct to me. The perf_output_handle is per-event and nothing that we can combine=20 with. etr_perf captures what the "ouput_handle" stands for and is=20 something necessary for syncing the buffer. Now coming back to this patch, I understand that the sync_perf could be=20 called with the polling patches multiple times. But don't we do a perf_output_handle_end() each of the time we wake up ? (I haven't looked at the later patches yet). I would expect: perf_aux_output_begin() -> update the etr_perf-> head when we sync the buffer, we do : Poll-> sync_buffer-> perf_aux_output_end() and perf_aux_output_begin()=20 -> update etr_perf->head. Kind regards Suzuki --===============4676193709422025884==--