From leo.yan@linaro.org Tue Apr 27 03:45:41 2021 From: Leo Yan To: coresight@lists.linaro.org Subject: Re: [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer. Date: Tue, 27 Apr 2021 11:45:31 +0800 Message-ID: <20210427034531.GA328795@leoy-ThinkPad-X240s> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7975374093715250570==" --===============7975374093715250570== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Mon, Apr 26, 2021 at 11:40:44AM +0100, Suzuki Kuruppassery Poulose wrote: [...] > > @@ -1442,7 +1442,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_per= f_buffer *etr_perf, > > { > > long bytes; > > long pg_idx, pg_offset; > > - unsigned long head =3D etr_perf->head; > > + unsigned long head; > > char **dst_pages, *src_buf; > > struct etr_buf *etr_buf =3D etr_perf->etr_buf; > > @@ -1465,7 +1465,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_per= f_buffer *etr_perf, > > bytes =3D tmc_etr_buf_get_data(etr_buf, src_offset, to_copy, > > &src_buf); > > if (WARN_ON_ONCE(bytes <=3D 0)) > > - break; > > + return; > > bytes =3D min(bytes, (long)(PAGE_SIZE - pg_offset)); > > memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes); > > @@ -1483,6 +1483,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_per= f_buffer *etr_perf, > > /* Move source pointers */ > > src_offset +=3D bytes; > > } > > + etr_perf->head =3D (pg_idx << PAGE_SHIFT) + pg_offset; >=20 >=20 > Looking at this patch, I feel the driver is doing a couple wrong things > already. >=20 > 1) We initialise etr_perf->head every time the ETR enable is called, > irrespective of whether we actually try to enable the Hardware. e.g, >=20 > etm_0 on -> .. -> enable_etr : > etr_perf->head =3D > enable_hw() >=20 > emt_1 on -> ... -> enable_etr: > etr_perf->head =3D > already_enabled, skip enable_hw() >=20 > etm_2 on -> ... -> enable_etr: > etr_perf->head =3D > already_enable, skip enable_hw()... >=20 >=20 > This doesn't look correct as we don't know which handle is going to get the > data. This looks pointless. I'd like to convert mapping into below diagram (for system wide trace): CPU0: AUX RB (perf_output_handle_0) -> etr_perf -> +---------+ CPU1: AUX RB (perf_output_handle_1) -> etr_perf -> | etr_buf | CPU2: AUX RB (perf_output_handle_2) -> etr_perf -> | | CPU3: AUX RB (perf_output_handle_3) -> etr_perf -> +---------+ Simply to say, there have two layers for controlling ring buffer, one layer is for perf AUX ring buffer, it mainly uses the structure perf_output_handle to manage the ring buffer. And in the ETR driver, it uses structure etr_perf to manage the header pointer for copying data into ETR buffer (tagged as "etr_buf"). ETR buffer is the single one, but the structures "perf_output_handle" and "etr_perf" are per CPU. We have multiple copies for the headers and tails to manage a single buffer, but the problem is these multiple copies have not been synced with each other. > 2) Even more problematic is where we copy the AUX buffer content to. > As mentioned above, we don't know which handle is going to be the last > one to consume and we have a "etr_perf->head" that came from one of the > handles and the "pages" that came from the first handle which created a > etr_perf buffer. In sync_perf_buffer() we copy the hardware buffers to > the "pages" (say of handle_0) with "etr_perf->head" (which could be from > any other handle, say handle_2) and then we could return the number of bytes > copied, which then is used to update the last handle (could be say > handle_3), where there is no actual data copied. >=20 > To fix all of these issues, we must > 1) Stop using etr_perf->head, and instead use the handle->head where we are > called update_buffer on. >=20 > 2) Keep track of the "pages" that belong to a given "handle" and then use > those pages to copy the data to the current handle we are called to update > the buffer on. The "pages" are only allocated once, even they are attached to multiple handles. I think the right way is to use the single structure "etr_perf" and single "perf_output_handle" to manage the "pages", IOW, if there have single buffer, then we just use one copy of header and tail to manage it. The difficult thing is how to use the single one "perf_output_handle" to manage the AUX ring buffer and notify to user space. I am wandering if we can only use CPU0's perf_output_handle to manage the AUX ring buffer, if any other CPUs read out the data, they always use CPU0's perf_output_handle. Thanks, Leo --===============7975374093715250570==--