Re: [Linaro-mm-sig] [PATCH v2] lib/scatterlist: add simple page iterator

12 Feb 2013

On Mon, 2013-02-11 at 12:54 -0800, Andrew Morton wrote:
...
On Mon, 11 Feb 2013 20:50:04 +0200
Imre Deak imre.deak@intel.com wrote:
...
Add an iterator to walk through a scatter list a page at a time starting
at a specific page offset. As opposed to the mapping iterator this is
What is "the mapping iterator"?
It's the one implemented by sg_miter_{start,stop} in scatterlist.c. It
also iterates through a scatterlist a page at a time, but it also kmaps
these pages. Since in our use case we don't need to map the pages we
needed a solution without this overhead.
...
...
meant to be small, performing well even in simple loops like collecting
all pages on the scatterlist into an array or setting up an iommu table
based on the pages' DMA address.
Where will this new macro be used?  What is driving this effort?
At the moment the only user of the macro would be the i915 driver, see
[1] for the patches that takes it into use. In the patchset the macro
was added as a DRM specific macro, but since it might be useful in the
future for other drivers too (anything using dma-buf) I'd like to add it
to a more generic place.
...
...
v2:

In each iteration sg_pgoffset pointed incorrectly at the next page not
the current one.

Signed-off-by: Imre Deak imre.deak@intel.com
include/linux/scatterlist.h |   50 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 4bd6c06..72578b5 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -231,6 +231,56 @@ size_t sg_copy_to_buffer(struct scatterlist *sgl, unsigned int nents,
  */
 #define SG_MAX_SINGLE_ALLOC		(PAGE_SIZE / sizeof(struct scatterlist))
 
+struct sg_page_iter {

struct scatterlist *sg;
int sg_pgoffset;
struct page *page;

+};
Some documentation wouldn't hurt.   What it's used for, why it exists.
Ok, will add it.
...
...
+static inline int
+sg_page_cnt(struct scatterlist *sg)
unneeded newline here.
A more typical name would be "sg_page_count".  Stripping words of their
vowels makes the symbols harder to remember.
Ok, will fix this.
...
...
+{

BUG_ON(sg->offset || sg->length & ~PAGE_MASK);

return sg->length >> PAGE_SHIFT;

+}



+static inline struct page *
+sg_page_iter_get_page(struct sg_page_iter *iter)
+{

while (iter->sg && iter->sg_pgoffset >= sg_page_cnt(iter->sg)) {
iter->sg_pgoffset -= sg_page_cnt(iter->sg);


iter->sg = sg_next(iter->sg);


}

return iter->sg ? nth_page(sg_page(iter->sg), iter->sg_pgoffset) : NULL;

+}



+static inline void
+sg_page_iter_next(struct sg_page_iter *iter)
+{

iter->sg_pgoffset++;
iter->page = sg_page_iter_get_page(iter);

+}



+static inline void
+sg_page_iter_start(struct sg_page_iter *iter, struct scatterlist *sglist,

   unsigned long pgoffset)



+{

iter->sg = sglist;
iter->sg_pgoffset = pgoffset;
iter->page = sg_page_iter_get_page(iter);

+}
All the above are undocumented also.  I guess that's acceptable if they
are only ever to be used by for_each_sg_page().  Although if that's the
case then perhaps the identifiers should be a bit more obscure-looking.
Usually we prefix them with "__" to say "this is in internal thing".
Yes, they are meant to be used only internally, so I'll add the __
prefix.
...
...
+/*


Simple sg page iterator, starting off at the given page offset. Each entry



on the sglist must start at offset 0 and can contain only full pages.



iter->page will point to the current page, iter->sg_pgoffset to the page



offset within the sg holding that page.


*/

+#define for_each_sg_page(sglist, iter, pgoffset)				\

for (sg_page_iter_start((iter), (sglist), (pgoffset));		\
    (iter)->page; sg_page_iter_next(iter))



Because all the helper functions are inlined, this will expand to a
quite large amount of code.  And large code can be slow code due to
I-cache eviction.
I don't know *how* big this thing will be because the patch didn't
include a caller and I can't be bothered writing my own.  (And the lack
of any caller means that the code will not be tested).
So, exactly how big is this thing, and how do we know it's better this
way than if we were to uninline some/all of the helpers?
I admit I only hoped compiler optimization would keep the inlined parts
at a minimum, but now I actually checked (on Intel CPU). I applied the
patchset from [1] and uninlined sg_page_iter_start as it's not
significant for speed:
size drivers/gpu/drm/i915/i915.ko
 514855	  15996	    272	 531123	  81ab3	drivers/gpu/drm/i915/i915.ko
Then uninlined all helpers:
 size drivers/gpu/drm/i915/i915.ko
 513447	  15996	    272	 529715	  81533	drivers/gpu/drm/i915/i915.ko
Since there are 8 invocations of the macro, the overhead for a single
invocation is about (531123 - 529715) / 8 = 191 bytes.
For speed, I benchmarked a simple loop which was basically:
page = vmalloc(sizeof(*page) * 1000, GFP_KERNEL);
for_each_sg_page(sglist, iter, 0)
    *page++ = iter.page;
where each entry on the sglist contained 16 consecutive pages. This
takes ~10% more time for the uninlined version to run. This is a rather
artificial test and I couldn't come up with something more real-life
using only the i915 driver's ioctl interface that would show a
significant change in speed.
So at least for now I'm ok with just uninlining all the helpers.
Thanks for the review,
Imre
[1]
http://lists.freedesktop.org/archives/intel-gfx/2013-February/024589.html

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] [PATCH v2] lib/scatterlist: add simple page iterator

Signed-off-by: Imre Deak imre.deak@intel.com