Hi,
While working with high-order page allocations (using `alloc_pages') I've
encountered some issues* with certain APIs and wanted to get a better
understanding of support for those APIs with high-order pages on ARM. In
short, I'm trying to give userspace access to those pages by using
`vm_insert_page' in an mmap handler. Without further ado, some
questions:
o vm_insert_page doesn't seem to work with high-order pages (it
eventually calls __flush_dcache_page which assumes pages of size
PAGE_SIZE). Is this analysis correct or am I missing something?
Things work fine if I use `remap_pfn_range' instead of
`vm_insert_page'. Things also seem to work if I use `vm_insert_page'
with an array of struct page * of size PAGE_SIZE (derived from the
high-order pages by picking out the PAGE_SIZE pages with
nth_page)...
o There's a comment in __dma_alloc (dma-alloc.c) to the effect that
__GFP_COMP is not supported on ARM. Is this true? The commit that
introduced this comment (ea2e7057) was actually ported from avr32
(3611553ef) so I'm curious about the basis for this claim...
I've tried pages of order 8 and order 4. The gfp flags I'm passing to
`alloc_pages' are (GFP_KERNEL | __GFP_HIGHMEM | __GFP_COMP).
Thanks!
* Some issues = in userspace mmap the buffer whose underlying mmap
handler is the one mentioned above, memset that to something and then
immediately check that the bytes are equal to whatever we just memset.
(With huge pages and vm_insert_page this test fails).
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
Add an iterator to walk through a scatter list a page at a time starting
at a specific page offset. As opposed to the mapping iterator this is
meant to be small, performing well even in simple loops like collecting
all pages on the scatterlist into an array or setting up an iommu table
based on the pages' DMA address.
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
include/linux/scatterlist.h | 48 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
[ Resending with proper email addresses. ]
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 4bd6c06..d22851c 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -231,6 +231,54 @@ size_t sg_copy_to_buffer(struct scatterlist *sgl, unsigned int nents,
*/
#define SG_MAX_SINGLE_ALLOC (PAGE_SIZE / sizeof(struct scatterlist))
+struct sg_page_iter {
+ struct scatterlist *sg;
+ int sg_pgoffset;
+ struct page *page;
+};
+
+static inline int
+sg_page_cnt(struct scatterlist *sg)
+{
+ BUG_ON(sg->offset || sg->length & ~PAGE_MASK);
+
+ return sg->length >> PAGE_SHIFT;
+}
+
+static inline void
+sg_page_iter_next(struct sg_page_iter *iter)
+{
+ while (iter->sg && iter->sg_pgoffset >= sg_page_cnt(iter->sg)) {
+ iter->sg_pgoffset -= sg_page_cnt(iter->sg);
+ iter->sg = sg_next(iter->sg);
+ }
+
+ if (iter->sg) {
+ iter->page = nth_page(sg_page(iter->sg), iter->sg_pgoffset);
+ iter->sg_pgoffset++;
+ }
+}
+
+static inline void
+sg_page_iter_start(struct sg_page_iter *iter, struct scatterlist *sglist,
+ unsigned long pgoffset)
+{
+ iter->sg = sglist;
+ iter->sg_pgoffset = pgoffset;
+ iter->page = NULL;
+
+ sg_page_iter_next(iter);
+}
+
+/*
+ * Simple sg page iterator, starting off at the given page offset. Each entry
+ * on the sglist must start at offset 0 and can contain only full pages.
+ * iter->page will point to the current page, iter->sg_pgoffset to the page
+ * offset within the sg holding that page.
+ */
+#define for_each_sg_page(sglist, iter, pgoffset) \
+ for (sg_page_iter_start((iter), (sglist), (pgoffset)); \
+ (iter)->sg; sg_page_iter_next(iter))
/*
* Mapping sg iterator
--
1.7.9.5
Hi Imre!
On Sat, Feb 09, 2013 at 05:27:33PM +0200, Imre Deak wrote:
> Add a helper to walk through a scatter list a page at a time. Needed by
> upcoming patches fixing the scatter list walking logic in the i915 driver.
Nice patch, but I think this would make a rather nice addition to the
common scatterlist api in scatterlist.h, maybe called sg_page_iter.
There's already another helper which does cpu mappings, but it has a
different use-case (gives you the page mapped, which we don't neeed and
can cope with not page-aligned sg tables). With dma-buf using sg tables I
expect more users of such a sg page iterator to pop up. Most possible
users of this will hang around on linaro-mm-sig, so please also cc that
besides the usual suspects.
Cheers, Daniel
>
> Signed-off-by: Imre Deak <imre.deak(a)intel.com>
> ---
> include/drm/drmP.h | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 44 insertions(+)
>
> diff --git a/include/drm/drmP.h b/include/drm/drmP.h
> index fad21c9..0c0c213 100644
> --- a/include/drm/drmP.h
> +++ b/include/drm/drmP.h
> @@ -1578,6 +1578,50 @@ extern int drm_sg_alloc_ioctl(struct drm_device *dev, void *data,
> extern int drm_sg_alloc(struct drm_device *dev, struct drm_scatter_gather * request);
> extern int drm_sg_free(struct drm_device *dev, void *data,
> struct drm_file *file_priv);
> +struct drm_sg_iter {
> + struct scatterlist *sg;
> + int sg_offset;
Imo using sg_pfn_offset (i.e. sg_offset >> PAGE_SIZE) would make it
clearer that this is all about iterating page-aligned sg tables.
> + struct page *page;
> +};
> +
> +static inline int
> +__drm_sg_iter_seek(struct drm_sg_iter *iter)
> +{
> + while (iter->sg && iter->sg_offset >= iter->sg->length) {
> + iter->sg_offset -= iter->sg->length;
> + iter->sg = sg_next(iter->sg);
And adding a WARN_ON(sg->legnth & ~PAGE_MASK); here would enforce that.
> + }
> +
> + return iter->sg ? 0 : -1;
> +}
> +
> +static inline struct page *
> +drm_sg_iter_next(struct drm_sg_iter *iter)
> +{
> + struct page *page;
> +
> + if (__drm_sg_iter_seek(iter))
> + return NULL;
> +
> + page = nth_page(sg_page(iter->sg), iter->sg_offset >> PAGE_SHIFT);
> + iter->sg_offset = (iter->sg_offset + PAGE_SIZE) & PAGE_MASK;
> +
> + return page;
> +}
> +
> +static inline struct page *
> +drm_sg_iter_start(struct drm_sg_iter *iter, struct scatterlist *sg,
> + unsigned long offset)
> +{
> + iter->sg = sg;
> + iter->sg_offset = offset;
> +
> + return drm_sg_iter_next(iter);
> +}
> +
> +#define drm_for_each_sg_page(iter, sg, pgoffset) \
> + for ((iter)->page = drm_sg_iter_start((iter), (sg), (pgoffset));\
> + (iter)->page; (iter)->page = drm_sg_iter_next(iter))
Again, for the initialization I'd go with page numbers, not an offset in
bytes.
>
> /* ATI PCIGART support (ati_pcigart.h) */
> extern int drm_ati_pcigart_init(struct drm_device *dev,
> --
> 1.7.10.4
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx(a)lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
On allocation or kmalloc failure in system heap allocate, the exit path
iterates over the allocated page infos and frees the allocated pages
and page info. The same page info structure is used as loop iterator.
Use the safe version of list iterator.
Signed-off-by: Nishanth Peethambaran <nishanth(a)broadcom.com>
---
drivers/gpu/ion/ion_system_heap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/ion/ion_system_heap.c
b/drivers/gpu/ion/ion_system_heap.c
index c1061a8..d079e2b 100644
--- a/drivers/gpu/ion/ion_system_heap.c
+++ b/drivers/gpu/ion/ion_system_heap.c
@@ -200,7 +200,7 @@ static int ion_system_heap_allocate(struct ion_heap *heap,
err1:
kfree(table);
err:
- list_for_each_entry(info, &pages, list) {
+ list_for_each_entry_safe(info, tmp_info, &pages, list) {
free_buffer_page(sys_heap, buffer, info->page, info->order);
kfree(info);
}
--
1.7.9.5
Hi,
I'm working on Android Linux Kernel Vesion 3.0.15 and seeing a
"deadlock" in the ashmem driver, while handling mmap request. I seek
your support in finding the correct fix.
The locks that involved in the dead lock are
1) mm->mmap_sem
2) ashmem_mutex
The following is the sequence of events that leads to the deadlock.
There are two threads A and B that belong to the same process
(system_server) and hence share the mm struct.
A1) In the A's context an mmap system call is made with an fd of ashmem
A2) The system call sys_mmap_pgoff acquires the mmap_sem of the "mm"
and sleeps before calling the .mmap of ashmem i.e before calling
ashmem_mmap
Now the thread B runs and proceeds to do the following
B1) In the B's context ashmem ioctl with option ASHMEM_SET_NAME is called.
B2) Now the code proceeds to acquire the ashmem_mutex and performs a
"copy_from_user"
B3) copy_from_user raises a valid exception to copy the data from user
space and proceeds to handle it gracefully, do_DataAbort -->
do_page_fault
B4) In do_page_fault it finds that the mm->mmap_sem is not available
(Note A & B share the mm) since A has it and sleeps
Now the thread A runs again
A3) It proceeds to call ashmem_mmap and tries to acquired
ashmem_mutex, which is not available (is with B) and sleeps.
Now A has acquired mmap_sem and waits for B to release ashmem_mutex
B has acquired ashmem_mutex and waits for the mmap_sem to be
available, which is held by A
This creates a dead lock in the system.
I'm not sure how to use these locks in such a way as to prevent this
scenario. Any suggestions would be of great help.
Workaround:
One possible work around is to replace the mutex_lock call made in the
ashmem_mmap with mutex_trylock and if it fails, wait for few
milliseconds and try back for few iterations and finally give up after
few iterations. This will bring the system out deadlock if this
scneario happens. I myself feel that this suggestion is not clean. But
I'm unable to think of anything.
Is there any suggestion to avoid this scenario.
Warm Regards,
Shankar
Hello,
This is the last missing piece to let us efficiently use large DMA
buffers on systems with lots of memory, which have support for himem
enabled. The first patch adds support for CMA regions placed in high
memory zones, the second one also enables allocations of individual
pages from high memory zone for IOMMU-mapped devices. Those two changes
let us to significantly save low memory for other tasks.
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (2):
ARM: dma-mapping: add support for CMA regions placed in highmem zone
ARM: dma-mapping: use himem for DMA buffers for IOMMU-mapped devices
arch/arm/mm/dma-mapping.c | 70 +++++++++++++++++++++++++++++++++------------
1 file changed, 52 insertions(+), 18 deletions(-)
--
1.7.9.5
Hi,
On 2013-02-06 00:27, Laurent Pinchart wrote:
> Hello,
>
> We've hosted a CDF meeting at the FOSDEM on Sunday morning. Here's a summary
> of the discussions.
Thanks for the summary. I've been on a longish leave, and just got back,
so I haven't read the recent CDF discussions on lists yet. I thought
I'll start by replying to this summary first =).
> 0. Abbreviations
> ----------------
>
> DBI - Display Bus Interface, a parallel video control and data bus that
> transmits data using parallel data, read/write, chip select and address
> signals, similarly to 8051-style microcontroller parallel busses. This is a
> mixed video control and data bus.
>
> DPI - Display Pixel Interface, a parallel video data bus that transmits data
> using parallel data, h/v sync and clock signals. This is a video data bus
> only.
>
> DSI - Display Serial Interface, a serial video control and data bus that
> transmits data using one or more differential serial lines. This is a mixed
> video control and data bus.
In case you'll re-use these abbrevs in later posts, I think it would be
good to mention that DPI is a one-way bus, whereas DBI and DSI are
two-way (perhaps that's implicit with control bus, though).
> 1. Goals
> --------
>
> The meeting started with a brief discussion about the CDF goals.
>
> Tomi Valkeinin and Tomasz Figa have sent RFC patches to show their views of
> what CDF could/should be. Many others have provided very valuable feedback.
> Given the early development stage propositions were sometimes contradictory,
> and focused on different areas of interest. We have thus started the meeting
> with a discussion about what CDF should try to achieve, and what it shouldn't.
>
> CDF has two main purposes. The original goal was to support display panels in
> a platform- and subsystem-independent way. While mostly useful for embedded
> systems, the emergence of platforms such as Intel Medfield and ARM-based PCs
> that blends the embedded and PC worlds makes panel support useful for the PC
> world as well.
>
> The second purpose is to provide a cross-subsystem interface to support video
> encoders. The idea originally came from a generalisation of the original RFC
> that supported panels only. While encoder support is considered as lower
> priority than display panel support by developers focussed on display
> controller driver (Intel, Renesas, ST Ericsson, TI), companies that produce
> video encoders (Analog Devices, and likely others) don't share that point of
> view and would like to provide a single encoder driver that can be used in
> both KMS and V4L2 drivers.
What is an encoder? Something that takes a video signal in, and lets the
CPU store the received data to memory? Isn't that a decoder?
Or do you mean something that takes a video signal in, and outputs a
video signal in another format? (transcoder?)
If the latter, I don't see them as lower priority. If we use CDF also
for SoC internal components (which I think would be great), then every
OMAP board has transcoders.
I'm not sure about the vocabulary on this area, but a normal OMAP
scenario could have a following video pipeline:
1. encoder (OMAP's DISPC, reads pixels from memory and outputs parallel RGB)
2. transcoder (OMAP's DSI, gets parallel RGB and outputs DSI)
3. transcoder (external DSI-to-LVDS chip)
4. panel (LVDS panel)
Even in the case where a panel would be connected directly to the OMAP,
there would be the internal transcoder.
> 2. Subsystems
> -------------
>
> Display panels are used in conjunction with FBDEV and KMS drivers. There was
> to the audience knowledge no V4L2 driver that needs to explicitly handle
> display panels. Even though at least one V4L2 output drivers (omap_vout) can
> output video to a display panel, it does so in conjunction with the KMS and/or
> FBDEV APIs that handle panel configuration. Panels are thus not exposed to
> V4L2 drivers.
Hmm, I'm no expert on omap_vout, but it doesn't use KMS nor omapfb. It
uses omapdss directly, and thus accesses the panels.
That said, I'm fine with leaving omap_vout out from the equation.
> 3. KMS Extensions
> -----------------
>
> The usefulness of V4L2 for output devices was questioned, and the possibility
> of using KMS for complex video devices usually associated with V4L2 was
> raised. The TI DaVinci 8xxx family is an example of chips that could benefit
> from KMS support.
>
> The KMS API is lacking support for deep-pipelining ("framebuffers" that are
> sourced from a data stream instead of a memory buffer) today. Extending the
> KMS API with deep-pipelining support was considered as a sensible goal that
> would mostly require the creation of a new KMS source object. Exposing the
> topology of the whole device would then be handled by the Media Controller
> API.
Isn't there also the problem that KSM doesn't support arbitrarily long
chains of display devices? That actually sounds more like
"deep-pipelining" than what you said, getting the source data from a
data stream.
> 5. Bus Model
> ------------
>
> Display panels are connected to a video bus that transmits video data and
> optionally to a control bus. Those two busses can be separate physical
> interfaces or combined into a single physical interface.
>
> The Linux device model represents the system as a tree of devices (not to be
> confused by the device tree, abreviated as DT). The tree is organized around
> control busses, with every device being a child of its control bus master. For
> instance an I2C device will be a child of its I2C controller device, which can
> itself be a child of its parent PCI device.
>
> Display panels will be represented as Linux devices. They will have a single
> parent from the Linux device model point of view, but will be potentially
> connected to multiple physical busses. CDF thus needs to define what bus to
> select as the Linux parent bus.
>
> In theory any physical bus that the device is attached to can be selected as
> the parent bus. However, selecting a video data bus would depart from the
> traditional Linux device model that uses control busses only. This caused
> concern among several people who argued that not presenting the device to the
> kernel as attached to its control bus would bring issues in embedded system.
> Unlike on PC systems where the control bus master is usually the same physical
> device as the data bus master, embedded systems are made of a potentially
> complex assembly of completely unrelated devices. Not representing an I2C-
> controlled panel as a child of its I2C master in DT was thus frown upon, even
> though no clear agreement was reached on the subject.
I've been thinking that a good rule of thumb would be that the device
must be somewhat usable after the parent bus is ready. So for, say,
DPI+SPI panel, when the SPI is set up the driver can send messages to
the panel, perhaps read an ID or such, even if the actual video cannot
be shown yet (presuming DPI bus is still missing).
Of course there are the funny cases, as always. Say, a DSI panel,
controlled via i2c, and the panel gets its functional clock from the DSI
bus's clock. In that case both busses need to be up and running before
the panel can do anything.
> - Combined video and control busses
>
> When the two busses are combined in a single physical bus the panel device
> will obviously be represented as a child of that single physical bus.
>
> In such cases the control bus could expose video bus control methods. This
> would remove the need for a video source as proposed by Tomi Valkeinen in his
> CDF model. However, if the bus can be used for video data transfer in
> combination with a different control bus, a video source corresponding to the
> data bus will be needed.
I think this is always the case. If a bus can be used for control and
video data, you can always use it only for video data.
> No decision has been taken on whether to use a video source in addition to the
> control bus in the combined busses case. Experimentation will be needed, and
> the right solution might depend on the bus type.
>
> - Multiple control busses
>
> One panel was mentioned as being connected to a DSI bus and an I2C bus. The
> DSI bus is used for both control and video, and the I2C bus for control only.
> configuring the panel requires sending commands through both DSI and I2C. The
I have luckily not encountered such a device. However, many of the DSI
devices do have i2c control as an option. From the device's point of
view, both can be used at the same time, but I think usually it's saner
to just pick one and use it.
The driver for the device should support both control busses, though.
Probably 99% of the driver code is common for both cases.
> 6. Miscellaneous
> ----------------
>
> - If the OMAP3 DSS driver is used as a model for the DSI support
> implementation, Daniel Vetter requested the DSI bus lock semaphore to be
> killed as it prevents lockdep from working correctly (reference needed ;-)).
I don't think OMAP DSS should be used as a model. It has too much legacy
crap that should be rewritten. However, it can be used as a reference to
see what kind of features are needed, as it does support both video and
command mode DSI modes, and has been used with many different kinds of
DSI panels and DSI transcoders.
As for the semaphore, sure, it can be removed, although I'm not aware of
this lockdep problem. If there's a problem it should be fixed in any case.
> - Do we need to support chaining several encoders ? We can come up with
> several theoretical use cases, some of them probably exist in real hardware,
> but the details are still a bit fuzzy.
If encoder means the same as the "transcoder" term I used earlier, then
yes, I think so.
As I wrote, I'd like to model the OMAP DSS internal components with CDF.
The internal IP blocks are in no way different than external IP blocks,
they just happen to be integrated into OMAP. The same display IPs are
used with multiple different TI SoCs.
Also, the IPs vary between TI SoCs (for ex, omap2 doesn't have DSI,
omap3 has one DSI, omap4 has two DSIs), so we'll anyway need to have
some kind of dynamic system inside omapdss driver. If I can't use CDF
for that, I'll need to implement a custom one, which I believe would
resemble CDF in many ways.
I'm guessing that having multiple external transcoders is quite rare on
production hardware, but is a very useful feature with development
boards. It's not just once or twice that we've used a transcoder or two
between a SoC and a panel, because we haven't had the final panel yet.
Also, sometimes there are small simple chips in the video pipeline, that
do things like level shifting or ESD protection. In some cases these
chips just work automatically, but in some cases one needs to setup
regulators and gpios to get them up and running (for example,
http://www.ti.com/product/tpd12s015). And if that's the case, then I
believe having a CDF "transcoder" driver for the chip is the easiest
solution.
Tomi