Hi all!
This is another quick update of the patches which add basic support for
dynamic allocation of memory reserved regions defined in device tree.
This time I hope I've really managed to address all the issues reported
by Grant Likely, see the change log for more details. As a bonus, I've
added support for ARM64 and PowerPC, as those architectures had quite
well defined place where early memory reservation can be done.
The initial code for this feature were posted here [1], merged as commit
9d8eab7af79cb4ce2de5de39f82c455b1f796963 ("drivers: of: add
initialization code for dma reserved memory") and later reverted by
commit 1931ee143b0ab72924944bc06e363d837ba05063. For more information,
see [2]. Finally a new bindings has been proposed [3] and Josh
Cartwright a few days ago prepared some code which implements those
bindings [4]. This finally pushed me again to find some time to finish
this task and review the code. Josh agreed to give me the ownership of
this series to continue preparing them for mainline inclusion.
For more information please refer to the changlelog below.
[1]: http://lkml.kernel.org/g/1377527959-5080-1-git-send-email-m.szyprowski@sams…
[2]: http://lkml.kernel.org/g/1381476448-14548-1-git-send-email-m.szyprowski@sam…
[3]: http://lkml.kernel.org/g/20131030134702.19B57C402A0@trevor.secretlab.ca
[4]: http://thread.gmane.org/gmane.linux.documentation/19579
Changelog:
v4:
- dynamic allocations are processed after all static reservations has been
done
- moved code for handling static reservations to drivers/of/fdt.c
- removed node matching by string comparison, now phandle values are used
directly
- moved code for DMA and CMA handling directly to
drivers/base/dma-{coherent,contiguous}.c
- added checks for proper #size-cells, #address-cells, ranges properties
in /reserved-memory node
- even more code cleanup
- added init code for ARM64 and PowerPC
v3: http://article.gmane.org/gmane.linux.documentation/20169/
- refactored memory reservation code, created common code to parse reg, size,
align, alloc-ranges properties
- added support for multiple tuples in 'reg' property
- memory is reserved regardless of presence of the driver for its compatible
- prepared arch specific hooks for memory reservation (defaults use memblock
calls)
- removed node matching by string during device initialization
- CMA init code: added checks for required region alignment
- more code cleanup here and there
v2: http://thread.gmane.org/gmane.linux.documentation/19870/
- removed copying of the node name
- split shared-dma-pool handling into separate files (one for CMA and one
for dma_declare_coherent based implementations) for making the code easier
to understand
- added support for AMBA devices, changed prototypes to use struct decice
instead of struct platform_device
- renamed some functions to better match other names used in drivers/of/
- restructured the rest of the code a bit for better readability
- added 'reusable' property to exmaple linux,cma node in documentation
- exclusive dma (dma_coherent) is used for only handling 'shared-dma-pool'
regions without 'reusable' property and CMA is used only for handling
'shared-dma-pool' regions with 'reusable' property.
v1: http://thread.gmane.org/gmane.linux.documentation/19579
- initial version prepared by Josh Cartwright
Summary:
Grant Likely (1):
of: document bindings for reserved-memory nodes
Marek Szyprowski (6):
drivers: of: add initialization code for reserved memory
drivers: dma-coherent: add initialization from device tree
drivers: dma-contiguous: add initialization from device tree
arm: add support for reserved memory defined by device tree
arm64: add support for reserved memory defined by device tree
powerpc: add support for reserved memory defined by device tree
.../bindings/reserved-memory/reserved-memory.txt | 138 +++++++++
arch/arm/Kconfig | 1 +
arch/arm/mm/init.c | 3 +
arch/arm64/Kconfig | 1 +
arch/arm64/mm/init.c | 2 +
arch/powerpc/Kconfig | 1 +
arch/powerpc/kernel/prom.c | 3 +
drivers/base/dma-coherent.c | 41 +++
drivers/base/dma-contiguous.c | 130 +++++++--
drivers/of/Kconfig | 6 +
drivers/of/Makefile | 1 +
drivers/of/fdt.c | 145 ++++++++++
drivers/of/of_reserved_mem.c | 296 ++++++++++++++++++++
drivers/of/platform.c | 7 +
include/asm-generic/vmlinux.lds.h | 11 +
include/linux/of_reserved_mem.h | 65 +++++
16 files changed, 829 insertions(+), 22 deletions(-)
create mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
create mode 100644 drivers/of/of_reserved_mem.c
create mode 100644 include/linux/of_reserved_mem.h
--
1.7.9.5
Hi all!
This is yet another update of the second attempt to add basic support
for dynamic allocation of memory reserved regions defined in device
tree.
This time I've tried to address all the issues reported by Grant Likely.
The side-effect of it is a complete rewrite of memory reservation code,
which results in added support for for multiple tuples in 'reg' property
and complete support for 'size', 'align' and 'alloc-ranges' properties.
The initial code for this feature were posted here [1], merged as commit
9d8eab7af79cb4ce2de5de39f82c455b1f796963 ("drivers: of: add
initialization code for dma reserved memory") and later reverted by
commit 1931ee143b0ab72924944bc06e363d837ba05063. For more information,
see [2]. Finally a new bindings has been proposed [3] and Josh
Cartwright a few days ago prepared some code which implements those
bindings [4]. This finally pushed me again to find some time to finish
this task and review the code. Josh agreed to give me the ownership of
this series to continue preparing them for mainline inclusion.
For more information please refer to the changlelog below.
[1]: http://lkml.kernel.org/g/1377527959-5080-1-git-send-email-m.szyprowski@sams…
[2]: http://lkml.kernel.org/g/1381476448-14548-1-git-send-email-m.szyprowski@sam…
[3]: http://lkml.kernel.org/g/20131030134702.19B57C402A0@trevor.secretlab.ca
[4]: http://thread.gmane.org/gmane.linux.documentation/19579
Changelog:
v3:
- refactored memory reservation code, created common code to parse reg, size,
align, alloc-ranges properties
- added support for multiple tuples in 'reg' property
- memory is reserved regardless of presence of the driver for its compatible
- prepared arch specific hooks for memory reservation (defaults use memblock
calls)
- removed node matching by string during device initialization
- CMA init code: added checks for required region alignment
- more code cleanup here and there
v2: http://thread.gmane.org/gmane.linux.documentation/19870/
- removed copying of the node name
- split shared-dma-pool handling into separate files (one for CMA and one
for dma_declare_coherent based implementations) for making the code easier
to understand
- added support for AMBA devices, changed prototypes to use struct decice
instead of struct platform_device
- renamed some functions to better match other names used in drivers/of/
- restructured the rest of the code a bit for better readability
- added 'reusable' property to exmaple linux,cma node in documentation
- exclusive dma (dma_coherent) is used for only handling 'shared-dma-pool'
regions without 'reusable' property and CMA is used only for handling
'shared-dma-pool' regions with 'reusable' property.
v1: http://thread.gmane.org/gmane.linux.documentation/19579
- initial version prepared by Josh Cartwright
Summary:
Grant Likely (1):
of: document bindings for reserved-memory nodes
Josh Cartwright (2):
drivers: of: implement reserved-memory handling for dma
drivers: of: implement reserved-memory handling for cma
Marek Szyprowski (3):
base: dma-contiguous: add dma_contiguous_init_reserved_mem() function
drivers: of: add initialization code for reserved memory
ARM: init: add support for reserved memory defined by device tree
.../bindings/reserved-memory/reserved-memory.txt | 138 +++++++
arch/arm/Kconfig | 1 +
arch/arm/mm/init.c | 3 +
drivers/base/dma-contiguous.c | 70 ++--
drivers/of/Kconfig | 19 +
drivers/of/Makefile | 3 +
drivers/of/fdt.c | 2 +
drivers/of/of_reserved_mem.c | 390 ++++++++++++++++++++
drivers/of/of_reserved_mem_cma.c | 68 ++++
drivers/of/of_reserved_mem_dma.c | 65 ++++
drivers/of/platform.c | 7 +
include/asm-generic/vmlinux.lds.h | 11 +
include/linux/dma-contiguous.h | 7 +
include/linux/of_reserved_mem.h | 65 ++++
14 files changed, 827 insertions(+), 22 deletions(-)
create mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
create mode 100644 drivers/of/of_reserved_mem.c
create mode 100644 drivers/of/of_reserved_mem_cma.c
create mode 100644 drivers/of/of_reserved_mem_dma.c
create mode 100644 include/linux/of_reserved_mem.h
--
1.7.9.5
On Mon, Feb 17, 2014 at 12:36 PM, Christian König
<deathsimple(a)vodafone.de> wrote:
> Am 17.02.2014 18:27, schrieb Rob Clark:
>
>> On Mon, Feb 17, 2014 at 11:56 AM, Christian König
>> <deathsimple(a)vodafone.de> wrote:
>>>
>>> Am 17.02.2014 16:56, schrieb Maarten Lankhorst:
>>>
>>>> This type of fence can be used with hardware synchronization for simple
>>>> hardware that can block execution until the condition
>>>> (dma_buf[offset] - value) >= 0 has been met.
>>>
>>>
>>> Can't we make that just "dma_buf[offset] != 0" instead? As far as I know
>>> this way it would match the definition M$ uses in their WDDM
>>> specification
>>> and so make it much more likely that hardware supports it.
>>
>> well 'buf[offset] >= value' at least means the same slot can be used
>> for multiple operations (with increasing values of 'value').. not sure
>> if that is something people care about.
>>
>>> =value seems to be possible with adreno and radeon. I'm not really sure
>>> about others (although I presume it as least supported for nv desktop
>>> stuff). For hw that cannot do >=value, we can either have a different fence
>>> implementation which uses the !=0 approach. Or change seqno-fence
>>> implementation later if needed. But if someone has hw that can do !=0 but
>>> not >=value, speak up now ;-)
>
>
> Here! Radeon can only do >=value on the DMA and 3D engine, but not with UVD
> or VCE. And for the 3D engine it means draining the pipe, which isn't really
> a good idea.
hmm, ok.. forgot you have a few extra rings compared to me. Is UVD
re-ordering from decode-order to display-order for you in hw? If not,
I guess you need sw intervention anyways when a frame is done for
frame re-ordering, so maybe hw->hw sync doesn't really matter as much
as compared to gpu/3d->display. For dma<->3d interactions, seems like
you would care more about hw<->hw sync, but I guess you aren't likely
to use GPU A to do a resolve blit for GPU B..
For 3D ring, I assume you probably want a CP_WAIT_FOR_IDLE before a
CP_MEM_WRITE to update fence value in memory (for the one signalling
the fence). But why would you need that before a CP_WAIT_REG_MEM (for
the one waiting for the fence)? I don't exactly have documentation
for adreno version of CP_WAIT_REG_{MEM,EQ,GTE}.. but PFP and ME
appear to be same instruction set as r600, so I'm pretty sure they
should have similar capabilities.. CP_WAIT_REG_MEM appears to be same
but with 32bit gpu addresses vs 64b.
BR,
-R
> Christian.
>
>
>>
>>> Apart from that I still don't like the idea of leaking a drivers IRQ
>>> context
>>> outside of the driver, but without a proper GPU scheduler there probably
>>> isn't much alternative.
>>
>> I guess it will be not uncommon scenario for gpu device to just need
>> to kick display device to write a few registers for a page flip..
>> probably best not to schedule a worker just for this (unless the
>> signalled device otherwise needs to). I think it is better in this
>> case to give the signalee some rope to hang themselves, and make it
>> the responsibility of the callback to kick things off to a worker if
>> needed.
>>
>> BR,
>> -R
>>
>>> Christian.
>>>
>>>> A software fallback still has to be provided in case the fence is used
>>>> with a device that doesn't support this mechanism. It is useful to
>>>> expose
>>>> this for graphics cards that have an op to support this.
>>>>
>>>> Some cards like i915 can export those, but don't have an option to wait,
>>>> so they need the software fallback.
>>>>
>>>> I extended the original patch by Rob Clark.
>>>>
>>>> v1: Original
>>>> v2: Renamed from bikeshed to seqno, moved into dma-fence.c since
>>>> not much was left of the file. Lots of documentation added.
>>>> v3: Use fence_ops instead of custom callbacks. Moved to own file
>>>> to avoid circular dependency between dma-buf.h and fence.h
>>>> v4: Add spinlock pointer to seqno_fence_init
>>>>
>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)canonical.com>
>>>> ---
>>>> Documentation/DocBook/device-drivers.tmpl | 1
>>>> drivers/base/fence.c | 50 +++++++++++++
>>>> include/linux/seqno-fence.h | 109
>>>> +++++++++++++++++++++++++++++
>>>> 3 files changed, 160 insertions(+)
>>>> create mode 100644 include/linux/seqno-fence.h
>>>>
>>>> diff --git a/Documentation/DocBook/device-drivers.tmpl
>>>> b/Documentation/DocBook/device-drivers.tmpl
>>>> index 7a0c9ddb4818..8c85c20942c2 100644
>>>> --- a/Documentation/DocBook/device-drivers.tmpl
>>>> +++ b/Documentation/DocBook/device-drivers.tmpl
>>>> @@ -131,6 +131,7 @@ X!Edrivers/base/interface.c
>>>> !Edrivers/base/dma-buf.c
>>>> !Edrivers/base/fence.c
>>>> !Iinclude/linux/fence.h
>>>> +!Iinclude/linux/seqno-fence.h
>>>> !Edrivers/base/reservation.c
>>>> !Iinclude/linux/reservation.h
>>>> !Edrivers/base/dma-coherent.c
>>>> diff --git a/drivers/base/fence.c b/drivers/base/fence.c
>>>> index 12df2bf62034..cd0937127a89 100644
>>>> --- a/drivers/base/fence.c
>>>> +++ b/drivers/base/fence.c
>>>> @@ -25,6 +25,7 @@
>>>> #include <linux/export.h>
>>>> #include <linux/atomic.h>
>>>> #include <linux/fence.h>
>>>> +#include <linux/seqno-fence.h>
>>>> #define CREATE_TRACE_POINTS
>>>> #include <trace/events/fence.h>
>>>> @@ -413,3 +414,52 @@ __fence_init(struct fence *fence, const struct
>>>> fence_ops *ops,
>>>> trace_fence_init(fence);
>>>> }
>>>> EXPORT_SYMBOL(__fence_init);
>>>> +
>>>> +static const char *seqno_fence_get_driver_name(struct fence *fence) {
>>>> + struct seqno_fence *seqno_fence = to_seqno_fence(fence);
>>>> + return seqno_fence->ops->get_driver_name(fence);
>>>> +}
>>>> +
>>>> +static const char *seqno_fence_get_timeline_name(struct fence *fence) {
>>>> + struct seqno_fence *seqno_fence = to_seqno_fence(fence);
>>>> + return seqno_fence->ops->get_timeline_name(fence);
>>>> +}
>>>> +
>>>> +static bool seqno_enable_signaling(struct fence *fence)
>>>> +{
>>>> + struct seqno_fence *seqno_fence = to_seqno_fence(fence);
>>>> + return seqno_fence->ops->enable_signaling(fence);
>>>> +}
>>>> +
>>>> +static bool seqno_signaled(struct fence *fence)
>>>> +{
>>>> + struct seqno_fence *seqno_fence = to_seqno_fence(fence);
>>>> + return seqno_fence->ops->signaled &&
>>>> seqno_fence->ops->signaled(fence);
>>>> +}
>>>> +
>>>> +static void seqno_release(struct fence *fence)
>>>> +{
>>>> + struct seqno_fence *f = to_seqno_fence(fence);
>>>> +
>>>> + dma_buf_put(f->sync_buf);
>>>> + if (f->ops->release)
>>>> + f->ops->release(fence);
>>>> + else
>>>> + kfree(f);
>>>> +}
>>>> +
>>>> +static long seqno_wait(struct fence *fence, bool intr, signed long
>>>> timeout)
>>>> +{
>>>> + struct seqno_fence *f = to_seqno_fence(fence);
>>>> + return f->ops->wait(fence, intr, timeout);
>>>> +}
>>>> +
>>>> +const struct fence_ops seqno_fence_ops = {
>>>> + .get_driver_name = seqno_fence_get_driver_name,
>>>> + .get_timeline_name = seqno_fence_get_timeline_name,
>>>> + .enable_signaling = seqno_enable_signaling,
>>>> + .signaled = seqno_signaled,
>>>> + .wait = seqno_wait,
>>>> + .release = seqno_release,
>>>> +};
>>>> +EXPORT_SYMBOL(seqno_fence_ops);
>>>> diff --git a/include/linux/seqno-fence.h b/include/linux/seqno-fence.h
>>>> new file mode 100644
>>>> index 000000000000..952f7909128c
>>>> --- /dev/null
>>>> +++ b/include/linux/seqno-fence.h
>>>> @@ -0,0 +1,109 @@
>>>> +/*
>>>> + * seqno-fence, using a dma-buf to synchronize fencing
>>>> + *
>>>> + * Copyright (C) 2012 Texas Instruments
>>>> + * Copyright (C) 2012 Canonical Ltd
>>>> + * Authors:
>>>> + * Rob Clark <robdclark(a)gmail.com>
>>>> + * Maarten Lankhorst <maarten.lankhorst(a)canonical.com>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> it
>>>> + * under the terms of the GNU General Public License version 2 as
>>>> published by
>>>> + * the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful, but
>>>> WITHOUT
>>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
>>>> or
>>>> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
>>>> License
>>>> for
>>>> + * more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License
>>>> along with
>>>> + * this program. If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#ifndef __LINUX_SEQNO_FENCE_H
>>>> +#define __LINUX_SEQNO_FENCE_H
>>>> +
>>>> +#include <linux/fence.h>
>>>> +#include <linux/dma-buf.h>
>>>> +
>>>> +struct seqno_fence {
>>>> + struct fence base;
>>>> +
>>>> + const struct fence_ops *ops;
>>>> + struct dma_buf *sync_buf;
>>>> + uint32_t seqno_ofs;
>>>> +};
>>>> +
>>>> +extern const struct fence_ops seqno_fence_ops;
>>>> +
>>>> +/**
>>>> + * to_seqno_fence - cast a fence to a seqno_fence
>>>> + * @fence: fence to cast to a seqno_fence
>>>> + *
>>>> + * Returns NULL if the fence is not a seqno_fence,
>>>> + * or the seqno_fence otherwise.
>>>> + */
>>>> +static inline struct seqno_fence *
>>>> +to_seqno_fence(struct fence *fence)
>>>> +{
>>>> + if (fence->ops != &seqno_fence_ops)
>>>> + return NULL;
>>>> + return container_of(fence, struct seqno_fence, base);
>>>> +}
>>>> +
>>>> +/**
>>>> + * seqno_fence_init - initialize a seqno fence
>>>> + * @fence: seqno_fence to initialize
>>>> + * @lock: pointer to spinlock to use for fence
>>>> + * @sync_buf: buffer containing the memory location to signal on
>>>> + * @context: the execution context this fence is a part of
>>>> + * @seqno_ofs: the offset within @sync_buf
>>>> + * @seqno: the sequence # to signal on
>>>> + * @ops: the fence_ops for operations on this seqno fence
>>>> + *
>>>> + * This function initializes a struct seqno_fence with passed
>>>> parameters,
>>>> + * and takes a reference on sync_buf which is released on fence
>>>> destruction.
>>>> + *
>>>> + * A seqno_fence is a dma_fence which can complete in software when
>>>> + * enable_signaling is called, but it also completes when
>>>> + * (s32)((sync_buf)[seqno_ofs] - seqno) >= 0 is true
>>>> + *
>>>> + * The seqno_fence will take a refcount on the sync_buf until it's
>>>> + * destroyed, but actual lifetime of sync_buf may be longer if one of
>>>> the
>>>> + * callers take a reference to it.
>>>> + *
>>>> + * Certain hardware have instructions to insert this type of wait
>>>> condition
>>>> + * in the command stream, so no intervention from software would be
>>>> needed.
>>>> + * This type of fence can be destroyed before completed, however a
>>>> reference
>>>> + * on the sync_buf dma-buf can be taken. It is encouraged to re-use the
>>>> same
>>>> + * dma-buf for sync_buf, since mapping or unmapping the sync_buf to the
>>>> + * device's vm can be expensive.
>>>> + *
>>>> + * It is recommended for creators of seqno_fence to call fence_signal
>>>> + * before destruction. This will prevent possible issues from
>>>> wraparound
>>>> at
>>>> + * time of issue vs time of check, since users can check
>>>> fence_is_signaled
>>>> + * before submitting instructions for the hardware to wait on the
>>>> fence.
>>>> + * However, when ops.enable_signaling is not called, it doesn't have to
>>>> be
>>>> + * done as soon as possible, just before there's any real danger of
>>>> seqno
>>>> + * wraparound.
>>>> + */
>>>> +static inline void
>>>> +seqno_fence_init(struct seqno_fence *fence, spinlock_t *lock,
>>>> + struct dma_buf *sync_buf, uint32_t context, uint32_t
>>>> seqno_ofs,
>>>> + uint32_t seqno, const struct fence_ops *ops)
>>>> +{
>>>> + BUG_ON(!fence || !sync_buf || !ops);
>>>> + BUG_ON(!ops->wait || !ops->enable_signaling ||
>>>> !ops->get_driver_name || !ops->get_timeline_name);
>>>> +
>>>> + /*
>>>> + * ops is used in __fence_init for get_driver_name, so needs to
>>>> be
>>>> + * initialized first
>>>> + */
>>>> + fence->ops = ops;
>>>> + __fence_init(&fence->base, &seqno_fence_ops, lock, context,
>>>> seqno);
>>>> + get_dma_buf(sync_buf);
>>>> + fence->sync_buf = sync_buf;
>>>> + fence->seqno_ofs = seqno_ofs;
>>>> +}
>>>> +
>>>> +#endif /* __LINUX_SEQNO_FENCE_H */
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel(a)lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel(a)lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
Hi all!
This is an updated version of the second attempt to add basic support
for dynamic allocation of memory reserved regions defined in device
tree.
The initial code for this feature were posted here [1], merged as commit
9d8eab7af79cb4ce2de5de39f82c455b1f796963 ("drivers: of: add
initialization code for dma reserved memory") and later reverted by
commit 1931ee143b0ab72924944bc06e363d837ba05063. For more information,
see [2]. Finally a new bindings has been proposed [3] and Josh
Cartwright a few days ago prepared some code which implements those
bindings [4]. This finally pushed me again to find some time to finish
this task and review the code. Josh agreed to give me the ownership of
this series to continue preparing them for mainline inclusion.
For more information please refer to the changlelog below.
[1]: http://lkml.kernel.org/g/1377527959-5080-1-git-send-email-m.szyprowski@sams…
[2]: http://lkml.kernel.org/g/1381476448-14548-1-git-send-email-m.szyprowski@sam…
[3]: http://lkml.kernel.org/g/20131030134702.19B57C402A0@trevor.secretlab.ca
[4]: http://thread.gmane.org/gmane.linux.documentation/19579
Changelog:
v2:
- removed copying of the node name
- split shared-dma-pool handling into separate files (one for CMA and one
for dma_declare_coherent based implementations) for making the code easier
to understand
- added support for AMBA devices, changed prototypes to use struct decice
instead of struct platform_device
- renamed some functions to better match other names used in drivers/of/
- restructured the rest of the code a bit for better readability
- added 'reusable' property to exmaple linux,cma node in documentation
- exclusive dma (dma_coherent) is used for only handling 'shared-dma-pool'
regions without 'reusable' property and CMA is used only for handling
'shared-dma-pool' regions with 'reusable' property.
v1: http://thread.gmane.org/gmane.linux.documentation/19579
- initial version prepared by Josh Cartwright
Summary:
Grant Likely (1):
of: document bindings for reserved-memory nodes
Josh Cartwright (2):
drivers: of: implement reserved-memory handling for dma
drivers: of: implement reserved-memory handling for cma
Marek Szyprowski (2):
drivers: of: add initialization code for reserved memory
ARM: init: add support for reserved memory defined by device tree
.../bindings/reserved-memory/reserved-memory.txt | 138 ++++++++++++
arch/arm/mm/init.c | 3 +
drivers/of/Kconfig | 20 ++
drivers/of/Makefile | 3 +
drivers/of/of_reserved_mem.c | 219 ++++++++++++++++++++
drivers/of/of_reserved_mem_cma.c | 75 +++++++
drivers/of/of_reserved_mem_dma.c | 78 +++++++
drivers/of/platform.c | 7 +
include/asm-generic/vmlinux.lds.h | 11 +
include/linux/of_reserved_mem.h | 62 ++++++
10 files changed, 616 insertions(+)
create mode 100644 Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
create mode 100644 drivers/of/of_reserved_mem.c
create mode 100644 drivers/of/of_reserved_mem_cma.c
create mode 100644 drivers/of/of_reserved_mem_dma.c
create mode 100644 include/linux/of_reserved_mem.h
--
1.7.9.5
Hi Linus,
Here's another tiny pull request for dma-buf framework updates; just
some debugfs output updates. (There's another patch related to
dma-buf, but it'll get upstreamed via Greg-kh's pull request).
Could you please pull?
The following changes since commit 45f7fdc2ffb9d5af4dab593843e89da70d1259e3:
Merge branch 'merge' of
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc (2014-02-11
22:28:47 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf.git
tags/dma-buf-for-3.14
for you to fetch changes up to c0b00a525c127d0055c1df6283300e17f601a1a1:
dma-buf: update debugfs output (2014-02-13 10:08:52 +0530)
----------------------------------------------------------------
Small dma-buf pull request for 3.14
----------------------------------------------------------------
Sumit Semwal (1):
dma-buf: update debugfs output
drivers/base/dma-buf.c | 25 ++++++++++++-------------
include/linux/dma-buf.h | 2 +-
2 files changed, 13 insertions(+), 14 deletions(-)
Hello,
In pursuit of saving memory on Android, I started experimenting with
Kernel Same Page Merging(KSM).
Number of pages shared because of KSM is reported by
/sys/kernel/mm/pages_sharing.
Documentation/vm/ksm.txt explains this as:
"pages_sharing - how many more sites are sharing them i.e. how much saved"
After enabling KSM on Android device, this number was reported as 19666 pages.
Obvious optimization is to find out source of sharing and see if we
can avoid duplicate pages at first place.
In order to collect the data needed, It needed few
modifications(trace_printk) statement in mm/ksm.c. Data should be
collected from second cycle because that's when ksm
starts merging pages. First KSM cycle is only used to calculate the
checksum, pages
are added to unstable tree and eventually moved to stable tree after this.
After analyzing data from second KSM cycle, few things which stood out:
1. In the same cycle, KSM can scan same page multiple times. Scanning
a page involves comparing page with pages in stable tree, if no match is found
checksum is calculated. From the look of it, it seems to be cpu
intensive operation and
impacts dcache as well.
2. Same page which is already shared by multiple process can be
replaced by KSM page. In this case, let say a particular page is
mapped 24 times and is
replaced by KSM page then eventually all 24 entries will point to KSM
page. pages_sharing
will account for all 24 pages. so pages _sharing does not actually
report amount of memory saved.From the above example actual savings
is one page.
Both cases happen very often with Android because of its architecture
- Zygote spawning(fork) multipleapplications. To calculate actual
savings, we should account for same page(pfn)replaced by same KSM page
only once.
In the case 2 example, page_sharing should account only one page.After
recalculating memory saving comes out to be 8602 pages (~34MB).
I am trying to find out right solution to fix pages_sharing and
eventually optimize KSM to scan pageonce even if it is mapped multiple
times.
Comments? Has anyone tried this before?
Thanks,
Pradeep
dma_buf_map_attachment and dma_buf_vmap can return NULL or
ERR_PTR on a error. This encourages a common buggy pattern in
callers:
sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
if (IS_ERR_OR_NULL(sgt))
return PTR_ERR(sgt);
This causes the caller to return 0 on an error. IS_ERR_OR_NULL
is almost always a sign of poorly-defined error handling.
This patch converts dma_buf_map_attachment to always return
ERR_PTR, and fixes the callers that incorrectly handled NULL.
There are a few more callers that were not checking for NULL
at all, which would have dereferenced a NULL pointer later.
There are also a few more callers that correctly handled NULL
and ERR_PTR differently, I left those alone but they could also
be modified to delete the NULL check.
This patch also converts dma_buf_vmap to always return NULL.
All the callers to dma_buf_vmap only check for NULL, and would
have dereferenced an ERR_PTR and panic'd if one was ever
returned. This is not consistent with the rest of the dma buf
APIs, but matches the expectations of all of the callers.
Signed-off-by: Colin Cross <ccross(a)android.com>
---
drivers/base/dma-buf.c | 18 +++++++++++-------
drivers/gpu/drm/drm_prime.c | 2 +-
drivers/gpu/drm/exynos/exynos_drm_dmabuf.c | 2 +-
drivers/media/v4l2-core/videobuf2-dma-contig.c | 2 +-
4 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index 1e16cbd61da2..cfe1d8bc7bb8 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -251,9 +251,8 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
* @dmabuf: [in] buffer to attach device to.
* @dev: [in] device to be attached.
*
- * Returns struct dma_buf_attachment * for this attachment; may return negative
- * error codes.
- *
+ * Returns struct dma_buf_attachment * for this attachment; returns ERR_PTR on
+ * error.
*/
struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
struct device *dev)
@@ -319,9 +318,8 @@ EXPORT_SYMBOL_GPL(dma_buf_detach);
* @attach: [in] attachment whose scatterlist is to be returned
* @direction: [in] direction of DMA transfer
*
- * Returns sg_table containing the scatterlist to be returned; may return NULL
- * or ERR_PTR.
- *
+ * Returns sg_table containing the scatterlist to be returned; returns ERR_PTR
+ * on error.
*/
struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
enum dma_data_direction direction)
@@ -334,6 +332,8 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach,
return ERR_PTR(-EINVAL);
sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
+ if (!sg_table)
+ sg_table = ERR_PTR(-ENOMEM);
return sg_table;
}
@@ -544,6 +544,8 @@ EXPORT_SYMBOL_GPL(dma_buf_mmap);
* These calls are optional in drivers. The intended use for them
* is for mapping objects linear in kernel space for high use objects.
* Please attempt to use kmap/kunmap before thinking about these interfaces.
+ *
+ * Returns NULL on error.
*/
void *dma_buf_vmap(struct dma_buf *dmabuf)
{
@@ -566,7 +568,9 @@ void *dma_buf_vmap(struct dma_buf *dmabuf)
BUG_ON(dmabuf->vmap_ptr);
ptr = dmabuf->ops->vmap(dmabuf);
- if (IS_ERR_OR_NULL(ptr))
+ if (WARN_ON_ONCE(IS_ERR(ptr)))
+ ptr = NULL;
+ if (!ptr)
goto out_unlock;
dmabuf->vmap_ptr = ptr;
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 56805c39c906..bb516fdd195d 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -471,7 +471,7 @@ struct drm_gem_object *drm_gem_prime_import(struct drm_device *dev,
get_dma_buf(dma_buf);
sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
- if (IS_ERR_OR_NULL(sgt)) {
+ if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
goto fail_detach;
}
diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
index 59827cc5e770..c786cd4f457b 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
@@ -224,7 +224,7 @@ struct drm_gem_object *exynos_dmabuf_prime_import(struct drm_device *drm_dev,
get_dma_buf(dma_buf);
sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
- if (IS_ERR_OR_NULL(sgt)) {
+ if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
goto err_buf_detach;
}
diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 33d3871d1e13..880be0782dd9 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -719,7 +719,7 @@ static int vb2_dc_map_dmabuf(void *mem_priv)
/* get the associated scatterlist for this buffer */
sgt = dma_buf_map_attachment(buf->db_attach, buf->dma_dir);
- if (IS_ERR_OR_NULL(sgt)) {
+ if (IS_ERR(sgt)) {
pr_err("Error getting dmabuf scatterlist\n");
return -EINVAL;
}
--
1.8.5.1
Hi everyone,
I tried looking for IOMMU support in ARM64 but what I was able to see is
only swiotlb is currently supported.
Based on my understanding for IOMMU support, we need DMA-MAPPING API to
have IOMMU ops field, similar to what is present in arm32.
I could see the iommu field added in dev_archdata in below mentioned patch
(arch/arm64/include/asm/device.h), but there is no ops field in
arch/arm64/mm/dma-mapping.c ?
I also saw one mail discussion between you guys on what is the best place
for adding iommu support in ARM64, but couldn't see any followed up patches
for the same.
Please tell us the current status/updates on the same. Your feedback will
be greatly appreciated.
commit 73150c983ac1f9b7653cfd3823b1ad4a44aad3bf
Author: Will Deacon <will.deacon(a)arm.com>
Date: Mon Jun 10 19:34:42 2013 +0100
arm64: device: add iommu pointer to device archdata
When using an IOMMU for device mappings, it is necessary to keep a
pointer between the device and the IOMMU to which it is attached in
order to obtain the correct IOMMU when attaching the device to a domain.
This patch adds an iommu pointer to the dev_archdata structure, in a
similar manner to other architectures (ARM, PowerPC, x86, ...).
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Thanks
Ritesh