Linaro-mm-sig September 2014

linaro-mm-sig@lists.linaro.org

16 participants
14 discussions

[PATCH v8 00/11] Add DRM for stih4xx platforms

by Benjamin Gaignard

This series of patches add the support of DRM/KMS drivers for STMicroelectronics chipsets stih416 and stih407. version 8: - Allow driver to be build as kernel module. - Use Universal Plane instead of legacy code. patches could be found here: git://git.linaro.org/people/benjamin.gaignard/kernel.git on branch: drm_kms_for_next-v8 version 7: - rebased on top of drm-next (2014-07-24) patches could be found here: git://git.linaro.org/people/benjamin.gaignard/kernel.git on branch: drm_kms_for_next-v7 version 6: - rebased on top of new component framework API. patches could be found here: git://git.linaro.org/people/benjamin.gaignard/kernel.git on branch: drm_kms_for_next-v6 version 5: - Rework sti_drm_drv probes functions to support deferred probe. This allow hdmi to delayed framebuffer creation until I2C is available. - Add ops functions in sti_layer structure to simpify GDP and VID code. - Add NOTES file to describe how the DRM concepts are mapped on hardware. patches could be found here: git://git.linaro.org/people/benjamin.gaignard/kernel.git on branch: drm_kms_for_next-v5 version 4: - Remove depency between TVout it subdevices HDMI and HDA - Rework and simplify VTG and VTAC code - Fix numbers of typo and indentation - Remove debug (will be push in separate patches) - Fix remarks done in previous patcheset patches could be found here: git://git.linaro.org/people/benjamin.gaignard/kernel.git on branch: drm_kms_for_next-v4 version 3: - Correctly split code between probe and bind funtions - Squash some commits - remove HQ-VDP device code to have a smaller patcheset, we will introduce it later. patches could be found here: git://git.linaro.org/people/benjamin.gaignard/kernel.git on branch: drm_kms_for_next-v3 version 2: - Use componentized device instead of register sub-devices in master driver probe function - Fix Makefile and Kconfig to only allow built-in compilation patches could be found here: git://git.linaro.org/people/benjamin.gaignard/kernel.git on branch: drm_kms_for_next-v2 version 1: - First path submission Hardware is split in two main blocks: Compositor and TVout. Each of them includes specific hardware IPs and the display timing are controlled by a specific Video Timing Generator hardware IP (VTG). Compositor is made of the follow hardware IPs: - GDP (Generic Display Pipeline) which is an entry point for graphic (RGB) buffers - VDP (Video Diplay Pipeline) which is an entry point for video (YUV) buffers - HQVDP (High Quality Video Display Processor) that supports scaling, deinterlacing and some miscellaneous image quality improvements. It fetches the Video decoded buffers from memory, processes them and pushes them to the Compositor through a HW dedicated bus. - Mixer is responsible of mixing all the entries depending of their respective z-order and layout TVout is divided in 3 parts: - HDMI to generate HDMI signals, depending of chipset version HDMI phy can change. - HDA to generate signals for HD analog TV - VIP to control/switch data path coming from Compositor On stih416 compositor and Tvout are on different dies so a Video Trafic Advance inter-die Communication mechanism (VTAC) is needed. +---------------------------------------------+ +----------------------------------------+ | +-------------------------------+ +----+ | | +----+ +--------------------------+ | | | | | | | | | | | +---------+ +----+ | | | | +----+ +------+ | | | | | | | | | VIP |---->|HDMI| | | | | |GPD +------------->| | | | | | | | | | | | +----+ | | | | +----+ |Mixer |--|-->| | | | | |---|->| switcher| | | | | | | | | | | | | | | | | +----+ | | | | | | | | | | | | | | | |---->|HDA | | | | | +------+ | |VTAC|========>|VTAC| | +---------+ +----+ | | | | | | | | | | | | | | | | Compositor | | | | | | | | TVout | | | +-------------------------------+ | | | | | | +--------------------------+ | | ^ | | | | | | ^ | | | | | | | | | | | | +--------------+ | | | | | | +-------------+ | | | VTG (master) |----->| | | | | |----->| VTG (slave) | | | +--------------+ +----+ | | +----+ +-------------+ | |Digital die | | Analog Die| +---------------------------------------------+ +----------------------------------------+ On stih407 Compositor and Tvout are on the same die +-----------------------------------------------------------------+ | +-------------------------------+ +--------------------------+ | | | | | +---------+ +----+ | | | | +----+ +------+ | | | VIP |---->|HDMI| | | | | |GPD +------------->| | | | | | +----+ | | | | +----+ |Mixer |--|--|->| switcher| | | | | +----+ +-----+ | | | | | | +----+ | | | | |VDP +-->+HQVDP+--->| | | | | |---->|HDA | | | | | +----+ +-----+ +------+ | | +---------+ +----+ | | | | | | | | | | Compositor | | TVout | | | +-------------------------------+ +--------------------------+ | | ^ ^ | | | | | | +--------------+ | | | VTG | | | +--------------+ | |Digital die | +-----------------------------------------------------------------+ In addition of the drivers for the IPs listed before a thin I2C driver (hdmiddc) is used by HDMI driver to retrieve EDID for monitor. To unify interfaces of GDP and VDP we create a "layer" interface called by compositor to control both GPD and VDP. Hardware have memory contraints (alignment, contiguous) so we use CMA drm helpers functions to allocate frame buffer. File naming convention is: - sti_* for IPs drivers - sti_drm_* for drm functions implementation. Benjamin Gaignard (11): drm: sti: add bindings for DRM driver drm: sti: add VTG driver drm: sti: add VTAC drivers drm: sti: add HDMI driver drm: sti: add HDA driver drm: sti: add TVOut driver drm: sti: add GDP layer drm: sti: add VID layer drm: sti: add Mixer drm: sti: add Compositor drm: sti: Add DRM driver itself .../devicetree/bindings/gpu/st,stih4xx.txt | 189 +++++ drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/sti/Kconfig | 14 + drivers/gpu/drm/sti/Makefile | 21 + drivers/gpu/drm/sti/NOTES | 58 ++ drivers/gpu/drm/sti/sti_compositor.c | 279 +++++++ drivers/gpu/drm/sti/sti_compositor.h | 90 +++ drivers/gpu/drm/sti/sti_drm_crtc.c | 423 +++++++++++ drivers/gpu/drm/sti/sti_drm_crtc.h | 22 + drivers/gpu/drm/sti/sti_drm_drv.c | 241 ++++++ drivers/gpu/drm/sti/sti_drm_drv.h | 29 + drivers/gpu/drm/sti/sti_drm_plane.c | 195 +++++ drivers/gpu/drm/sti/sti_drm_plane.h | 18 + drivers/gpu/drm/sti/sti_gdp.c | 549 ++++++++++++++ drivers/gpu/drm/sti/sti_gdp.h | 16 + drivers/gpu/drm/sti/sti_hda.c | 794 ++++++++++++++++++++ drivers/gpu/drm/sti/sti_hdmi.c | 810 +++++++++++++++++++++ drivers/gpu/drm/sti/sti_hdmi.h | 88 +++ drivers/gpu/drm/sti/sti_hdmi_tx3g0c55phy.c | 336 +++++++++ drivers/gpu/drm/sti/sti_hdmi_tx3g0c55phy.h | 14 + drivers/gpu/drm/sti/sti_hdmi_tx3g4c28phy.c | 211 ++++++ drivers/gpu/drm/sti/sti_hdmi_tx3g4c28phy.h | 14 + drivers/gpu/drm/sti/sti_layer.c | 197 +++++ drivers/gpu/drm/sti/sti_layer.h | 123 ++++ drivers/gpu/drm/sti/sti_mixer.c | 249 +++++++ drivers/gpu/drm/sti/sti_mixer.h | 54 ++ drivers/gpu/drm/sti/sti_tvout.c | 648 +++++++++++++++++ drivers/gpu/drm/sti/sti_vid.c | 138 ++++ drivers/gpu/drm/sti/sti_vid.h | 12 + drivers/gpu/drm/sti/sti_vtac.c | 215 ++++++ drivers/gpu/drm/sti/sti_vtg.c | 366 ++++++++++ drivers/gpu/drm/sti/sti_vtg.h | 28 + 33 files changed, 6444 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/st,stih4xx.txt create mode 100644 drivers/gpu/drm/sti/Kconfig create mode 100644 drivers/gpu/drm/sti/Makefile create mode 100644 drivers/gpu/drm/sti/NOTES create mode 100644 drivers/gpu/drm/sti/sti_compositor.c create mode 100644 drivers/gpu/drm/sti/sti_compositor.h create mode 100644 drivers/gpu/drm/sti/sti_drm_crtc.c create mode 100644 drivers/gpu/drm/sti/sti_drm_crtc.h create mode 100644 drivers/gpu/drm/sti/sti_drm_drv.c create mode 100644 drivers/gpu/drm/sti/sti_drm_drv.h create mode 100644 drivers/gpu/drm/sti/sti_drm_plane.c create mode 100644 drivers/gpu/drm/sti/sti_drm_plane.h create mode 100644 drivers/gpu/drm/sti/sti_gdp.c create mode 100644 drivers/gpu/drm/sti/sti_gdp.h create mode 100644 drivers/gpu/drm/sti/sti_hda.c create mode 100644 drivers/gpu/drm/sti/sti_hdmi.c create mode 100644 drivers/gpu/drm/sti/sti_hdmi.h create mode 100644 drivers/gpu/drm/sti/sti_hdmi_tx3g0c55phy.c create mode 100644 drivers/gpu/drm/sti/sti_hdmi_tx3g0c55phy.h create mode 100644 drivers/gpu/drm/sti/sti_hdmi_tx3g4c28phy.c create mode 100644 drivers/gpu/drm/sti/sti_hdmi_tx3g4c28phy.h create mode 100644 drivers/gpu/drm/sti/sti_layer.c create mode 100644 drivers/gpu/drm/sti/sti_layer.h create mode 100644 drivers/gpu/drm/sti/sti_mixer.c create mode 100644 drivers/gpu/drm/sti/sti_mixer.h create mode 100644 drivers/gpu/drm/sti/sti_tvout.c create mode 100644 drivers/gpu/drm/sti/sti_vid.c create mode 100644 drivers/gpu/drm/sti/sti_vid.h create mode 100644 drivers/gpu/drm/sti/sti_vtac.c create mode 100644 drivers/gpu/drm/sti/sti_vtg.c create mode 100644 drivers/gpu/drm/sti/sti_vtg.h -- 1.9.1

9 years, 2 months

[PATCH v2 00/18] Exynos SYSMMU (IOMMU) integration with DT and DMA-mapping subsystem

by Marek Szyprowski

Hello Everyone, This is yet another attempt to finally make Exynos SYSMMU driver fully integrated with DMA-mapping subsystem. Previous approach is available here: https://lkml.org/lkml/2014/8/5/183 I meantime, there have been a discussion about the way the iommu driver should be integrated with dma-mapping subsystem, which resulted in "[RFC PATCH v3 0/7] Introduce automatic DMA configuration for IOMMU masters" patches prepared by Will Deacon: http://www.spinics.net/lists/arm-kernel/msg362076.html Those patches removed the need to use bus-specific notifiers for initialization. Main changes since previous version of my patches: 1. rebased onto "[RFC PATCH v3 0/7] Introduce automatic DMA configuration for IOMMU masters" patches, changed initialization from bus notifiers to DT related callbacks 2. removed support for separate IO address spaces - this will be discussed separately after the basic support gets merged 3. removed support for power domain notifier-based runtime power management - this also will be discussed separately later I hope that the driver with above changes will be easier to be merged to v3.18. Best regards Marek Szyprowski Samsung R&D Institute Poland Patch summary: Marek Szyprowski (18): arm: dma-mapping: arm_iommu_attach_device: automatically set max_seg_size arm: exynos: bind power domains earlier, on device creation drm: exynos: detach from default dma-mapping domain on init clk: exynos: add missing smmu_g2d clock and update comments ARM: DTS: Exynos4: add System MMU nodes iommu: exynos: don't read version register on every tlb operation iommu: exynos: remove unused functions iommu: exynos: remove useless spinlock iommu: exynos: refactor function parameters to simplify code iommu: exynos: remove unused functions, part 2 iommu: exynos: remove useless device_add/remove callbacks iommu: exynos: add support for binding more than one sysmmu to master device iommu: exynos: add support for runtime_pm iommu: exynos: rename variables to reflect their purpose iommu: exynos: document internal structures iommu: exynos: remove excessive includes and sort others alphabetically iommu: exynos: init from dt-specific callback instead of initcall iommu: exynos: add callback for initializing devices from device tree arch/arm/boot/dts/exynos4.dtsi | 117 +++++++ arch/arm/boot/dts/exynos4210.dtsi | 23 ++ arch/arm/boot/dts/exynos4x12.dtsi | 82 +++++ arch/arm/mach-exynos/pm_domains.c | 12 +- arch/arm/mm/dma-mapping.c | 16 + drivers/clk/samsung/clk-exynos4.c | 1 + drivers/gpu/drm/exynos/exynos_drm_iommu.c | 3 + drivers/iommu/exynos-iommu.c | 494 ++++++++++++++---------------- include/dt-bindings/clock/exynos4.h | 10 +- 9 files changed, 483 insertions(+), 275 deletions(-) -- 1.9.2

10 years, 8 months

[PATCH v2 0/3] CMA & device tree, another approach

by Marek Szyprowski

Hello, This is another approach to finish support for reserved memory regions defined in device tree. Previous attempts (http://lists.linaro.org/pipermail/linaro-mm-sig/2014-February/003738.html and https://lkml.org/lkml/2014/7/14/108) ended in merging parts of the code and documentation. Merged patches allow to reserve memory, but there is still no reserved memory drivers nor any code that actually uses reserved memory regions. The final conclusion from the above mentioned threads is that there is no automated reserved memory initialization. All drivers that want to use reserved memory, should initialize it on their own. This patch series provides two driver for reserved memory regions (one based on CMA and one based on dma_coherent allocator). The main improvement comparing to the previous version is removal of automated reserved memory for every device and support for named memory regions. Those patches are for merging, rebased on top of recent linux-next tree. Best regards Marek Szyprowski Samsung R&D Institute Poland Changes since v1 (https://lkml.org/lkml/2014/8/26/339): - removed patches for named reserved regions - they will be discussed separately - added a check for 'no-map' property to dma coherent allocator (suggested by Laura Abbott) - removed example code for s5p-mfc driver Changes since '[PATCH v2 RESEND 0/4] CMA & device tree, once again' version: (https://lkml.org/lkml/2014/7/14/108) - added return error value to of_reserved_mem_device_init() - added support for named memory regions (so more than one region can be defined per device) - added usage example - converted custom reserved memory code used by s5p-mfc driver to the generic reserved memory handling code Patch summary: Marek Szyprowski (3): drivers: of: add return value to of_reserved_mem_device_init drivers: dma-coherent: add initialization from device tree drivers: dma-contiguous: add initialization from device tree drivers/base/dma-coherent.c | 145 ++++++++++++++++++++++++++++++++++------ drivers/base/dma-contiguous.c | 71 ++++++++++++++++++++ drivers/of/of_reserved_mem.c | 3 +- include/linux/cma.h | 3 + include/linux/of_reserved_mem.h | 9 ++- mm/cma.c | 62 ++++++++++++++--- 6 files changed, 259 insertions(+), 34 deletions(-) -- 1.9.2

10 years, 8 months

Install Ubuntu on Samsung Chromebook 2

by Viswanath Puttagunta

Hi All, Does any one have instructions to install Ubuntu natively on Samsung Chromebook2 that worked for you? Regards, Vish (Viswanath Puttagunta) Cell: 972-342-0205 Technical Program Manager Member Services, Linaro

10 years, 9 months

Changing PAGE_ALLOC_COSTLY_ORDER from 3 to 2

by Pintu Kumar

Hi, I wanted to know about the impact of changing PAGE_ALLOC_COSTLY_ORDER value from 3 to 2. This macro is defined in include/linux/mmzone.h #define PAGE_ALLOC_COSTLY_ORDER 3 As I know this value should never be changed irrespective of the type of the system. Is it good to change this value for RAM size: 512MB, 256MB or 128MB? If anybody have changed this value and experience any kind of problem or benefits please let us know. We noticed that for one of the Android product with 512MB RAM, the PAGE_ALLOC_COSTLY_ORDER was set to 2. We could not figure out why this value was decreased from 3 to 2. As per my analysis, I observed that kmalloc fails little early, if we change this value to 2. This is also visible from the _slowpath_ in page_alloc.c Apart from this we could not find any other impact. If anybody is aware of any other impact, please let us know. Thank you! Regards, Pintu Kumar

10 years, 9 months

Re: [Linaro-mm-sig] Question on UAPI for fences

by Daniel Vetter

On Sun, Sep 14, 2014 at 12:36:43PM +0200, Christian König wrote: > Yeah, right. Providing the fd to reassign to a fence would indeed reduce the > create/close overhead. > > But it would still be more overhead than for example a simple on demand > growing ring buffer which then uses 64bit sequence numbers in userspace to > refer to a fence in the kernel. > > Apart from that I'm pretty sure that when we do the syncing completely in > userspace we need more fences open at the same time than fds are available > by default. If you do the syncing completely in userspace you don't need kernel fences at all. Kernel fences are only required if you sync with a different process (where the pure userspace syncing might not work out) or with different devices. tbh I don't see any use-case at all where you'd need 10k such fences. That means your driver gets to deal with 2 kinds of fences, but so be it. Since not using fds for cross-device or cross-process syncing imo just doesn't make sense, so that one pretty much will have to stick. > As long as our internal handle or sequence based fence are easily > convertible to a fence fd I actually don't really see a problem with that. > Going to hack that approach into my prototype and then we can see how bad > the code looks after all. My plan for i915 is to start out with fd fences only, and once we have some clarity on the exact requirements probably add some pure userspace-controlled fences for tightly coupled stuff. Those might be fully internal to the opencl userspace driver though and never get out of there, ever. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

10 years, 9 months

Re: [Linaro-mm-sig] Question on UAPI for fences

by Jesse Barnes

On Fri, 12 Sep 2014 18:08:23 +0200 Christian König <christian.koenig(a)amd.com> wrote: > > As Daniel said using fd is most likely the way we want to do it but this > > remains vague. > Separating the discussion if it should be an fd or not. Using an fd > sounds fine to me in general, but I have some concerns as well. > > For example what was the maximum number of opened FDs per process again? > Could that become a problem? etc... You can check out the i915 patches I posted if you want to see examples. Max fds may be an issue if userspace doesn't clean up its fences. The implementation is pretty easy with the stuff Maarten has done recently. The changes I still need to make to mine: - sit on top of Chris's request/seqno changes (driver internals really) - switch over to execbuf as the main API on the render side (like you're doing) - add support for display and other timelines As far as compat goes, I don't think it should be too hard. Even with GPU scheduling, a given context's buffers should all be in-order with respect to one another, so we ought to be able to mix & match clients using explicit fencing and implicit fencing. Though in Mesa I still haven't looked at how to handle server vs client side arb_sync with the scheduler and explicit fencing in place; might need some extra work there... -- Jesse Barnes, Intel Open Source Technology Center

10 years, 9 months

Re: [Linaro-mm-sig] Question on UAPI for fences

by Jerome Glisse

On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian König wrote: > Am 12.09.2014 um 17:48 schrieb Jerome Glisse: > >On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian König wrote: > >>Am 12.09.2014 um 17:33 schrieb Jerome Glisse: > >>>On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote: > >>>>On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse <j.glisse(a)gmail.com> wrote: > >>>>>On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote: > >>>>>>On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter <daniel(a)ffwll.ch> wrote: > >>>>>>>On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote: > >>>>>>>>Hello everyone, > >>>>>>>> > >>>>>>>>to allow concurrent buffer access by different engines beyond the multiple > >>>>>>>>readers/single writer model that we currently use in radeon and other > >>>>>>>>drivers we need some kind of synchonization object exposed to userspace. > >>>>>>>> > >>>>>>>>My initial patch set for this used (or rather abused) zero sized GEM buffers > >>>>>>>>as fence handles. This is obviously isn't the best way of doing this (to > >>>>>>>>much overhead, rather ugly etc...), Jerome commented on this accordingly. > >>>>>>>> > >>>>>>>>So what should a driver expose instead? Android sync points? Something else? > >>>>>>>I think actually exposing the struct fence objects as a fd, using android > >>>>>>>syncpts (or at least something compatible to it) is the way to go. Problem > >>>>>>>is that it's super-hard to get the android guys out of hiding for this :( > >>>>>>> > >>>>>>>Adding a bunch of people in the hopes that something sticks. > >>>>>>More people. > >>>>>Just to re-iterate, exposing such thing while still using command stream > >>>>>ioctl that use implicit synchronization is a waste and you can only get > >>>>>the lowest common denominator which is implicit synchronization. So i do > >>>>>not see the point of such api if you are not also adding a new cs ioctl > >>>>>with explicit contract that it does not do any kind of synchronization > >>>>>(it could be almost the exact same code modulo the do not wait for > >>>>>previous cmd to complete). > >>>>Our thinking was to allow explicit sync from a single process, but > >>>>implicitly sync between processes. > >>>This is a BIG NAK if you are using the same ioctl as it would mean you are > >>>changing userspace API, well at least userspace expectation. Adding a new > >>>cs flag might do the trick but it should not be about inter-process, or any > >>>thing special, it's just implicit sync or no synchronization. Converting > >>>userspace is not that much of a big deal either, it can be broken into > >>>several step. Like mesa use explicit synchronization all time but ddx use > >>>implicit. > >>The thinking here is that we need to be backward compatible for DRI2/3 and > >>support all kind of different use cases like old DDX and new Mesa, or old > >>Mesa and new DDX etc... > >> > >>So for my prototype if the kernel sees any access of a BO from two different > >>clients it falls back to the old behavior of implicit synchronization of > >>access to the same buffer object. That might not be the fastest approach, > >>but is as far as I can see conservative and so should work under all > >>conditions. > >> > >>Apart from that the planning so far was that we just hide this feature > >>behind a couple of command submission flags and new chunks. > >Just to reproduce IRC discussion, i think it's a lot simpler and not that > >complex. For explicit cs ioctl you do not wait for any previous fence of > >any of the buffer referenced in the cs ioctl, but you still associate a > >new fence with all the buffer object referenced in the cs ioctl. So if the > >next ioctl is an implicit sync ioctl it will wait properly and synchronize > >properly with previous explicit cs ioctl. Hence you can easily have a mix > >in userspace thing is you only get benefit once enough of your userspace > >is using explicit. > > Yes, that's exactly what my patches currently implement. > > The only difference is that by current planning I implemented it as a per BO > flag for the command submission, but that was just for testing. Having a > single flag to switch between implicit and explicit synchronization for > whole CS IOCTL would do equally well. Doing it per BO sounds bogus to me. But otherwise yes we are in agreement. As Daniel said using fd is most likely the way we want to do it but this remains vague. > > >Note that you still need a way to have explicit cs ioctl to wait on a > >previos "explicit" fence so you need some api to expose fence per cs > >submission. > > Exactly, that's what this mail thread is all about. > > As Daniel correctly noted you need something like a functionality to get a > fence as the result of a command submission as well as pass in a list of > fences to wait for before beginning a command submission. > > At least it looks like we are all on the same general line here, its just > nobody has a good idea how the details should look like. > > Regards, > Christian. > > > > >Cheers, > >Jérôme > > > >>Regards, > >>Christian. > >> > >>>Cheers, > >>>Jérôme > >>> > >>>>Alex > >>>> > >>>>>Also one thing that the Android sync point does not have, AFAICT, is a > >>>>>way to schedule synchronization as part of a cs ioctl so cpu never have > >>>>>to be involve for cmd stream that deal only one gpu (assuming the driver > >>>>>and hw can do such trick). > >>>>> > >>>>>Cheers, > >>>>>Jérôme > >>>>> > >>>>>>-Daniel > >>>>>>-- > >>>>>>Daniel Vetter > >>>>>>Software Engineer, Intel Corporation > >>>>>>+41 (0) 79 365 57 48 - http://blog.ffwll.ch > >>>>>_______________________________________________ > >>>>>dri-devel mailing list > >>>>>dri-devel(a)lists.freedesktop.org > >>>>>http://lists.freedesktop.org/mailman/listinfo/dri-devel >

10 years, 9 months

Re: [Linaro-mm-sig] Question on UAPI for fences

by Jerome Glisse

On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian König wrote: > Am 12.09.2014 um 17:33 schrieb Jerome Glisse: > >On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote: > >>On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse <j.glisse(a)gmail.com> wrote: > >>>On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote: > >>>>On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter <daniel(a)ffwll.ch> wrote: > >>>>>On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote: > >>>>>>Hello everyone, > >>>>>> > >>>>>>to allow concurrent buffer access by different engines beyond the multiple > >>>>>>readers/single writer model that we currently use in radeon and other > >>>>>>drivers we need some kind of synchonization object exposed to userspace. > >>>>>> > >>>>>>My initial patch set for this used (or rather abused) zero sized GEM buffers > >>>>>>as fence handles. This is obviously isn't the best way of doing this (to > >>>>>>much overhead, rather ugly etc...), Jerome commented on this accordingly. > >>>>>> > >>>>>>So what should a driver expose instead? Android sync points? Something else? > >>>>>I think actually exposing the struct fence objects as a fd, using android > >>>>>syncpts (or at least something compatible to it) is the way to go. Problem > >>>>>is that it's super-hard to get the android guys out of hiding for this :( > >>>>> > >>>>>Adding a bunch of people in the hopes that something sticks. > >>>>More people. > >>>Just to re-iterate, exposing such thing while still using command stream > >>>ioctl that use implicit synchronization is a waste and you can only get > >>>the lowest common denominator which is implicit synchronization. So i do > >>>not see the point of such api if you are not also adding a new cs ioctl > >>>with explicit contract that it does not do any kind of synchronization > >>>(it could be almost the exact same code modulo the do not wait for > >>>previous cmd to complete). > >>Our thinking was to allow explicit sync from a single process, but > >>implicitly sync between processes. > >This is a BIG NAK if you are using the same ioctl as it would mean you are > >changing userspace API, well at least userspace expectation. Adding a new > >cs flag might do the trick but it should not be about inter-process, or any > >thing special, it's just implicit sync or no synchronization. Converting > >userspace is not that much of a big deal either, it can be broken into > >several step. Like mesa use explicit synchronization all time but ddx use > >implicit. > > The thinking here is that we need to be backward compatible for DRI2/3 and > support all kind of different use cases like old DDX and new Mesa, or old > Mesa and new DDX etc... > > So for my prototype if the kernel sees any access of a BO from two different > clients it falls back to the old behavior of implicit synchronization of > access to the same buffer object. That might not be the fastest approach, > but is as far as I can see conservative and so should work under all > conditions. > > Apart from that the planning so far was that we just hide this feature > behind a couple of command submission flags and new chunks. Just to reproduce IRC discussion, i think it's a lot simpler and not that complex. For explicit cs ioctl you do not wait for any previous fence of any of the buffer referenced in the cs ioctl, but you still associate a new fence with all the buffer object referenced in the cs ioctl. So if the next ioctl is an implicit sync ioctl it will wait properly and synchronize properly with previous explicit cs ioctl. Hence you can easily have a mix in userspace thing is you only get benefit once enough of your userspace is using explicit. Note that you still need a way to have explicit cs ioctl to wait on a previos "explicit" fence so you need some api to expose fence per cs submission. Cheers, Jérôme > > Regards, > Christian. > > > > >Cheers, > >Jérôme > > > >>Alex > >> > >>>Also one thing that the Android sync point does not have, AFAICT, is a > >>>way to schedule synchronization as part of a cs ioctl so cpu never have > >>>to be involve for cmd stream that deal only one gpu (assuming the driver > >>>and hw can do such trick). > >>> > >>>Cheers, > >>>Jérôme > >>> > >>>>-Daniel > >>>>-- > >>>>Daniel Vetter > >>>>Software Engineer, Intel Corporation > >>>>+41 (0) 79 365 57 48 - http://blog.ffwll.ch > >>>_______________________________________________ > >>>dri-devel mailing list > >>>dri-devel(a)lists.freedesktop.org > >>>http://lists.freedesktop.org/mailman/listinfo/dri-devel >

10 years, 9 months

Re: [Linaro-mm-sig] Question on UAPI for fences

by Daniel Vetter

On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote: > Hello everyone, > > to allow concurrent buffer access by different engines beyond the multiple > readers/single writer model that we currently use in radeon and other > drivers we need some kind of synchonization object exposed to userspace. > > My initial patch set for this used (or rather abused) zero sized GEM buffers > as fence handles. This is obviously isn't the best way of doing this (to > much overhead, rather ugly etc...), Jerome commented on this accordingly. > > So what should a driver expose instead? Android sync points? Something else? I think actually exposing the struct fence objects as a fd, using android syncpts (or at least something compatible to it) is the way to go. Problem is that it's super-hard to get the android guys out of hiding for this :( Adding a bunch of people in the hopes that something sticks. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

10 years, 9 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig September 2014