This adds a DRM driver that implements communication between the CPU and an APU. The driver target embedded device that usually run inference using some prebuilt models. The goal is to provide common infrastructure that could be re-used to support many accelerators. Both kernel, userspace and firmware tries to use standard and existing to leverage the development and maintenance effort. The series implements two platform drivers, one for simulation and another one for the mt8183 (compatible with mt8365).
For the people interested by the firmware or userspace library, the sources are available here: https://gitlab.baylibre.com/baylibre/libapu/libapu
The support of APU has to be upstreamed to libdrm. Until this is done, you could find the source here: https://gitlab.baylibre.com/baylibre/libapu/libdrm/-/tree/abailon/main
The driver for mt8183 depends on this series (which is currently blocked): https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=620429
Alexandre Bailon (5): drm: Add support of AI Processor Unit (APU) drm/apu: Add memory allocator drm/apu: Add support of requests drm/apu: Add support of IOMMU dt-bindings: Add bidings for mtk,apu-drm
Julien Stephan (2): drm/apu: allow platform driver to implement their own mmap function drm/apu: Add support for a simulated APU
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 ++ drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/apu/Kconfig | 22 + drivers/gpu/drm/apu/Makefile | 10 + drivers/gpu/drm/apu/apu_drv.c | 282 +++++++++ drivers/gpu/drm/apu/apu_gem.c | 230 +++++++ drivers/gpu/drm/apu/apu_internal.h | 205 ++++++ drivers/gpu/drm/apu/apu_sched.c | 592 ++++++++++++++++++ drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++ include/uapi/drm/apu_drm.h | 81 +++ 11 files changed, 1776 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml create mode 100644 drivers/gpu/drm/apu/Kconfig create mode 100644 drivers/gpu/drm/apu/Makefile create mode 100644 drivers/gpu/drm/apu/apu_drv.c create mode 100644 drivers/gpu/drm/apu/apu_gem.c create mode 100644 drivers/gpu/drm/apu/apu_internal.h create mode 100644 drivers/gpu/drm/apu/apu_sched.c create mode 100644 drivers/gpu/drm/apu/simu_apu.c create mode 100644 include/uapi/drm/apu_drm.h
Many AI Processur Unit (APU) have a similar architecture. This driver intends helping supporting them. This relies on DRM and provides some abstractions useful for AI accelerators. Currently, this provides the infrastructure to alloc an APU device and register one or many cores. The driver will takes care to register itself to DRM.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com --- drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/apu/Kconfig | 12 ++ drivers/gpu/drm/apu/Makefile | 5 + drivers/gpu/drm/apu/apu_drv.c | 272 +++++++++++++++++++++++++++++ drivers/gpu/drm/apu/apu_internal.h | 68 ++++++++ include/uapi/drm/apu_drm.h | 28 +++ 7 files changed, 388 insertions(+) create mode 100644 drivers/gpu/drm/apu/Kconfig create mode 100644 drivers/gpu/drm/apu/Makefile create mode 100644 drivers/gpu/drm/apu/apu_drv.c create mode 100644 drivers/gpu/drm/apu/apu_internal.h create mode 100644 include/uapi/drm/apu_drm.h
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index ba3fb04bb691..32ffa66a8b54 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -371,6 +371,8 @@ source "drivers/gpu/drm/solomon/Kconfig"
source "drivers/gpu/drm/sprd/Kconfig"
+source "drivers/gpu/drm/apu/Kconfig" + config DRM_HYPERV tristate "DRM Support for Hyper-V synthetic video device" depends on DRM && PCI && MMU && HYPERV diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index a33257d2bc7f..7cd8c0f3936a 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -191,6 +191,7 @@ obj-$(CONFIG_DRM_MCDE) += mcde/ obj-$(CONFIG_DRM_TIDSS) += tidss/ obj-y += xlnx/ obj-y += gud/ +obj-$(CONFIG_DRM_APU) += apu/ obj-$(CONFIG_DRM_HYPERV) += hyperv/ obj-y += solomon/ obj-$(CONFIG_DRM_SPRD) += sprd/ diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig new file mode 100644 index 000000000000..226dcf072115 --- /dev/null +++ b/drivers/gpu/drm/apu/Kconfig @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: GPL-2.0-only +# + +config DRM_APU + tristate "APU (AI Processor Unit)" + select DRM_GEM_DMA_HELPER + select DRM_KMS_HELPER + help + This provides a DRM driver that provides some facilities to + communicate with an AI Processor Unit (APU). + The driver intends to provide a common infrastructure that may be + used to support many different APU. diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile new file mode 100644 index 000000000000..ad85b88a8b52 --- /dev/null +++ b/drivers/gpu/drm/apu/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +drm_apu-y += apu_drv.o + +obj-$(CONFIG_DRM_APU) += drm_apu.o diff --git a/drivers/gpu/drm/apu/apu_drv.c b/drivers/gpu/drm/apu/apu_drv.c new file mode 100644 index 000000000000..b420b13a9ffd --- /dev/null +++ b/drivers/gpu/drm/apu/apu_drv.c @@ -0,0 +1,272 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Copyright 2020 BayLibre SAS + +#include <linux/list.h> +#include <linux/module.h> + +#include <drm/apu_drm.h> +#include <drm/drm_drv.h> +#include <drm/drm_gem_dma_helper.h> +#include <drm/drm_probe_helper.h> + +#include "apu_internal.h" + +static LIST_HEAD(apu_devices); + +static int ioctl_apu_state(struct drm_device *dev, void *data, + struct drm_file *file_priv); + +static const struct drm_ioctl_desc ioctls[] = { + DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state, + DRM_RENDER_ALLOW), +}; + +DEFINE_DRM_GEM_DMA_FOPS(apu_drm_ops); + +static struct drm_driver apu_drm_driver = { + .driver_features = DRIVER_GEM | DRIVER_SYNCOBJ, + .name = "drm_apu", + .desc = "APU DRM driver", + .date = "20210319", + .major = 1, + .minor = 0, + .patchlevel = 0, + .ioctls = ioctls, + .num_ioctls = ARRAY_SIZE(ioctls), + .fops = &apu_drm_ops, + DRM_GEM_DMA_DRIVER_OPS_WITH_DUMB_CREATE(drm_gem_dma_dumb_create), +}; + +/** + * apu_dev_alloc() - Allocate a new APU device + * + * @dev: Pointer to the device instance. + + * This allocate an APU device. + * The APU describe a hardware accelerator that may have one or more + * core (or unit). + * + * Returns: A pointer or NULL in case of failure. + */ +struct apu_drm *apu_dev_alloc(struct device *dev) +{ + struct drm_device *drm; + struct apu_drm *apu; + + apu = devm_drm_dev_alloc(dev, &apu_drm_driver, typeof(*apu), base); + if (IS_ERR(apu)) + return NULL; + INIT_LIST_HEAD(&apu->cores); + + apu->dev = dev; + ida_init(&apu->ida); + drm = &apu->base; + drm->dev_private = apu; + + dev_set_drvdata(dev, drm); + + return apu; +} +EXPORT_SYMBOL_GPL(apu_dev_alloc); + +/** + * apu_dev_register() - Register the APU to DRM + * + * @apu: Pointer to APU device + * + * Register an APU device to DRM. + * On success, this creates everything required to use the APU. + * Note that at this step, the cores (or units) have not been + * registered so we can't yet perform any operations. + * + * Returns: Zero on success, non-zero value on failure. + */ +int apu_dev_register(struct apu_drm *apu) +{ + struct drm_device *drm = &apu->base; + int ret; + + ret = drm_dev_register(drm, 0); + if (ret) + return ret; + + list_add(&apu->node, &apu_devices); + + return 0; +} +EXPORT_SYMBOL_GPL(apu_dev_register); + +/** + * apu_dev_unregister() - Unregister the APU + * + * @apu: Pointer to APU device + * + * This undo what has been done by apu_dev_register(); + */ +void apu_dev_unregister(struct apu_drm *apu) +{ + struct drm_device *drm = &apu->base; + + list_del(&apu->node); + drm_dev_unregister(drm); +} +EXPORT_SYMBOL_GPL(apu_dev_unregister); + +/** + * apu_core_alloc() - Allocate an APU core + * + * @apu: Pointer to APU device + * @ops: The operation callbacks to use for this core + * @priv: + * + * Allocate an APU core. This represents a computing unit that could + * execute a job. The APU may be composed of different units that doesn't + * accept same kind of jobs so we may to use differents callbacks for each + * core. + * + * Returns: A pointer or NULL in case of failure. + */ +struct apu_core *apu_core_alloc(struct apu_drm *apu, struct apu_core_ops *ops, + void *priv) +{ + struct apu_core *core; + + if (!ops || !ops->is_ready) + return NULL; + + core = devm_kzalloc(apu->dev, sizeof(*core), GFP_KERNEL); + if (!core) + return NULL; + + core->device_id = ida_alloc(&apu->ida, GFP_KERNEL); + if (core->device_id < 0) + return NULL; + + core->apu = apu; + core->priv = priv; + core->ops = ops; + + list_add(&core->node, &apu->cores); + + return core; +} +EXPORT_SYMBOL_GPL(apu_core_alloc); + +/** + * apu_core_free() - Free an APU core allocated using apu_core_alloc() + * + * @core: The APU core to release + */ +void apu_core_free(struct apu_core *core) +{ + ida_free(&core->apu->ida, core->device_id); + list_del(&core->node); +} +EXPORT_SYMBOL_GPL(apu_core_free); + +/** + * apu_core_register() - Register a core to APU device + * + * @dev: Pointer to APU device + * @core: Pointer to APU core to register + * @priv: Private data attached to this core + * + * Register an APU core and make it available for computing. + * On success, userspace can start using this core. + * + * Returns: Zero on success, non-zero value on failure. + */ +int apu_core_register(struct device *dev, struct apu_core *core, void *priv) +{ + int ret; + + core->dev_priv = priv; + core->dev = dev; + + if (core->ops->register_core) { + ret = core->ops->register_core(core); + if (ret) + return ret; + } + + return 0; +} +EXPORT_SYMBOL_GPL(apu_core_register); + +/** + * apu_core_remove() - Remove a core from the APU device + * + * @core: Pointer to APU core to remove + */ +void apu_core_remove(struct apu_core *core) +{ + core->dev_priv = NULL; +} +EXPORT_SYMBOL_GPL(apu_core_remove); + +/** + * apu_find_core_by_priv() - Find a core allocated by apu_core_alloc() + * + * @priv: The pointer used to allocate the core + * + * All core allocated using apu_core_alloc() is registered to a list. + * This goes through the list to find the core using the @priv field. + * + * Returns: A pointer or NULL if no core has been found. + */ +struct apu_core *apu_find_core_by_priv(void *priv) +{ + struct apu_drm *apu; + struct apu_core *core; + + list_for_each_entry(apu, &apu_devices, node) { + list_for_each_entry(core, &apu->cores, node) { + if (core->priv == priv) + return core; + } + } + + return NULL; +} +EXPORT_SYMBOL_GPL(apu_find_core_by_priv); + +static struct apu_core *get_apu_core(struct apu_drm *apu, int device_id) +{ + struct apu_core *core; + + list_for_each_entry(core, &apu->cores, node) { + if (core->device_id == device_id) + return core; + } + + return NULL; +} + +static void apu_core_update_state(struct apu_core *core) +{ + if (!core->ops->is_ready(core)) + core->flags &= ~APU_ONLINE; +} + +static int ioctl_apu_state(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct apu_drm *apu = dev->dev_private; + struct drm_apu_state *args = data; + struct apu_core *core; + + args->flags = 0; + + core = get_apu_core(apu, args->device); + if (!core) + return -ENODEV; + + apu_core_update_state(core); + args->flags |= core->flags; + + return 0; +} + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Alexandre Bailon"); diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h new file mode 100644 index 000000000000..58d93a16c68f --- /dev/null +++ b/drivers/gpu/drm/apu/apu_internal.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __APU_INTERNAL_H__ +#define __APU_INTERNAL_H__ + +#include <drm/drm_drv.h> + +struct apu_core { + int device_id; + struct device *dev; + struct apu_core_ops *ops; + struct apu_drm *apu; + + struct list_head node; + void *priv; + void *dev_priv; + + u32 flags; +}; + +struct apu_drm { + struct drm_device base; + struct device *dev; + + struct list_head cores; + struct list_head node; + + struct ida ida; +}; + +/** + * @apu_core_ops: Provides platform specific callbacks + */ +struct apu_core_ops { + /** + * @register_core: + * + * Optional. Platform specific APU core registration. + */ + int (*register_core)(struct apu_core *core); + + /** + * @is_ready: + * + * Implements platform specific code to test if APU is ready to receive + * commands. + * Basically, an APU core may be running but not be ready to handle + * commands. This allows checking if APU is ready and start executing + * requests. + * + * Returns: + * + * One if the APU is ready or zero. + */ + int (*is_ready)(struct apu_core *core); +}; + +struct apu_drm *apu_dev_alloc(struct device *dev); +int apu_dev_register(struct apu_drm *apu); +void apu_dev_unregister(struct apu_drm *apu); + +struct apu_core *apu_core_alloc(struct apu_drm *apu, struct apu_core_ops *ops, + void *priv); +void apu_core_free(struct apu_core *core); +int apu_core_register(struct device *dev, struct apu_core *core, void *priv); +void apu_core_remove(struct apu_core *core); +struct apu_core *apu_find_core_by_priv(void *priv); + +#endif /* __APU_INTERNAL_H__ */ diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h new file mode 100644 index 000000000000..d50c63d1b813 --- /dev/null +++ b/include/uapi/drm/apu_drm.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note*/ + +#ifndef __UAPI_APU_DRM_H__ +#define __UAPI_APU_DRM_H__ + +#include "drm.h" + +#if defined(__cplusplus) +extern "C" { +#endif + +#define APU_ONLINE BIT(0) + +struct drm_apu_state { + __u32 device; + __u32 flags; +}; + +#define DRM_APU_STATE 0x00 +#define DRM_APU_NUM_IOCTLS 0x01 + +#define DRM_IOCTL_APU_STATE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_STATE, struct drm_apu_state) + +#if defined(__cplusplus) +} +#endif + +#endif /* __UAPI_APU_DRM_H__ */
This adds a new ioctl to allocate GEM object.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com --- drivers/gpu/drm/apu/Makefile | 1 + drivers/gpu/drm/apu/apu_drv.c | 2 ++ drivers/gpu/drm/apu/apu_gem.c | 56 ++++++++++++++++++++++++++++++ drivers/gpu/drm/apu/apu_internal.h | 30 ++++++++++++++++ include/uapi/drm/apu_drm.h | 16 ++++++++- 5 files changed, 104 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/apu/apu_gem.c
diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile index ad85b88a8b52..91894250d4c1 100644 --- a/drivers/gpu/drm/apu/Makefile +++ b/drivers/gpu/drm/apu/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0
drm_apu-y += apu_drv.o +drm_apu-y += apu_gem.o
obj-$(CONFIG_DRM_APU) += drm_apu.o diff --git a/drivers/gpu/drm/apu/apu_drv.c b/drivers/gpu/drm/apu/apu_drv.c index b420b13a9ffd..323e267b0f53 100644 --- a/drivers/gpu/drm/apu/apu_drv.c +++ b/drivers/gpu/drm/apu/apu_drv.c @@ -20,6 +20,8 @@ static int ioctl_apu_state(struct drm_device *dev, void *data, static const struct drm_ioctl_desc ioctls[] = { DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(APU_GEM_NEW, ioctl_gem_new, + DRM_RENDER_ALLOW), };
DEFINE_DRM_GEM_DMA_FOPS(apu_drm_ops); diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c new file mode 100644 index 000000000000..0e7b3b27942c --- /dev/null +++ b/drivers/gpu/drm/apu/apu_gem.c @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Copyright 2020 BayLibre SAS + +#include <drm/drm_gem_dma_helper.h> + +#include <uapi/drm/apu_drm.h> + +#include "apu_internal.h" + +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev, + size_t size) +{ + struct drm_gem_dma_object *dma_obj; + + dma_obj = drm_gem_dma_create(dev, size); + if (!dma_obj) + return NULL; + + return &dma_obj->base; +} + +int ioctl_gem_new(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_apu_gem_new *args = data; + struct drm_gem_dma_object *dma_obj; + struct apu_gem_object *apu_obj; + struct drm_gem_object *gem_obj; + int ret; + + dma_obj = drm_gem_dma_create(dev, args->size); + if (IS_ERR(dma_obj)) + return PTR_ERR(dma_obj); + + gem_obj = &dma_obj->base; + apu_obj = to_apu_bo(gem_obj); + + /* + * Save the size of buffer expected by application instead of the + * aligned one. + */ + apu_obj->size = args->size; + apu_obj->offset = 0; + mutex_init(&apu_obj->mutex); + + ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle); + drm_gem_object_put(gem_obj); + if (ret) { + drm_gem_dma_object_free(gem_obj); + return ret; + } + args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node); + + return 0; +} diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h index 58d93a16c68f..203aadc58b72 100644 --- a/drivers/gpu/drm/apu/apu_internal.h +++ b/drivers/gpu/drm/apu/apu_internal.h @@ -3,6 +3,14 @@ #define __APU_INTERNAL_H__
#include <drm/drm_drv.h> +#include <drm/drm_gem_dma_helper.h> + +struct apu_gem_object { + struct drm_gem_dma_object base; + struct mutex mutex; + size_t size; + u32 offset; +};
struct apu_core { int device_id; @@ -54,6 +62,17 @@ struct apu_core_ops { int (*is_ready)(struct apu_core *core); };
+static inline struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj) +{ + return container_of(to_drm_gem_dma_obj(obj), struct apu_gem_object, + base); +} + +static inline void *apu_drm_priv(struct apu_core *apu_core) +{ + return apu_core->dev_priv; +} + struct apu_drm *apu_dev_alloc(struct device *dev); int apu_dev_register(struct apu_drm *apu); void apu_dev_unregister(struct apu_drm *apu); @@ -65,4 +84,15 @@ int apu_core_register(struct device *dev, struct apu_core *core, void *priv); void apu_core_remove(struct apu_core *core); struct apu_core *apu_find_core_by_priv(void *priv);
+struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj); +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev, + size_t size); + +int ioctl_gem_new(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int ioctl_gem_user_new(struct drm_device *dev, void *data, + struct drm_file *file_priv); +struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem, + int flags); + #endif /* __APU_INTERNAL_H__ */ diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h index d50c63d1b813..14fc478f45dc 100644 --- a/include/uapi/drm/apu_drm.h +++ b/include/uapi/drm/apu_drm.h @@ -9,6 +9,18 @@ extern "C" { #endif
+/* + * Please note that modifications to all structs defined here are + * subject to backwards-compatibility constraints. + */ + +struct drm_apu_gem_new { + __u32 size; /* in */ + __u32 flags; /* in */ + __u32 handle; /* out */ + __u64 offset; /* out */ +}; + #define APU_ONLINE BIT(0)
struct drm_apu_state { @@ -17,9 +29,11 @@ struct drm_apu_state { };
#define DRM_APU_STATE 0x00 -#define DRM_APU_NUM_IOCTLS 0x01 +#define DRM_APU_GEM_NEW 0x01 +#define DRM_APU_NUM_IOCTLS 0x02
#define DRM_IOCTL_APU_STATE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_STATE, struct drm_apu_state) +#define DRM_IOCTL_APU_GEM_NEW DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_NEW, struct drm_apu_gem_new)
#if defined(__cplusplus) }
This updates the APU driver to with two new ioctls to queue and dequeue requests. This uses DRM scheduler to manage the requests. The requests allocation and send and receive operations are platform specifics and must be implemented as callback.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com --- drivers/gpu/drm/apu/Kconfig | 1 + drivers/gpu/drm/apu/Makefile | 1 + drivers/gpu/drm/apu/apu_drv.c | 52 +-- drivers/gpu/drm/apu/apu_internal.h | 93 ++++- drivers/gpu/drm/apu/apu_sched.c | 564 +++++++++++++++++++++++++++++ include/uapi/drm/apu_drm.h | 31 +- 6 files changed, 697 insertions(+), 45 deletions(-) create mode 100644 drivers/gpu/drm/apu/apu_sched.c
diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig index 226dcf072115..a769df42091c 100644 --- a/drivers/gpu/drm/apu/Kconfig +++ b/drivers/gpu/drm/apu/Kconfig @@ -5,6 +5,7 @@ config DRM_APU tristate "APU (AI Processor Unit)" select DRM_GEM_DMA_HELPER select DRM_KMS_HELPER + select DRM_SCHED help This provides a DRM driver that provides some facilities to communicate with an AI Processor Unit (APU). diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile index 91894250d4c1..fc8d6380fc38 100644 --- a/drivers/gpu/drm/apu/Makefile +++ b/drivers/gpu/drm/apu/Makefile @@ -2,5 +2,6 @@
drm_apu-y += apu_drv.o drm_apu-y += apu_gem.o +drm_apu-y += apu_sched.o
obj-$(CONFIG_DRM_APU) += drm_apu.o diff --git a/drivers/gpu/drm/apu/apu_drv.c b/drivers/gpu/drm/apu/apu_drv.c index 323e267b0f53..b6bd340b2bc8 100644 --- a/drivers/gpu/drm/apu/apu_drv.c +++ b/drivers/gpu/drm/apu/apu_drv.c @@ -14,14 +14,15 @@
static LIST_HEAD(apu_devices);
-static int ioctl_apu_state(struct drm_device *dev, void *data, - struct drm_file *file_priv); - static const struct drm_ioctl_desc ioctls[] = { DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state, DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(APU_GEM_NEW, ioctl_gem_new, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(APU_GEM_QUEUE, ioctl_gem_queue, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue, + DRM_RENDER_ALLOW), };
DEFINE_DRM_GEM_DMA_FOPS(apu_drm_ops); @@ -134,7 +135,8 @@ struct apu_core *apu_core_alloc(struct apu_drm *apu, struct apu_core_ops *ops, { struct apu_core *core;
- if (!ops || !ops->is_ready) + if (!ops || !ops->alloc_prepare_request || !ops->send_request + || !ops->handle_request || !ops->is_ready) return NULL;
core = devm_kzalloc(apu->dev, sizeof(*core), GFP_KERNEL); @@ -148,6 +150,8 @@ struct apu_core *apu_core_alloc(struct apu_drm *apu, struct apu_core_ops *ops, core->apu = apu; core->priv = priv; core->ops = ops; + spin_lock_init(&core->ctx_lock); + INIT_LIST_HEAD(&core->requests);
list_add(&core->node, &apu->cores);
@@ -192,7 +196,7 @@ int apu_core_register(struct device *dev, struct apu_core *core, void *priv) return ret; }
- return 0; + return apu_drm_job_init(core); } EXPORT_SYMBOL_GPL(apu_core_register);
@@ -203,6 +207,7 @@ EXPORT_SYMBOL_GPL(apu_core_register); */ void apu_core_remove(struct apu_core *core) { + apu_sched_fini(core); core->dev_priv = NULL; } EXPORT_SYMBOL_GPL(apu_core_remove); @@ -233,42 +238,5 @@ struct apu_core *apu_find_core_by_priv(void *priv) } EXPORT_SYMBOL_GPL(apu_find_core_by_priv);
-static struct apu_core *get_apu_core(struct apu_drm *apu, int device_id) -{ - struct apu_core *core; - - list_for_each_entry(core, &apu->cores, node) { - if (core->device_id == device_id) - return core; - } - - return NULL; -} - -static void apu_core_update_state(struct apu_core *core) -{ - if (!core->ops->is_ready(core)) - core->flags &= ~APU_ONLINE; -} - -static int ioctl_apu_state(struct drm_device *dev, void *data, - struct drm_file *file_priv) -{ - struct apu_drm *apu = dev->dev_private; - struct drm_apu_state *args = data; - struct apu_core *core; - - args->flags = 0; - - core = get_apu_core(apu, args->device); - if (!core) - return -ENODEV; - - apu_core_update_state(core); - args->flags |= core->flags; - - return 0; -} - MODULE_LICENSE("GPL"); MODULE_AUTHOR("Alexandre Bailon"); diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h index 203aadc58b72..021a3efdedf2 100644 --- a/drivers/gpu/drm/apu/apu_internal.h +++ b/drivers/gpu/drm/apu/apu_internal.h @@ -4,6 +4,7 @@
#include <drm/drm_drv.h> #include <drm/drm_gem_dma_helper.h> +#include <drm/gpu_scheduler.h>
struct apu_gem_object { struct drm_gem_dma_object base; @@ -12,16 +13,21 @@ struct apu_gem_object { u32 offset; };
+struct apu_sched; struct apu_core { int device_id; struct device *dev; struct apu_core_ops *ops; struct apu_drm *apu;
+ spinlock_t ctx_lock; + struct list_head requests; + struct list_head node; void *priv; void *dev_priv;
+ struct apu_sched *sched; u32 flags; };
@@ -35,6 +41,43 @@ struct apu_drm { struct ida ida; };
+struct apu_job { + struct drm_sched_job base; + + struct kref refcount; + + struct apu_core *core; + struct apu_drm *apu; + + /* Fence to be signaled by IRQ handler when the job is complete. */ + struct dma_fence *done_fence; + + __u32 cmd; + + /* Exclusive fences we have taken from the BOs to wait for */ + struct dma_fence **implicit_fences; + struct drm_gem_object **bos; + u32 bo_count; + + /* Fence to be signaled by drm-sched once its done with the job */ + struct dma_fence *render_done_fence; + + void *data_in; + uint16_t size_in; + void *data_out; + uint16_t size_out; + uint16_t result; + uint16_t id; + + struct list_head node; + struct drm_syncobj *sync_out; + + struct apu_event *event; + + void *request_data; + int request_len; +}; + /** * @apu_core_ops: Provides platform specific callbacks */ @@ -46,6 +89,40 @@ struct apu_core_ops { */ int (*register_core)(struct apu_core *core);
+ /** + * @alloc_prepare_request: + * + * Allocate and initialize the data to be sent to the APU. + * Basically, this must convert the job object to something + * that could be understand by the hardware accelerator. + * + * Returns: + * + * Zero on success, non-zero value on failure. + */ + int (*alloc_prepare_request)(struct apu_job *job); + + /** + * @send_data: + * + * Implements platform specific code to send the data previously + * allocated using @alloc_data. + * + * Returns: + * + * Zero on success, non-zero value on failure. + */ + int (*send_request)(struct apu_job *job); + + /** + * @handle_request: + * + * Implements platform specific code to handle incoming request. + * This must decode the data encoded using platform specific format + * and convert it to generic format. + */ + int (*handle_request)(struct apu_job *job, void *data, int len); + /** * @is_ready: * @@ -68,9 +145,9 @@ static inline struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj) base); }
-static inline void *apu_drm_priv(struct apu_core *apu_core) +static inline void *apu_drm_priv(struct apu_core *core) { - return apu_core->dev_priv; + return core->dev_priv; }
struct apu_drm *apu_dev_alloc(struct device *dev); @@ -94,5 +171,17 @@ int ioctl_gem_user_new(struct drm_device *dev, void *data, struct drm_file *file_priv); struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem, int flags); +int ioctl_gem_queue(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int ioctl_gem_dequeue(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int ioctl_apu_state(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int apu_drm_callback(struct apu_core *core, uint16_t job_id, void *data, int len); + +struct apu_job; + +int apu_drm_job_init(struct apu_core *core); +void apu_sched_fini(struct apu_core *core);
#endif /* __APU_INTERNAL_H__ */ diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c new file mode 100644 index 000000000000..13b6fbd00bd8 --- /dev/null +++ b/drivers/gpu/drm/apu/apu_sched.c @@ -0,0 +1,564 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Copyright 2020 BayLibre SAS + +#include <drm/apu_drm.h> +#include <drm/drm_syncobj.h> +#include <drm/gpu_scheduler.h> + +#include "apu_internal.h" + +struct apu_queue_state { + struct drm_gpu_scheduler sched; + u64 fence_context; + u64 seqno; +}; + +struct apu_request { + struct list_head node; + struct apu_job *job; +}; + +struct apu_sched { + struct apu_queue_state apu_queue; + spinlock_t job_lock; + struct drm_sched_entity sched_entity; +}; + +struct apu_event { + struct drm_pending_event pending_event; + union { + struct drm_event base; + struct apu_job_event job_event; + }; +}; + +static DEFINE_IDA(req_ida); +static LIST_HEAD(complete_node); + +static void apu_core_update_state(struct apu_core *core) +{ + if (!core->ops->is_ready(core)) + core->flags &= ~APU_ONLINE; +} + +static int apu_core_is_running(struct apu_core *core) +{ + apu_core_update_state(core); + + return core->flags & APU_ONLINE; +} + +static void apu_set_online(struct apu_core *core) +{ + core->flags |= APU_ONLINE; +} + +static void apu_set_offline(struct apu_core *core) +{ + core->flags &= ~APU_ONLINE; +} + +/* + * apu_drm_callback() - Handle the data coming from accelerator + * + * @core: The pointer to the APU core + * @job_id: The job id + * @data: The data coming from the accelerator + * @len: The size of the data + * + * Returns: Zero on success, non-zero value on failure. + */ +int apu_drm_callback(struct apu_core *core, uint16_t job_id, void *data, int len) +{ + struct apu_request *apu_req, *tmp; + unsigned long flags; + int ret = -EINVAL; + + spin_lock_irqsave(&core->ctx_lock, flags); + list_for_each_entry_safe(apu_req, tmp, &core->requests, node) { + struct apu_job *job = apu_req->job; + + if (job && job_id == job->id) { + kref_get(&job->refcount); + ret = core->ops->handle_request(job, data, len); + list_add(&job->node, &complete_node); + list_del(&apu_req->node); + ida_simple_remove(&req_ida, job->id); + kfree(apu_req); + drm_send_event(&job->apu->base, + &job->event->pending_event); + dma_fence_signal_locked(job->done_fence); + break; + } + } + spin_unlock_irqrestore(&core->ctx_lock, flags); + + return ret; +} +EXPORT_SYMBOL_GPL(apu_drm_callback); + +static void apu_job_cleanup(struct kref *ref) +{ + struct apu_job *job = container_of(ref, struct apu_job, + refcount); + unsigned int i; + + if (job->implicit_fences) { + for (i = 0; i < job->bo_count; i++) + dma_fence_put(job->implicit_fences[i]); + kvfree(job->implicit_fences); + } + dma_fence_put(job->done_fence); + dma_fence_put(job->render_done_fence); + + if (job->bos) { + for (i = 0; i < job->bo_count; i++) { + struct apu_gem_object *apu_obj; + + apu_obj = to_apu_bo(job->bos[i]); + drm_gem_object_put(job->bos[i]); + } + + kvfree(job->bos); + } + + kfree(job->data_out); + kfree(job->data_in); + kfree(job); +} + +static void apu_job_put(struct apu_job *job) +{ + kref_put(&job->refcount, apu_job_cleanup); +} + +static void apu_acquire_object_fences(struct drm_gem_object **bos, + int bo_count, + struct dma_fence **implicit_fences) +{ + int i; + + for (i = 0; i < bo_count; i++) + dma_resv_get_singleton(bos[i]->resv, DMA_RESV_USAGE_KERNEL, + &implicit_fences[i]); +} + +static void apu_attach_object_fences(struct drm_gem_object **bos, + int bo_count, struct dma_fence *fence) +{ + int i; + + for (i = 0; i < bo_count; i++) { + dma_resv_reserve_fences(bos[i]->resv, 1); + dma_resv_add_fence(bos[i]->resv, fence, DMA_RESV_USAGE_KERNEL); + } +} + +static int apu_job_push(struct apu_job *job) +{ + struct drm_sched_entity *entity = &job->core->sched->sched_entity; + struct ww_acquire_ctx acquire_ctx; + int ret = 0; + int i; + + ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx); + if (ret) + return ret; + + ret = drm_sched_job_init(&job->base, entity, NULL); + if (ret) + goto unlock; + + drm_sched_job_arm(&job->base); + job->render_done_fence = dma_fence_get(&job->base.s_fence->finished); + + kref_get(&job->refcount); /* put by scheduler job completion */ + apu_acquire_object_fences(job->bos, job->bo_count, + job->implicit_fences); + + drm_sched_entity_push_job(&job->base); + + apu_attach_object_fences(job->bos, job->bo_count, + job->render_done_fence); + + for (i = 0; i < job->bo_count; i++) + ret = drm_sched_job_add_implicit_dependencies(&job->base, job->bos[i], + true); +unlock: + drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx); + + return ret; +} + +static const char *apu_fence_get_driver_name(struct dma_fence *fence) +{ + return "apu"; +} + +static const char *apu_fence_get_timeline_name(struct dma_fence *fence) +{ + return "apu-0"; +} + +static void apu_fence_release(struct dma_fence *f) +{ + kfree(f); +} + +static const struct dma_fence_ops apu_fence_ops = { + .get_driver_name = apu_fence_get_driver_name, + .get_timeline_name = apu_fence_get_timeline_name, + .release = apu_fence_release, +}; + +static struct dma_fence *apu_fence_create(struct apu_sched *sched) +{ + struct dma_fence *fence; + struct apu_queue_state *apu_queue = &sched->apu_queue; + + fence = kzalloc(sizeof(*fence), GFP_KERNEL); + if (!fence) + return ERR_PTR(-ENOMEM); + + dma_fence_init(fence, &apu_fence_ops, &sched->job_lock, + apu_queue->fence_context, apu_queue->seqno++); + + return fence; +} + +static struct apu_job *to_apu_job(struct drm_sched_job *sched_job) +{ + return container_of(sched_job, struct apu_job, base); +} + +static int apu_job_hw_submit(struct apu_job *job) +{ + int ret; + struct apu_core *core = job->core; + struct apu_request *apu_req; + unsigned long flags; + + ret = ida_simple_get(&req_ida, 0, 0xffff, GFP_KERNEL); + if (ret < 0) + return ret; + job->id = ret; + + ret = core->ops->alloc_prepare_request(job); + if (ret || !job->request_data || !job->request_len) { + ret = -ENOMEM; + goto err_ida_remove; + } + + apu_req = kzalloc(sizeof(*apu_req), GFP_KERNEL); + if (!apu_req) { + ret = -ENOMEM; + goto err_free_data; + } + + apu_req->job = job; + spin_lock_irqsave(&core->ctx_lock, flags); + list_add(&apu_req->node, &core->requests); + spin_unlock_irqrestore(&core->ctx_lock, flags); + ret = core->ops->send_request(job); + if (ret < 0) + goto err; + kfree(job->request_data); + + return 0; + +err: + list_del(&apu_req->node); + kfree(apu_req); +err_free_data: + kfree(job->request_data); +err_ida_remove: + ida_simple_remove(&req_ida, job->id); + + return ret; +} + +static struct dma_fence *apu_job_run(struct drm_sched_job *sched_job) +{ + struct apu_job *job = to_apu_job(sched_job); + struct dma_fence *fence = NULL; + + if (unlikely(job->base.s_fence->finished.error)) + return NULL; + + fence = apu_fence_create(job->core->sched); + if (IS_ERR(fence)) + return NULL; + + job->done_fence = dma_fence_get(fence); + + apu_job_hw_submit(job); + + return fence; +} + +static enum drm_gpu_sched_stat apu_job_timedout(struct drm_sched_job *sched_job) +{ + struct apu_request *apu_req, *tmp; + struct apu_job *job = to_apu_job(sched_job); + + if (dma_fence_is_signaled(job->done_fence)) + return DRM_GPU_SCHED_STAT_NOMINAL; + + list_for_each_entry_safe(apu_req, tmp, &job->core->requests, node) { + /* Remove the request and notify user about timeout */ + if (apu_req->job == job) { + kref_get(&job->refcount); + job->result = ETIMEDOUT; + list_add(&job->node, &complete_node); + list_del(&apu_req->node); + ida_simple_remove(&req_ida, job->id); + kfree(apu_req); + drm_send_event(&job->apu->base, + &job->event->pending_event); + dma_fence_signal_locked(job->done_fence); + } + } + + return DRM_GPU_SCHED_STAT_NOMINAL; +} + +static void apu_job_free(struct drm_sched_job *sched_job) +{ + struct apu_job *job = to_apu_job(sched_job); + + drm_sched_job_cleanup(sched_job); + + apu_job_put(job); +} + +static const struct drm_sched_backend_ops apu_sched_ops = { + .run_job = apu_job_run, + .timedout_job = apu_job_timedout, + .free_job = apu_job_free +}; + +int apu_drm_job_init(struct apu_core *core) +{ + int ret; + struct apu_sched *apu_sched; + struct drm_gpu_scheduler *sched; + + apu_sched = devm_kzalloc(core->dev, sizeof(*apu_sched), GFP_KERNEL); + if (!apu_sched) + return -ENOMEM; + + sched = &apu_sched->apu_queue.sched; + apu_sched->apu_queue.fence_context = dma_fence_context_alloc(1); + ret = drm_sched_init(sched, &apu_sched_ops, + 1, 0, msecs_to_jiffies(500), + NULL, NULL, "apu_js", core->dev); + if (ret) { + dev_err(core->dev, "Failed to create scheduler"); + return ret; + } + + ret = drm_sched_entity_init(&apu_sched->sched_entity, + DRM_SCHED_PRIORITY_NORMAL, + &sched, 1, NULL); + if (ret) { + dev_err(core->dev, "Failed to initialize scheduler entity"); + drm_sched_fini(&core->sched->apu_queue.sched); + return ret; + } + + core->sched = apu_sched; + apu_set_online(core); + + return 0; +} + +void apu_sched_fini(struct apu_core *core) +{ + apu_set_offline(core); + drm_sched_fini(&core->sched->apu_queue.sched); + devm_kfree(core->dev, core->sched); + core->sched = NULL; +} + +static struct apu_core *get_apu_core(struct apu_drm *apu, int device_id) +{ + struct apu_core *core; + + list_for_each_entry(core, &apu->cores, node) { + if (core->device_id == device_id) + return core; + } + + return NULL; +} + +static int apu_lookup_bos(struct drm_device *dev, struct drm_file *file_priv, + struct drm_apu_gem_queue *args, struct apu_job *job) +{ + void __user *bo_handles; + int ret; + + job->bo_count = args->bo_handle_count; + + if (!job->bo_count) + return 0; + + job->implicit_fences = kvmalloc_array(job->bo_count, + sizeof(struct dma_fence *), + GFP_KERNEL | __GFP_ZERO); + if (!job->implicit_fences) + return -ENOMEM; + + bo_handles = (void __user *)(uintptr_t) args->bo_handles; + ret = drm_gem_objects_lookup(file_priv, bo_handles, + job->bo_count, &job->bos); + + return ret; +} + +int ioctl_gem_queue(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct apu_drm *apu = dev->dev_private; + struct drm_apu_gem_queue *args = data; + struct apu_event *event; + struct apu_core *core; + struct drm_syncobj *sync_out = NULL; + struct apu_job *job; + int ret = 0; + + core = get_apu_core(apu, args->device); + if (!apu_core_is_running(core)) + return -ENODEV; + + if (args->out_sync > 0) { + sync_out = drm_syncobj_find(file_priv, args->out_sync); + if (!sync_out) + return -ENODEV; + } + + job = kzalloc(sizeof(*job), GFP_KERNEL); + if (!job) { + ret = -ENOMEM; + goto fail_out_sync; + } + + kref_init(&job->refcount); + + job->apu = apu; + job->core = core; + job->cmd = args->cmd; + job->size_in = args->size_in; + job->size_out = args->size_out; + job->sync_out = sync_out; + if (job->size_in) { + job->data_in = kmalloc(job->size_in, GFP_KERNEL); + if (!job->data_in) { + ret = -ENOMEM; + goto fail_job; + } + + ret = + copy_from_user(job->data_in, + (void __user *)(uintptr_t) args->data, + job->size_in); + if (ret) + goto fail_job; + } + + if (job->size_out) { + job->data_out = kmalloc(job->size_out, GFP_KERNEL); + if (!job->data_out) { + ret = -ENOMEM; + goto fail_job; + } + } + + ret = apu_lookup_bos(dev, file_priv, args, job); + if (ret) + goto fail_job; + + event = kzalloc(sizeof(*event), GFP_KERNEL); + event->base.length = sizeof(struct apu_job_event); + event->base.type = APU_JOB_COMPLETED; + event->job_event.out_sync = args->out_sync; + job->event = event; + ret = drm_event_reserve_init(dev, file_priv, &job->event->pending_event, + &job->event->base); + if (ret) + goto fail_job; + + ret = apu_job_push(job); + if (ret) { + drm_event_cancel_free(dev, &job->event->pending_event); + goto fail_job; + } + + if (sync_out) + drm_syncobj_replace_fence(sync_out, job->render_done_fence); + +fail_job: + apu_job_put(job); +fail_out_sync: + if (sync_out) + drm_syncobj_put(sync_out); + + return ret; +} + +int ioctl_gem_dequeue(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_apu_gem_dequeue *args = data; + struct drm_syncobj *sync_out = NULL; + struct apu_job *job; + int ret = 0; + + if (args->out_sync > 0) { + sync_out = drm_syncobj_find(file_priv, args->out_sync); + if (!sync_out) + return -ENODEV; + } + + list_for_each_entry(job, &complete_node, node) { + if (job->sync_out == sync_out) { + if (job->data_out) { + ret = copy_to_user((void __user *)(uintptr_t) + args->data, job->data_out, + job->size_out); + args->size = job->size_out; + } + args->result = job->result; + list_del(&job->node); + apu_job_put(job); + drm_syncobj_put(sync_out); + + return ret; + } + } + + if (sync_out) + drm_syncobj_put(sync_out); + + return 0; +} + +int ioctl_apu_state(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct apu_drm *apu = dev->dev_private; + struct drm_apu_state *args = data; + struct apu_core *core; + + args->flags = 0; + + core = get_apu_core(apu, args->device); + if (!core) + return -ENODEV; + + apu_core_update_state(core); + args->flags |= core->flags; + + return 0; +} diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h index 14fc478f45dc..c47000097040 100644 --- a/include/uapi/drm/apu_drm.h +++ b/include/uapi/drm/apu_drm.h @@ -9,6 +9,8 @@ extern "C" { #endif
+#define APU_JOB_COMPLETED 0x80000000 + /* * Please note that modifications to all structs defined here are * subject to backwards-compatibility constraints. @@ -21,6 +23,29 @@ struct drm_apu_gem_new { __u64 offset; /* out */ };
+struct drm_apu_gem_queue { + __u32 device; + __u32 cmd; + __u32 out_sync; + __u64 bo_handles; + __u32 bo_handle_count; + __u16 size_in; + __u16 size_out; + __u64 data; +}; + +struct drm_apu_gem_dequeue { + __u32 out_sync; + __u16 result; + __u16 size; + __u64 data; +}; + +struct apu_job_event { + struct drm_event base; + __u32 out_sync; +}; + #define APU_ONLINE BIT(0)
struct drm_apu_state { @@ -30,10 +55,14 @@ struct drm_apu_state {
#define DRM_APU_STATE 0x00 #define DRM_APU_GEM_NEW 0x01 -#define DRM_APU_NUM_IOCTLS 0x02 +#define DRM_APU_GEM_QUEUE 0x02 +#define DRM_APU_GEM_DEQUEUE 0x03 +#define DRM_APU_NUM_IOCTLS 0x04
#define DRM_IOCTL_APU_STATE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_STATE, struct drm_apu_state) #define DRM_IOCTL_APU_GEM_NEW DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_NEW, struct drm_apu_gem_new) +#define DRM_IOCTL_APU_GEM_QUEUE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_QUEUE, struct drm_apu_gem_queue) +#define DRM_IOCTL_APU_GEM_DEQUEUE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_DEQUEUE, struct drm_apu_gem_dequeue)
#if defined(__cplusplus) }
Some APU devices are behind an IOMMU. For some of these devices, we can't use DMA API because they use static addresses so we have to manually use IOMMU API to correctly map the buffers. This adds support of IOMMU.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com --- drivers/gpu/drm/apu/apu_drv.c | 4 + drivers/gpu/drm/apu/apu_gem.c | 174 +++++++++++++++++++++++++++++ drivers/gpu/drm/apu/apu_internal.h | 16 +++ drivers/gpu/drm/apu/apu_sched.c | 28 +++++ include/uapi/drm/apu_drm.h | 12 +- 5 files changed, 233 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/apu/apu_drv.c b/drivers/gpu/drm/apu/apu_drv.c index b6bd340b2bc8..a0dce785a02a 100644 --- a/drivers/gpu/drm/apu/apu_drv.c +++ b/drivers/gpu/drm/apu/apu_drv.c @@ -23,6 +23,10 @@ static const struct drm_ioctl_desc ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap, + DRM_RENDER_ALLOW), };
DEFINE_DRM_GEM_DMA_FOPS(apu_drm_ops); diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c index 0e7b3b27942c..0a91363754c5 100644 --- a/drivers/gpu/drm/apu/apu_gem.c +++ b/drivers/gpu/drm/apu/apu_gem.c @@ -2,6 +2,9 @@ // // Copyright 2020 BayLibre SAS
+#include <linux/iommu.h> +#include <linux/iova.h> + #include <drm/drm_gem_dma_helper.h>
#include <uapi/drm/apu_drm.h> @@ -42,6 +45,7 @@ int ioctl_gem_new(struct drm_device *dev, void *data, */ apu_obj->size = args->size; apu_obj->offset = 0; + apu_obj->iommu_refcount = 0; mutex_init(&apu_obj->mutex);
ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle); @@ -54,3 +58,173 @@ int ioctl_gem_new(struct drm_device *dev, void *data,
return 0; } + +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj) +{ + int iova_pfn; + int i; + + if (!obj->iommu_sgt) + return; + + mutex_lock(&obj->mutex); + obj->iommu_refcount--; + if (obj->iommu_refcount) { + mutex_unlock(&obj->mutex); + return; + } + + iova_pfn = PHYS_PFN(obj->iova); + for (i = 0; i < obj->iommu_sgt->nents; i++) { + iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn), + PAGE_ALIGN(obj->iommu_sgt->sgl[i].length)); + iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length)); + } + + sg_free_table(obj->iommu_sgt); + kfree(obj->iommu_sgt); + + free_iova(&apu_drm->iovad, PHYS_PFN(obj->iova)); + mutex_unlock(&obj->mutex); +} + +static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj) +{ + if (obj->funcs) + return obj->funcs->get_sg_table(obj); + return NULL; +} + +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj) +{ + struct apu_gem_object *apu_obj = to_apu_bo(obj); + struct scatterlist *sgl; + phys_addr_t phys; + int total_buf_space; + int iova_pfn; + int iova; + int ret; + int i; + + mutex_lock(&apu_obj->mutex); + apu_obj->iommu_refcount++; + if (apu_obj->iommu_refcount != 1) { + mutex_unlock(&apu_obj->mutex); + return 0; + } + + apu_obj->iommu_sgt = apu_get_sg_table(obj); + if (IS_ERR(apu_obj->iommu_sgt)) { + mutex_unlock(&apu_obj->mutex); + return PTR_ERR(apu_obj->iommu_sgt); + } + + total_buf_space = obj->size; + iova_pfn = alloc_iova_fast(&apu_drm->iovad, + total_buf_space >> PAGE_SHIFT, + apu_drm->iova_limit_pfn, true); + apu_obj->iova = PFN_PHYS(iova_pfn); + + if (!iova_pfn) { + dev_err(apu_drm->dev, "Failed to allocate iova address\n"); + mutex_unlock(&apu_obj->mutex); + return -ENOMEM; + } + + iova = apu_obj->iova; + sgl = apu_obj->iommu_sgt->sgl; + for (i = 0; i < apu_obj->iommu_sgt->nents; i++) { + phys = page_to_phys(sg_page(&sgl[i])); + ret = + iommu_map(apu_drm->domain, PFN_PHYS(iova_pfn), phys, + PAGE_ALIGN(sgl[i].length), IOMMU_READ | IOMMU_WRITE, + GFP_KERNEL); + if (ret) { + dev_err(apu_drm->dev, "Failed to iommu map\n"); + free_iova(&apu_drm->iovad, iova_pfn); + mutex_unlock(&apu_obj->mutex); + return ret; + } + iova += sgl[i].offset + sgl[i].length; + iova_pfn += PHYS_PFN(PAGE_ALIGN(sgl[i].length)); + } + mutex_unlock(&apu_obj->mutex); + + return 0; +} + +int ioctl_gem_iommu_map(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct apu_drm *apu_drm = dev->dev_private; + struct drm_apu_gem_iommu_map *args = data; + struct drm_gem_object **bos; + void __user *bo_handles; + u64 *das; + int ret; + int i; + + if (!apu_drm->domain) + return -ENODEV; + + das = kvmalloc_array(args->bo_handle_count, sizeof(*das), GFP_KERNEL); + if (!das) + return -ENOMEM; + + bo_handles = (void __user *)(uintptr_t) args->bo_handles; + ret = drm_gem_objects_lookup(file_priv, bo_handles, + args->bo_handle_count, &bos); + if (ret) { + kvfree(das); + return ret; + } + + for (i = 0; i < args->bo_handle_count; i++) { + ret = apu_bo_iommu_map(apu_drm, bos[i]); + if (ret) { + /* TODO: handle error */ + break; + } + das[i] = to_apu_bo(bos[i])->iova + to_apu_bo(bos[i])->offset; + } + + if (copy_to_user((void *)args->bo_device_addresses, das, + args->bo_handle_count * sizeof(u64))) { + ret = -EFAULT; + DRM_DEBUG("Failed to copy device addresses\n"); + goto out; + } + +out: + kvfree(das); + kvfree(bos); + + return 0; +} + +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct apu_drm *apu_drm = dev->dev_private; + struct drm_apu_gem_iommu_map *args = data; + struct drm_gem_object **bos; + void __user *bo_handles; + int ret; + int i; + + if (!apu_drm->domain) + return -ENODEV; + + bo_handles = (void __user *)(uintptr_t) args->bo_handles; + ret = drm_gem_objects_lookup(file_priv, bo_handles, + args->bo_handle_count, &bos); + if (ret) + return ret; + + for (i = 0; i < args->bo_handle_count; i++) + apu_bo_iommu_unmap(apu_drm, to_apu_bo(bos[i])); + + kvfree(bos); + + return 0; +} diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h index 021a3efdedf2..ea4183f3fb15 100644 --- a/drivers/gpu/drm/apu/apu_internal.h +++ b/drivers/gpu/drm/apu/apu_internal.h @@ -2,6 +2,9 @@ #ifndef __APU_INTERNAL_H__ #define __APU_INTERNAL_H__
+#include <linux/iommu.h> +#include <linux/iova.h> + #include <drm/drm_drv.h> #include <drm/drm_gem_dma_helper.h> #include <drm/gpu_scheduler.h> @@ -9,7 +12,10 @@ struct apu_gem_object { struct drm_gem_dma_object base; struct mutex mutex; + struct sg_table *iommu_sgt; + int iommu_refcount; size_t size; + u32 iova; u32 offset; };
@@ -35,6 +41,10 @@ struct apu_drm { struct drm_device base; struct device *dev;
+ struct iommu_domain *domain; + struct iova_domain iovad; + int iova_limit_pfn; + struct list_head cores; struct list_head node;
@@ -165,12 +175,18 @@ struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj); struct drm_gem_object *apu_gem_create_object(struct drm_device *dev, size_t size);
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj); +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj); int ioctl_gem_new(struct drm_device *dev, void *data, struct drm_file *file_priv); int ioctl_gem_user_new(struct drm_device *dev, void *data, struct drm_file *file_priv); struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem, int flags); +int ioctl_gem_iommu_map(struct drm_device *dev, void *data, + struct drm_file *file_priv); +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data, + struct drm_file *file_priv); int ioctl_gem_queue(struct drm_device *dev, void *data, struct drm_file *file_priv); int ioctl_gem_dequeue(struct drm_device *dev, void *data, diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c index 13b6fbd00bd8..716d4b7f2d55 100644 --- a/drivers/gpu/drm/apu/apu_sched.c +++ b/drivers/gpu/drm/apu/apu_sched.c @@ -117,6 +117,8 @@ static void apu_job_cleanup(struct kref *ref) struct apu_gem_object *apu_obj;
apu_obj = to_apu_bo(job->bos[i]); + if (job->apu->domain) + apu_bo_iommu_unmap(job->apu, apu_obj); drm_gem_object_put(job->bos[i]); }
@@ -397,6 +399,7 @@ static int apu_lookup_bos(struct drm_device *dev, struct drm_file *file_priv, struct drm_apu_gem_queue *args, struct apu_job *job) { void __user *bo_handles; + unsigned int i; int ret;
job->bo_count = args->bo_handle_count; @@ -413,6 +416,31 @@ static int apu_lookup_bos(struct drm_device *dev, struct drm_file *file_priv, bo_handles = (void __user *)(uintptr_t) args->bo_handles; ret = drm_gem_objects_lookup(file_priv, bo_handles, job->bo_count, &job->bos); + if (ret) + return ret; + + if (!job->apu->domain) + return 0; + + for (i = 0; i < job->bo_count; i++) { + ret = apu_bo_iommu_map(job->apu, job->bos[i]); + if (ret) + goto err_iommu_map; + } + + return ret; + +err_iommu_map: + kvfree(job->implicit_fences); + for (i = 0; i < job->bo_count; i++) { + struct apu_gem_object *apu_obj; + + apu_obj = to_apu_bo(job->bos[i]); + if (job->apu->domain) + apu_bo_iommu_unmap(job->apu, apu_obj); + drm_gem_object_put(job->bos[i]); + } + kvfree(job->bos);
return ret; } diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h index c47000097040..0ecc739d8aed 100644 --- a/include/uapi/drm/apu_drm.h +++ b/include/uapi/drm/apu_drm.h @@ -41,6 +41,12 @@ struct drm_apu_gem_dequeue { __u64 data; };
+struct drm_apu_gem_iommu_map { + __u64 bo_handles; + __u32 bo_handle_count; + __u64 bo_device_addresses; +}; + struct apu_job_event { struct drm_event base; __u32 out_sync; @@ -57,12 +63,16 @@ struct drm_apu_state { #define DRM_APU_GEM_NEW 0x01 #define DRM_APU_GEM_QUEUE 0x02 #define DRM_APU_GEM_DEQUEUE 0x03 -#define DRM_APU_NUM_IOCTLS 0x04 +#define DRM_APU_GEM_IOMMU_MAP 0x04 +#define DRM_APU_GEM_IOMMU_UNMAP 0x05 +#define DRM_APU_NUM_IOCTLS 0x06
#define DRM_IOCTL_APU_STATE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_STATE, struct drm_apu_state) #define DRM_IOCTL_APU_GEM_NEW DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_NEW, struct drm_apu_gem_new) #define DRM_IOCTL_APU_GEM_QUEUE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_QUEUE, struct drm_apu_gem_queue) #define DRM_IOCTL_APU_GEM_DEQUEUE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_DEQUEUE, struct drm_apu_gem_dequeue) +#define DRM_IOCTL_APU_GEM_IOMMU_MAP DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_MAP, struct drm_apu_gem_iommu_map) +#define DRM_IOCTL_APU_GEM_IOMMU_UNMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_UNMAP, struct drm_apu_gem_iommu_map)
#if defined(__cplusplus) }
On 2023-05-17 15:52, Alexandre Bailon wrote:
Some APU devices are behind an IOMMU. For some of these devices, we can't use DMA API because they use static addresses so we have to manually use IOMMU API to correctly map the buffers.
Except you still need to use the DMA for the sake of cache coherency and any other aspects :(
This adds support of IOMMU.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
drivers/gpu/drm/apu/apu_drv.c | 4 + drivers/gpu/drm/apu/apu_gem.c | 174 +++++++++++++++++++++++++++++ drivers/gpu/drm/apu/apu_internal.h | 16 +++ drivers/gpu/drm/apu/apu_sched.c | 28 +++++ include/uapi/drm/apu_drm.h | 12 +- 5 files changed, 233 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/apu/apu_drv.c b/drivers/gpu/drm/apu/apu_drv.c index b6bd340b2bc8..a0dce785a02a 100644 --- a/drivers/gpu/drm/apu/apu_drv.c +++ b/drivers/gpu/drm/apu/apu_drv.c @@ -23,6 +23,10 @@ static const struct drm_ioctl_desc ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue, DRM_RENDER_ALLOW),
- DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
DRM_RENDER_ALLOW),
- DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
};DRM_RENDER_ALLOW),
DEFINE_DRM_GEM_DMA_FOPS(apu_drm_ops); diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c index 0e7b3b27942c..0a91363754c5 100644 --- a/drivers/gpu/drm/apu/apu_gem.c +++ b/drivers/gpu/drm/apu/apu_gem.c @@ -2,6 +2,9 @@ // // Copyright 2020 BayLibre SAS +#include <linux/iommu.h> +#include <linux/iova.h>
- #include <drm/drm_gem_dma_helper.h>
#include <uapi/drm/apu_drm.h> @@ -42,6 +45,7 @@ int ioctl_gem_new(struct drm_device *dev, void *data, */ apu_obj->size = args->size; apu_obj->offset = 0;
- apu_obj->iommu_refcount = 0; mutex_init(&apu_obj->mutex);
ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle); @@ -54,3 +58,173 @@ int ioctl_gem_new(struct drm_device *dev, void *data, return 0; }
+void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj) +{
- int iova_pfn;
- int i;
- if (!obj->iommu_sgt)
return;
- mutex_lock(&obj->mutex);
- obj->iommu_refcount--;
- if (obj->iommu_refcount) {
mutex_unlock(&obj->mutex);
return;
- }
- iova_pfn = PHYS_PFN(obj->iova);
Using mm layer operations on IOVAs looks wrong. In practice I don't think it's ultimately harmful, other than potentially making less efficient use of IOVA space if the CPU page size is larger than the IOMMU page size, but it's still a bad code smell when you're using an IOVA abstraction that is deliberately decoupled from CPU pages.
- for (i = 0; i < obj->iommu_sgt->nents; i++) {
iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
You can unmap a set of IOVA-contiguous mappings as a single range with one call.
- }
- sg_free_table(obj->iommu_sgt);
- kfree(obj->iommu_sgt);
- free_iova(&apu_drm->iovad, PHYS_PFN(obj->iova));
- mutex_unlock(&obj->mutex);
+}
+static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj) +{
- if (obj->funcs)
return obj->funcs->get_sg_table(obj);
- return NULL;
+}
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj) +{
- struct apu_gem_object *apu_obj = to_apu_bo(obj);
- struct scatterlist *sgl;
- phys_addr_t phys;
- int total_buf_space;
- int iova_pfn;
- int iova;
- int ret;
- int i;
- mutex_lock(&apu_obj->mutex);
- apu_obj->iommu_refcount++;
- if (apu_obj->iommu_refcount != 1) {
mutex_unlock(&apu_obj->mutex);
return 0;
- }
- apu_obj->iommu_sgt = apu_get_sg_table(obj);
- if (IS_ERR(apu_obj->iommu_sgt)) {
mutex_unlock(&apu_obj->mutex);
return PTR_ERR(apu_obj->iommu_sgt);
- }
- total_buf_space = obj->size;
- iova_pfn = alloc_iova_fast(&apu_drm->iovad,
total_buf_space >> PAGE_SHIFT,
apu_drm->iova_limit_pfn, true);
If you need things mapped at specific addresses like the commit message claims, the DMA IOVA allocator is a terrible tool for the job. DRM already has its own more flexible abstraction for address space management in the form of drm_mm, so as a DRM driver it would seem a lot more sensible to use one of those.
And even if you could justify using this allocator, I can't imagine there's any way you'd need the _fast version (further illustrated by the fact that you're freeing the IOVAs wrongly for that).
- apu_obj->iova = PFN_PHYS(iova_pfn);
- if (!iova_pfn) {
dev_err(apu_drm->dev, "Failed to allocate iova address\n");
mutex_unlock(&apu_obj->mutex);
return -ENOMEM;
- }
- iova = apu_obj->iova;
- sgl = apu_obj->iommu_sgt->sgl;
- for (i = 0; i < apu_obj->iommu_sgt->nents; i++) {
phys = page_to_phys(sg_page(&sgl[i]));
ret =
iommu_map(apu_drm->domain, PFN_PHYS(iova_pfn), phys,
PAGE_ALIGN(sgl[i].length), IOMMU_READ | IOMMU_WRITE,
GFP_KERNEL);
if (ret) {
dev_err(apu_drm->dev, "Failed to iommu map\n");
free_iova(&apu_drm->iovad, iova_pfn);
mutex_unlock(&apu_obj->mutex);
return ret;
}
iova += sgl[i].offset + sgl[i].length;
iova_pfn += PHYS_PFN(PAGE_ALIGN(sgl[i].length));
This looks a lot like it should just be iommu_map_sg(). Also it makes me suspicious of the relationship between obj->size and the sgtable - if the size is already pre-calculated to include any required padding then why can't the caller just provide aligned SG segments in the first place? Conversely if it's the original un-padded size, then any padding you *do* add at this point means you're going to overrun the allocated IOVA space.
- }
- mutex_unlock(&apu_obj->mutex);
- return 0;
+}
+int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
struct drm_file *file_priv)
+{
- struct apu_drm *apu_drm = dev->dev_private;
- struct drm_apu_gem_iommu_map *args = data;
- struct drm_gem_object **bos;
- void __user *bo_handles;
- u64 *das;
- int ret;
- int i;
- if (!apu_drm->domain)
return -ENODEV;
- das = kvmalloc_array(args->bo_handle_count, sizeof(*das), GFP_KERNEL);
Does anything prevent userspace passing random numbers and being able to cause arbitrarily large allocations of unaccounted kernel memory here?
- if (!das)
return -ENOMEM;
- bo_handles = (void __user *)(uintptr_t) args->bo_handles;
- ret = drm_gem_objects_lookup(file_priv, bo_handles,
args->bo_handle_count, &bos);
- if (ret) {
kvfree(das);
return ret;
- }
- for (i = 0; i < args->bo_handle_count; i++) {
ret = apu_bo_iommu_map(apu_drm, bos[i]);
if (ret) {
/* TODO: handle error */
Yes, that would be a good thing to do.
break;
}
das[i] = to_apu_bo(bos[i])->iova + to_apu_bo(bos[i])->offset;
- }
- if (copy_to_user((void *)args->bo_device_addresses, das,
args->bo_handle_count * sizeof(u64))) {
ret = -EFAULT;
DRM_DEBUG("Failed to copy device addresses\n");
goto out;
- }
+out:
- kvfree(das);
- kvfree(bos);
- return 0;
+}
+int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
struct drm_file *file_priv)
+{
- struct apu_drm *apu_drm = dev->dev_private;
- struct drm_apu_gem_iommu_map *args = data;
- struct drm_gem_object **bos;
- void __user *bo_handles;
- int ret;
- int i;
- if (!apu_drm->domain)
return -ENODEV;
- bo_handles = (void __user *)(uintptr_t) args->bo_handles;
- ret = drm_gem_objects_lookup(file_priv, bo_handles,
args->bo_handle_count, &bos);
- if (ret)
return ret;
- for (i = 0; i < args->bo_handle_count; i++)
apu_bo_iommu_unmap(apu_drm, to_apu_bo(bos[i]));
- kvfree(bos);
- return 0;
+} diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h index 021a3efdedf2..ea4183f3fb15 100644 --- a/drivers/gpu/drm/apu/apu_internal.h +++ b/drivers/gpu/drm/apu/apu_internal.h @@ -2,6 +2,9 @@ #ifndef __APU_INTERNAL_H__ #define __APU_INTERNAL_H__ +#include <linux/iommu.h> +#include <linux/iova.h>
- #include <drm/drm_drv.h> #include <drm/drm_gem_dma_helper.h> #include <drm/gpu_scheduler.h>
@@ -9,7 +12,10 @@ struct apu_gem_object { struct drm_gem_dma_object base; struct mutex mutex;
- struct sg_table *iommu_sgt;
- int iommu_refcount; size_t size;
- u32 iova;
Really? "Common infrastructure that could be re-used to support many accelerators", in 2023, that still assumes 32-bit addressing?
u32 offset; }; @@ -35,6 +41,10 @@ struct apu_drm { struct drm_device base; struct device *dev;
- struct iommu_domain *domain;
Oh, nothing ever allocates this domain or attaches to it, so this is all dead code :(
- struct iova_domain iovad;
- int iova_limit_pfn;
(and nothing initialises these either)
- struct list_head cores; struct list_head node;
@@ -165,12 +175,18 @@ struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj); struct drm_gem_object *apu_gem_create_object(struct drm_device *dev, size_t size); +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj); +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj); int ioctl_gem_new(struct drm_device *dev, void *data, struct drm_file *file_priv); int ioctl_gem_user_new(struct drm_device *dev, void *data, struct drm_file *file_priv); struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem, int flags); +int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
struct drm_file *file_priv);
+int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
int ioctl_gem_queue(struct drm_device *dev, void *data, struct drm_file *file_priv); int ioctl_gem_dequeue(struct drm_device *dev, void *data,struct drm_file *file_priv);
diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c index 13b6fbd00bd8..716d4b7f2d55 100644 --- a/drivers/gpu/drm/apu/apu_sched.c +++ b/drivers/gpu/drm/apu/apu_sched.c @@ -117,6 +117,8 @@ static void apu_job_cleanup(struct kref *ref) struct apu_gem_object *apu_obj; apu_obj = to_apu_bo(job->bos[i]);
if (job->apu->domain)
}apu_bo_iommu_unmap(job->apu, apu_obj); drm_gem_object_put(job->bos[i]);
@@ -397,6 +399,7 @@ static int apu_lookup_bos(struct drm_device *dev, struct drm_file *file_priv, struct drm_apu_gem_queue *args, struct apu_job *job) { void __user *bo_handles;
- unsigned int i; int ret;
job->bo_count = args->bo_handle_count; @@ -413,6 +416,31 @@ static int apu_lookup_bos(struct drm_device *dev, struct drm_file *file_priv, bo_handles = (void __user *)(uintptr_t) args->bo_handles; ret = drm_gem_objects_lookup(file_priv, bo_handles, job->bo_count, &job->bos);
- if (ret)
return ret;
- if (!job->apu->domain)
return 0;
- for (i = 0; i < job->bo_count; i++) {
ret = apu_bo_iommu_map(job->apu, job->bos[i]);
if (ret)
goto err_iommu_map;
- }
- return ret;
+err_iommu_map:
- kvfree(job->implicit_fences);
- for (i = 0; i < job->bo_count; i++) {
struct apu_gem_object *apu_obj;
apu_obj = to_apu_bo(job->bos[i]);
if (job->apu->domain)
If the domain *did* ever exist, but could suddenly disappear at any point after you've decided to go ahead and start mapping things into it, then there is a heck of a lot of sychronisation missing from this whole infrastructure.
Thanks, Robin.
apu_bo_iommu_unmap(job->apu, apu_obj);
drm_gem_object_put(job->bos[i]);
- }
- kvfree(job->bos);
return ret; } diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h index c47000097040..0ecc739d8aed 100644 --- a/include/uapi/drm/apu_drm.h +++ b/include/uapi/drm/apu_drm.h @@ -41,6 +41,12 @@ struct drm_apu_gem_dequeue { __u64 data; }; +struct drm_apu_gem_iommu_map {
- __u64 bo_handles;
- __u32 bo_handle_count;
- __u64 bo_device_addresses;
+};
- struct apu_job_event { struct drm_event base; __u32 out_sync;
@@ -57,12 +63,16 @@ struct drm_apu_state { #define DRM_APU_GEM_NEW 0x01 #define DRM_APU_GEM_QUEUE 0x02 #define DRM_APU_GEM_DEQUEUE 0x03 -#define DRM_APU_NUM_IOCTLS 0x04 +#define DRM_APU_GEM_IOMMU_MAP 0x04 +#define DRM_APU_GEM_IOMMU_UNMAP 0x05 +#define DRM_APU_NUM_IOCTLS 0x06 #define DRM_IOCTL_APU_STATE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_STATE, struct drm_apu_state) #define DRM_IOCTL_APU_GEM_NEW DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_NEW, struct drm_apu_gem_new) #define DRM_IOCTL_APU_GEM_QUEUE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_QUEUE, struct drm_apu_gem_queue) #define DRM_IOCTL_APU_GEM_DEQUEUE DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_DEQUEUE, struct drm_apu_gem_dequeue) +#define DRM_IOCTL_APU_GEM_IOMMU_MAP DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_MAP, struct drm_apu_gem_iommu_map) +#define DRM_IOCTL_APU_GEM_IOMMU_UNMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_UNMAP, struct drm_apu_gem_iommu_map) #if defined(__cplusplus) }
From: Julien Stephan jstephan@baylibre.com
By default we will call drm_gem_mmap() unless the apu driver has declared it's own mmap handler.
Signed-off-by: Julien Stephan jstephan@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com --- drivers/gpu/drm/apu/apu_drv.c | 38 +++++++++++++++++++++++++++++- drivers/gpu/drm/apu/apu_internal.h | 2 ++ 2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/apu/apu_drv.c b/drivers/gpu/drm/apu/apu_drv.c index a0dce785a02a..703d4515f075 100644 --- a/drivers/gpu/drm/apu/apu_drv.c +++ b/drivers/gpu/drm/apu/apu_drv.c @@ -29,7 +29,20 @@ static const struct drm_ioctl_desc ioctls[] = { DRM_RENDER_ALLOW), };
-DEFINE_DRM_GEM_DMA_FOPS(apu_drm_ops); +static int apu_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma); + +static const struct file_operations apu_drm_ops = { + .owner = THIS_MODULE, + .open = drm_open, + .release = drm_release, + .unlocked_ioctl = drm_ioctl, + .compat_ioctl = drm_compat_ioctl, + .poll = drm_poll, + .read = drm_read, + .llseek = noop_llseek, + .mmap = apu_drm_gem_mmap, + DRM_GEM_DMA_UNMAPPED_AREA_FOPS +};
static struct drm_driver apu_drm_driver = { .driver_features = DRIVER_GEM | DRIVER_SYNCOBJ, @@ -45,6 +58,29 @@ static struct drm_driver apu_drm_driver = { DRM_GEM_DMA_DRIVER_OPS_WITH_DUMB_CREATE(drm_gem_dma_dumb_create), };
+/** + * apu_drm_gem_mmap() + * + * @filp: DRM file pointer + * @vma: VMA for the area to be mapped + * + * by default will call drm_gem_mmap() unless the apu driver has declared it's + * own mmap handler + * + */ +static int apu_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct drm_file *priv = filp->private_data; + struct drm_device *dev = priv->minor->dev; + struct apu_drm *apu = dev->dev_private; + + if (apu->mmap) + return apu->mmap(filp, vma); + else + return drm_gem_mmap(filp, vma); +} + + /** * apu_dev_alloc() - Allocate a new APU device * diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h index ea4183f3fb15..46e0b2be7821 100644 --- a/drivers/gpu/drm/apu/apu_internal.h +++ b/drivers/gpu/drm/apu/apu_internal.h @@ -45,6 +45,8 @@ struct apu_drm { struct iova_domain iovad; int iova_limit_pfn;
+ int (*mmap)(struct file *filp, struct vm_area_struct *vma); + struct list_head cores; struct list_head node;
On 17/05/2023 16:52, Alexandre Bailon wrote:
From: Julien Stephan jstephan@baylibre.com
By default we will call drm_gem_mmap() unless the apu driver has declared it's own mmap handler.
Signed-off-by: Julien Stephan jstephan@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
One does not have to review own code. We all assume that we send good code which we do not have to review by ourselves (by the author). We also assume we make mistakes, which we cannot find, thus other person's review is important.
Adding own review tag suggests you added them mechanically, so I doubt that they really happened.
Anyway, your SoB is missing.
Best regards, Krzysztof
On 5/17/23 21:45, Krzysztof Kozlowski wrote:
On 17/05/2023 16:52, Alexandre Bailon wrote:
From: Julien Stephan jstephan@baylibre.com
By default we will call drm_gem_mmap() unless the apu driver has declared it's own mmap handler.
Signed-off-by: Julien Stephan jstephan@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
One does not have to review own code. We all assume that we send good code which we do not have to review by ourselves (by the author). We also assume we make mistakes, which we cannot find, thus other person's review is important.
I am sorry, I am the one who made the misstake. I squashed this patch with another one I made, lost my signedof and left the reviewed by which indeed doesn't make any sense.
Best Regards, Alexandre
Adding own review tag suggests you added them mechanically, so I doubt that they really happened.
Anyway, your SoB is missing.
Best regards, Krzysztof
From: Julien Stephan jstephan@baylibre.com
This implements a driver to use with a simulation APU.
This is useful for testing purpose and can be used as a basis to implement real platform driver. Communication between the simulated APU and the driver is done using netlink socket.
Signed-off-by: Julien Stephan jstephan@baylibre.com --- drivers/gpu/drm/apu/Kconfig | 9 + drivers/gpu/drm/apu/Makefile | 3 + drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++++++++++++++++++++++++++ 3 files changed, 325 insertions(+) create mode 100644 drivers/gpu/drm/apu/simu_apu.c
diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig index a769df42091c..e0ffc166497c 100644 --- a/drivers/gpu/drm/apu/Kconfig +++ b/drivers/gpu/drm/apu/Kconfig @@ -11,3 +11,12 @@ config DRM_APU communicate with an AI Processor Unit (APU). The driver intends to provide a common infrastructure that may be used to support many different APU. + +config DRM_SIMU_APU + tristate "SIMULATION APU DRM driver" + depends on DRM_APU + default n + help + This provides a driver using netlink socket to communicate + with a simu APU. + This is useful for simulation and testing of libAPU stack. diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile index fc8d6380fc38..0b007854a07f 100644 --- a/drivers/gpu/drm/apu/Makefile +++ b/drivers/gpu/drm/apu/Makefile @@ -4,4 +4,7 @@ drm_apu-y += apu_drv.o drm_apu-y += apu_gem.o drm_apu-y += apu_sched.o
+drm_simu_apu-y += simu_apu.o + obj-$(CONFIG_DRM_APU) += drm_apu.o +obj-$(CONFIG_DRM_SIMU_APU) += drm_simu_apu.o diff --git a/drivers/gpu/drm/apu/simu_apu.c b/drivers/gpu/drm/apu/simu_apu.c new file mode 100644 index 000000000000..5557f8b78a83 --- /dev/null +++ b/drivers/gpu/drm/apu/simu_apu.c @@ -0,0 +1,313 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Copyright 2023 BayLibre SAS + +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/platform_device.h> +#include <linux/skbuff.h> + +#include <net/sock.h> + +#include <drm/apu_drm.h> + +#include "apu_internal.h" + + +#define MYPROTO 17 +#define MYGRP 17 + +#define DRIVER_NAME "SIMU APU DRIVER" + +/* + * Firmware request, must be aligned with the one defined in firmware. + * @id: Request id, used in the case of reply, to find the pending request + * @cmd: The command id to execute in the firmware + * @result: The result of the command executed on the firmware + * @size: The size of the data available in this request + * @count: The number of shared buffer + * @data: Contains the data attached with the request if size is greater than + * zero, and the addresses of shared buffers if count is greater than + * zero. Both the data and the shared buffer could be read and write + * by the APU. + */ +struct apu_dev_request { + u16 id; + u16 cmd; + u16 result; + u16 size_in; + u16 size_out; + u16 count; + u8 data[0]; +} __packed; + +struct platform_device *platform; +struct apu_core *apu_core; +static int pid = -1; +struct sock *nl_sock; + +static int apu_netlink_read(struct sk_buff *skb, struct apu_dev_request **msg_ptr, int *pid) +{ + struct nlmsghdr *nlh; + + nlh = (struct nlmsghdr *)skb->data; + *pid = nlh->nlmsg_pid; /* pid of sending process */ + *msg_ptr = nlmsg_data(nlh); + + return nlh->nlmsg_len - NLMSG_HDRLEN; +} + +static int apu_netlink_write(void *msg_ptr, int msg_size, int pid) +{ + struct sk_buff *skb_out; + struct nlmsghdr *nlh; + int res; + + skb_out = nlmsg_new(msg_size, 0); + if (!skb_out) + return -ENOMEM; + + nlh = nlmsg_put(skb_out, 0, 0, NLMSG_DONE, msg_size, 0); + NETLINK_CB(skb_out).dst_group = 0; /* not in multicast group */ + memcpy(nlmsg_data(nlh), msg_ptr, msg_size); + + res = nlmsg_unicast(nl_sock, skb_out, pid); + + if (res < 0) + return res; + else + return nlh->nlmsg_len - NLMSG_HDRLEN; +} + +static void netlink_recv_msg(struct sk_buff *skb) +{ + int msg_size; + struct apu_dev_request *hdr; + int nlmsg_pid; + + msg_size = apu_netlink_read(skb, &hdr, &nlmsg_pid); + + if (pid == -1) { + // No device registered yet, the first message should be + // "READY" + if (!strncmp((char *)hdr, "READY", strlen("READY"))) { + + pid = nlmsg_pid; + if (apu_core_register(&platform->dev, apu_core, apu_core->apu)) + pr_err("cannot register SIMU APU\n"); + } + } else if (pid == nlmsg_pid) { + if (!strncmp((char *)hdr, "STOP", strlen("STOP"))) { + pid = -1; + apu_core_remove(apu_core); + } else + apu_drm_callback(apu_core, hdr->id, hdr, msg_size); + } else { + pr_err("%s: Only one core is supported for now\n", DRIVER_NAME); + } +} + +static int netlink_setup(void) +{ + int ret = 0; + struct netlink_kernel_cfg cfg = { + .input = netlink_recv_msg, + }; + + nl_sock = netlink_kernel_create(&init_net, MYPROTO, &cfg); + if (!nl_sock) + ret = -ENOMEM; + + return ret; +} + +static int simu_apu_send(struct apu_job *job) +{ + return apu_netlink_write((void *)(job->request_data), job->request_len, pid); +} + +static int simu_apu_handle_request(struct apu_job *job, void *data, int len) +{ + struct apu_dev_request *hdr = data; + + job->result = hdr->result; + if (job->size_out) + memcpy(job->data_out, hdr->data + job->size_in, + min(job->size_out, hdr->size_out)); + job->size_out = hdr->size_out; + return 0; +} + +static int simu_apu_alloc_request(struct apu_job *job) +{ + struct apu_dev_request *dev_req; + + int size; + u64 *dev_req_da; + u32 *dev_req_buffer_size; + int i; + + size = sizeof(*dev_req) + (sizeof(u64) + sizeof(u32)) * job->bo_count * 2 + + job->size_in + job->size_out; + dev_req = kmalloc(size, GFP_KERNEL); + if (!dev_req) + return -ENOMEM; + + dev_req->cmd = job->cmd; + dev_req->size_in = job->size_in; + dev_req->size_out = job->size_out; + dev_req->count = job->bo_count; + dev_req_da = + (u64 *) (dev_req->data + dev_req->size_in + dev_req->size_out); + dev_req_buffer_size = (u32 *) (dev_req_da + dev_req->count); + memcpy(dev_req->data, job->data_in, job->size_in); + + for (i = 0; i < job->bo_count; i++) { + struct apu_gem_object *obj = to_apu_bo(job->bos[i]); + + dev_req_da[i] = drm_vma_node_offset_addr(&obj->base.base.vma_node); + dev_req_buffer_size[i] = obj->size; + } + + dev_req->id = job->id; + + job->request_data = dev_req; + job->request_len = size; + return 0; +} + +static int simu_apu_ready(struct apu_core *core) +{ + if (pid == -1) + return 0; + + return 1; +} + +/** + * simu_apu_gem_mmap + * + * this is directly based on drm_gem_mmap() function but removing the permission + * check before mapping a buffer. This is useful here to be able to easily + * share buffers between libapu host application and libapu device application + * (simulation use case) + * + */ +static int simu_apu_gem_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct drm_file *priv = filp->private_data; + struct drm_device *dev = priv->minor->dev; + struct drm_gem_object *obj = NULL; + struct drm_vma_offset_node *node; + int ret; + + if (drm_dev_is_unplugged(dev)) + return -ENODEV; + + drm_vma_offset_lock_lookup(dev->vma_offset_manager); + node = drm_vma_offset_exact_lookup_locked(dev->vma_offset_manager, + vma->vm_pgoff, + vma_pages(vma)); + if (likely(node)) { + obj = container_of(node, struct drm_gem_object, vma_node); + /* + * When the object is being freed, after it hits 0-refcnt it + * proceeds to tear down the object. In the process it will + * attempt to remove the VMA offset and so acquire this + * mgr->vm_lock. Therefore if we find an object with a 0-refcnt + * that matches our range, we know it is in the process of being + * destroyed and will be freed as soon as we release the lock - + * so we have to check for the 0-refcnted object and treat it as + * invalid. + */ + if (!kref_get_unless_zero(&obj->refcount)) { + obj = NULL; + pr_err("DTC: %s: %d\n", __func__, __LINE__); + } + } + drm_vma_offset_unlock_lookup(dev->vma_offset_manager); + + if (!obj) + return -EINVAL; + + ret = drm_gem_mmap_obj(obj, drm_vma_node_size(node) << PAGE_SHIFT, + vma); + + drm_gem_object_put(obj); + + return ret; +} + +static struct apu_core_ops simu_apu_ops = { + .alloc_prepare_request = simu_apu_alloc_request, + .send_request = simu_apu_send, + .handle_request = simu_apu_handle_request, + .is_ready = simu_apu_ready, +}; + +static int __init apu_platform_init(void) +{ + int ret; + struct apu_drm *apu; + + platform = platform_device_register_simple(DRIVER_NAME, -1, NULL, 0); + if (IS_ERR(platform)) + return PTR_ERR(platform); + + if (!devres_open_group(&platform->dev, NULL, GFP_KERNEL)) { + ret = -ENOMEM; + goto out_unregister; + } + + apu = apu_dev_alloc(&platform->dev); + if (!apu) { + ret = -ENOMEM; + goto out_devres; + } + + apu_core = apu_core_alloc(apu, &simu_apu_ops, apu); + if (!apu_core) { + ret = -ENOMEM; + goto out_devres; + } + + ret = apu_dev_register(apu); + if (ret) + goto out_apu_core_free; + + apu->mmap = simu_apu_gem_mmap; + + ret = netlink_setup(); + if (ret) + goto out_apu_dev_unregister; + + return 0; + +out_apu_dev_unregister: + apu_dev_unregister(apu); +out_apu_core_free: + apu_core_free(apu_core); +out_devres: + devres_release_group(&platform->dev, NULL); +out_unregister: + platform_device_unregister(platform); + return ret; +} + +static void __exit apu_platform_exit(void) +{ + netlink_kernel_release(nl_sock); + apu_core_remove(apu_core); + apu_core_free(apu_core); + apu_dev_unregister((struct apu_drm *)apu_core->apu); + devres_release_group(&platform->dev, NULL); + platform_device_unregister(platform); +} + + +module_init(apu_platform_init); +module_exit(apu_platform_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Julien Stephan"); +MODULE_DESCRIPTION(DRIVER_NAME);
This adds the device tree bindings for the APU DRM driver.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com --- .../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml new file mode 100644 index 000000000000..6f432d3ea478 --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml @@ -0,0 +1,38 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: AI Processor Unit DRM + +properties: + compatible: + const: mediatek,apu-drm + + remoteproc: + maxItems: 2 + description: + Handle to remoteproc devices controlling the APU + + iova: + maxItems: 1 + description: + Address and size of virtual memory that could used by the APU + +required: + - compatible + - remoteproc + - iova + +additionalProperties: false + +examples: + - | + apu@0 { + compatible = "mediatek,apu-drm"; + remoteproc = <&vpu0>, <&vpu1>; + iova = <0 0x60000000 0 0x10000000>; + }; + +...
Il 17/05/23 16:52, Alexandre Bailon ha scritto:
This adds the device tree bindings for the APU DRM driver.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++
mediatek,mt(model)-apu.yaml
1 file changed, 38 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml new file mode 100644 index 000000000000..6f432d3ea478 --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml @@ -0,0 +1,38 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml#
+title: AI Processor Unit DRM
+properties:
- compatible:
- const: mediatek,apu-drm
const: mediatek,mt8195-apu (or whatever else).
...besides, I don't think that this patch even belongs to this series? :-) Spoiler alert! :-)
Cheers, Angelo
On Wed, May 17, 2023 at 05:04:00PM +0200, AngeloGioacchino Del Regno wrote:
Il 17/05/23 16:52, Alexandre Bailon ha scritto:
This adds the device tree bindings for the APU DRM driver.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++
mediatek,mt(model)-apu.yaml
1 file changed, 38 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml new file mode 100644 index 000000000000..6f432d3ea478 --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml @@ -0,0 +1,38 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml#
+title: AI Processor Unit DRM
+properties:
- compatible:
- const: mediatek,apu-drm
const: mediatek,mt8195-apu (or whatever else).
Aye, and drop the references to DRM in the title field too (and add the vendor name?).
...besides, I don't think that this patch even belongs to this series? :-) Spoiler alert! :-)
Well, I do not know what this means - but if it is being respun as part of some other work, a description field should be added to the binding.
Cheers, Conor.
On 5/17/23 17:04, AngeloGioacchino Del Regno wrote:
Il 17/05/23 16:52, Alexandre Bailon ha scritto:
This adds the device tree bindings for the APU DRM driver.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++
mediatek,mt(model)-apu.yaml
1 file changed, 38 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml new file mode 100644 index 000000000000..6f432d3ea478 --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml @@ -0,0 +1,38 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml#
+title: AI Processor Unit DRM
+properties: + compatible: + const: mediatek,apu-drm
const: mediatek,mt8195-apu (or whatever else).
...besides, I don't think that this patch even belongs to this series? :-) Spoiler alert! :-)
Actually, it does! I forgot to send the patch that adds the platform driver ^^'
Thanks, Alexandre
Cheers, Angelo
On Wed, 17 May 2023 16:52:37 +0200, Alexandre Bailon wrote:
This adds the device tree bindings for the APU DRM driver.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check' on your patch (DT_CHECKER_FLAGS is new in v5.13):
yamllint warnings/errors:
dtschema/dtc warnings/errors: /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: 'maintainers' is a required property hint: Metaschema for devicetree binding documentation from schema $id: http://devicetree.org/meta-schemas/base.yaml# ./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: $id: relative path/filename doesn't match actual path or filename expected: http://devicetree.org/schemas/gpu/mtk,apu-drm.yaml# Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dts:18.15-22.11: Warning (unit_address_vs_reg): /example-0/apu@0: node has a unit name, but no reg or ranges property /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dtb: apu@0: remoteproc: [[4294967295, 4294967295]] is too short From schema: /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
doc reference errors (make refcheckdocs):
See https://patchwork.ozlabs.org/project/devicetree-bindings/patch/2023051714523...
The base for the series is generally the latest rc1. A different dependency should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above error(s), then make sure 'yamllint' is installed and dt-schema is up to date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note that DT_SCHEMA_FILES can be set to your schema file to speed up checking your schema. However, it must be unset to test all examples with your schema.
On Wed, 17 May 2023 16:52:37 +0200, Alexandre Bailon wrote:
This adds the device tree bindings for the APU DRM driver.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check' on your patch (DT_CHECKER_FLAGS is new in v5.13):
yamllint warnings/errors:
dtschema/dtc warnings/errors: /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: 'maintainers' is a required property hint: Metaschema for devicetree binding documentation from schema $id: http://devicetree.org/meta-schemas/base.yaml# ./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: $id: relative path/filename doesn't match actual path or filename expected: http://devicetree.org/schemas/gpu/mtk,apu-drm.yaml# Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dts:18.15-22.11: Warning (unit_address_vs_reg): /example-0/apu@0: node has a unit name, but no reg or ranges property /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dtb: apu@0: remoteproc: [[4294967295, 4294967295]] is too short From schema: /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
doc reference errors (make refcheckdocs):
See https://patchwork.ozlabs.org/patch/1782720
This check can fail if there are any dependencies. The base for a patch series is generally the most recent rc1.
If you already ran 'make dt_binding_check' and didn't see the above error(s), then make sure 'yamllint' is installed and dt-schema is up to date:
pip3 install dtschema --upgrade
Please check and re-submit.
On 17/05/2023 16:52, Alexandre Bailon wrote:
This adds the device tree bindings for the APU DRM driver.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
There are so many errors in this patch... that for sure it was not tested. Reduced review, except what was already said:
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml new file mode 100644 index 000000000000..6f432d3ea478 --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml @@ -0,0 +1,38 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml#
+title: AI Processor Unit DRM
+properties:
- compatible:
- const: mediatek,apu-drm
drm is not hardware. Drop everywhere or explain the acronym. If you explain it like Linux explains, then: drm is not hardware.
- remoteproc:
- maxItems: 2
- description:
Handle to remoteproc devices controlling the APU
Missing type/ref. Does not look like generic property, so missing vendor prefix.
- iova:
- maxItems: 1
- description:
Address and size of virtual memory that could used by the APU
So it is a reg?
+required:
- compatible
- remoteproc
- iova
+additionalProperties: false
+examples:
- |
- apu@0 {
Where is reg? @0 says you have it...
compatible = "mediatek,apu-drm";
remoteproc = <&vpu0>, <&vpu1>;
iova = <0 0x60000000 0 0x10000000>;
Why would you store virtual address, not real, in DT? Let's say you have some randomization like KASLR. How is it going to work? Drop, it is not hardware property.
Best regards, Krzysztof
On 17/05/2023 21:38, Krzysztof Kozlowski wrote:
On 17/05/2023 16:52, Alexandre Bailon wrote:
This adds the device tree bindings for the APU DRM driver.
Signed-off-by: Alexandre Bailon abailon@baylibre.com Reviewed-by: Julien Stephan jstephan@baylibre.com
There are so many errors in this patch... that for sure it was not tested. Reduced review, except what was already said:
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml new file mode 100644 index 000000000000..6f432d3ea478 --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml @@ -0,0 +1,38 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml#
+title: AI Processor Unit DRM
+properties:
- compatible:
- const: mediatek,apu-drm
drm is not hardware. Drop everywhere or explain the acronym. If you explain it like Linux explains, then: drm is not hardware.
- remoteproc:
- maxItems: 2
- description:
Handle to remoteproc devices controlling the APU
Missing type/ref. Does not look like generic property, so missing vendor prefix.
- iova:
- maxItems: 1
- description:
Address and size of virtual memory that could used by the APU
So it is a reg?
+required:
- compatible
- remoteproc
- iova
+additionalProperties: false
+examples:
- |
- apu@0 {
Where is reg? @0 says you have it...
compatible = "mediatek,apu-drm";
remoteproc = <&vpu0>, <&vpu1>;
iova = <0 0x60000000 0 0x10000000>;
Why would you store virtual address, not real, in DT? Let's say you have some randomization like KASLR. How is it going to work? Drop, it is not hardware property.
Actually RANDOMIZE_BASE. KASLR randomizes the physical.
Best regards, Krzysztof
Hi,
it looks like this driver belongs into driver/accel.
Best regards Thomas
Am 17.05.23 um 16:52 schrieb Alexandre Bailon:
This adds a DRM driver that implements communication between the CPU and an APU. The driver target embedded device that usually run inference using some prebuilt models. The goal is to provide common infrastructure that could be re-used to support many accelerators. Both kernel, userspace and firmware tries to use standard and existing to leverage the development and maintenance effort. The series implements two platform drivers, one for simulation and another one for the mt8183 (compatible with mt8365).
For the people interested by the firmware or userspace library, the sources are available here: https://gitlab.baylibre.com/baylibre/libapu/libapu
The support of APU has to be upstreamed to libdrm. Until this is done, you could find the source here: https://gitlab.baylibre.com/baylibre/libapu/libdrm/-/tree/abailon/main
The driver for mt8183 depends on this series (which is currently blocked): https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=620429
Alexandre Bailon (5): drm: Add support of AI Processor Unit (APU) drm/apu: Add memory allocator drm/apu: Add support of requests drm/apu: Add support of IOMMU dt-bindings: Add bidings for mtk,apu-drm
Julien Stephan (2): drm/apu: allow platform driver to implement their own mmap function drm/apu: Add support for a simulated APU
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 ++ drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/apu/Kconfig | 22 + drivers/gpu/drm/apu/Makefile | 10 + drivers/gpu/drm/apu/apu_drv.c | 282 +++++++++ drivers/gpu/drm/apu/apu_gem.c | 230 +++++++ drivers/gpu/drm/apu/apu_internal.h | 205 ++++++ drivers/gpu/drm/apu/apu_sched.c | 592 ++++++++++++++++++ drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++ include/uapi/drm/apu_drm.h | 81 +++ 11 files changed, 1776 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml create mode 100644 drivers/gpu/drm/apu/Kconfig create mode 100644 drivers/gpu/drm/apu/Makefile create mode 100644 drivers/gpu/drm/apu/apu_drv.c create mode 100644 drivers/gpu/drm/apu/apu_gem.c create mode 100644 drivers/gpu/drm/apu/apu_internal.h create mode 100644 drivers/gpu/drm/apu/apu_sched.c create mode 100644 drivers/gpu/drm/apu/simu_apu.c create mode 100644 include/uapi/drm/apu_drm.h
On 5/17/2023 8:52 AM, Alexandre Bailon wrote:
This adds a DRM driver that implements communication between the CPU and an APU. The driver target embedded device that usually run inference using some prebuilt models. The goal is to provide common infrastructure that could be re-used to support many accelerators. Both kernel, userspace and firmware tries to use standard and existing to leverage the development and maintenance effort. The series implements two platform drivers, one for simulation and another one for the mt8183 (compatible with mt8365).
This looks like the 3 existing Accel drivers. Why is this in DRM?
For the people interested by the firmware or userspace library, the sources are available here: https://gitlab.baylibre.com/baylibre/libapu/libapu
I don't see a compiler. What am I missing?
The support of APU has to be upstreamed to libdrm. Until this is done, you could find the source here: https://gitlab.baylibre.com/baylibre/libapu/libdrm/-/tree/abailon/main
The driver for mt8183 depends on this series (which is currently blocked): https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=620429
Alexandre Bailon (5): drm: Add support of AI Processor Unit (APU) drm/apu: Add memory allocator drm/apu: Add support of requests drm/apu: Add support of IOMMU dt-bindings: Add bidings for mtk,apu-drm
Julien Stephan (2): drm/apu: allow platform driver to implement their own mmap function drm/apu: Add support for a simulated APU
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 ++ drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/apu/Kconfig | 22 + drivers/gpu/drm/apu/Makefile | 10 + drivers/gpu/drm/apu/apu_drv.c | 282 +++++++++ drivers/gpu/drm/apu/apu_gem.c | 230 +++++++ drivers/gpu/drm/apu/apu_internal.h | 205 ++++++ drivers/gpu/drm/apu/apu_sched.c | 592 ++++++++++++++++++ drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++ include/uapi/drm/apu_drm.h | 81 +++
"apu" seems too generic. We already have 3 "AI processing units" over in drivers/accel already...
11 files changed, 1776 insertions(+) create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml create mode 100644 drivers/gpu/drm/apu/Kconfig create mode 100644 drivers/gpu/drm/apu/Makefile create mode 100644 drivers/gpu/drm/apu/apu_drv.c create mode 100644 drivers/gpu/drm/apu/apu_gem.c create mode 100644 drivers/gpu/drm/apu/apu_internal.h create mode 100644 drivers/gpu/drm/apu/apu_sched.c create mode 100644 drivers/gpu/drm/apu/simu_apu.c create mode 100644 include/uapi/drm/apu_drm.h
I feel like device/driver based documentation in Documentation/ would really help in reviews.
-Jeff
Jeffrey Hugo quic_jhugo@quicinc.com writes:
On 5/17/2023 8:52 AM, Alexandre Bailon wrote:
This adds a DRM driver that implements communication between the CPU and an APU. The driver target embedded device that usually run inference using some prebuilt models. The goal is to provide common infrastructure that could be re-used to support many accelerators. Both kernel, userspace and firmware tries to use standard and existing to leverage the development and maintenance effort. The series implements two platform drivers, one for simulation and another one for the mt8183 (compatible with mt8365).
This looks like the 3 existing Accel drivers. Why is this in DRM?
Yes, this belongs in accel. I think Alex had some issues around the infra in accel with device nodes not appearing/opening properly, but I'll let him comment there. But either way, the right approach should be to fix any issues in accel and move it there.
[...]
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 ++ drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/apu/Kconfig | 22 + drivers/gpu/drm/apu/Makefile | 10 + drivers/gpu/drm/apu/apu_drv.c | 282 +++++++++ drivers/gpu/drm/apu/apu_gem.c | 230 +++++++ drivers/gpu/drm/apu/apu_internal.h | 205 ++++++ drivers/gpu/drm/apu/apu_sched.c | 592 ++++++++++++++++++ drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++ include/uapi/drm/apu_drm.h | 81 +++
"apu" seems too generic. We already have 3 "AI processing units" over in drivers/accel already...
Indeed, it is generic, but that's kind of the point for this driver since it's targetted at generalizing the interface with "AI processing units" on a growing number of embedded SoCs (ARM, RISC-V, etc.) In addition, the generic naming is intentional because the goal is bigger than the kernel and is working towards a generic, shared "libAPU" userspace[1], but also common firmware for DSP-style inference engines (e.g. analgous Sound Open Firmware for audio DSPs.)
As usual, the various SoC vendors use different names (APU, NPU, NN unit, etc.) but we'd like a generic name for the class of devices targetted by this driver. And unfortunately, it looks like the equally generic "Versatile processing unit" is already taken Intel's drivers/accel/ivpu. :)
Maybe since this is more about generalizing the interface between the CPU running linux and the APU, what about the name apu_if? But I guess that applies to the other 3 drivers in drivers/accell also. Hmmm...
Naming things is hard[2], so we're definitly open to other ideas. Any suggestions?
Kevin
[1] https://gitlab.baylibre.com/baylibre/libapu/libapu
[2] "There are 2 hard problems in computer science: cache invalidation, naming things and off-by-1 errors." -- https://twitter.com/secretGeek/status/7269997868
On Wed, May 24, 2023 at 2:34 AM Kevin Hilman khilman@baylibre.com wrote:
Jeffrey Hugo quic_jhugo@quicinc.com writes:
On 5/17/2023 8:52 AM, Alexandre Bailon wrote:
This adds a DRM driver that implements communication between the CPU and an APU. The driver target embedded device that usually run inference using some prebuilt models. The goal is to provide common infrastructure that could be re-used to support many accelerators. Both kernel, userspace and firmware tries to use standard and existing to leverage the development and maintenance effort. The series implements two platform drivers, one for simulation and another one for the mt8183 (compatible with mt8365).
This looks like the 3 existing Accel drivers. Why is this in DRM?
Yes, this belongs in accel. I think Alex had some issues around the infra in accel with device nodes not appearing/opening properly, but I'll let him comment there. But either way, the right approach should be to fix any issues in accel and move it there.
[...]
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 ++ drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/apu/Kconfig | 22 + drivers/gpu/drm/apu/Makefile | 10 + drivers/gpu/drm/apu/apu_drv.c | 282 +++++++++ drivers/gpu/drm/apu/apu_gem.c | 230 +++++++ drivers/gpu/drm/apu/apu_internal.h | 205 ++++++ drivers/gpu/drm/apu/apu_sched.c | 592 ++++++++++++++++++ drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++ include/uapi/drm/apu_drm.h | 81 +++
"apu" seems too generic. We already have 3 "AI processing units" over in drivers/accel already...
Indeed, it is generic, but that's kind of the point for this driver since it's targetted at generalizing the interface with "AI processing units" on a growing number of embedded SoCs (ARM, RISC-V, etc.) In addition, the generic naming is intentional because the goal is bigger than the kernel and is working towards a generic, shared "libAPU" userspace[1], but also common firmware for DSP-style inference engines (e.g. analgous Sound Open Firmware for audio DSPs.)
As usual, the various SoC vendors use different names (APU, NPU, NN unit, etc.) but we'd like a generic name for the class of devices targetted by this driver. And unfortunately, it looks like the equally generic "Versatile processing unit" is already taken Intel's drivers/accel/ivpu. :)
Maybe since this is more about generalizing the interface between the CPU running linux and the APU, what about the name apu_if? But I guess that applies to the other 3 drivers in drivers/accell also. Hmmm...
Naming things is hard[2], so we're definitly open to other ideas. Any suggestions?
Maybe model it according to the tiny driver in drm display ? You can then call it tiny_apu :-) Disclosure: It was Daniel's suggestion, he can chime in with more details on the tiny driver concept. Oded
Kevin
[1] https://gitlab.baylibre.com/baylibre/libapu/libapu
[2] "There are 2 hard problems in computer science: cache invalidation, naming things and off-by-1 errors." -- https://twitter.com/secretGeek/status/7269997868
On Wed, May 24, 2023 at 01:27:00PM +0300, Oded Gabbay wrote:
On Wed, May 24, 2023 at 2:34 AM Kevin Hilman khilman@baylibre.com wrote:
Jeffrey Hugo quic_jhugo@quicinc.com writes:
On 5/17/2023 8:52 AM, Alexandre Bailon wrote:
This adds a DRM driver that implements communication between the CPU and an APU. The driver target embedded device that usually run inference using some prebuilt models. The goal is to provide common infrastructure that could be re-used to support many accelerators. Both kernel, userspace and firmware tries to use standard and existing to leverage the development and maintenance effort. The series implements two platform drivers, one for simulation and another one for the mt8183 (compatible with mt8365).
This looks like the 3 existing Accel drivers. Why is this in DRM?
Yes, this belongs in accel. I think Alex had some issues around the infra in accel with device nodes not appearing/opening properly, but I'll let him comment there. But either way, the right approach should be to fix any issues in accel and move it there.
[...]
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 ++ drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/apu/Kconfig | 22 + drivers/gpu/drm/apu/Makefile | 10 + drivers/gpu/drm/apu/apu_drv.c | 282 +++++++++ drivers/gpu/drm/apu/apu_gem.c | 230 +++++++ drivers/gpu/drm/apu/apu_internal.h | 205 ++++++ drivers/gpu/drm/apu/apu_sched.c | 592 ++++++++++++++++++ drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++ include/uapi/drm/apu_drm.h | 81 +++
"apu" seems too generic. We already have 3 "AI processing units" over in drivers/accel already...
Indeed, it is generic, but that's kind of the point for this driver since it's targetted at generalizing the interface with "AI processing units" on a growing number of embedded SoCs (ARM, RISC-V, etc.) In addition, the generic naming is intentional because the goal is bigger than the kernel and is working towards a generic, shared "libAPU" userspace[1], but also common firmware for DSP-style inference engines (e.g. analgous Sound Open Firmware for audio DSPs.)
As usual, the various SoC vendors use different names (APU, NPU, NN unit, etc.) but we'd like a generic name for the class of devices targetted by this driver. And unfortunately, it looks like the equally generic "Versatile processing unit" is already taken Intel's drivers/accel/ivpu. :)
Maybe since this is more about generalizing the interface between the CPU running linux and the APU, what about the name apu_if? But I guess that applies to the other 3 drivers in drivers/accell also. Hmmm...
Naming things is hard[2], so we're definitly open to other ideas. Any suggestions?
Maybe model it according to the tiny driver in drm display ? You can then call it tiny_apu :-) Disclosure: It was Daniel's suggestion, he can chime in with more details on the tiny driver concept.
Yeah so maybe a bit more detail on my thoughts:
First this smells like a need bypass of the entire "we want open userspace for accel drivers" rule. The rule isn't quite a strict as for drm gpu drivers (not sure we ended up documenting exactly what, but iirc the consensus was that for build-time only dependencies we're ok with downstream compilers), but it's still there.
And at least from a quick look apu.ko and libapu just look like a generic accel interface, and that's not enough.
For the big training engines it's more or less "enough to run pytorch, but it can be really slow", not sure what the right standard for these inference-only drivers should be.
So that's the first reason why I don't like this.
The other is that I think if we do end up with a pile of tiny accel drivers, we should probably look into something like simmpledrm for the tiny display drivers. Probably still IP specific ioctls (at least most) so that IP specific job knows and all that are easy, but then just pass to a framework that simplifies a drm gem driver to "write ptes" and "run job" callback, maybe with an optional "create/destroy vm/ctx" for hw which can do that.
So maybe we end up with a drivers/accel/tiny and a bunch more helpers around the existing gem ones. The rule we have for drm/tiny is "1 file, less than 1kloc", and there's a bunch of them. I do think we can achieve the same for tiny accel inference engines (but it's still a bit a road). Maybe tiny accel is more like "less than 5kloc" since you need a bit more glue for the driver specific ioctl stuff - maybe that's only needed for the submit ioctl, maybe also for buffer map/unmap and creation.
Also note that there's an entire pile of in-flight work for adding new helpers to the gem world to make this all easier. Once we have gpuva and exec helpers there not much glue left to tie it all together with the scheduler.
But the real crux is that an accel inference driver really needs to have enough userspace to do an actual inference job with some android/cros/whatever framework for inference (there's just too many). -Daniel
Oded
Kevin
[1] https://gitlab.baylibre.com/baylibre/libapu/libapu
[2] "There are 2 hard problems in computer science: cache invalidation, naming things and off-by-1 errors." -- https://twitter.com/secretGeek/status/7269997868
On 5/24/23 12:40, Daniel Vetter wrote:
On Wed, May 24, 2023 at 01:27:00PM +0300, Oded Gabbay wrote:
On Wed, May 24, 2023 at 2:34 AM Kevin Hilman khilman@baylibre.com wrote:
Jeffrey Hugo quic_jhugo@quicinc.com writes:
On 5/17/2023 8:52 AM, Alexandre Bailon wrote:
This adds a DRM driver that implements communication between the CPU and an APU. The driver target embedded device that usually run inference using some prebuilt models. The goal is to provide common infrastructure that could be re-used to support many accelerators. Both kernel, userspace and firmware tries to use standard and existing to leverage the development and maintenance effort. The series implements two platform drivers, one for simulation and another one for the mt8183 (compatible with mt8365).
This looks like the 3 existing Accel drivers. Why is this in DRM?
Yes, this belongs in accel. I think Alex had some issues around the infra in accel with device nodes not appearing/opening properly, but I'll let him comment there. But either way, the right approach should be to fix any issues in accel and move it there.
[...]
.../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 ++ drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/apu/Kconfig | 22 + drivers/gpu/drm/apu/Makefile | 10 + drivers/gpu/drm/apu/apu_drv.c | 282 +++++++++ drivers/gpu/drm/apu/apu_gem.c | 230 +++++++ drivers/gpu/drm/apu/apu_internal.h | 205 ++++++ drivers/gpu/drm/apu/apu_sched.c | 592 ++++++++++++++++++ drivers/gpu/drm/apu/simu_apu.c | 313 +++++++++ include/uapi/drm/apu_drm.h | 81 +++
"apu" seems too generic. We already have 3 "AI processing units" over in drivers/accel already...
Indeed, it is generic, but that's kind of the point for this driver since it's targetted at generalizing the interface with "AI processing units" on a growing number of embedded SoCs (ARM, RISC-V, etc.) In addition, the generic naming is intentional because the goal is bigger than the kernel and is working towards a generic, shared "libAPU" userspace[1], but also common firmware for DSP-style inference engines (e.g. analgous Sound Open Firmware for audio DSPs.)
As usual, the various SoC vendors use different names (APU, NPU, NN unit, etc.) but we'd like a generic name for the class of devices targetted by this driver. And unfortunately, it looks like the equally generic "Versatile processing unit" is already taken Intel's drivers/accel/ivpu. :)
Maybe since this is more about generalizing the interface between the CPU running linux and the APU, what about the name apu_if? But I guess that applies to the other 3 drivers in drivers/accell also. Hmmm...
Naming things is hard[2], so we're definitly open to other ideas. Any suggestions?
Maybe model it according to the tiny driver in drm display ? You can then call it tiny_apu :-) Disclosure: It was Daniel's suggestion, he can chime in with more details on the tiny driver concept.
Yeah so maybe a bit more detail on my thoughts:
First this smells like a need bypass of the entire "we want open userspace for accel drivers" rule. The rule isn't quite a strict as for drm gpu drivers (not sure we ended up documenting exactly what, but iirc the consensus was that for build-time only dependencies we're ok with downstream compilers), but it's still there.
What is letting you think that we want to bypass open source requirements ? Although the neural network firmware and userspace application are not yet opensource, our intention is to develop a full open source stack. Currently, we only support Mediatek APU (an Xtensa VP6) and we have to use closed source sotfware to execute inferences on the accelerator. As far I know, there software stack similar to mesa where we could add support of a new accelerator (this is also true for firmware). That is actually what we would like to do. But this will take a lot of time and we consider this driver as a first (small) step.
And at least from a quick look apu.ko and libapu just look like a generic accel interface, and that's not enough.
For the big training engines it's more or less "enough to run pytorch, but it can be really slow", not sure what the right standard for these inference-only drivers should be.
To be honest, I don't know what would be required for training engines. We only target accelerators for embedded device that usually only run inferences. In my opinion, this is 2 different use cases and I don't think we could address them in the same way.
So that's the first reason why I don't like this.
The other is that I think if we do end up with a pile of tiny accel drivers, we should probably look into something like simmpledrm for the tiny display drivers. Probably still IP specific ioctls (at least most) so that IP specific job knows and all that are easy, but then just pass to a framework that simplifies a drm gem driver to "write ptes" and "run job" callback, maybe with an optional "create/destroy vm/ctx" for hw which can do that.
So maybe we end up with a drivers/accel/tiny and a bunch more helpers around the existing gem ones. The rule we have for drm/tiny is "1 file, less than 1kloc", and there's a bunch of them. I do think we can achieve the same for tiny accel inference engines (but it's still a bit a road). Maybe tiny accel is more like "less than 5kloc" since you need a bit more glue for the driver specific ioctl stuff - maybe that's only needed for the submit ioctl, maybe also for buffer map/unmap and creation.
This makes sense to me.
Also note that there's an entire pile of in-flight work for adding new helpers to the gem world to make this all easier. Once we have gpuva and exec helpers there not much glue left to tie it all together with the scheduler.
I wrote this series a long time ago and just rebased it recently. I will take some time to see the in-flight work and see if that something I could start using.
But the real crux is that an accel inference driver really needs to have enough userspace to do an actual inference job with some android/cros/whatever framework for inference (there's just too many).
We are currently stuck with closed source fimrware, userspace applications and toolchains (works with android and linux). We are looking for a solution but implementing something will take some time.
Alexandre
-Daniel
Oded
Kevin
[1] https://gitlab.baylibre.com/baylibre/libapu/libapu
[2] "There are 2 hard problems in computer science: cache invalidation, naming things and off-by-1 errors." -- https://twitter.com/secretGeek/status/7269997868
linaro-mm-sig@lists.linaro.org