+static void nvme_sync_dma(struct nvme_dev *nvme_dev, struct request *req,
enum dma_data_direction dir)+{
- struct blk_mq_dma_map *map = req->dma_map;
- int length = blk_rq_payload_bytes(req);
- bool for_cpu = dir == DMA_FROM_DEVICE;
- struct device *dev = nvme_dev->dev;
- dma_addr_t *dma_list = map->private;
- struct bio *bio = req->bio;
- int offset, map_idx;
- offset = bio->bi_iter.bi_bvec_done;
- map_idx = offset / NVME_CTRL_PAGE_SIZE;
- length += offset & (NVME_CTRL_PAGE_SIZE - 1);
- while (length > 0) {
u64 dma_addr = dma_list[map_idx++];if (for_cpu)__dma_sync_single_for_cpu(dev, dma_addr,NVME_CTRL_PAGE_SIZE, dir);else__dma_sync_single_for_device(dev, dma_addr,NVME_CTRL_PAGE_SIZE, dir);length -= NVME_CTRL_PAGE_SIZE;- }
This looks really inefficient. Usually the ranges in the dmabuf should be much larger than a controller page.
+static void nvme_unmap_premapped_data(struct nvme_dev *dev,
struct request *req)+{
- struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
- if (rq_data_dir(req) == READ)
nvme_sync_dma(dev, req, DMA_FROM_DEVICE);- if (!(iod->flags & IOD_SINGLE_SEGMENT))
nvme_free_descriptors(req);+}
This doesn't really unmap anything :)
Also the dma ownership rules say that you always need to call the sync_to_device helpers before I/O and the sync_to_cpu helpers after I/O, no matters if it is a read or write. The implementations then makes them a no-op where possible.
- offset = bio->bi_iter.bi_bvec_done;
- map_idx = offset / NVME_CTRL_PAGE_SIZE;
- offset &= (NVME_CTRL_PAGE_SIZE - 1);
- prp1_dma = dma_list[map_idx++] + offset;
- length -= (NVME_CTRL_PAGE_SIZE - offset);
- if (length <= 0) {
prp2_dma = 0;
Urgg, why is this building PRPs instead of SGLs? Yes, SGLs are an optional feature, but for devices where you want to micro-optimize like this I think we should simply require them. This should cut down on both the memory use and the amount of special mapping code.
linaro-mm-sig@lists.linaro.org