On Mon, Jan 05, 2026 at 08:44:48AM -0800, Caleb Sander Mateos wrote:
On Sat, Jan 3, 2026 at 6:14 AM Ming Lei ming.lei@redhat.com wrote:
On Fri, Jan 02, 2026 at 05:45:14PM -0700, Caleb Sander Mateos wrote:
Indicate to the ublk server when an incoming request has integrity data by setting UBLK_IO_F_INTEGRITY in the ublksrv_io_desc's op_flags field. If the ublk device doesn't support integrity, the request will never provide integrity data. If the ublk device supports integrity, the request may omit the integrity buffer only if metadata_size matches the PI tuple size determined by csum_type. In this case, the ublk server should internally generate/verify the protection information from the data and sector offset. Set the UBLK_IO_F_CHECK_{GUARD,REFTAG,APPTAG} flags based on the request's BIP_CHECK_{GUARD,REFTAG,APPTAG} flags, indicating whether to verify the guard, reference, and app tags in the protection information. The expected reference tag (32 or 48 bits) and app tag (16 bits) are indicated in ublksrv_io_desc's new struct ublksrv_io_integrity integrity field. This field is unioned with the addr field to avoid changing the
It might be fine to set per-rq app_tag, but bios in one request might have different app_tag in case of io merge actually.
I based this logic largely on the code under if (ns->head->ms) in nvme_setup_rw(). That also assumes a single app_tag for the request. Sounds like an existing bug if bios with different app_tags can be merged together?
Looks it is true.
Also block layer builds ref_tag for each internal, please see
What do you mean by "internal"? "interval"?
t10_pi_generate() and ext_pi_crc64_generate().
Yes, the reftag increases by 1 for each integrity interval. That's why it suffices for an NVMe command reading multiple blocks to specify only the expected reftag for the first block; the reftags for subsequent blocks are incremented accordingly.
Actually, I think we probably don't need to communicate the reftag seed to the ublk server. NVMe doesn't use the reftag seed (which can be overridden by struct uio_meta's seed field). Instead, nvme_set_ref_tag() always uses the offset into the block device divided by the integrity interval size, as required by all the existing csum_type formats the kernel supports. So a ublk server could just use the start_sector field of struct ublksrv_io_desc to compute the expected reftags. And using start_sector as the reftag also means merging requests would preserve their expected reftags.
IMO, this way looks fine from user viewpoint, especially aligning with NVMe.
So looks this way is wrong.
More importantly reusing iod->addr for other purpose not related with IO buffer is very unfriendly for adding new features, and one lesson is for ZONED support by reusing ublksrv_io_cmd->addr for zoned's append lba.
That's a fair point.
One candidate is add per-IO mmaped meta area, which can be flexible to cover more use cases.
For example, there is chance to support dma-buf based zero copy for ublk, and please see the io-uring dma-buf support[1], and iod->addr might carry IO buffer info in dma-buf format in future.
[1] https://lore.kernel.org/io-uring/cover.1763725387.git.asml.silence@gmail.com...
BTW, PI data size is often small, and it belongs to kernel, there could be chance to define PI data as pre-mapped DMA-BUF, then almost all drivers can benefit from avoiding the runtime dma mapping for meta. But that may be one bigger thing.
size of struct ublksrv_io_desc. UBLK_F_INTEGRITY requires UBLK_F_USER_COPY and the addr field isn't used for UBLK_F_USER_COPY, so the two fields aren't needed simultaneously.
Signed-off-by: Caleb Sander Mateos csander@purestorage.com
drivers/block/ublk_drv.c | 43 +++++++++++++++++++++++++++++++---- include/uapi/linux/ublk_cmd.h | 27 ++++++++++++++++++++-- 2 files changed, 64 insertions(+), 6 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 2f9316febf83..51469e0627ff 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -316,10 +316,36 @@ static inline bool ublk_dev_is_zoned(const struct ublk_device *ub) static inline bool ublk_queue_is_zoned(const struct ublk_queue *ubq) { return ubq->flags & UBLK_F_ZONED; }
+static void ublk_setup_iod_buf(const struct ublk_queue *ubq,
const struct request *req,struct ublksrv_io_desc *iod)+{ +#ifdef CONFIG_BLK_DEV_INTEGRITY
if (ubq->flags & UBLK_F_INTEGRITY) {struct bio_integrity_payload *bip;sector_t ref_tag_seed;if (!blk_integrity_rq(req))return;bip = bio_integrity(req->bio);ref_tag_seed = bip_get_seed(bip);As mentioned, t10_pi_generate() and ext_pi_crc64_generate() builds per-internal ref tag.
As mentioned, the reftags for subsequent intervals can be computed by simply incrementing the seed. If the seed is assumed to always be start_sector >> (interval_exp - SECTOR_SHIFT), then it may not be necessary to communicate ref_tag_seed at all.
Fair enough, but this should be documented in UAPI interface.
iod->integrity.ref_tag_lo = ref_tag_seed;iod->integrity.ref_tag_hi = ref_tag_seed >> 32;iod->integrity.app_tag = bip->app_tag;In case of io merge, each bio may have different ->app_tag.
It seems like it would make more sense to prevent merging bios with different app_tags. In the common case where a request contains a single bio, which has a single app_tag, it would be much more efficient to communicate only the 1 app_tag instead of having to pass a separate app_tag for every logical block/integrity interval.
OK.
Given you have to copy meta data via user copy, I suggest to follow the PI standard and make it per-internal.
How are you suggesting the ublk server access bip->app_tag and bip_get_seed(bip) (if overriding the reftag seed is supported)? Would the ublk server need to make another user copy syscall?
Or would you prefer I drop the BIP_CHECK_* flag support from this patch set for now?
I can understand the motivation, and extra syscall should be avoided for communicating reftag & apptag only, given you have explained both can be per-request instead of per-interval.
But iod->addr should be avoided for this purpose, otherwise, new feature can conflict with this usage easily.
But per-io mmapped area can solve this issue, the meta size can be one parameter of `ublksrv_ctrl_dev_info` with feature flag of UBLK_F_MMAPED_IO_META, what do you think of this way?
Thanks, Ming