Much work has recently gone into supporting block device integrity data
(sometimes called "metadata") in Linux. Many NVMe devices these days
support metadata transfers and/or automatic protection information
generation and verification. However, ublk devices can't yet advertise
integrity data capabilities. This patch series wires up support for
integrity data in ublk. The ublk feature is referred to as "integrity"
rather than "metadata" to match the block layer's name for it and to
avoid confusion with the existing and unrelated UBLK_IO_F_META.
To advertise support for integrity data, a ublk server fills out the
struct ublk_params's integrity field and sets UBLK_PARAM_TYPE_INTEGRITY.
The struct ublk_param_integrity flags and csum_type fields use the
existing LBMD_PI_* constants from the linux/fs.h UAPI header. The ublk
driver fills out a corresponding struct blk_integrity.
When a request with integrity data is issued to the ublk device, the
ublk driver sets UBLK_IO_F_INTEGRITY in struct ublksrv_io_desc's
op_flags field. This is necessary for a ublk server for which
bi_offload_capable() returns true to distinguish requests with integrity
data from those without.
Integrity data transfers can currently only be performed via the ublk
user copy mechanism. The overhead of zero-copy buffer registration makes
it less appealing for the small transfers typical of integrity data.
Additionally, neither io_uring NVMe passthru nor IORING_RW_ATTR_FLAG_PI
currently allow an io_uring registered buffer for the integrity data.
The ki_pos field of the struct kiocb passed to the user copy
->{read,write}_iter() callback gains a bit UBLKSRV_IO_INTEGRITY_FLAG for
a ublk server to indicate whether to access the request's data or
integrity data.
Not yet supported is an analogue for the IO_INTEGRITY_CHK_*/BIP_CHECK_*
flags to ask the ublk server to verify the guard, reftag, and/or apptag
of a request's protection information. The user copy mechanism currently
forbids a ublk server from reading the data/integrity buffer of a
read-direction request. We could potentially relax this restriction for
integrity data on reads. Alternatively, the ublk driver could verify the
requested fields as part of the user copy operation.
v3:
- Drop support for communicating BIP_CHECK_* for now until the interface
is decided
- Add Reviewed-by tags
v2:
- Communicate BIP_CHECK_* flags and expected reftag seed and app tag to
ublk server
- Add UBLK_F_INTEGRITY feature flag (Ming)
- Don't change the definition of UBLKSRV_IO_BUF_TOTAL_BITS (Ming)
- Drop patches already applied
- Add Reviewed-by tags
Caleb Sander Mateos (16):
blk-integrity: take const pointer in blk_integrity_rq()
ublk: move ublk flag check functions earlier
ublk: set UBLK_IO_F_INTEGRITY in ublksrv_io_desc
ublk: add ublk_copy_user_bvec() helper
ublk: split out ublk_user_copy() helper
ublk: inline ublk_check_and_get_req() into ublk_user_copy()
ublk: move offset check out of __ublk_check_and_get_req()
ublk: optimize ublk_user_copy() on daemon task
selftests: ublk: display UBLK_F_INTEGRITY support
selftests: ublk: add utility to get block device metadata size
selftests: ublk: add kublk support for integrity params
selftests: ublk: implement integrity user copy in kublk
selftests: ublk: support non-O_DIRECT backing files
selftests: ublk: add integrity data support to loop target
selftests: ublk: add integrity params test
selftests: ublk: add end-to-end integrity test
Stanley Zhang (3):
ublk: support UBLK_PARAM_TYPE_INTEGRITY in device creation
ublk: implement integrity user copy
ublk: support UBLK_F_INTEGRITY
drivers/block/ublk_drv.c | 350 +++++++++++++------
include/linux/blk-integrity.h | 6 +-
include/uapi/linux/ublk_cmd.h | 24 ++
tools/testing/selftests/ublk/Makefile | 6 +-
tools/testing/selftests/ublk/common.c | 4 +-
tools/testing/selftests/ublk/fault_inject.c | 1 +
tools/testing/selftests/ublk/file_backed.c | 61 +++-
tools/testing/selftests/ublk/kublk.c | 90 ++++-
tools/testing/selftests/ublk/kublk.h | 37 +-
tools/testing/selftests/ublk/metadata_size.c | 36 ++
tools/testing/selftests/ublk/null.c | 1 +
tools/testing/selftests/ublk/stripe.c | 6 +-
tools/testing/selftests/ublk/test_common.sh | 10 +
tools/testing/selftests/ublk/test_loop_08.sh | 111 ++++++
tools/testing/selftests/ublk/test_null_04.sh | 166 +++++++++
15 files changed, 777 insertions(+), 132 deletions(-)
create mode 100644 tools/testing/selftests/ublk/metadata_size.c
create mode 100755 tools/testing/selftests/ublk/test_loop_08.sh
create mode 100755 tools/testing/selftests/ublk/test_null_04.sh
--
2.45.2
The primary goal is to add test validation for GSO when operating over
UDP tunnels, a scenario which is not currently covered.
The design strategy is to extend the existing tun/tap testing infrastructure
to support this new use-case, rather than introducing a new or parallel framework.
This allows for better integration and re-use of existing test logic.
---
v3 -> v4:
- Rebase onto the latest net-next tree to resolve merge conflicts.
v3: https://lore.kernel.org/netdev/cover.1767580224.git.xudu@redhat.com/
- Re-send the patch series becasue Patchwork don't update them.
v2: https://lore.kernel.org/netdev/cover.1767074545.git.xudu@redhat.com/
- Addresse sporadic failures due to too early send.
- Refactor environment address assign helper function.
- Fix incorrect argument passing in build packet functions.
v1: https://lore.kernel.org/netdev/cover.1763345426.git.xudu@redhat.com/
Xu Du (8):
selftest: tun: Format tun.c existing code
selftest: tun: Introduce tuntap_helpers.h header for TUN/TAP testing
selftest: tun: Refactor tun_delete to use tuntap_helpers
selftest: tap: Refactor tap test to use tuntap_helpers
selftest: tun: Add helpers for GSO over UDP tunnel
selftest: tun: Add test for sending gso packet into tun
selftest: tun: Add test for receiving gso packet from tun
selftest: tun: Add test data for success and failure paths
tools/testing/selftests/net/tap.c | 281 +-----
tools/testing/selftests/net/tun.c | 919 ++++++++++++++++++-
tools/testing/selftests/net/tuntap_helpers.h | 602 ++++++++++++
3 files changed, 1526 insertions(+), 276 deletions(-)
create mode 100644 tools/testing/selftests/net/tuntap_helpers.h
base-commit: c303e8b86d9dbd6868f5216272973292f7f3b7f1
--
2.49.0
Much work has recently gone into supporting block device integrity data
(sometimes called "metadata") in Linux. Many NVMe devices these days
support metadata transfers and/or automatic protection information
generation and verification. However, ublk devices can't yet advertise
integrity data capabilities. This patch series wires up support for
integrity data in ublk. The ublk feature is referred to as "integrity"
rather than "metadata" to match the block layer's name for it and to
avoid confusion with the existing and unrelated UBLK_IO_F_META.
To advertise support for integrity data, a ublk server fills out the
struct ublk_params's integrity field and sets UBLK_PARAM_TYPE_INTEGRITY.
The struct ublk_param_integrity flags and csum_type fields use the
existing LBMD_PI_* constants from the linux/fs.h UAPI header. The ublk
driver fills out a corresponding struct blk_integrity.
When a request with integrity data is issued to the ublk device, the
ublk driver sets UBLK_IO_F_INTEGRITY in struct ublksrv_io_desc's
op_flags field. This is necessary for a ublk server for which
bi_offload_capable() returns true to distinguish requests with integrity
data from those without.
Integrity data transfers can currently only be performed via the ublk
user copy mechanism. The overhead of zero-copy buffer registration makes
it less appealing for the small transfers typical of integrity data.
Additionally, neither io_uring NVMe passthru nor IORING_RW_ATTR_FLAG_PI
currently allow an io_uring registered buffer for the integrity data.
The ki_pos field of the struct kiocb passed to the user copy
->{read,write}_iter() callback gains a bit UBLKSRV_IO_INTEGRITY_FLAG for
a ublk server to indicate whether to access the request's data or
integrity data.
v2:
- Communicate BIP_CHECK_* flags and expected reftag seed and app tag to
ublk server
- Add UBLK_F_INTEGRITY feature flag (Ming)
- Don't change the definition of UBLKSRV_IO_BUF_TOTAL_BITS (Ming)
- Drop patches already applied
- Add Reviewed-by tags
Caleb Sander Mateos (16):
blk-integrity: take const pointer in blk_integrity_rq()
ublk: move ublk flag check functions earlier
ublk: set request integrity params in ublksrv_io_desc
ublk: add ublk_copy_user_bvec() helper
ublk: split out ublk_user_copy() helper
ublk: inline ublk_check_and_get_req() into ublk_user_copy()
ublk: move offset check out of __ublk_check_and_get_req()
ublk: optimize ublk_user_copy() on daemon task
selftests: ublk: display UBLK_F_INTEGRITY support
selftests: ublk: add utility to get block device metadata size
selftests: ublk: add kublk support for integrity params
selftests: ublk: implement integrity user copy in kublk
selftests: ublk: support non-O_DIRECT backing files
selftests: ublk: add integrity data support to loop target
selftests: ublk: add integrity params test
selftests: ublk: add end-to-end integrity test
Stanley Zhang (3):
ublk: support UBLK_PARAM_TYPE_INTEGRITY in device creation
ublk: implement integrity user copy
ublk: support UBLK_F_INTEGRITY
drivers/block/ublk_drv.c | 387 ++++++++++++++-----
include/linux/blk-integrity.h | 6 +-
include/uapi/linux/ublk_cmd.h | 49 ++-
tools/testing/selftests/ublk/Makefile | 6 +-
tools/testing/selftests/ublk/common.c | 4 +-
tools/testing/selftests/ublk/fault_inject.c | 1 +
tools/testing/selftests/ublk/file_backed.c | 61 ++-
tools/testing/selftests/ublk/kublk.c | 90 ++++-
tools/testing/selftests/ublk/kublk.h | 37 +-
tools/testing/selftests/ublk/metadata_size.c | 36 ++
tools/testing/selftests/ublk/null.c | 1 +
tools/testing/selftests/ublk/stripe.c | 6 +-
tools/testing/selftests/ublk/test_common.sh | 10 +
tools/testing/selftests/ublk/test_loop_08.sh | 111 ++++++
tools/testing/selftests/ublk/test_null_04.sh | 166 ++++++++
15 files changed, 833 insertions(+), 138 deletions(-)
create mode 100644 tools/testing/selftests/ublk/metadata_size.c
create mode 100755 tools/testing/selftests/ublk/test_loop_08.sh
create mode 100755 tools/testing/selftests/ublk/test_null_04.sh
--
2.45.2
This unintended LRU eviction issue was observed while developing the
selftest for
"[PATCH bpf-next v10 0/8] bpf: Introduce BPF_F_CPU and BPF_F_ALL_CPUS flags for percpu maps" [1].
When updating an existing element in lru_hash or lru_percpu_hash maps,
the current implementation calls prealloc_lru_pop() to get a new node
before checking if the key already exists. If the map is full, this
triggers LRU eviction and removes an existing element, even though the
update operation only needs to modify the value in-place.
In the selftest of the aforementioned patch, this was to be worked around by
reserving an extra entry to
void triggering eviction in __htab_lru_percpu_map_update_elem(). However, the
underlying issue remains problematic because:
1. Users may unexpectedly lose entries when updating existing keys in a
full map.
2. The eviction overhead is unnecessary for existing key updates.
This patchset fixes the issue by first checking if the key exists before
allocating a new node. If the key is found, update the value using the extra
LRU node without triggering any eviction. Only proceed with node allocation
if the key does not exist.
Links:
[1] https://lore.kernel.org/bpf/20251117162033.6296-1-leon.hwang@linux.dev/
Changes:
v2 -> v3:
* Rebase onto the latest tree to fix CI build failures.
* Free special fields of 'l_old' on the non-error path (per bot).
v1 -> v2:
* Tidy hash handling in LRU code.
* Factor out bpf_lru_node_reset_state helper.
* Factor out bpf_lru_move_next_inactive_rotation helper.
* Update element using preallocated extra elements in order to avoid
breaking the update atomicity (per Alexei).
* Check values on other CPUs in tests (per bot).
* v1: https://lore.kernel.org/bpf/20251202153032.10118-1-leon.hwang@linux.dev/
Leon Hwang (5):
bpf: lru: Tidy hash handling in LRU code
bpf: lru: Factor out bpf_lru_node_reset_state helper
bpf: lru: Factor out bpf_lru_move_next_inactive_rotation helper
bpf: lru: Fix unintended eviction when updating lru hash maps
selftests/bpf: Add tests to verify no unintended eviction when
updating lru_[percpu_,]hash maps
kernel/bpf/bpf_lru_list.c | 228 ++++++++++++++----
kernel/bpf/bpf_lru_list.h | 10 +-
kernel/bpf/hashtab.c | 96 +++++++-
.../selftests/bpf/prog_tests/htab_update.c | 129 ++++++++++
4 files changed, 408 insertions(+), 55 deletions(-)
--
2.52.0