On Wed, May 07, 2025 at 03:49:42PM -0600, Uday Shankar wrote:
+Load balancing +--------------
+A simple approach to designing a ublk server might involve selecting a +number of I/O handler threads N, creating devices with N queues, and +pairing up I/O handler threads with queues, so that each thread gets a +unique qid, and it issues ``FETCH_REQ``s against all tags for that qid.
``FETCH_REQ``\s (escape s)
+Indeed, before the introduction of the ``UBLK_F_RR_TAGS`` feature, this +was essentially the only option (*)
Use reST footnotes syntax, i.e.:
---- >8 ---- diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst index 440b63be4ea8b6..b1d29fceff4e80 100644 --- a/Documentation/block/ublk.rst +++ b/Documentation/block/ublk.rst @@ -325,7 +325,7 @@ number of I/O handler threads N, creating devices with N queues, and pairing up I/O handler threads with queues, so that each thread gets a unique qid, and it issues ``FETCH_REQ``\s against all tags for that qid. Indeed, before the introduction of the ``UBLK_F_RR_TAGS`` feature, this -was essentially the only option (*) +was essentially the only option [#]_
This approach can run into performance issues under imbalanced load. This architecture taken together with the `blk-mq architecture @@ -368,8 +368,8 @@ With this setup, I/O submitted on a CPU which maps to queue 0 will be balanced across all threads instead of all landing on the same thread. Thus, a potential bottleneck is avoided.
-(*) technically, one I/O handling thread could service multiple queues -if it wanted to, but that doesn't help with imbalanced load +.. [#] Technically, one I/O handling thread could service multiple queues + if it wanted to, but that doesn't help with imbalanced load
Zero copy ---------
+This approach can run into performance issues under imbalanced load. +This architecture taken together with the `blk-mq architecture +https://docs.kernel.org/block/blk-mq.html`_ implies that there is a
This architecture, taken together with the :doc:`blk-mq architecture </block/blk-mq>`, implies that ...
+fixed mapping from I/O submission CPU to the ublk server thread that +handles it. If the workload is CPU-bottlenecked, only allowing one ublk +server thread to handle all the I/O generated from a single CPU can +limit peak bandwidth.
<snipped>... +With these changes, a ublk server can balance load as follows:
+- create the device with ``UBLK_F_RR_TAGS`` set in
- ``ublksrv_ctrl_dev_info::flags`` when issuing the ``ADD_DEV`` command
+- issue ``FETCH_REQ``s from ublk server threads to (qid,tag) pairs in
- a round-robin manner. For example, for a device configured with
- ``nr_hw_queues=2`` and ``queue_depth=4``, and a ublk server having 4
- I/O handling threads, ``FETCH_REQ``s could be issued as follows, where
- each entry in the table is the pair (``ublksrv_io_cmd::q_id``,
- ``ublksrv_io_cmd::tag``) in the payload of the ``FETCH_REQ``.
s/``FETCH_REQ``/``FETCH_REQ``\s/ (escape s after FETCH_REQ).
- ======== ======== ======== ========
- thread 0 thread 1 thread 2 thread 3
- ======== ======== ======== ========
- (0, 0) (0, 1) (0, 2) (0, 3)
- (1, 3) (1, 0) (1, 1) (1, 2)
Add table border in the bottom, i.e.:
---- >8 ---- diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst index e9cbabdd69c553..dc6fdfedba9ab4 100644 --- a/Documentation/block/ublk.rst +++ b/Documentation/block/ublk.rst @@ -362,6 +362,7 @@ With these changes, a ublk server can balance load as follows: ======== ======== ======== ======== (0, 0) (0, 1) (0, 2) (0, 3) (1, 3) (1, 0) (1, 1) (1, 2) + ======== ======== ======== ========
With this setup, I/O submitted on a CPU which maps to queue 0 will be balanced across all threads instead of all landing on the same thread.
Thanks.