Hi Andres,
With this commit applied to the 6.1 and later kernels (others not tested) the iowait time ("wa" field in top) in an ARM64 build running on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core is permanently blocked on I/O. The change can be observed after installing mariadb-server (no configuration or use is required). After reverting just this commit, "wa" drops to zero again.
I can believe that this change hasn't negatively affected performance, but the result is misleading. I also think it's pushing the boundaries of what a back-port to stable should do.
Phil
On Mon, Jul 24, 2023 at 04:35:43PM +0100, Phil Elwell wrote:
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not tested) the iowait time ("wa" field in top) in an ARM64 build running on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core is permanently blocked on I/O. The change can be observed after installing mariadb-server (no configuration or use is required). After reverting just this commit, "wa" drops to zero again.
This has been discussed already: https://lore.kernel.org/r/12251678.O9o76ZdvQC@natalenko.name
It's not a bug, mariadb does have pending I/O, so the report is correct, but the CPU isn't blocked at all.
thanks,
greg k-h
On 7/24/23 9:48?AM, Greg KH wrote:
On Mon, Jul 24, 2023 at 04:35:43PM +0100, Phil Elwell wrote:
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not tested) the iowait time ("wa" field in top) in an ARM64 build running on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core is permanently blocked on I/O. The change can be observed after installing mariadb-server (no configuration or use is required). After reverting just this commit, "wa" drops to zero again.
This has been discussed already: https://lore.kernel.org/r/12251678.O9o76ZdvQC@natalenko.name
It's not a bug, mariadb does have pending I/O, so the report is correct, but the CPU isn't blocked at all.
Indeed - only thing I can think of is perhaps mariadb is having a separate thread waiting on the ring in perpetuity, regardless of whether or not it currently has IO.
But yes, this is very much ado about nothing...
On 7/24/23 9:50?AM, Jens Axboe wrote:
On 7/24/23 9:48?AM, Greg KH wrote:
On Mon, Jul 24, 2023 at 04:35:43PM +0100, Phil Elwell wrote:
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not tested) the iowait time ("wa" field in top) in an ARM64 build running on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core is permanently blocked on I/O. The change can be observed after installing mariadb-server (no configuration or use is required). After reverting just this commit, "wa" drops to zero again.
This has been discussed already: https://lore.kernel.org/r/12251678.O9o76ZdvQC@natalenko.name
It's not a bug, mariadb does have pending I/O, so the report is correct, but the CPU isn't blocked at all.
Indeed - only thing I can think of is perhaps mariadb is having a separate thread waiting on the ring in perpetuity, regardless of whether or not it currently has IO.
But yes, this is very much ado about nothing...
Current -git and having mariadb idle:
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 0.00 0.00 0.04 12.47 0.04 0.00 0.00 0.00 0.00 87.44 Average: 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 2 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 99.67 Average: 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 4 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 99.67 Average: 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 6 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
which is showing 100% iowait on one cpu, as mariadb has a thread waiting on IO. That is obviously a valid use case, if you split submission and completion into separate threads. Then you have the latter just always waiting on something to process.
With the suggested patch, we do eliminate that case and the iowait on that task is gone. Here's current -git with the patch and mariadb also running:
09:53:49 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 09:53:50 AM all 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00 99.25 09:53:50 AM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:53:50 AM 1 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 2 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 3 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 4 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 99.01 09:53:50 AM 5 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:53:50 AM 7 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00
Even though I don't think this is an actual problem, it is a bit confusing that you get 100% iowait while waiting without having IO pending. So I do think the suggested patch is probably worthwhile pursuing. I'll post it and hopefully have Andres test it too, if he's available.
Jens, Greg,
On Mon, 24 Jul 2023 at 16:58, Jens Axboe axboe@kernel.dk wrote:
On 7/24/23 9:50?AM, Jens Axboe wrote:
On 7/24/23 9:48?AM, Greg KH wrote:
On Mon, Jul 24, 2023 at 04:35:43PM +0100, Phil Elwell wrote:
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not tested) the iowait time ("wa" field in top) in an ARM64 build running on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core is permanently blocked on I/O. The change can be observed after installing mariadb-server (no configuration or use is required). After reverting just this commit, "wa" drops to zero again.
This has been discussed already: https://lore.kernel.org/r/12251678.O9o76ZdvQC@natalenko.name
Sorry - a brief search failed to find that.
It's not a bug, mariadb does have pending I/O, so the report is correct, but the CPU isn't blocked at all.
Indeed - only thing I can think of is perhaps mariadb is having a separate thread waiting on the ring in perpetuity, regardless of whether or not it currently has IO.
But yes, this is very much ado about nothing...
Current -git and having mariadb idle:
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 0.00 0.00 0.04 12.47 0.04 0.00 0.00 0.00 0.00 87.44 Average: 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 2 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 99.67 Average: 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 4 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 99.67 Average: 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 6 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
which is showing 100% iowait on one cpu, as mariadb has a thread waiting on IO. That is obviously a valid use case, if you split submission and completion into separate threads. Then you have the latter just always waiting on something to process.
With the suggested patch, we do eliminate that case and the iowait on that task is gone. Here's current -git with the patch and mariadb also running:
09:53:49 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 09:53:50 AM all 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00 99.25 09:53:50 AM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:53:50 AM 1 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 2 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 3 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 4 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 99.01 09:53:50 AM 5 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:53:50 AM 7 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00
Even though I don't think this is an actual problem, it is a bit confusing that you get 100% iowait while waiting without having IO pending. So I do think the suggested patch is probably worthwhile pursuing. I'll post it and hopefully have Andres test it too, if he's available.
If you CC me I'll happily test it for you.
Thanks,
Phil
-- Jens Axboe
On 7/24/23 10:07?AM, Phil Elwell wrote:
Even though I don't think this is an actual problem, it is a bit confusing that you get 100% iowait while waiting without having IO pending. So I do think the suggested patch is probably worthwhile pursuing. I'll post it and hopefully have Andres test it too, if he's available.
If you CC me I'll happily test it for you.
Here it is.
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 89a611541bc4..f4591b912ea8 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2493,11 +2493,20 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx) return 0; }
+static bool current_pending_io(void) +{ + struct io_uring_task *tctx = current->io_uring; + + if (!tctx) + return false; + return percpu_counter_read_positive(&tctx->inflight); +} + /* when returns >0, the caller should retry */ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct io_wait_queue *iowq) { - int token, ret; + int io_wait, ret;
if (unlikely(READ_ONCE(ctx->check_cq))) return 1; @@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, return 0;
/* - * Use io_schedule_prepare/finish, so cpufreq can take into account - * that the task is waiting for IO - turns out to be important for low - * QD IO. + * Mark us as being in io_wait if we have pending requests, so cpufreq + * can take into account that the task is waiting for IO - turns out + * to be important for low QD IO. */ - token = io_schedule_prepare(); + io_wait = current->in_iowait; + if (current_pending_io()) + current->in_iowait = 1; ret = 0; if (iowq->timeout == KTIME_MAX) schedule(); else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS)) ret = -ETIME; - io_schedule_finish(token); + current->in_iowait = io_wait; return ret; }
Hi Jens,
On Mon, 24 Jul 2023 at 17:08, Jens Axboe axboe@kernel.dk wrote:
On 7/24/23 10:07?AM, Phil Elwell wrote:
Even though I don't think this is an actual problem, it is a bit confusing that you get 100% iowait while waiting without having IO pending. So I do think the suggested patch is probably worthwhile pursuing. I'll post it and hopefully have Andres test it too, if he's available.
If you CC me I'll happily test it for you.
Here it is.
< snip >
Thanks, that works for me on top of 6.5-rc3. Going to 6.1 is a non-trivial (for me) back-port - the switch from "ret = 0" in 6.5 to "ret = 1" in 6.1 is surprising.
Phil
On 7/24/23 10:48 AM, Phil Elwell wrote:
Hi Jens,
On Mon, 24 Jul 2023 at 17:08, Jens Axboe axboe@kernel.dk wrote:
On 7/24/23 10:07?AM, Phil Elwell wrote:
Even though I don't think this is an actual problem, it is a bit confusing that you get 100% iowait while waiting without having IO pending. So I do think the suggested patch is probably worthwhile pursuing. I'll post it and hopefully have Andres test it too, if he's available.
If you CC me I'll happily test it for you.
Here it is.
< snip >
Thanks, that works for me on top of 6.5-rc3. Going to 6.1 is a non-trivial (for me) back-port - the switch from "ret = 0" in 6.5 to "ret = 1" in 6.1 is surprising.
Great, thanks for testing. I'll take care of the stable backports once the patch lands in upstream -git later this week.
On 7/24/23 16:58, Jens Axboe wrote:
On 7/24/23 9:50?AM, Jens Axboe wrote:
On 7/24/23 9:48?AM, Greg KH wrote:
On Mon, Jul 24, 2023 at 04:35:43PM +0100, Phil Elwell wrote:
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not tested) the iowait time ("wa" field in top) in an ARM64 build running on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core is permanently blocked on I/O. The change can be observed after installing mariadb-server (no configuration or use is required). After reverting just this commit, "wa" drops to zero again.
This has been discussed already: https://lore.kernel.org/r/12251678.O9o76ZdvQC@natalenko.name
It's not a bug, mariadb does have pending I/O, so the report is correct, but the CPU isn't blocked at all.
Indeed - only thing I can think of is perhaps mariadb is having a separate thread waiting on the ring in perpetuity, regardless of whether or not it currently has IO.
But yes, this is very much ado about nothing...
Current -git and having mariadb idle:
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 0.00 0.00 0.04 12.47 0.04 0.00 0.00 0.00 0.00 87.44 Average: 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 2 0.00 0.00 0.00 0.00 0.33 0.00 0.00 0.00 0.00 99.67 Average: 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 4 0.00 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 99.67 Average: 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 Average: 6 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
which is showing 100% iowait on one cpu, as mariadb has a thread waiting on IO. That is obviously a valid use case, if you split submission and completion into separate threads. Then you have the latter just always waiting on something to process.
With the suggested patch, we do eliminate that case and the iowait on that task is gone. Here's current -git with the patch and mariadb also running:
09:53:49 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 09:53:50 AM all 0.00 0.00 0.00 0.00 0.00 0.75 0.00 0.00 0.00 99.25 09:53:50 AM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:53:50 AM 1 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 2 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 3 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 4 0.00 0.00 0.00 0.00 0.00 0.99 0.00 0.00 0.00 99.01 09:53:50 AM 5 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00 09:53:50 AM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00 09:53:50 AM 7 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 99.00
Even though I don't think this is an actual problem, it is a bit confusing that you get 100% iowait while waiting without having IO pending. So I do think the suggested patch is probably worthwhile pursuing. I'll post it and hopefully have Andres test it too, if he's available.
Emmm, what's the definition of the "IO" state? Unless we can say what exactly it is there will be no end to adjustments, because I can easily argue that CQ waiting by itself is IO. Do we consider sleep(N) to be "IO"? I don't think the kernel uses io schedule around that, and so it'd be different from io_uring waiting for a timeout request. What about epoll waiting, etc.?
Pavel Begunkov asml.silence@gmail.com writes:
On 7/24/23 16:58, Jens Axboe wrote:
Even though I don't think this is an actual problem, it is a bit confusing that you get 100% iowait while waiting without having IO pending. So I do think the suggested patch is probably worthwhile pursuing. I'll post it and hopefully have Andres test it too, if he's available.
Emmm, what's the definition of the "IO" state? Unless we can say what exactly it is there will be no end to adjustments, because I can easily argue that CQ waiting by itself is IO. Do we consider sleep(N) to be "IO"? I don't think the kernel uses io schedule around that, and so it'd be different from io_uring waiting for a timeout request. What about epoll waiting, etc.?
See Documentation/filesystems/proc.rst (and mainly commit 9c240d757658 ("Change the document about iowait")):
- iowait: In a word, iowait stands for waiting for I/O to complete. But there are several problems:
1. CPU will not wait for I/O to complete, iowait is the time that a task is waiting for I/O to complete. When CPU goes into idle state for outstanding task I/O, another task will be scheduled on this CPU. 2. In a multi-core CPU, the task waiting for I/O to complete is not running on any CPU, so the iowait of each CPU is difficult to calculate. 3. The value of iowait field in /proc/stat will decrease in certain conditions.
So, the iowait is not reliable by reading from /proc/stat.
Also, vmstat(8): wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
iostat/mpstat man pages: %iowait Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
sar(1): %iowait Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
iowait was initially introduced in 2002 by Rik van Riel in historical git commit 7b88e5e0bdf25 ("[PATCH] "io wait" process accounting"). The changelog from akpm reads:
Patch from Rik adds "I/O wait" statistics to /proc/stat.
This allows us to determine how much system time is being spent awaiting IO completion. This is an important statistic, as it tends to directly subtract from job completion time.
procps-2.0.9 is OK with this, but doesn't report it.
I vaguely recall there was confusion from users about why the system was idle when running database workloads. Maybe Rik can remember more clearly.
Anyway, as you can see, the definition is murky, at best. I don't think we should overthink it. I agree with the principle of Jens' patch--let's just not surprise users with a change in behavior.
Cheers, Jeff
On 7/24/23 9:35?AM, Phil Elwell wrote:
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not tested) the iowait time ("wa" field in top) in an ARM64 build running on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core is permanently blocked on I/O. The change can be observed after installing mariadb-server (no configuration or use is required). After reverting just this commit, "wa" drops to zero again.
There are a few other threads on this...
I can believe that this change hasn't negatively affected performance, but the result is misleading. I also think it's pushing the boundaries of what a back-port to stable should do.
It's just a cosmetic thing, to be fair, and it makes quite a large difference on important cases. This is why it also went to stable, which btw was not Andres's decision at all. I've posted this patch in another thread as well, but here it is in this thread too - this will limit the cases that are marked as iowait.
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 89a611541bc4..f4591b912ea8 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2493,11 +2493,20 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx) return 0; }
+static bool current_pending_io(void) +{ + struct io_uring_task *tctx = current->io_uring; + + if (!tctx) + return false; + return percpu_counter_read_positive(&tctx->inflight); +} + /* when returns >0, the caller should retry */ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct io_wait_queue *iowq) { - int token, ret; + int io_wait, ret;
if (unlikely(READ_ONCE(ctx->check_cq))) return 1; @@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, return 0;
/* - * Use io_schedule_prepare/finish, so cpufreq can take into account - * that the task is waiting for IO - turns out to be important for low - * QD IO. + * Mark us as being in io_wait if we have pending requests, so cpufreq + * can take into account that the task is waiting for IO - turns out + * to be important for low QD IO. */ - token = io_schedule_prepare(); + io_wait = current->in_iowait; + if (current_pending_io()) + current->in_iowait = 1; ret = 0; if (iowq->timeout == KTIME_MAX) schedule(); else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS)) ret = -ETIME; - io_schedule_finish(token); + current->in_iowait = io_wait; return ret; }
Hi,
On 2023-07-24 09:48:58 -0600, Jens Axboe wrote:
On 7/24/23 9:35?AM, Phil Elwell wrote:
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not tested) the iowait time ("wa" field in top) in an ARM64 build running on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core is permanently blocked on I/O. The change can be observed after installing mariadb-server (no configuration or use is required). After reverting just this commit, "wa" drops to zero again.
There are a few other threads on this...
I can believe that this change hasn't negatively affected performance, but the result is misleading. I also think it's pushing the boundaries of what a back-port to stable should do.
FWIW, I think this partially just mpstat reporting something quite bogus. It makes no sense to say that a cpu is 100% busy waiting for IO, when the one process is doing IO is just waiting.
+static bool current_pending_io(void) +{
- struct io_uring_task *tctx = current->io_uring;
- if (!tctx)
return false;
- return percpu_counter_read_positive(&tctx->inflight);
+}
/* when returns >0, the caller should retry */ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct io_wait_queue *iowq) {
- int token, ret;
- int io_wait, ret;
if (unlikely(READ_ONCE(ctx->check_cq))) return 1; @@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, return 0; /*
* Use io_schedule_prepare/finish, so cpufreq can take into account
* that the task is waiting for IO - turns out to be important for low
* QD IO.
* Mark us as being in io_wait if we have pending requests, so cpufreq
* can take into account that the task is waiting for IO - turns out
*/* to be important for low QD IO.
- token = io_schedule_prepare();
- io_wait = current->in_iowait;
I don't know the kernel "rules" around this, but ->in_iowait is only modified in kernel/sched, so it seemed a tad "unfriendly" to scribble on it here...
Building a kernel to test with the patch applied, will reboot into it once the call I am on has finished. Unfortunately the performance difference didn't reproduce nicely in VM...
Greetings,
Andres Freund
On 7/24/23 10:16?AM, Andres Freund wrote:
Hi,
On 2023-07-24 09:48:58 -0600, Jens Axboe wrote:
On 7/24/23 9:35?AM, Phil Elwell wrote:
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not tested) the iowait time ("wa" field in top) in an ARM64 build running on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core is permanently blocked on I/O. The change can be observed after installing mariadb-server (no configuration or use is required). After reverting just this commit, "wa" drops to zero again.
There are a few other threads on this...
I can believe that this change hasn't negatively affected performance, but the result is misleading. I also think it's pushing the boundaries of what a back-port to stable should do.
FWIW, I think this partially just mpstat reporting something quite bogus. It makes no sense to say that a cpu is 100% busy waiting for IO, when the one process is doing IO is just waiting.
Indeed... It really just means it's spending 100% of its time _waiting_ on IO, not that it's doing anything. This is largely to save myself from future emails on this subject, saving my own time.
+static bool current_pending_io(void) +{
- struct io_uring_task *tctx = current->io_uring;
- if (!tctx)
return false;
- return percpu_counter_read_positive(&tctx->inflight);
+}
/* when returns >0, the caller should retry */ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct io_wait_queue *iowq) {
- int token, ret;
- int io_wait, ret;
if (unlikely(READ_ONCE(ctx->check_cq))) return 1; @@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, return 0; /*
* Use io_schedule_prepare/finish, so cpufreq can take into account
* that the task is waiting for IO - turns out to be important for low
* QD IO.
* Mark us as being in io_wait if we have pending requests, so cpufreq
* can take into account that the task is waiting for IO - turns out
*/* to be important for low QD IO.
- token = io_schedule_prepare();
- io_wait = current->in_iowait;
I don't know the kernel "rules" around this, but ->in_iowait is only modified in kernel/sched, so it seemed a tad "unfriendly" to scribble on it here...
It's either that or add new helpers for this, at least for the initial one. Calling blk_flush_plug() (and with async == true, no less) is not something we need or want to do.
So we could add an io_schedule_prepare_noflush() for this, but also seems silly to add a single use helper for that imho.
Building a kernel to test with the patch applied, will reboot into it once the call I am on has finished. Unfortunately the performance difference didn't reproduce nicely in VM...
Thanks!
Hi,
On 2023-07-24 09:16:56 -0700, Andres Freund wrote:
Building a kernel to test with the patch applied, will reboot into it once the call I am on has finished. Unfortunately the performance difference didn't reproduce nicely in VM...
Performance is good with the patch applied. Results are slightly better even, but I think that's likely just noise.
Greetings,
Andres Freund
On 7/24/23 11:24?AM, Andres Freund wrote:
Hi,
On 2023-07-24 09:16:56 -0700, Andres Freund wrote:
Building a kernel to test with the patch applied, will reboot into it once the call I am on has finished. Unfortunately the performance difference didn't reproduce nicely in VM...
Performance is good with the patch applied. Results are slightly better even, but I think that's likely just noise.
Could be - it's avoiding a few function calls and the flush, but would probably have to be a pretty targeted test setup to find that for storage IO. Thanks for testing!
linux-stable-mirror@lists.linaro.org