On Tue, 15 Apr 2025 18:58:51 -0700, akpm@linux-foundation.org wrote:
On Tue, 15 Apr 2025 17:02:32 +0800 alexjlzheng@gmail.com wrote:
From: Jinliang Zheng alexjlzheng@tencent.com
In the dirty_ratio_handler() function, vm_dirty_bytes must be set to zero before calling writeback_set_ratelimit(), as global_dirty_limits() always prioritizes the value of vm_dirty_bytes.
Can you please tell us precisely where global_dirty_limits() prioritizes vm_dirty_bytes? I spent a while chasing code and didn't see how global_dirty_limits() gets to node_dirty_ok()(?).
Thank you for your reply.
It's domain_dirty_limits() that's relevant here, not node_dirty_ok:
dirty_ratio_handler writeback_set_ratelimit global_dirty_limits(&dirty_thresh) <- ratelimit_pages based on dirty_thresh domain_dirty_limits if (bytes) <- bytes = vm_dirty_bytes <--------+ thresh = f1(bytes) <- prioritizes vm_dirty_bytes | else | thresh = f2(ratio) | ratelimit_pages = f3(dirty_thresh) | vm_dirty_bytes = 0 <- it's late! ---------------------+
That causes ratelimit_pages to still use the value calculated based on vm_dirty_bytes, which is wrong now.
Fixes: 9d823e8f6b1b ("writeback: per task dirty rate limit") Signed-off-by: Jinliang Zheng alexjlzheng@tencent.com Reviewed-by: MengEn Sun mengensun@tencent.com Cc: stable@vger.kernel.org
Please, as always, provide a description of the userspace-visible effects of this bug?
The impact visible to userspace is difficult to capture directly because there is no procfs/sysfs interface exported to user space. However, it will have a real impact on the balance of dirty pages.
For example: 1. On default, we have vm_dirty_ratio=40, vm_dirty_bytes=0 2. echo 8192 > dirty_bytes, then vm_dirty_bytes=8192, vm_dirty_ratio=0, and ratelimit_pages is calculated based on vm_dirty_bytes now. 3. echo 20 > dirty_ratio, then since vm_dirty_bytes is not reset to zero when writeback_set_ratelimit() -> global_dirty_limits() -> domain_dirty_limits() is called, reallimit_pages is still calculated based on vm_dirty_bytes instead of vm_dirty_ratio. This does not conform to the actual intention of the user.
thanks, Jinliang Zheng :)