[BUG] There is a bug report that with zoned device and sectorsize is smaller than page size (aka, subpage), btrfs would crash with a very basic workload:
# getconfig PAGESIZE 16384 # mkfs.btrfs -f $dev -s 4k # mount $dev $mnt # $fsstress -w -n 8 -s 1707820327 -v -d $mnt # umount $mnt
The crash would look like this (with CONFIG_BTRFS_ASSERT enabled):
assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1384! CPU: 0 PID: 872 Comm: kworker/u9:2 Tainted: G OE 6.8.0-rc3-custom+ #7 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20231122-12.fc39 11/22/2023 Workqueue: writeback wb_workfn (flush-btrfs-8) pc : __extent_writepage_io+0x404/0x460 [btrfs] lr : __extent_writepage_io+0x404/0x460 [btrfs] Call trace: __extent_writepage_io+0x404/0x460 [btrfs] extent_write_locked_range+0x16c/0x460 [btrfs] run_delalloc_cow+0x88/0x118 [btrfs] btrfs_run_delalloc_range+0x128/0x228 [btrfs] writepage_delalloc+0xb8/0x178 [btrfs] __extent_writepage+0xc8/0x3a0 [btrfs] extent_write_cache_pages+0x1cc/0x460 [btrfs] extent_writepages+0x8c/0x120 [btrfs] btrfs_writepages+0x18/0x30 [btrfs] do_writepages+0x94/0x1f8 __writeback_single_inode+0x4c/0x388 writeback_sb_inodes+0x208/0x4b0 wb_writeback+0x118/0x3c0 wb_do_writeback+0xbc/0x388 wb_workfn+0x80/0x240 process_one_work+0x154/0x3c8 worker_thread+0x2bc/0x3e0 kthread+0xf4/0x108 ret_from_fork+0x10/0x20 Code: 9102c021 90000be0 91378000 9402bf53 (d4210000) ---[ end trace 0000000000000000 ]---
[CAUSE] There are several factors causing the problem:
1. __extent_writepage_io() requires all dirty ranges to have delalloc executed This can be solved by adding @start and @len parameter to only submit IO for a subset of the page, and update several involved helpers to do subpage checks.
So this is not a big deal.
2. Subpage only accepts for full page aligned ranges for extent_write_locked_range() For zoned device, regular COW is switched to utilize extent_write_locked_range() to submit the IO.
But the caller, run_delalloc_cow() can be called to run on a subpage range, e.g.
0 4K 8K 12K 16K |/////| |/////|
Where |///| is the dirtied range.
In that case, btrfs_run_delalloc_range() would call run_delalloc_cow(), which would call extent_write_locked_range() for [0, 4K), and unlock the whole [0, 16K) page.
But btrfs_run_delalloc_range() would again be called for range [8K, 12K), as there are still dirty range left. In that case, since the whole page is already unlocked by previous iteration, and would cause different ASSERT()s inside extent_write_locked_range().
That's also why compression for subpage cases require fully page aligned range.
[WORKAROUND] A proper fix requires some big changes to delalloc workload, to allow extent_write_locked_range() to handle multiple different entries with the same @locked_page.
So for now, disable read-write support for subpage zoned btrfs.
The problem can only be solved if subpage btrfs can handle subpage compression, which need quite some work on the delalloc procedure for the @locked_page handling.
Reported-by: HAN Yuwei hrx@bupt.moe Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aa... CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Qu Wenruo wqu@suse.com --- fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3ab268533ca..85cd23aebdd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) * part of @locked_page. * That's also why compression for subpage only work for page aligned ranges. */ - if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) { + if (fs_info->sectorsize < PAGE_SIZE && + btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { btrfs_warn(fs_info, "no zoned read-write support for page size %lu with sectorsize %u", PAGE_SIZE, fs_info->sectorsize);
On 12.02.24 06:16, Qu Wenruo wrote:
Reported-by: HAN Yuwei hrx@bupt.moe Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aa... CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Qu Wenruo wqu@suse.com
fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3ab268533ca..85cd23aebdd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) * part of @locked_page. * That's also why compression for subpage only work for page aligned ranges. */
- if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
- if (fs_info->sectorsize < PAGE_SIZE &&
btrfs_warn(fs_info, "no zoned read-write support for page size %lu with sectorsize %u", PAGE_SIZE, fs_info->sectorsize);btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) {
Please keep btrfs_is_zoned(fs_info) instead of using btrfs_fs_incompat(fs_info, ZONED).
Otherwise, Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com
On 2024/2/12 20:15, Johannes Thumshirn wrote:
On 12.02.24 06:16, Qu Wenruo wrote:
Reported-by: HAN Yuwei hrx@bupt.moe Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aa... CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Qu Wenruo wqu@suse.com
fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3ab268533ca..85cd23aebdd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) * part of @locked_page. * That's also why compression for subpage only work for page aligned ranges. */
- if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
- if (fs_info->sectorsize < PAGE_SIZE &&
"no zoned read-write support for page size %lu with sectorsize %u", PAGE_SIZE, fs_info->sectorsize);btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { btrfs_warn(fs_info,
Please keep btrfs_is_zoned(fs_info) instead of using btrfs_fs_incompat(fs_info, ZONED).
At the time of calling, we haven't yet populate fs_info->zone_size, thus we have to use super flags to verify if it's zoned.
If needed, I can add a comment for it.
Thanks, Qu
Otherwise, Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com
On 12.02.24 10:50, Qu Wenruo wrote:
On 2024/2/12 20:15, Johannes Thumshirn wrote:
On 12.02.24 06:16, Qu Wenruo wrote:
Reported-by: HAN Yuwei hrx@bupt.moe Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aa... CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Qu Wenruo wqu@suse.com
fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3ab268533ca..85cd23aebdd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) * part of @locked_page. * That's also why compression for subpage only work for page aligned ranges. */
- if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
- if (fs_info->sectorsize < PAGE_SIZE &&
btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { btrfs_warn(fs_info, "no zoned read-write support for page size %lu with sectorsize %u", PAGE_SIZE, fs_info->sectorsize);
Please keep btrfs_is_zoned(fs_info) instead of using btrfs_fs_incompat(fs_info, ZONED).
At the time of calling, we haven't yet populate fs_info->zone_size, thus we have to use super flags to verify if it's zoned.
If needed, I can add a comment for it.
All good, I didn't have enough coffee.
On Mon, Feb 12, 2024 at 08:20:21PM +1030, Qu Wenruo wrote:
On 2024/2/12 20:15, Johannes Thumshirn wrote:
On 12.02.24 06:16, Qu Wenruo wrote:
Reported-by: HAN Yuwei hrx@bupt.moe Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aa... CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Qu Wenruo wqu@suse.com
fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3ab268533ca..85cd23aebdd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) * part of @locked_page. * That's also why compression for subpage only work for page aligned ranges. */
- if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
- if (fs_info->sectorsize < PAGE_SIZE &&
"no zoned read-write support for page size %lu with sectorsize %u", PAGE_SIZE, fs_info->sectorsize);btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { btrfs_warn(fs_info,
Please keep btrfs_is_zoned(fs_info) instead of using btrfs_fs_incompat(fs_info, ZONED).
At the time of calling, we haven't yet populate fs_info->zone_size, thus we have to use super flags to verify if it's zoned.
If needed, I can add a comment for it.
Yes please add a comment the difference is quite subtle.
在 2024/2/14 17:59, David Sterba 写道:
On Mon, Feb 12, 2024 at 08:20:21PM +1030, Qu Wenruo wrote:
On 2024/2/12 20:15, Johannes Thumshirn wrote:
On 12.02.24 06:16, Qu Wenruo wrote:
Reported-by: HAN Yuwei hrx@bupt.moe Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aa... CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Qu Wenruo wqu@suse.com
fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3ab268533ca..85cd23aebdd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) * part of @locked_page. * That's also why compression for subpage only work for page aligned ranges. */
- if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
- if (fs_info->sectorsize < PAGE_SIZE &&
btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) { btrfs_warn(fs_info, "no zoned read-write support for page size %lu with sectorsize %u", PAGE_SIZE, fs_info->sectorsize);
Please keep btrfs_is_zoned(fs_info) instead of using btrfs_fs_incompat(fs_info, ZONED).
At the time of calling, we haven't yet populate fs_info->zone_size, thus we have to use super flags to verify if it's zoned.
If needed, I can add a comment for it.
Yes please add a comment the difference is quite subtle.
I'd say this patch can be dropped.
The reason is:
- I'm already working on the proper subpage handling for the @locked_page of a delalloc range
The patchset is under testing now, the results looks fine for both regular and subpage cases. Will queue extra testing for subpage zoned.
- The rejection would cause future problems for detecting whether we have proper subpage + zoned support. Either we do not detect, or introduce a complex mechanism only for this one edge case.
Thus I prefer not to detect.
- Subage + zoned is too niche for now The most common subpage usage would be aarch64 (especially for Apple M1/2 chips). For those Apple based ones, they have no ability to expand, thus won't hit any real zoned devices. For other aarch64 servers, they should have the ability to choose a 4K page size kernel if they really want to go zoned devices.
This bug is only exposed with a helpful reporter using 16K page sized loongson board with SATA zoned disk.
Due to above reasons, I really prefer to keep the current situation, and focus on the proper fix instead.
Thanks, Qu
On Mon, Feb 12, 2024 at 03:46:15PM +1030, Qu Wenruo wrote:
[BUG] There is a bug report that with zoned device and sectorsize is smaller than page size (aka, subpage), btrfs would crash with a very basic workload:
# getconfig PAGESIZE 16384 # mkfs.btrfs -f $dev -s 4k # mount $dev $mnt # $fsstress -w -n 8 -s 1707820327 -v -d $mnt # umount $mnt
The crash would look like this (with CONFIG_BTRFS_ASSERT enabled):
assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384
This is the same as what Josef fixed in https://github.com/btrfs/linux/commit/400bb013912dac637c7d6826407be580ea8ef9... "btrfs: don't drop extent_map for free space inode on write error" but not on zoned+subpage, is it really a different error you're fixing?
[...]
[WORKAROUND] A proper fix requires some big changes to delalloc workload, to allow extent_write_locked_range() to handle multiple different entries with the same @locked_page.
So for now, disable read-write support for subpage zoned btrfs.
The problem can only be solved if subpage btrfs can handle subpage compression, which need quite some work on the delalloc procedure for the @locked_page handling.
Ok, this on itself is a valid reason to disable the support.
On 2024/2/12 21:44, David Sterba wrote:
On Mon, Feb 12, 2024 at 03:46:15PM +1030, Qu Wenruo wrote:
[BUG] There is a bug report that with zoned device and sectorsize is smaller than page size (aka, subpage), btrfs would crash with a very basic workload:
# getconfig PAGESIZE 16384 # mkfs.btrfs -f $dev -s 4k # mount $dev $mnt # $fsstress -w -n 8 -s 1707820327 -v -d $mnt # umount $mnt
The crash would look like this (with CONFIG_BTRFS_ASSERT enabled):
assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384
This is the same as what Josef fixed in https://github.com/btrfs/linux/commit/400bb013912dac637c7d6826407be580ea8ef9... "btrfs: don't drop extent_map for free space inode on write error" but not on zoned+subpage, is it really a different error you're fixing?
Yes, it's really a different bug.
Although the root cause is a little more complex, it has to be explained through the whole CAUSE section.
[...]
[WORKAROUND] A proper fix requires some big changes to delalloc workload, to allow extent_write_locked_range() to handle multiple different entries with the same @locked_page.
So for now, disable read-write support for subpage zoned btrfs.
The problem can only be solved if subpage btrfs can handle subpage compression, which need quite some work on the delalloc procedure for the @locked_page handling.
Ok, this on itself is a valid reason to disable the support.
But this reminds me that, I have to work on the proper delalloc support for subpage.
My current plan is:
1. Introduce subpage LOCK bitmap Not just only an atomic counter.
2. Make find_lock_delalloc_range() to populate that lock bitmap
3. Make find_lock_delalloc_range() to find all delalloc range of a page and properly lock them This would not change the behavior for page sized sectorsize.
4. Make find_lock_delalloc_range() to skip the page if it's already being locked before. This is also a subpage specific behavior
With above planned work, btrfs should be able to handle subpage unlock, so that we won't need weird @locked_page handling.
Hopefully this would solve both the subpage+zoned and full sector aligned subpage compression support.
Thanks, Qu
On Mon, Feb 12, 2024 at 03:46:15PM +1030, Qu Wenruo wrote:
[BUG] There is a bug report that with zoned device and sectorsize is smaller than page size (aka, subpage), btrfs would crash with a very basic workload:
# getconfig PAGESIZE 16384 # mkfs.btrfs -f $dev -s 4k # mount $dev $mnt # $fsstress -w -n 8 -s 1707820327 -v -d $mnt # umount $mnt
The crash would look like this (with CONFIG_BTRFS_ASSERT enabled):
assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1384 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1384! CPU: 0 PID: 872 Comm: kworker/u9:2 Tainted: G OE 6.8.0-rc3-custom+ #7 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20231122-12.fc39 11/22/2023 Workqueue: writeback wb_workfn (flush-btrfs-8) pc : __extent_writepage_io+0x404/0x460 [btrfs] lr : __extent_writepage_io+0x404/0x460 [btrfs] Call trace: __extent_writepage_io+0x404/0x460 [btrfs] extent_write_locked_range+0x16c/0x460 [btrfs] run_delalloc_cow+0x88/0x118 [btrfs] btrfs_run_delalloc_range+0x128/0x228 [btrfs] writepage_delalloc+0xb8/0x178 [btrfs] __extent_writepage+0xc8/0x3a0 [btrfs] extent_write_cache_pages+0x1cc/0x460 [btrfs] extent_writepages+0x8c/0x120 [btrfs] btrfs_writepages+0x18/0x30 [btrfs] do_writepages+0x94/0x1f8 __writeback_single_inode+0x4c/0x388 writeback_sb_inodes+0x208/0x4b0 wb_writeback+0x118/0x3c0 wb_do_writeback+0xbc/0x388 wb_workfn+0x80/0x240 process_one_work+0x154/0x3c8 worker_thread+0x2bc/0x3e0 kthread+0xf4/0x108 ret_from_fork+0x10/0x20 Code: 9102c021 90000be0 91378000 9402bf53 (d4210000) ---[ end trace 0000000000000000 ]---
[CAUSE] There are several factors causing the problem:
__extent_writepage_io() requires all dirty ranges to have delalloc executed This can be solved by adding @start and @len parameter to only submit IO for a subset of the page, and update several involved helpers to do subpage checks.
So this is not a big deal.
Subpage only accepts for full page aligned ranges for extent_write_locked_range() For zoned device, regular COW is switched to utilize extent_write_locked_range() to submit the IO.
But the caller, run_delalloc_cow() can be called to run on a subpage range, e.g.
0 4K 8K 12K 16K |/////| |/////|
Where |///| is the dirtied range.
In that case, btrfs_run_delalloc_range() would call run_delalloc_cow(), which would call extent_write_locked_range() for [0, 4K), and unlock the whole [0, 16K) page.
But btrfs_run_delalloc_range() would again be called for range [8K, 12K), as there are still dirty range left. In that case, since the whole page is already unlocked by previous iteration, and would cause different ASSERT()s inside extent_write_locked_range().
That's also why compression for subpage cases require fully page aligned range.
[WORKAROUND] A proper fix requires some big changes to delalloc workload, to allow extent_write_locked_range() to handle multiple different entries with the same @locked_page.
So for now, disable read-write support for subpage zoned btrfs.
The problem can only be solved if subpage btrfs can handle subpage compression, which need quite some work on the delalloc procedure for the @locked_page handling.
Reported-by: HAN Yuwei hrx@bupt.moe Link: https://lore.kernel.org/all/1ACD2E3643008A17+da260584-2c7f-432a-9e22-9d390aa... CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Qu Wenruo wqu@suse.com
fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c3ab268533ca..85cd23aebdd6 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3193,7 +3193,8 @@ int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount) * part of @locked_page. * That's also why compression for subpage only work for page aligned ranges. */
- if (fs_info->sectorsize < PAGE_SIZE && btrfs_is_zoned(fs_info) && is_rw_mount) {
- if (fs_info->sectorsize < PAGE_SIZE &&
btrfs_warn(fs_info, "no zoned read-write support for page size %lu with sectorsize %u", PAGE_SIZE, fs_info->sectorsize);btrfs_fs_incompat(fs_info, ZONED) && is_rw_mount) {
This does not match any code I see in usual branches and it also does not apply so it can't be picked as a fix.
linux-stable-mirror@lists.linaro.org