On Mon, Jun 28, 2021 at 05:57:28PM +0900, Naohiro Aota wrote:
Damien reported a test failure with btrfs/209. The test itself ran fine, but the fsck run afterwards reported a corrupted filesystem.
The filesystem corruption happens because we're splitting an extent and then writing the extent twice. We have to split the extent though, because we're creating too large extents for a REQ_OP_ZONE_APPEND operation.
When dumping the extent tree, we can see two EXTENT_ITEMs at the same start address but different lengths.
$ btrfs inspect dump-tree /dev/nullb1 -t extent ... item 19 key (269484032 EXTENT_ITEM 126976) itemoff 15470 itemsize 53 refs 1 gen 7 flags DATA extent data backref root FS_TREE objectid 257 offset 786432 count 1 item 20 key (269484032 EXTENT_ITEM 262144) itemoff 15417 itemsize 53 refs 1 gen 7 flags DATA extent data backref root FS_TREE objectid 257 offset 786432 count 1
The duplicated EXTENT_ITEMs originally come from wrongly split extent_map in extract_ordered_extent(). Since extract_ordered_extent() uses create_io_em() to split an existing extent_map, we will have split->orig_start != split->start. Then, it will be logged with non-zero "extent data offset". Finally, the logged entries are replayed into a duplicated EXTENT_ITEM.
Introduce and use proper splitting function for extent_map. The function is intended to be simple and specific usage for extract_ordered_extent() e.g. not supporting compression case (we do not allow splitting compressed extent_map anyway).
Fixes: d22002fd37bd ("btrfs: zoned: split ordered extent when bio is sent") Cc: stable@vger.kernel.org # 5.12+ Reported-by: Damien Le Moal damien.lemoal@wdc.com Cc: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Naohiro Aota naohiro.aota@wdc.com
Added to a topic branch, I think I've hit the problem this patch is supposed to fix so I'll to reproduce it before adding it to misc-next. I've added Daminen's answer to the changelog as it's really helpful to understand why it's fixed that way.