The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
260ad3de7183 ("platform/x86/amd: pmc: Add a workaround for an s0i3 issue on Cezanne")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 260ad3de718301ed8c22e28558e3a31c99f54cf6 Mon Sep 17 00:00:00 2001
From: Mario Limonciello <mario.limonciello(a)amd.com>
Date: Wed, 16 Nov 2022 09:43:41 -0600
Subject: [PATCH] platform/x86/amd: pmc: Add a workaround for an s0i3 issue on
Cezanne
Cezanne platforms under the right circumstances have a synchronization
problem where attempting to enter s2idle may fail if the x86 cores are
put into HLT before hardware resume from the previous attempt has
completed.
To avoid this issue add a 10-20ms delay before entering s2idle another
time. This workaround will only be applied on interrupts that wake the
hardware but don't break the s2idle loop.
Cc: stable(a)vger.kernel.org # 6.1
Cc: "Mahapatra, Rajib" <Rajib.Mahapatra(a)amd.com>
Cc: "Raul Rangel" <rrangel(a)chromium.org>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Link: https://lore.kernel.org/r/20221116154341.13382-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
diff --git a/drivers/platform/x86/amd/pmc.c b/drivers/platform/x86/amd/pmc.c
index ef4ae977b8e0..439d282aafd1 100644
--- a/drivers/platform/x86/amd/pmc.c
+++ b/drivers/platform/x86/amd/pmc.c
@@ -739,8 +739,14 @@ static void amd_pmc_s2idle_prepare(void)
static void amd_pmc_s2idle_check(void)
{
struct amd_pmc_dev *pdev = &pmc;
+ struct smu_metrics table;
int rc;
+ /* CZN: Ensure that future s0i3 entry attempts at least 10ms passed */
+ if (pdev->cpu_id == AMD_CPU_ID_CZN && !get_metrics_table(pdev, &table) &&
+ table.s0i3_last_entry_status)
+ usleep_range(10000, 20000);
+
/* Dump the IdleMask before we add to the STB */
amd_pmc_idlemask_read(pdev, pdev->dev, NULL);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
bc1b705b0eee ("x86/MCE/AMD: Clear DFR errors found in THR handler")
8121b8f947be ("x86/mce: Get rid of msr_ops")
c9bf318f77b3 ("x86/mce/amd: Init thresholding machinery only on relevant vendors")
6e5cf31fbe65 ("x86/mce/amd: Publish the bank pointer only after setup has succeeded")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc1b705b0eee4c645ad8b3bbff3c8a66e9688362 Mon Sep 17 00:00:00 2001
From: Yazen Ghannam <yazen.ghannam(a)amd.com>
Date: Tue, 21 Jun 2022 15:59:43 +0000
Subject: [PATCH] x86/MCE/AMD: Clear DFR errors found in THR handler
AMD's MCA Thresholding feature counts errors of all severity levels, not
just correctable errors. If a deferred error causes the threshold limit
to be reached (it was the error that caused the overflow), then both a
deferred error interrupt and a thresholding interrupt will be triggered.
The order of the interrupts is not guaranteed. If the threshold
interrupt handler is executed first, then it will clear MCA_STATUS for
the error. It will not check or clear MCA_DESTAT which also holds a copy
of the deferred error. When the deferred error interrupt handler runs it
will not find an error in MCA_STATUS, but it will find the error in
MCA_DESTAT. This will cause two errors to be logged.
Check for deferred errors when handling a threshold interrupt. If a bank
contains a deferred error, then clear the bank's MCA_DESTAT register.
Define a new helper function to do the deferred error check and clearing
of MCA_DESTAT.
[ bp: Simplify, convert comment to passive voice. ]
Fixes: 37d43acfd79f ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers")
Signed-off-by: Yazen Ghannam <yazen.ghannam(a)amd.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20220621155943.33623-1-yazen.ghannam@amd.com
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1c87501e0fa3..10fb5b5c9efa 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -788,6 +788,24 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
return status & MCI_STATUS_DEFERRED;
}
+static bool _log_error_deferred(unsigned int bank, u32 misc)
+{
+ if (!_log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
+ mca_msr_reg(bank, MCA_ADDR), misc))
+ return false;
+
+ /*
+ * Non-SMCA systems don't have MCA_DESTAT/MCA_DEADDR registers.
+ * Return true here to avoid accessing these registers.
+ */
+ if (!mce_flags.smca)
+ return true;
+
+ /* Clear MCA_DESTAT if the deferred error was logged from MCA_STATUS. */
+ wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
+ return true;
+}
+
/*
* We have three scenarios for checking for Deferred errors:
*
@@ -799,19 +817,8 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
*/
static void log_error_deferred(unsigned int bank)
{
- bool defrd;
-
- defrd = _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
- mca_msr_reg(bank, MCA_ADDR), 0);
-
- if (!mce_flags.smca)
- return;
-
- /* Clear MCA_DESTAT if we logged the deferred error from MCA_STATUS. */
- if (defrd) {
- wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
+ if (_log_error_deferred(bank, 0))
return;
- }
/*
* Only deferred errors are logged in MCA_DE{STAT,ADDR} so just check
@@ -832,7 +839,7 @@ static void amd_deferred_error_interrupt(void)
static void log_error_thresholding(unsigned int bank, u64 misc)
{
- _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), mca_msr_reg(bank, MCA_ADDR), misc);
+ _log_error_deferred(bank, misc);
}
static void log_and_reset_block(struct threshold_block *block)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
bc1b705b0eee ("x86/MCE/AMD: Clear DFR errors found in THR handler")
8121b8f947be ("x86/mce: Get rid of msr_ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc1b705b0eee4c645ad8b3bbff3c8a66e9688362 Mon Sep 17 00:00:00 2001
From: Yazen Ghannam <yazen.ghannam(a)amd.com>
Date: Tue, 21 Jun 2022 15:59:43 +0000
Subject: [PATCH] x86/MCE/AMD: Clear DFR errors found in THR handler
AMD's MCA Thresholding feature counts errors of all severity levels, not
just correctable errors. If a deferred error causes the threshold limit
to be reached (it was the error that caused the overflow), then both a
deferred error interrupt and a thresholding interrupt will be triggered.
The order of the interrupts is not guaranteed. If the threshold
interrupt handler is executed first, then it will clear MCA_STATUS for
the error. It will not check or clear MCA_DESTAT which also holds a copy
of the deferred error. When the deferred error interrupt handler runs it
will not find an error in MCA_STATUS, but it will find the error in
MCA_DESTAT. This will cause two errors to be logged.
Check for deferred errors when handling a threshold interrupt. If a bank
contains a deferred error, then clear the bank's MCA_DESTAT register.
Define a new helper function to do the deferred error check and clearing
of MCA_DESTAT.
[ bp: Simplify, convert comment to passive voice. ]
Fixes: 37d43acfd79f ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers")
Signed-off-by: Yazen Ghannam <yazen.ghannam(a)amd.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20220621155943.33623-1-yazen.ghannam@amd.com
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1c87501e0fa3..10fb5b5c9efa 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -788,6 +788,24 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
return status & MCI_STATUS_DEFERRED;
}
+static bool _log_error_deferred(unsigned int bank, u32 misc)
+{
+ if (!_log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
+ mca_msr_reg(bank, MCA_ADDR), misc))
+ return false;
+
+ /*
+ * Non-SMCA systems don't have MCA_DESTAT/MCA_DEADDR registers.
+ * Return true here to avoid accessing these registers.
+ */
+ if (!mce_flags.smca)
+ return true;
+
+ /* Clear MCA_DESTAT if the deferred error was logged from MCA_STATUS. */
+ wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
+ return true;
+}
+
/*
* We have three scenarios for checking for Deferred errors:
*
@@ -799,19 +817,8 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
*/
static void log_error_deferred(unsigned int bank)
{
- bool defrd;
-
- defrd = _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
- mca_msr_reg(bank, MCA_ADDR), 0);
-
- if (!mce_flags.smca)
- return;
-
- /* Clear MCA_DESTAT if we logged the deferred error from MCA_STATUS. */
- if (defrd) {
- wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
+ if (_log_error_deferred(bank, 0))
return;
- }
/*
* Only deferred errors are logged in MCA_DE{STAT,ADDR} so just check
@@ -832,7 +839,7 @@ static void amd_deferred_error_interrupt(void)
static void log_error_thresholding(unsigned int bank, u64 misc)
{
- _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), mca_msr_reg(bank, MCA_ADDR), misc);
+ _log_error_deferred(bank, misc);
}
static void log_and_reset_block(struct threshold_block *block)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
bc1b705b0eee ("x86/MCE/AMD: Clear DFR errors found in THR handler")
8121b8f947be ("x86/mce: Get rid of msr_ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bc1b705b0eee4c645ad8b3bbff3c8a66e9688362 Mon Sep 17 00:00:00 2001
From: Yazen Ghannam <yazen.ghannam(a)amd.com>
Date: Tue, 21 Jun 2022 15:59:43 +0000
Subject: [PATCH] x86/MCE/AMD: Clear DFR errors found in THR handler
AMD's MCA Thresholding feature counts errors of all severity levels, not
just correctable errors. If a deferred error causes the threshold limit
to be reached (it was the error that caused the overflow), then both a
deferred error interrupt and a thresholding interrupt will be triggered.
The order of the interrupts is not guaranteed. If the threshold
interrupt handler is executed first, then it will clear MCA_STATUS for
the error. It will not check or clear MCA_DESTAT which also holds a copy
of the deferred error. When the deferred error interrupt handler runs it
will not find an error in MCA_STATUS, but it will find the error in
MCA_DESTAT. This will cause two errors to be logged.
Check for deferred errors when handling a threshold interrupt. If a bank
contains a deferred error, then clear the bank's MCA_DESTAT register.
Define a new helper function to do the deferred error check and clearing
of MCA_DESTAT.
[ bp: Simplify, convert comment to passive voice. ]
Fixes: 37d43acfd79f ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers")
Signed-off-by: Yazen Ghannam <yazen.ghannam(a)amd.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20220621155943.33623-1-yazen.ghannam@amd.com
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1c87501e0fa3..10fb5b5c9efa 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -788,6 +788,24 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
return status & MCI_STATUS_DEFERRED;
}
+static bool _log_error_deferred(unsigned int bank, u32 misc)
+{
+ if (!_log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
+ mca_msr_reg(bank, MCA_ADDR), misc))
+ return false;
+
+ /*
+ * Non-SMCA systems don't have MCA_DESTAT/MCA_DEADDR registers.
+ * Return true here to avoid accessing these registers.
+ */
+ if (!mce_flags.smca)
+ return true;
+
+ /* Clear MCA_DESTAT if the deferred error was logged from MCA_STATUS. */
+ wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
+ return true;
+}
+
/*
* We have three scenarios for checking for Deferred errors:
*
@@ -799,19 +817,8 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc)
*/
static void log_error_deferred(unsigned int bank)
{
- bool defrd;
-
- defrd = _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS),
- mca_msr_reg(bank, MCA_ADDR), 0);
-
- if (!mce_flags.smca)
- return;
-
- /* Clear MCA_DESTAT if we logged the deferred error from MCA_STATUS. */
- if (defrd) {
- wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0);
+ if (_log_error_deferred(bank, 0))
return;
- }
/*
* Only deferred errors are logged in MCA_DE{STAT,ADDR} so just check
@@ -832,7 +839,7 @@ static void amd_deferred_error_interrupt(void)
static void log_error_thresholding(unsigned int bank, u64 misc)
{
- _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), mca_msr_reg(bank, MCA_ADDR), misc);
+ _log_error_deferred(bank, misc);
}
static void log_and_reset_block(struct threshold_block *block)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
560840afc3e6 ("btrfs: fix resolving backrefs for inline extent followed by prealloc")
7ac8b88ee668 ("btrfs: backref, only collect file extent items matching backref offset")
de47c9d3ff87 ("btrfs: replace hardcoded value with SEQ_LAST macro")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 560840afc3e63bbe5d9c5ef6b2ecf8f3589adff6 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Wed, 14 Dec 2022 15:05:08 -0800
Subject: [PATCH] btrfs: fix resolving backrefs for inline extent followed by
prealloc
If a file consists of an inline extent followed by a regular or prealloc
extent, then a legitimate attempt to resolve a logical address in the
non-inline region will result in add_all_parents reading the invalid
offset field of the inline extent. If the inline extent item is placed
in the leaf eb s.t. it is the first item, attempting to access the
offset field will not only be meaningless, it will go past the end of
the eb and cause this panic:
[17.626048] BTRFS warning (device dm-2): bad eb member end: ptr 0x3fd4 start 30834688 member offset 16377 size 8
[17.631693] general protection fault, probably for non-canonical address 0x5088000000000: 0000 [#1] SMP PTI
[17.635041] CPU: 2 PID: 1267 Comm: btrfs Not tainted 5.12.0-07246-g75175d5adc74-dirty #199
[17.637969] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[17.641995] RIP: 0010:btrfs_get_64+0xe7/0x110
[17.649890] RSP: 0018:ffffc90001f73a08 EFLAGS: 00010202
[17.651652] RAX: 0000000000000001 RBX: ffff88810c42d000 RCX: 0000000000000000
[17.653921] RDX: 0005088000000000 RSI: ffffc90001f73a0f RDI: 0000000000000001
[17.656174] RBP: 0000000000000ff9 R08: 0000000000000007 R09: c0000000fffeffff
[17.658441] R10: ffffc90001f73790 R11: ffffc90001f73788 R12: ffff888106afe918
[17.661070] R13: 0000000000003fd4 R14: 0000000000003f6f R15: cdcdcdcdcdcdcdcd
[17.663617] FS: 00007f64e7627d80(0000) GS:ffff888237c80000(0000) knlGS:0000000000000000
[17.666525] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17.668664] CR2: 000055d4a39152e8 CR3: 000000010c596002 CR4: 0000000000770ee0
[17.671253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17.673634] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[17.676034] PKRU: 55555554
[17.677004] Call Trace:
[17.677877] add_all_parents+0x276/0x480
[17.679325] find_parent_nodes+0xfae/0x1590
[17.680771] btrfs_find_all_leafs+0x5e/0xa0
[17.682217] iterate_extent_inodes+0xce/0x260
[17.683809] ? btrfs_inode_flags_to_xflags+0x50/0x50
[17.685597] ? iterate_inodes_from_logical+0xa1/0xd0
[17.687404] iterate_inodes_from_logical+0xa1/0xd0
[17.689121] ? btrfs_inode_flags_to_xflags+0x50/0x50
[17.691010] btrfs_ioctl_logical_to_ino+0x131/0x190
[17.692946] btrfs_ioctl+0x104a/0x2f60
[17.694384] ? selinux_file_ioctl+0x182/0x220
[17.695995] ? __x64_sys_ioctl+0x84/0xc0
[17.697394] __x64_sys_ioctl+0x84/0xc0
[17.698697] do_syscall_64+0x33/0x40
[17.700017] entry_SYSCALL_64_after_hwframe+0x44/0xae
[17.701753] RIP: 0033:0x7f64e72761b7
[17.709355] RSP: 002b:00007ffefb067f58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[17.712088] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f64e72761b7
[17.714667] RDX: 00007ffefb067fb0 RSI: 00000000c0389424 RDI: 0000000000000003
[17.717386] RBP: 00007ffefb06d188 R08: 000055d4a390d2b0 R09: 00007f64e7340a60
[17.719938] R10: 0000000000000231 R11: 0000000000000246 R12: 0000000000000001
[17.722383] R13: 0000000000000000 R14: 00000000c0389424 R15: 000055d4a38fd2a0
[17.724839] Modules linked in:
Fix the bug by detecting the inline extent item in add_all_parents and
skipping to the next extent item.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 21c92c74bf71..46851511b661 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -484,6 +484,7 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx,
u64 wanted_disk_byte = ref->wanted_disk_byte;
u64 count = 0;
u64 data_offset;
+ u8 type;
if (level != 0) {
eb = path->nodes[level];
@@ -538,6 +539,9 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx,
continue;
}
fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
+ type = btrfs_file_extent_type(eb, fi);
+ if (type == BTRFS_FILE_EXTENT_INLINE)
+ goto next;
disk_byte = btrfs_file_extent_disk_bytenr(eb, fi);
data_offset = btrfs_file_extent_offset(eb, fi);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
560840afc3e6 ("btrfs: fix resolving backrefs for inline extent followed by prealloc")
7ac8b88ee668 ("btrfs: backref, only collect file extent items matching backref offset")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 560840afc3e63bbe5d9c5ef6b2ecf8f3589adff6 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Wed, 14 Dec 2022 15:05:08 -0800
Subject: [PATCH] btrfs: fix resolving backrefs for inline extent followed by
prealloc
If a file consists of an inline extent followed by a regular or prealloc
extent, then a legitimate attempt to resolve a logical address in the
non-inline region will result in add_all_parents reading the invalid
offset field of the inline extent. If the inline extent item is placed
in the leaf eb s.t. it is the first item, attempting to access the
offset field will not only be meaningless, it will go past the end of
the eb and cause this panic:
[17.626048] BTRFS warning (device dm-2): bad eb member end: ptr 0x3fd4 start 30834688 member offset 16377 size 8
[17.631693] general protection fault, probably for non-canonical address 0x5088000000000: 0000 [#1] SMP PTI
[17.635041] CPU: 2 PID: 1267 Comm: btrfs Not tainted 5.12.0-07246-g75175d5adc74-dirty #199
[17.637969] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[17.641995] RIP: 0010:btrfs_get_64+0xe7/0x110
[17.649890] RSP: 0018:ffffc90001f73a08 EFLAGS: 00010202
[17.651652] RAX: 0000000000000001 RBX: ffff88810c42d000 RCX: 0000000000000000
[17.653921] RDX: 0005088000000000 RSI: ffffc90001f73a0f RDI: 0000000000000001
[17.656174] RBP: 0000000000000ff9 R08: 0000000000000007 R09: c0000000fffeffff
[17.658441] R10: ffffc90001f73790 R11: ffffc90001f73788 R12: ffff888106afe918
[17.661070] R13: 0000000000003fd4 R14: 0000000000003f6f R15: cdcdcdcdcdcdcdcd
[17.663617] FS: 00007f64e7627d80(0000) GS:ffff888237c80000(0000) knlGS:0000000000000000
[17.666525] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17.668664] CR2: 000055d4a39152e8 CR3: 000000010c596002 CR4: 0000000000770ee0
[17.671253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17.673634] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[17.676034] PKRU: 55555554
[17.677004] Call Trace:
[17.677877] add_all_parents+0x276/0x480
[17.679325] find_parent_nodes+0xfae/0x1590
[17.680771] btrfs_find_all_leafs+0x5e/0xa0
[17.682217] iterate_extent_inodes+0xce/0x260
[17.683809] ? btrfs_inode_flags_to_xflags+0x50/0x50
[17.685597] ? iterate_inodes_from_logical+0xa1/0xd0
[17.687404] iterate_inodes_from_logical+0xa1/0xd0
[17.689121] ? btrfs_inode_flags_to_xflags+0x50/0x50
[17.691010] btrfs_ioctl_logical_to_ino+0x131/0x190
[17.692946] btrfs_ioctl+0x104a/0x2f60
[17.694384] ? selinux_file_ioctl+0x182/0x220
[17.695995] ? __x64_sys_ioctl+0x84/0xc0
[17.697394] __x64_sys_ioctl+0x84/0xc0
[17.698697] do_syscall_64+0x33/0x40
[17.700017] entry_SYSCALL_64_after_hwframe+0x44/0xae
[17.701753] RIP: 0033:0x7f64e72761b7
[17.709355] RSP: 002b:00007ffefb067f58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[17.712088] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f64e72761b7
[17.714667] RDX: 00007ffefb067fb0 RSI: 00000000c0389424 RDI: 0000000000000003
[17.717386] RBP: 00007ffefb06d188 R08: 000055d4a390d2b0 R09: 00007f64e7340a60
[17.719938] R10: 0000000000000231 R11: 0000000000000246 R12: 0000000000000001
[17.722383] R13: 0000000000000000 R14: 00000000c0389424 R15: 000055d4a38fd2a0
[17.724839] Modules linked in:
Fix the bug by detecting the inline extent item in add_all_parents and
skipping to the next extent item.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 21c92c74bf71..46851511b661 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -484,6 +484,7 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx,
u64 wanted_disk_byte = ref->wanted_disk_byte;
u64 count = 0;
u64 data_offset;
+ u8 type;
if (level != 0) {
eb = path->nodes[level];
@@ -538,6 +539,9 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx,
continue;
}
fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
+ type = btrfs_file_extent_type(eb, fi);
+ if (type == BTRFS_FILE_EXTENT_INLINE)
+ goto next;
disk_byte = btrfs_file_extent_disk_bytenr(eb, fi);
data_offset = btrfs_file_extent_offset(eb, fi);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
560840afc3e6 ("btrfs: fix resolving backrefs for inline extent followed by prealloc")
7ac8b88ee668 ("btrfs: backref, only collect file extent items matching backref offset")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 560840afc3e63bbe5d9c5ef6b2ecf8f3589adff6 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Wed, 14 Dec 2022 15:05:08 -0800
Subject: [PATCH] btrfs: fix resolving backrefs for inline extent followed by
prealloc
If a file consists of an inline extent followed by a regular or prealloc
extent, then a legitimate attempt to resolve a logical address in the
non-inline region will result in add_all_parents reading the invalid
offset field of the inline extent. If the inline extent item is placed
in the leaf eb s.t. it is the first item, attempting to access the
offset field will not only be meaningless, it will go past the end of
the eb and cause this panic:
[17.626048] BTRFS warning (device dm-2): bad eb member end: ptr 0x3fd4 start 30834688 member offset 16377 size 8
[17.631693] general protection fault, probably for non-canonical address 0x5088000000000: 0000 [#1] SMP PTI
[17.635041] CPU: 2 PID: 1267 Comm: btrfs Not tainted 5.12.0-07246-g75175d5adc74-dirty #199
[17.637969] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[17.641995] RIP: 0010:btrfs_get_64+0xe7/0x110
[17.649890] RSP: 0018:ffffc90001f73a08 EFLAGS: 00010202
[17.651652] RAX: 0000000000000001 RBX: ffff88810c42d000 RCX: 0000000000000000
[17.653921] RDX: 0005088000000000 RSI: ffffc90001f73a0f RDI: 0000000000000001
[17.656174] RBP: 0000000000000ff9 R08: 0000000000000007 R09: c0000000fffeffff
[17.658441] R10: ffffc90001f73790 R11: ffffc90001f73788 R12: ffff888106afe918
[17.661070] R13: 0000000000003fd4 R14: 0000000000003f6f R15: cdcdcdcdcdcdcdcd
[17.663617] FS: 00007f64e7627d80(0000) GS:ffff888237c80000(0000) knlGS:0000000000000000
[17.666525] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17.668664] CR2: 000055d4a39152e8 CR3: 000000010c596002 CR4: 0000000000770ee0
[17.671253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17.673634] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[17.676034] PKRU: 55555554
[17.677004] Call Trace:
[17.677877] add_all_parents+0x276/0x480
[17.679325] find_parent_nodes+0xfae/0x1590
[17.680771] btrfs_find_all_leafs+0x5e/0xa0
[17.682217] iterate_extent_inodes+0xce/0x260
[17.683809] ? btrfs_inode_flags_to_xflags+0x50/0x50
[17.685597] ? iterate_inodes_from_logical+0xa1/0xd0
[17.687404] iterate_inodes_from_logical+0xa1/0xd0
[17.689121] ? btrfs_inode_flags_to_xflags+0x50/0x50
[17.691010] btrfs_ioctl_logical_to_ino+0x131/0x190
[17.692946] btrfs_ioctl+0x104a/0x2f60
[17.694384] ? selinux_file_ioctl+0x182/0x220
[17.695995] ? __x64_sys_ioctl+0x84/0xc0
[17.697394] __x64_sys_ioctl+0x84/0xc0
[17.698697] do_syscall_64+0x33/0x40
[17.700017] entry_SYSCALL_64_after_hwframe+0x44/0xae
[17.701753] RIP: 0033:0x7f64e72761b7
[17.709355] RSP: 002b:00007ffefb067f58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[17.712088] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f64e72761b7
[17.714667] RDX: 00007ffefb067fb0 RSI: 00000000c0389424 RDI: 0000000000000003
[17.717386] RBP: 00007ffefb06d188 R08: 000055d4a390d2b0 R09: 00007f64e7340a60
[17.719938] R10: 0000000000000231 R11: 0000000000000246 R12: 0000000000000001
[17.722383] R13: 0000000000000000 R14: 00000000c0389424 R15: 000055d4a38fd2a0
[17.724839] Modules linked in:
Fix the bug by detecting the inline extent item in add_all_parents and
skipping to the next extent item.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 21c92c74bf71..46851511b661 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -484,6 +484,7 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx,
u64 wanted_disk_byte = ref->wanted_disk_byte;
u64 count = 0;
u64 data_offset;
+ u8 type;
if (level != 0) {
eb = path->nodes[level];
@@ -538,6 +539,9 @@ static int add_all_parents(struct btrfs_backref_walk_ctx *ctx,
continue;
}
fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
+ type = btrfs_file_extent_type(eb, fi);
+ if (type == BTRFS_FILE_EXTENT_INLINE)
+ goto next;
disk_byte = btrfs_file_extent_disk_bytenr(eb, fi);
data_offset = btrfs_file_extent_offset(eb, fi);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1742e1c90c3d ("btrfs: fix extent map use-after-free when handling missing device in read_one_chunk")
ff37c89f94be ("btrfs: move missing device handling in a dedicate function")
562d7b1512f7 ("btrfs: handle device lookup with btrfs_dev_lookup_args")
1a9fd4172d5c ("btrfs: fix typos in comments")
e9306ad4ef5c ("btrfs: more graceful errors/warnings on 32bit systems when reaching limits")
bc03f39ec3c1 ("btrfs: use a bit to track the existence of tree mod log users")
406808ab2f0b ("btrfs: use booleans where appropriate for the tree mod log functions")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
2f96e40212d4 ("btrfs: fix possible free space tree corruption with online conversion")
1aaac38c83a2 ("btrfs: don't allow tree block to cross page boundary for subpage support")
948462294577 ("btrfs: keep sb cache_generation consistent with space_cache")
8b228324a8ce ("btrfs: clear free space tree on ro->rw remount")
8cd2908846d1 ("btrfs: clear oneshot options on mount and remount")
5011139a4718 ("btrfs: create free space tree on ro->rw remount")
8f1c21d7490f ("btrfs: start orphan cleanup on ro->rw remount")
44c0ca211a4d ("btrfs: lift read-write mount setup from mount and remount")
5297199a8bca ("btrfs: remove inode number cache feature")
ec7d6dfd73b2 ("btrfs: move btrfs_find_highest_objectid/btrfs_find_free_objectid to disk-io.c")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1742e1c90c3da344f3bb9b1f1309b3f47482756a Mon Sep 17 00:00:00 2001
From: void0red <void0red(a)gmail.com>
Date: Wed, 23 Nov 2022 22:39:45 +0800
Subject: [PATCH] btrfs: fix extent map use-after-free when handling missing
device in read_one_chunk
Store the error code before freeing the extent_map. Though it's
reference counted structure, in that function it's the first and last
allocation so this would lead to a potential use-after-free.
The error can happen eg. when chunk is stored on a missing device and
the degraded mount option is missing.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216721
Reported-by: eriri <1527030098(a)qq.com>
Fixes: adfb69af7d8c ("btrfs: add_missing_dev() should return the actual error")
CC: stable(a)vger.kernel.org # 4.9+
Signed-off-by: void0red <void0red(a)gmail.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index acab20f2863d..aa25fa335d3e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6976,8 +6976,9 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
map->stripes[i].dev = handle_missing_device(fs_info,
devid, uuid);
if (IS_ERR(map->stripes[i].dev)) {
+ ret = PTR_ERR(map->stripes[i].dev);
free_extent_map(em);
- return PTR_ERR(map->stripes[i].dev);
+ return ret;
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1742e1c90c3d ("btrfs: fix extent map use-after-free when handling missing device in read_one_chunk")
ff37c89f94be ("btrfs: move missing device handling in a dedicate function")
562d7b1512f7 ("btrfs: handle device lookup with btrfs_dev_lookup_args")
1a9fd4172d5c ("btrfs: fix typos in comments")
e9306ad4ef5c ("btrfs: more graceful errors/warnings on 32bit systems when reaching limits")
bc03f39ec3c1 ("btrfs: use a bit to track the existence of tree mod log users")
406808ab2f0b ("btrfs: use booleans where appropriate for the tree mod log functions")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
2f96e40212d4 ("btrfs: fix possible free space tree corruption with online conversion")
1aaac38c83a2 ("btrfs: don't allow tree block to cross page boundary for subpage support")
948462294577 ("btrfs: keep sb cache_generation consistent with space_cache")
8b228324a8ce ("btrfs: clear free space tree on ro->rw remount")
8cd2908846d1 ("btrfs: clear oneshot options on mount and remount")
5011139a4718 ("btrfs: create free space tree on ro->rw remount")
8f1c21d7490f ("btrfs: start orphan cleanup on ro->rw remount")
44c0ca211a4d ("btrfs: lift read-write mount setup from mount and remount")
5297199a8bca ("btrfs: remove inode number cache feature")
ec7d6dfd73b2 ("btrfs: move btrfs_find_highest_objectid/btrfs_find_free_objectid to disk-io.c")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1742e1c90c3da344f3bb9b1f1309b3f47482756a Mon Sep 17 00:00:00 2001
From: void0red <void0red(a)gmail.com>
Date: Wed, 23 Nov 2022 22:39:45 +0800
Subject: [PATCH] btrfs: fix extent map use-after-free when handling missing
device in read_one_chunk
Store the error code before freeing the extent_map. Though it's
reference counted structure, in that function it's the first and last
allocation so this would lead to a potential use-after-free.
The error can happen eg. when chunk is stored on a missing device and
the degraded mount option is missing.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216721
Reported-by: eriri <1527030098(a)qq.com>
Fixes: adfb69af7d8c ("btrfs: add_missing_dev() should return the actual error")
CC: stable(a)vger.kernel.org # 4.9+
Signed-off-by: void0red <void0red(a)gmail.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index acab20f2863d..aa25fa335d3e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6976,8 +6976,9 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
map->stripes[i].dev = handle_missing_device(fs_info,
devid, uuid);
if (IS_ERR(map->stripes[i].dev)) {
+ ret = PTR_ERR(map->stripes[i].dev);
free_extent_map(em);
- return PTR_ERR(map->stripes[i].dev);
+ return ret;
}
}