When converting to directly create the vfio_device the mdev driver has to
put a vfio_register_emulated_iommu_dev() in the probe() and a pairing
vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable(a)vger.kernel.org
Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev")
Reported-by: Alex Williamson <alex.williamson(a)redhat.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
---
drivers/gpu/drm/i915/gvt/kvmgt.c | 1 +
1 file changed, 1 insertion(+)
Should go through Alex's tree.
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 41bba40feef8f4..9003145adb5a93 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1615,6 +1615,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached))
return;
+ vfio_unregister_group_dev(&vgpu->vfio_device);
vfio_put_device(&vgpu->vfio_device);
}
base-commit: c72e0034e6d4c36322d958b997d11d2627c6056c
--
2.37.3
Following build error noticed while building arm64 defconfig on
stable-rc queue/5.10.
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
drivers/dma/ti/k3-udma.c: In function 'udma_decrement_byte_counters':
drivers/dma/ti/k3-udma.c:666:26: error: 'struct udma_chan' has no
member named 'bchan'; did you mean 'tchan'?
666 | if (!uc->bchan)
| ^~~~~
| tchan
make[4]: *** [scripts/Makefile.build:286: drivers/dma/ti/k3-udma.o] Error 1
--
Linaro LKFT
https://lkft.linaro.org
commit d80ca810f096ff66f451e7a3ed2f0cd9ef1ff519 upstream
Currently, the non-x86 stub code calls get_memory_map() redundantly,
given that the data it returns is never used anywhere. So drop the call.
Cc: <stable(a)vger.kernel.org> # v4.14+
Fixes: 24d7c494ce46 ("efi/arm-stub: Round up FDT allocation to mapping size")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
drivers/firmware/efi/libstub/fdt.c | 8 --------
1 file changed, 8 deletions(-)
This is a backport for v5.4 and older, where the patch in question did
not apply cleanly on the first attempt. Please apply.
diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index dba296a44f4e..2a1a587edef9 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -301,14 +301,6 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table,
goto fail;
}
- /*
- * Now that we have done our final memory allocation (and free)
- * we can get the memory map key needed for exit_boot_services().
- */
- status = efi_get_memory_map(sys_table, &map);
- if (status != EFI_SUCCESS)
- goto fail_free_new_fdt;
-
status = update_fdt(sys_table, (void *)fdt_addr, fdt_size,
(void *)*new_fdt_addr, MAX_FDT_SIZE, cmdline_ptr,
initrd_addr, initrd_size);
--
2.35.1
From: Marek Bykowski <marek.bykowski(a)gmail.com>
[ Upstream commit d5e3050c0feb8bf7b9a75482fafcc31b90257926 ]
If the properties 'linux,initrd-start' and 'linux,initrd-end' of
the chosen node populated from the bootloader, eg. U-Boot, are so that
start > end, then the phys_initrd_size calculated from end - start is
negative that subsequently gets converted to a high positive value for
being unsigned long long. Then, the memory region with the (invalid)
size is added to the bootmem and attempted being paged in paging_init()
that results in the kernel fault.
For example, on the FVP ARM64 system I'm running, the U-Boot populates
the 'linux,initrd-start' with 8800_0000 and 'linux,initrd-end' with 0.
The phys_initrd_size calculated is then ffff_ffff_7800_0000
(= 0 - 8800_0000 = -8800_0000 + ULLONG_MAX + 1). paging_init() then
attempts to map the address 8800_0000 + ffff_ffff_7800_0000 and oops'es
as below.
It should be stressed, it is generally a fault of the bootloader's with
the kernel relying on it, however we should not allow the bootloader's
misconfiguration to lead to the kernel oops. Not only the kernel should be
bullet proof against it but also finding the root cause of the paging
fault spanning over the bootloader, DT, and kernel may happen is not so
easy.
Unable to handle kernel paging request at virtual address fffffffefe43c000
Mem abort info:
ESR = 0x96000007
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000007
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000
[fffffffefe43c000] pgd=0000000080de9003, pud=0000000080de9003
Unable to handle kernel paging request at virtual address ffffff8000de9f90
Mem abort info:
ESR = 0x96000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000005
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000
[ffffff8000de9f90] pgd=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.51-yocto-standard #1
Hardware name: FVP Base (DT)
pstate: 60000085 (nZCv daIf -PAN -UAO)
pc : show_pte+0x12c/0x1b4
lr : show_pte+0x100/0x1b4
sp : ffffffc010ce3b30
x29: ffffffc010ce3b30 x28: ffffffc010ceed80
x27: fffffffefe43c000 x26: fffffffefe43a028
x25: 0000000080bf0000 x24: 0000000000000025
x23: ffffffc010b8d000 x22: ffffffc010e3d000
x23: ffffffc010b8d000 x22: ffffffc010e3d000
x21: 0000000080de9000 x20: ffffff7f80000f90
x19: fffffffefe43c000 x18: 0000000000000030
x17: 0000000000001400 x16: 0000000000001c00
x15: ffffffc010cef1b8 x14: ffffffffffffffff
x13: ffffffc010df1f40 x12: ffffffc010df1b70
x11: ffffffc010ce3b30 x10: ffffffc010ce3b30
x9 : 00000000ffffffc8 x8 : 0000000000000000
x7 : 000000000000000f x6 : ffffffc010df16e8
x5 : 0000000000000000 x4 : 0000000000000000
x3 : 00000000ffffffff x2 : 0000000000000000
x1 : 0000008080000000 x0 : ffffffc010af1d68
Call trace:
show_pte+0x12c/0x1b4
die_kernel_fault+0x54/0x78
__do_kernel_fault+0x11c/0x128
do_translation_fault+0x58/0xac
do_mem_abort+0x50/0xb0
el1_da+0x1c/0x90
__create_pgd_mapping+0x348/0x598
paging_init+0x3f0/0x70d0
setup_arch+0x2c0/0x5d4
start_kernel+0x94/0x49c
Code: 92748eb5 900052a0 9135a000 cb010294 (f8756a96)
Signed-off-by: Marek Bykowski <marek.bykowski(a)gmail.com>
Link: https://lore.kernel.org/r/20220909023358.76881-1-marek.bykowski@gmail.com
Signed-off-by: Rob Herring <robh(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/of/fdt.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 513558eecfd6..44903f94d0cd 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -917,6 +917,8 @@ static void __init early_init_dt_check_for_initrd(unsigned long node)
if (!prop)
return;
end = of_read_number(prop, len/4);
+ if (start > end)
+ return;
__early_init_dt_declare_initrd(start, end);
--
2.35.1
The gadget driver may wait on the request completion when it sets the
USB_GADGET_DELAYED_STATUS. Make sure that the End Transfer command can
go through if the dwc->delayed_status is set so that the request can
complete. When the delayed_status is set, the Setup packet is already
processed, and the next phase should be either Data or Status. It's
unlikely that the host would cancel the control transfer and send a new
Setup packet during End Transfer command. But if that's the case, we can
try again when ep0state returns to EP0_SETUP_PHASE.
Fixes: e1ee843488d5 ("usb: dwc3: gadget: Force sending delayed status during soft disconnect")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
---
drivers/usb/dwc3/gadget.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 079cd333632e..f4fe6ba3dd83 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1698,6 +1698,16 @@ static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool int
cmd |= DWC3_DEPCMD_PARAM(dep->resource_index);
memset(¶ms, 0, sizeof(params));
ret = dwc3_send_gadget_ep_cmd(dep, cmd, ¶ms);
+ /*
+ * If the End Transfer command was timed out while the device is
+ * not in SETUP phase, it's possible that an incoming Setup packet
+ * may prevent the command's completion. Let's retry when the
+ * ep0state returns to EP0_SETUP_PHASE.
+ */
+ if (ret == -ETIMEDOUT && dwc->ep0state != EP0_SETUP_PHASE) {
+ dep->flags |= DWC3_EP_DELAY_STOP;
+ return 0;
+ }
WARN_ON_ONCE(ret);
dep->resource_index = 0;
@@ -3719,7 +3729,7 @@ void dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force,
* timeout. Delay issuing the End Transfer command until the Setup TRB is
* prepared.
*/
- if (dwc->ep0state != EP0_SETUP_PHASE) {
+ if (dwc->ep0state != EP0_SETUP_PHASE && !dwc->delayed_status) {
dep->flags |= DWC3_EP_DELAY_STOP;
return;
}
base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
--
2.28.0
The patch titled
Subject: mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mmhugetlb-take-hugetlb_lock-before-decrementing-h-resv_huge_pages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Rik van Riel <riel(a)surriel.com>
Subject: mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
Date: Mon, 17 Oct 2022 20:25:05 -0400
The h->*_huge_pages counters are protected by the hugetlb_lock, but
alloc_huge_page has a corner case where it can decrement the counter
outside of the lock.
This could lead to a corrupted value of h->resv_huge_pages, which we have
observed on our systems.
Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a
potential race.
Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com
Fixes: a88c76954804 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count")
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Glen McCready <gkmccready(a)meta.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/hugetlb.c~mmhugetlb-take-hugetlb_lock-before-decrementing-h-resv_huge_pages
+++ a/mm/hugetlb.c
@@ -2924,11 +2924,11 @@ struct page *alloc_huge_page(struct vm_a
page = alloc_buddy_huge_page_with_mpol(h, vma, addr);
if (!page)
goto out_uncharge_cgroup;
+ spin_lock_irq(&hugetlb_lock);
if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) {
SetHPageRestoreReserve(page);
h->resv_huge_pages--;
}
- spin_lock_irq(&hugetlb_lock);
list_add(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
/* Fall through */
_
Patches currently in -mm which might be from riel(a)surriel.com are
mmhugetlb-take-hugetlb_lock-before-decrementing-h-resv_huge_pages.patch
From: Jeff Vanhoof <qjv001(a)motorola.com>
In uvc_video_encode_isoc_sg, the uvc_request's sg list is
incorrectly being populated leading to corrupt video being
received by the remote end. When building the sg list the
usage of buf->sg's 'dma_length' field is not correct and
instead its 'length' field should be used.
Fixes: e81e7f9a0eb9 ("usb: gadget: uvc: add scatter gather support")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jeff Vanhoof <qjv001(a)motorola.com>
Signed-off-by: Dan Vacura <w36195(a)motorola.com>
---
V1 -> V3:
- no change, new patch in series
V3 -> V4:
- no change
drivers/usb/gadget/function/uvc_video.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 5993e083819c..dd1c6b2ca7c6 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -157,10 +157,10 @@ uvc_video_encode_isoc_sg(struct usb_request *req, struct uvc_video *video,
sg = sg_next(sg);
for_each_sg(sg, iter, ureq->sgt.nents - 1, i) {
- if (!len || !buf->sg || !sg_dma_len(buf->sg))
+ if (!len || !buf->sg || !buf->sg->length)
break;
- sg_left = sg_dma_len(buf->sg) - buf->offset;
+ sg_left = buf->sg->length - buf->offset;
part = min_t(unsigned int, len, sg_left);
sg_set_page(iter, sg_page(buf->sg), part, buf->offset);
--
2.34.1
With the re-use of the previous completion status in 0d1c407b1a749
("usb: dwc3: gadget: Return proper request status") it could be possible
that the next frame would also get dropped if the current frame has a
missed isoc error. Ensure that an interrupt is requested for the start
of a new frame.
Fixes: fc78941d8169 ("usb: gadget: uvc: decrease the interrupt load to a quarter")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Dan Vacura <w36195(a)motorola.com>
---
V1 -> V3:
- no change, new patch in series
V3 -> V4:
- no change
drivers/usb/gadget/function/uvc_video.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index bb037fcc90e6..323977716f5a 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -431,7 +431,8 @@ static void uvcg_video_pump(struct work_struct *work)
/* Endpoint now owns the request */
req = NULL;
- video->req_int_count++;
+ if (buf->state != UVC_BUF_STATE_DONE)
+ video->req_int_count++;
}
if (!req)
--
2.34.1
SDHCI_RESET_ALL resets will reset the hardware CQE state, but we aren't
tracking that properly in software. When out of sync, we may trigger
various timeouts.
It's not typical to perform resets while CQE is enabled, but one
particular case I hit commonly enough: mmc_suspend() -> mmc_power_off().
Typically we will eventually deactivate CQE (cqhci_suspend() ->
cqhci_deactivate()), but that's not guaranteed -- in particular, if
we perform a partial (e.g., interrupted) system suspend.
The same bug was already found and fixed for two other drivers, in v5.7
and v5.9:
5cf583f1fb9c mmc: sdhci-msm: Deactivate CQE during SDHC reset
df57d73276b8 mmc: sdhci-pci: Fix SDHCI_RESET_ALL for CQHCI for Intel GLK-based controllers
The latter is especially prescient, saying "other drivers using CQHCI
might benefit from a similar change, if they also have CQHCI reset by
SDHCI_RESET_ALL."
So like these other patches, deactivate CQHCI when resetting the
controller. Also, move around the DT/caps handling, because
sdhci_setup_host() performs resets before we've initialized CQHCI. This
is the pattern followed in other SDHCI/CQHCI drivers.
Fixes: 84362d79f436 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
drivers/mmc/host/sdhci-of-arasan.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/mmc/host/sdhci-of-arasan.c b/drivers/mmc/host/sdhci-of-arasan.c
index 3997cad1f793..1988a703781a 100644
--- a/drivers/mmc/host/sdhci-of-arasan.c
+++ b/drivers/mmc/host/sdhci-of-arasan.c
@@ -366,6 +366,10 @@ static void sdhci_arasan_reset(struct sdhci_host *host, u8 mask)
struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
struct sdhci_arasan_data *sdhci_arasan = sdhci_pltfm_priv(pltfm_host);
+ if ((host->mmc->caps2 & MMC_CAP2_CQE) && (mask & SDHCI_RESET_ALL) &&
+ sdhci_arasan->has_cqe)
+ cqhci_deactivate(host->mmc);
+
sdhci_reset(host, mask);
if (sdhci_arasan->quirks & SDHCI_ARASAN_QUIRK_FORCE_CDTEST) {
@@ -1521,7 +1525,8 @@ static int sdhci_arasan_register_sdclk(struct sdhci_arasan_data *sdhci_arasan,
return 0;
}
-static int sdhci_arasan_add_host(struct sdhci_arasan_data *sdhci_arasan)
+static int sdhci_arasan_add_host(struct sdhci_arasan_data *sdhci_arasan,
+ struct device_node *np)
{
struct sdhci_host *host = sdhci_arasan->host;
struct cqhci_host *cq_host;
@@ -1549,6 +1554,10 @@ static int sdhci_arasan_add_host(struct sdhci_arasan_data *sdhci_arasan)
if (dma64)
cq_host->caps |= CQHCI_TASK_DESC_SZ_128;
+ host->mmc->caps2 |= MMC_CAP2_CQE;
+ if (!of_property_read_bool(np, "disable-cqe-dcmd"))
+ host->mmc->caps2 |= MMC_CAP2_CQE_DCMD;
+
ret = cqhci_init(cq_host, host->mmc, dma64);
if (ret)
goto cleanup;
@@ -1705,13 +1714,9 @@ static int sdhci_arasan_probe(struct platform_device *pdev)
host->mmc_host_ops.start_signal_voltage_switch =
sdhci_arasan_voltage_switch;
sdhci_arasan->has_cqe = true;
- host->mmc->caps2 |= MMC_CAP2_CQE;
-
- if (!of_property_read_bool(np, "disable-cqe-dcmd"))
- host->mmc->caps2 |= MMC_CAP2_CQE_DCMD;
}
- ret = sdhci_arasan_add_host(sdhci_arasan);
+ ret = sdhci_arasan_add_host(sdhci_arasan, np);
if (ret)
goto err_add_host;
--
2.38.0.413.g74048e4d9e-goog
For G200_SE_A, PLL M setting is wrong, which leads to blank screen,
or "signal out of range" on VGA display.
previous code had "m |= 0x80" which was changed to
m |= ((pixpllcn & BIT(8)) >> 1);
Tested on G200_SE_A rev 42
This line of code was moved to another file with
commit 85397f6bc4ff ("drm/mgag200: Initialize each model in separate
function") but can be easily backported before this commit.
v2: * put BIT(7) First to respect MSB-to-LSB (Thomas)
* Add a comment to explain that this bit must be set (Thomas)
Fixes: 2dd040946ecf ("drm/mgag200: Store values (not bits) in struct mgag200_pll_values")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jocelyn Falempe <jfalempe(a)redhat.com>
---
drivers/gpu/drm/mgag200/mgag200_g200se.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/mgag200/mgag200_g200se.c b/drivers/gpu/drm/mgag200/mgag200_g200se.c
index be389ed91cbd..bd6e573c9a1a 100644
--- a/drivers/gpu/drm/mgag200/mgag200_g200se.c
+++ b/drivers/gpu/drm/mgag200/mgag200_g200se.c
@@ -284,7 +284,8 @@ static void mgag200_g200se_04_pixpllc_atomic_update(struct drm_crtc *crtc,
pixpllcp = pixpllc->p - 1;
pixpllcs = pixpllc->s;
- xpixpllcm = pixpllcm | ((pixpllcn & BIT(8)) >> 1);
+ // For G200SE A, BIT(7) should be set unconditionally.
+ xpixpllcm = BIT(7) | pixpllcm;
xpixpllcn = pixpllcn;
xpixpllcp = (pixpllcs << 3) | pixpllcp;
--
2.37.3
From: Zong-Zhe Yang <kevin_yang(a)realtek.com>
[ Upstream commit 86331c7e0cd819bf0c1d0dcf895e0c90b0aa9a6f ]
reported by smatch
phy.c:854 rtw_phy_linear_2_db() error: buffer overflow 'db_invert_table[i]'
8 <= 8 (assuming for loop doesn't break)
However, it seems to be a false alarm because we prevent it originally via
if (linear >= db_invert_table[11][7])
return 96; /* maximum 96 dB */
Still, we adjust the code to be more readable and avoid smatch warning.
Signed-off-by: Zong-Zhe Yang <kevin_yang(a)realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih(a)realtek.com>
Signed-off-by: Kalle Valo <kvalo(a)kernel.org>
Link: https://lore.kernel.org/r/20220727065003.28340-5-pkshih@realtek.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/realtek/rtw88/phy.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw88/phy.c b/drivers/net/wireless/realtek/rtw88/phy.c
index af8b703d11d4..0fc5a893c395 100644
--- a/drivers/net/wireless/realtek/rtw88/phy.c
+++ b/drivers/net/wireless/realtek/rtw88/phy.c
@@ -604,23 +604,18 @@ static u8 rtw_phy_linear_2_db(u64 linear)
u8 j;
u32 dB;
- if (linear >= db_invert_table[11][7])
- return 96; /* maximum 96 dB */
-
for (i = 0; i < 12; i++) {
- if (i <= 2 && (linear << FRAC_BITS) <= db_invert_table[i][7])
- break;
- else if (i > 2 && linear <= db_invert_table[i][7])
- break;
+ for (j = 0; j < 8; j++) {
+ if (i <= 2 && (linear << FRAC_BITS) <= db_invert_table[i][j])
+ goto cnt;
+ else if (i > 2 && linear <= db_invert_table[i][j])
+ goto cnt;
+ }
}
- for (j = 0; j < 8; j++) {
- if (i <= 2 && (linear << FRAC_BITS) <= db_invert_table[i][j])
- break;
- else if (i > 2 && linear <= db_invert_table[i][j])
- break;
- }
+ return 96; /* maximum 96 dB */
+cnt:
if (j == 0 && i == 0)
goto end;
--
2.35.1
There is some code in the parser that tries to read 0x8000
bytes into a block to "read in the middle" of the block. Well
that only works if the block is also 0x10000 bytes all the time,
else we get these parse errors as we reach the end of the flash:
spi-nor spi0.0: mx25l1606e (2048 Kbytes)
mtd_read error while parsing (offset: 0x200000): -22
mtd_read error while parsing (offset: 0x201000): -22
(...)
Fix the code to do what I think was intended.
Cc: stable(a)vger.kernel.org
Fixes: f0501e81fbaa ("mtd: bcm47xxpart: alternative MAGIC for board_data partition")
Cc: Rafał Miłecki <zajec5(a)gmail.com>
Cc: Florian Fainelli <f.fainelli(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
ChangeLog v1->v2:
- Add some fixes and stable tags.
---
drivers/mtd/parsers/bcm47xxpart.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/parsers/bcm47xxpart.c b/drivers/mtd/parsers/bcm47xxpart.c
index 50fcf4c2174b..13daf9bffd08 100644
--- a/drivers/mtd/parsers/bcm47xxpart.c
+++ b/drivers/mtd/parsers/bcm47xxpart.c
@@ -233,11 +233,11 @@ static int bcm47xxpart_parse(struct mtd_info *master,
}
/* Read middle of the block */
- err = mtd_read(master, offset + 0x8000, 0x4, &bytes_read,
+ err = mtd_read(master, offset + (blocksize / 2), 0x4, &bytes_read,
(uint8_t *)buf);
if (err && !mtd_is_bitflip(err)) {
pr_err("mtd_read error while parsing (offset: 0x%X): %d\n",
- offset + 0x8000, err);
+ offset + (blocksize / 2), err);
continue;
}
--
2.34.1
A typical DP-MST unplug removes a KMS connector. However care must
be taken to properly synchronize with user-space. The expected
sequence of events is the following:
1. The kernel notices that the DP-MST port is gone.
2. The kernel marks the connector as disconnected, then sends a
uevent to make user-space re-scan the connector list.
3. User-space notices the connector goes from connected to disconnected,
disables it.
4. Kernel handles the the IOCTL disabling the connector. On success,
the very last reference to the struct drm_connector is dropped and
drm_connector_cleanup() is called.
5. The connector is removed from the list, and a uevent is sent to tell
user-space that the connector disappeared.
The very last step was missing. As a result, user-space thought the
connector still existed and could try to disable it again. Since the
kernel no longer knows about the connector, that would end up with
EINVAL and confused user-space.
Fix this by sending a hotplug uevent from drm_connector_cleanup().
Signed-off-by: Simon Ser <contact(a)emersion.fr>
Cc: stable(a)vger.kernel.org
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Jonas Ådahl <jadahl(a)redhat.com>
---
drivers/gpu/drm/drm_connector.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
index e3142c8142b3..90dad87e9ad0 100644
--- a/drivers/gpu/drm/drm_connector.c
+++ b/drivers/gpu/drm/drm_connector.c
@@ -582,6 +582,9 @@ void drm_connector_cleanup(struct drm_connector *connector)
mutex_destroy(&connector->mutex);
memset(connector, 0, sizeof(*connector));
+
+ if (dev->registered)
+ drm_sysfs_hotplug_event(dev);
}
EXPORT_SYMBOL(drm_connector_cleanup);
--
2.38.0
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: e7ad18d1169c62e6c78c01ff693fd362d9d65278
Gitweb: https://git.kernel.org/tip/e7ad18d1169c62e6c78c01ff693fd362d9d65278
Author: Borislav Petkov <bp(a)suse.de>
AuthorDate: Wed, 05 Oct 2022 12:00:08 +02:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Tue, 18 Oct 2022 11:03:27 +02:00
x86/microcode/AMD: Apply the patch early on every logical thread
Currently, the patch application logic checks whether the revision
needs to be applied on each logical CPU (SMT thread). Therefore, on SMT
designs where the microcode engine is shared between the two threads,
the application happens only on one of them as that is enough to update
the shared microcode engine.
However, there are microcode patches which do per-thread modification,
see Link tag below.
Therefore, drop the revision check and try applying on each thread. This
is what the BIOS does too so this method is very much tested.
Btw, change only the early paths. On the late loading paths, there's no
point in doing per-thread modification because if is it some case like
in the bugzilla below - removing a CPUID flag - the kernel cannot go and
un-use features it has detected are there early. For that, one should
use early loading anyway.
[ bp: Fixes does not contain the oldest commit which did check for
equality but that is good enough. ]
Fixes: 8801b3fcb574 ("x86/microcode/AMD: Rework container parsing")
Reported-by: Ștefan Talpalaru <stefantalpalaru(a)yahoo.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Tested-by: Ștefan Talpalaru <stefantalpalaru(a)yahoo.com>
Cc: <stable(a)vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
---
arch/x86/kernel/cpu/microcode/amd.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index e7410e9..3a35dec 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -440,7 +440,13 @@ apply_microcode_early_amd(u32 cpuid_1_eax, void *ucode, size_t size, bool save_p
return ret;
native_rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy);
- if (rev >= mc->hdr.patch_id)
+
+ /*
+ * Allow application of the same revision to pick up SMT-specific
+ * changes even if the revision of the other SMT thread is already
+ * up-to-date.
+ */
+ if (rev > mc->hdr.patch_id)
return ret;
if (!__apply_microcode_amd(mc)) {
@@ -528,8 +534,12 @@ void load_ucode_amd_ap(unsigned int cpuid_1_eax)
native_rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy);
- /* Check whether we have saved a new patch already: */
- if (*new_rev && rev < mc->hdr.patch_id) {
+ /*
+ * Check whether a new patch has been saved already. Also, allow application of
+ * the same revision in order to pick up SMT-thread-specific configuration even
+ * if the sibling SMT thread already has an up-to-date revision.
+ */
+ if (*new_rev && rev <= mc->hdr.patch_id) {
if (!__apply_microcode_amd(mc)) {
*new_rev = mc->hdr.patch_id;
return;
Hi Greg and Sasha,
Please consider applying the following commits to 5.19 and 6.0:
4f001a21080f ("Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5")
bb1435f3f575 ("Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT")
0a6de78cff60 ("lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5")
The main change is the final one but the second patch is a prerequisite
and a useful fix in its own right. The first change allows the second to
apply cleanly and it is a good clean up as well.
I have verified that they apply cleanly to those trees with 'patch -p1'.
Attached is a manual backport for 5.15; the changes should not be
necessary for older trees, as they do not have support for DWARF 5.
If there are any problems or questions, please let me know.
Cheers,
Nathan
hi,
new version of pahole (1.24) is causing compilation fail for 5.15
stable kernel, discussed in here [1][2]. Sending fix for that plus
one dependency patch.
Note for patch 1:
there was one extra line change in scripts/pahole-flags.sh file in
its linux tree merge commit:
fc02cb2b37fe Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
not sure how/if to track that, I squashed the change in.
thanks,
jirka
[1] https://lore.kernel.org/bpf/20220825163538.vajnsv3xcpbhl47v@altlinux.org/
[2] https://lore.kernel.org/bpf/YwQRKkmWqsf%2FDu6A@kernel.org/
---
Jiri Olsa (1):
kbuild: Unify options for BTF generation for vmlinux and modules
Martin Rodriguez Reboredo (1):
kbuild: Add skip_encoding_btf_enum64 option to pahole
Makefile | 3 +++
scripts/Makefile.modfinal | 2 +-
scripts/link-vmlinux.sh | 11 +----------
scripts/pahole-flags.sh | 24 ++++++++++++++++++++++++
4 files changed, 29 insertions(+), 11 deletions(-)
create mode 100755 scripts/pahole-flags.sh
Hi Sasha,
Please stop backporting my commits to stable without asking, especially without
Cc'ing to any mailing list that is archived on lore.kernel.org.
"fscrypt: stop using keyrings subsystem for fscrypt_master_key" is a big change,
and there's already a fix for it pending
(https://lore.kernel.org/r/20221011213838.209879-1-ebiggers@kernel.org). I've
been planning on doing the backports myself once the change has had a bit more
time to soak, which is why I intentionally didn't add Cc stable.
It appears you also backported several other commits just to get it apply
cleanly to 5.10, so please drop those too for now.
- Eric
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f9929f69de94 ("drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9929f69de94212f98b3ad72a3e81c3bd3d333e0 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Mon, 25 Jul 2022 16:36:29 -0700
Subject: [PATCH] drm/simpledrm: Fix return type of
simpledrm_simple_display_pipe_mode_valid()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When booting a kernel compiled with clang's CFI protection
(CONFIG_CFI_CLANG), there is a CFI failure in
drm_simple_kms_crtc_mode_valid() when trying to call
simpledrm_simple_display_pipe_mode_valid() through ->mode_valid():
[ 0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8):
...
[ 0.324928] Call trace:
[ 0.324969] __ubsan_handle_cfi_check_fail+0x58/0x60
[ 0.325053] __cfi_check_fail+0x3c/0x44
[ 0.325120] __cfi_slowpath_diag+0x178/0x200
[ 0.325192] drm_simple_kms_crtc_mode_valid+0x58/0x80
[ 0.325279] __drm_helper_update_and_validate+0x31c/0x464
...
The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs'
expects a return type of 'enum drm_mode_status', not 'int'. Correct it
to fix the CFI failure.
Cc: stable(a)vger.kernel.org
Fixes: 11e8f5fd223b ("drm: Add simpledrm driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/1647
Reported-by: Tomasz Paweł Gajc <tpgxyz(a)gmail.com>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reviewed-by: Sami Tolvanen <samitolvanen(a)google.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220725233629.223223-1-natha…
(cherry picked from commit 0c09bc33aa8e9dc867300acaadc318c2f0d85a1e)
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c
index 768242a78e2b..5422363690e7 100644
--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -627,7 +627,7 @@ static const struct drm_connector_funcs simpledrm_connector_funcs = {
.atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
};
-static int
+static enum drm_mode_status
simpledrm_simple_display_pipe_mode_valid(struct drm_simple_display_pipe *pipe,
const struct drm_display_mode *mode)
{
With the re-use of the previous completion status in 0d1c407b1a749
("usb: dwc3: gadget: Return proper request status") it could be possible
that the next frame would also get dropped if the current frame has a
missed isoc error. Ensure that an interrupt is requested for the start
of a new frame.
Fixes: fc78941d8169 ("usb: gadget: uvc: decrease the interrupt load to a quarter")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Dan Vacura <w36195(a)motorola.com>
---
V1 -> V3:
- no change, new patch in series
drivers/usb/gadget/function/uvc_video.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index bb037fcc90e6..323977716f5a 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -431,7 +431,8 @@ static void uvcg_video_pump(struct work_struct *work)
/* Endpoint now owns the request */
req = NULL;
- video->req_int_count++;
+ if (buf->state != UVC_BUF_STATE_DONE)
+ video->req_int_count++;
}
if (!req)
--
2.34.1
From: Marek Bykowski <marek.bykowski(a)gmail.com>
[ Upstream commit d5e3050c0feb8bf7b9a75482fafcc31b90257926 ]
If the properties 'linux,initrd-start' and 'linux,initrd-end' of
the chosen node populated from the bootloader, eg. U-Boot, are so that
start > end, then the phys_initrd_size calculated from end - start is
negative that subsequently gets converted to a high positive value for
being unsigned long long. Then, the memory region with the (invalid)
size is added to the bootmem and attempted being paged in paging_init()
that results in the kernel fault.
For example, on the FVP ARM64 system I'm running, the U-Boot populates
the 'linux,initrd-start' with 8800_0000 and 'linux,initrd-end' with 0.
The phys_initrd_size calculated is then ffff_ffff_7800_0000
(= 0 - 8800_0000 = -8800_0000 + ULLONG_MAX + 1). paging_init() then
attempts to map the address 8800_0000 + ffff_ffff_7800_0000 and oops'es
as below.
It should be stressed, it is generally a fault of the bootloader's with
the kernel relying on it, however we should not allow the bootloader's
misconfiguration to lead to the kernel oops. Not only the kernel should be
bullet proof against it but also finding the root cause of the paging
fault spanning over the bootloader, DT, and kernel may happen is not so
easy.
Unable to handle kernel paging request at virtual address fffffffefe43c000
Mem abort info:
ESR = 0x96000007
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000007
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000
[fffffffefe43c000] pgd=0000000080de9003, pud=0000000080de9003
Unable to handle kernel paging request at virtual address ffffff8000de9f90
Mem abort info:
ESR = 0x96000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000005
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000
[ffffff8000de9f90] pgd=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.51-yocto-standard #1
Hardware name: FVP Base (DT)
pstate: 60000085 (nZCv daIf -PAN -UAO)
pc : show_pte+0x12c/0x1b4
lr : show_pte+0x100/0x1b4
sp : ffffffc010ce3b30
x29: ffffffc010ce3b30 x28: ffffffc010ceed80
x27: fffffffefe43c000 x26: fffffffefe43a028
x25: 0000000080bf0000 x24: 0000000000000025
x23: ffffffc010b8d000 x22: ffffffc010e3d000
x23: ffffffc010b8d000 x22: ffffffc010e3d000
x21: 0000000080de9000 x20: ffffff7f80000f90
x19: fffffffefe43c000 x18: 0000000000000030
x17: 0000000000001400 x16: 0000000000001c00
x15: ffffffc010cef1b8 x14: ffffffffffffffff
x13: ffffffc010df1f40 x12: ffffffc010df1b70
x11: ffffffc010ce3b30 x10: ffffffc010ce3b30
x9 : 00000000ffffffc8 x8 : 0000000000000000
x7 : 000000000000000f x6 : ffffffc010df16e8
x5 : 0000000000000000 x4 : 0000000000000000
x3 : 00000000ffffffff x2 : 0000000000000000
x1 : 0000008080000000 x0 : ffffffc010af1d68
Call trace:
show_pte+0x12c/0x1b4
die_kernel_fault+0x54/0x78
__do_kernel_fault+0x11c/0x128
do_translation_fault+0x58/0xac
do_mem_abort+0x50/0xb0
el1_da+0x1c/0x90
__create_pgd_mapping+0x348/0x598
paging_init+0x3f0/0x70d0
setup_arch+0x2c0/0x5d4
start_kernel+0x94/0x49c
Code: 92748eb5 900052a0 9135a000 cb010294 (f8756a96)
Signed-off-by: Marek Bykowski <marek.bykowski(a)gmail.com>
Link: https://lore.kernel.org/r/20220909023358.76881-1-marek.bykowski@gmail.com
Signed-off-by: Rob Herring <robh(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/of/fdt.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index cc9b8c699da4..253680c84c8c 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -957,6 +957,8 @@ static void __init early_init_dt_check_for_initrd(unsigned long node)
if (!prop)
return;
end = of_read_number(prop, len/4);
+ if (start > end)
+ return;
__early_init_dt_declare_initrd(start, end);
--
2.35.1
The patch titled
Subject: ocfs2: clear dinode links count in case of error
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-clear-dinode-links-count-in-case-of-error.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: clear dinode links count in case of error
Date: Mon, 17 Oct 2022 21:02:27 +0800
In ocfs2_mknod(), if error occurs after dinode successfully allocated,
ocfs2 i_links_count will not be 0.
So even though we clear inode i_nlink before iput in error handling, it
still won't wipe inode since we'll refresh inode from dinode during inode
lock. So just like clear inode i_nlink, we clear ocfs2 i_links_count as
well. Also do the same change for ocfs2_symlink().
Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.c…
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: Yan Wang <wangyan122(a)huawei.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/namei.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/namei.c~ocfs2-clear-dinode-links-count-in-case-of-error
+++ a/fs/ocfs2/namei.c
@@ -232,6 +232,7 @@ static int ocfs2_mknod(struct user_names
handle_t *handle = NULL;
struct ocfs2_super *osb;
struct ocfs2_dinode *dirfe;
+ struct ocfs2_dinode *fe = NULL;
struct buffer_head *new_fe_bh = NULL;
struct inode *inode = NULL;
struct ocfs2_alloc_context *inode_ac = NULL;
@@ -382,6 +383,7 @@ static int ocfs2_mknod(struct user_names
goto leave;
}
+ fe = (struct ocfs2_dinode *) new_fe_bh->b_data;
if (S_ISDIR(mode)) {
status = ocfs2_fill_new_dir(osb, handle, dir, inode,
new_fe_bh, data_ac, meta_ac);
@@ -454,8 +456,11 @@ roll_back:
leave:
if (status < 0 && did_quota_inode)
dquot_free_inode(inode);
- if (handle)
+ if (handle) {
+ if (status < 0 && fe)
+ ocfs2_set_links_count(fe, 0);
ocfs2_commit_trans(osb, handle);
+ }
ocfs2_inode_unlock(dir, 1);
if (did_block_signals)
@@ -2019,8 +2024,11 @@ bail:
ocfs2_clusters_to_bytes(osb->sb, 1));
if (status < 0 && did_quota_inode)
dquot_free_inode(inode);
- if (handle)
+ if (handle) {
+ if (status < 0 && fe)
+ ocfs2_set_links_count(fe, 0);
ocfs2_commit_trans(osb, handle);
+ }
ocfs2_inode_unlock(dir, 1);
if (did_block_signals)
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
ocfs2-fix-bug-when-iput-after-ocfs2_mknod-fails.patch
ocfs2-clear-dinode-links-count-in-case-of-error.patch
The patch titled
Subject: ocfs2: fix BUG when iput after ocfs2_mknod fails
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-fix-bug-when-iput-after-ocfs2_mknod-fails.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: fix BUG when iput after ocfs2_mknod fails
Date: Mon, 17 Oct 2022 21:02:26 +0800
Commit b1529a41f777 "ocfs2: should reclaim the inode if
'__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed
inode if __ocfs2_mknod_locked() fails later. But this introduce a race,
the freed bit may be reused immediately by another thread, which will
update dinode, e.g. i_generation. Then iput this inode will lead to BUG:
inode->i_generation != le32_to_cpu(fe->i_generation)
We could make this inode as bad, but we did want to do operations like
wipe in some cases. Since the claimed inode bit can only affect that an
dinode is missing and will return back after fsck, it seems not a big
problem. So just leave it as is by revert the reclaim logic.
Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.c…
Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: Yan Wang <wangyan122(a)huawei.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/namei.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
--- a/fs/ocfs2/namei.c~ocfs2-fix-bug-when-iput-after-ocfs2_mknod-fails
+++ a/fs/ocfs2/namei.c
@@ -632,18 +632,9 @@ static int ocfs2_mknod_locked(struct ocf
return status;
}
- status = __ocfs2_mknod_locked(dir, inode, dev, new_fe_bh,
+ return __ocfs2_mknod_locked(dir, inode, dev, new_fe_bh,
parent_fe_bh, handle, inode_ac,
fe_blkno, suballoc_loc, suballoc_bit);
- if (status < 0) {
- u64 bg_blkno = ocfs2_which_suballoc_group(fe_blkno, suballoc_bit);
- int tmp = ocfs2_free_suballoc_bits(handle, inode_ac->ac_inode,
- inode_ac->ac_bh, suballoc_bit, bg_blkno, 1);
- if (tmp)
- mlog_errno(tmp);
- }
-
- return status;
}
static int ocfs2_mkdir(struct user_namespace *mnt_userns,
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
ocfs2-fix-bug-when-iput-after-ocfs2_mknod-fails.patch
ocfs2-clear-dinode-links-count-in-case-of-error.patch
Since commit fc274c1e9973 ("USB: gadget: Add a new bus for gadgets"),
the gadget devices are proper driver core devices, which caused each
device to request pinmux settings:
aspeed_vhub 1e6a0000.usb-vhub: Initialized virtual hub in USB2 mode
aspeed-g5-pinctrl 1e6e2080.pinctrl: pin A7 already requested by 1e6a0000.usb-vhub; cannot claim for gadget.0
aspeed-g5-pinctrl 1e6e2080.pinctrl: pin-232 (gadget.0) status -22
aspeed-g5-pinctrl 1e6e2080.pinctrl: could not request pin 232 (A7) from group USB2AD on device aspeed-g5-pinctrl
g_mass_storage gadget.0: Error applying setting, reverse things back
The vhub driver has already claimed the pins, so prevent the gadgets
from requesting them too by setting the magic of_node_reused flag. This
causes the driver core to skip the mux request.
Reported-by: Zev Weiss <zev(a)bewilderbeest.net>
Reported-by: Jae Hyun Yoo <quic_jaehyoo(a)quicinc.com>
Fixes: fc274c1e9973 ("USB: gadget: Add a new bus for gadgets")
Cc: stable(a)vger.kernel.org
Signed-off-by: Joel Stanley <joel(a)jms.id.au>
---
drivers/usb/gadget/udc/aspeed-vhub/dev.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/udc/aspeed-vhub/dev.c b/drivers/usb/gadget/udc/aspeed-vhub/dev.c
index b0dfca43fbdc..4f3bc27c1c62 100644
--- a/drivers/usb/gadget/udc/aspeed-vhub/dev.c
+++ b/drivers/usb/gadget/udc/aspeed-vhub/dev.c
@@ -591,6 +591,7 @@ int ast_vhub_init_dev(struct ast_vhub *vhub, unsigned int idx)
d->gadget.max_speed = USB_SPEED_HIGH;
d->gadget.speed = USB_SPEED_UNKNOWN;
d->gadget.dev.of_node = vhub->pdev->dev.of_node;
+ d->gadget.dev.of_node_reused = true;
rc = usb_add_gadget_udc(d->port_dev, &d->gadget);
if (rc != 0)
--
2.35.1
From: Jeff Vanhoof <qjv001(a)motorola.com>
In uvc_video_encode_isoc_sg, the uvc_request's sg list is
incorrectly being populated leading to corrupt video being
received by the remote end. When building the sg list the
usage of buf->sg's 'dma_length' field is not correct and
instead its 'length' field should be used.
Fixes: e81e7f9a0eb9 ("usb: gadget: uvc: add scatter gather support")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jeff Vanhoof <qjv001(a)motorola.com>
Signed-off-by: Dan Vacura <w36195(a)motorola.com>
---
V1 -> V3:
- no change, new patch in series
drivers/usb/gadget/function/uvc_video.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index dd54841b0b3e..7d4508a83d5d 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -157,10 +157,10 @@ uvc_video_encode_isoc_sg(struct usb_request *req, struct uvc_video *video,
sg = sg_next(sg);
for_each_sg(sg, iter, ureq->sgt.nents - 1, i) {
- if (!len || !buf->sg || !sg_dma_len(buf->sg))
+ if (!len || !buf->sg || !buf->sg->length)
break;
- sg_left = sg_dma_len(buf->sg) - buf->offset;
+ sg_left = buf->sg->length - buf->offset;
part = min_t(unsigned int, len, sg_left);
sg_set_page(iter, sg_page(buf->sg), part, buf->offset);
--
2.34.1
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 2b12a7a126d62bdbd81f4923c21bf6e9a7fbd069
Gitweb: https://git.kernel.org/tip/2b12a7a126d62bdbd81f4923c21bf6e9a7fbd069
Author: Zhang Rui <rui.zhang(a)intel.com>
AuthorDate: Fri, 14 Oct 2022 17:01:46 +08:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Mon, 17 Oct 2022 11:58:52 -07:00
x86/topology: Fix multiple packages shown on a single-package system
CPUID.1F/B does not enumerate Package level explicitly, instead, all the
APIC-ID bits above the enumerated levels are assumed to be package ID
bits.
Current code gets package ID by shifting out all the APIC-ID bits that
Linux supports, rather than shifting out all the APIC-ID bits that
CPUID.1F enumerates. This introduces problems when CPUID.1F enumerates a
level that Linux does not support.
For example, on a single package AlderLake-N, there are 2 Ecore Modules
with 4 atom cores in each module. Linux does not support the Module
level and interprets the Module ID bits as package ID and erroneously
reports a multi module system as a multi-package system.
Fix this by using APIC-ID bits above all the CPUID.1F enumerated levels
as package ID.
[ dhansen: spelling fix ]
Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support")
Suggested-by: Len Brown <len.brown(a)intel.com>
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Len Brown <len.brown(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20221014090147.1836-4-rui.zhang@intel.com
---
arch/x86/kernel/cpu/topology.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 132a2de..f759281 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -96,6 +96,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
unsigned int core_select_mask, core_level_siblings;
unsigned int die_select_mask, die_level_siblings;
+ unsigned int pkg_mask_width;
bool die_level_present = false;
int leaf;
@@ -111,10 +112,10 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
- die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
sub_index = 1;
- do {
+ while (true) {
cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
/*
@@ -132,8 +133,13 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}
+ if (LEAFB_SUBTYPE(ecx) != INVALID_TYPE)
+ pkg_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ else
+ break;
+
sub_index++;
- } while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
+ }
core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
die_select_mask = (~(-1 << die_plus_mask_width)) >>
@@ -148,7 +154,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
}
c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
- die_plus_mask_width);
+ pkg_mask_width);
/*
* Reinit the apicid, now that we have extended initial_apicid.
*/
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 71eac7063698b7d7b8fafb1683ac24a034541141
Gitweb: https://git.kernel.org/tip/71eac7063698b7d7b8fafb1683ac24a034541141
Author: Zhang Rui <rui.zhang(a)intel.com>
AuthorDate: Fri, 14 Oct 2022 17:01:47 +08:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Mon, 17 Oct 2022 11:58:52 -07:00
x86/topology: Fix duplicated core ID within a package
Today, core ID is assumed to be unique within each package.
But an AlderLake-N platform adds a Module level between core and package,
Linux excludes the unknown modules bits from the core ID, resulting in
duplicate core ID's.
To keep core ID unique within a package, Linux must include all APIC-ID
bits for known or unknown levels above the core and below the package
in the core ID.
It is important to understand that core ID's have always come directly
from the APIC-ID encoding, which comes from the BIOS. Thus there is no
guarantee that they start at 0, or that they are contiguous.
As such, naively using them for array indexes can be problematic.
[ dhansen: un-known -> unknown ]
Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support")
Suggested-by: Len Brown <len.brown(a)intel.com>
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Len Brown <len.brown(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20221014090147.1836-5-rui.zhang@intel.com
---
arch/x86/kernel/cpu/topology.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index f759281..5e868b6 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -141,7 +141,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
sub_index++;
}
- core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
+ core_select_mask = (~(-1 << pkg_mask_width)) >> ht_mask_width;
die_select_mask = (~(-1 << die_plus_mask_width)) >>
core_plus_mask_width;
This is the start of the stable review cycle for the 5.10.149 release.
There are 4 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue, 18 Oct 2022 06:44:46 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.149-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.149-rc1
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211: fix MBSSID parsing use-after-free
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211: don't parse mbssid in assoc response
Johannes Berg <johannes.berg(a)intel.com>
mac80211: mlme: find auth challenge directly
Sasha Levin <sashal(a)kernel.org>
Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
-------------
Diffstat:
Makefile | 4 ++--
fs/splice.c | 10 ++++++----
net/mac80211/ieee80211_i.h | 4 ++--
net/mac80211/mlme.c | 21 +++++++++++++--------
net/mac80211/scan.c | 2 ++
net/mac80211/util.c | 11 ++++++-----
6 files changed, 31 insertions(+), 21 deletions(-)
This is the start of the stable review cycle for the 5.4.219 release.
There are 4 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue, 18 Oct 2022 06:44:46 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.219-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.219-rc1
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211: fix MBSSID parsing use-after-free
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211: don't parse mbssid in assoc response
Johannes Berg <johannes.berg(a)intel.com>
mac80211: mlme: find auth challenge directly
Sasha Levin <sashal(a)kernel.org>
Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
-------------
Diffstat:
Makefile | 4 ++--
fs/splice.c | 10 ++++++----
net/mac80211/ieee80211_i.h | 4 ++--
net/mac80211/mlme.c | 21 +++++++++++++--------
net/mac80211/scan.c | 2 ++
net/mac80211/util.c | 11 ++++++-----
6 files changed, 31 insertions(+), 21 deletions(-)
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 33806e7cb8d50379f55c3e8f335e91e1b359dc7b
Gitweb: https://git.kernel.org/tip/33806e7cb8d50379f55c3e8f335e91e1b359dc7b
Author: Nathan Chancellor <nathan(a)kernel.org>
AuthorDate: Thu, 29 Sep 2022 08:20:10 -07:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Mon, 17 Oct 2022 19:11:16 +02:00
x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB
A recent change in LLVM made CONFIG_EFI_STUB unselectable because it no
longer pretends to support -mabi=ms, breaking the dependency in
Kconfig. Lack of CONFIG_EFI_STUB can prevent kernels from booting via
EFI in certain circumstances.
This check was added by
8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi")
to ensure that __attribute__((ms_abi)) was available, as -mabi=ms is
not actually used in any cflags.
According to the GCC documentation, this attribute has been supported
since GCC 4.4.7. The kernel currently requires GCC 5.1 so this check is
not necessary; even when that change landed in 5.6, the kernel required
GCC 4.9 so it was unnecessary then as well.
Clang supports __attribute__((ms_abi)) for all versions that are
supported for building the kernel so no additional check is needed.
Remove the 'depends on' line altogether to allow CONFIG_EFI_STUB to be
selected when CONFIG_EFI is enabled, regardless of compiler.
Fixes: 8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi")
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/d1ad006a8f64bdc17f618deffa9e7c9…
---
arch/x86/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6d1879e..67745ce 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1973,7 +1973,6 @@ config EFI
config EFI_STUB
bool "EFI stub support"
depends on EFI
- depends on $(cc-option,-mabi=ms) || X86_32
select RELOCATABLE
help
This kernel feature allows a bzImage to be loaded directly
I'm announcing the release of the 5.10.149 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
fs/splice.c | 10 ++++++----
net/mac80211/ieee80211_i.h | 4 ++--
net/mac80211/mlme.c | 21 +++++++++++++--------
net/mac80211/scan.c | 2 ++
net/mac80211/util.c | 11 ++++++-----
6 files changed, 30 insertions(+), 20 deletions(-)
Greg Kroah-Hartman (1):
Linux 5.10.149
Johannes Berg (3):
mac80211: mlme: find auth challenge directly
wifi: mac80211: don't parse mbssid in assoc response
wifi: mac80211: fix MBSSID parsing use-after-free
Sasha Levin (1):
Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
I'm announcing the release of the 5.4.219 kernel.
All users of the 5.4 kernel series must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
fs/splice.c | 10 ++++++----
net/mac80211/ieee80211_i.h | 4 ++--
net/mac80211/mlme.c | 21 +++++++++++++--------
net/mac80211/scan.c | 2 ++
net/mac80211/util.c | 11 ++++++-----
6 files changed, 30 insertions(+), 20 deletions(-)
Greg Kroah-Hartman (1):
Linux 5.4.219
Johannes Berg (3):
mac80211: mlme: find auth challenge directly
wifi: mac80211: don't parse mbssid in assoc response
wifi: mac80211: fix MBSSID parsing use-after-free
Sasha Levin (1):
Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
A recent change in LLVM made CONFIG_EFI_STUB unselectable because it no
longer pretends to support '-mabi=ms', breaking the dependency in
Kconfig. Lack of CONFIG_EFI_STUB can prevent kernels from booting via
EFI in certain circumstances.
This check was added by commit 8f24f8c2fc82 ("efi/libstub: Annotate
firmware routines as __efiapi") to ensure that '__attribute__((ms_abi))'
was available, as '-mabi=ms' is not actually used in any cflags.
According to the GCC documentation, this attribute has been supported
since GCC 4.4.7. The kernel currently requires GCC 5.1 so this check is
not necessary; even when that change landed in 5.6, the kernel required
GCC 4.9 so it was unnecessary then as well. Clang supports
'__attribute__((ms_abi))' for all versions that are supported for
building the kernel so no additional check is needed. Remove the
'depends on' line altogether to allow CONFIG_EFI_STUB to be selected
when CONFIG_EFI is enabled, regardless of compiler.
Cc: stable(a)vger.kernel.org
Fixes: 8f24f8c2fc82 ("efi/libstub: Annotate firmware routines as __efiapi")
Link: https://github.com/ClangBuiltLinux/linux/issues/1725
Link: https://gcc.gnu.org/onlinedocs/gcc-4.4.7/gcc/Function-Attributes.html
Link: https://github.com/llvm/llvm-project/commit/d1ad006a8f64bdc17f618deffa9e7c9…
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
arch/x86/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f9920f1341c8..81012154d9ed 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1956,7 +1956,6 @@ config EFI
config EFI_STUB
bool "EFI stub support"
depends on EFI
- depends on $(cc-option,-mabi=ms) || X86_32
select RELOCATABLE
help
This kernel feature allows a bzImage to be loaded directly
base-commit: f76349cf41451c5c42a99f18a9163377e4b364ff
--
2.37.3
This reverts commit ad2bea79ef0144043721d4893eef719c907e2e63.
On systems with older PMUFW (Xilinx ZynqMP Platform Management Firmware)
using these pinctrl properties can cause system hang because there is
missing feature autodetection.
When this feature is implemented in the PMUFW, support for these two
properties should bring back.
Cc: stable(a)vger.kernel.org
Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri(a)amd.com>
---
drivers/pinctrl/pinctrl-zynqmp.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/drivers/pinctrl/pinctrl-zynqmp.c b/drivers/pinctrl/pinctrl-zynqmp.c
index 7d2fbf8a02cd..c98f35ad8921 100644
--- a/drivers/pinctrl/pinctrl-zynqmp.c
+++ b/drivers/pinctrl/pinctrl-zynqmp.c
@@ -412,10 +412,6 @@ static int zynqmp_pinconf_cfg_set(struct pinctrl_dev *pctldev,
break;
case PIN_CONFIG_BIAS_HIGH_IMPEDANCE:
- param = PM_PINCTRL_CONFIG_TRI_STATE;
- arg = PM_PINCTRL_TRI_STATE_ENABLE;
- ret = zynqmp_pm_pinctrl_set_config(pin, param, arg);
- break;
case PIN_CONFIG_MODE_LOW_POWER:
/*
* These cases are mentioned in dts but configurable
@@ -424,11 +420,6 @@ static int zynqmp_pinconf_cfg_set(struct pinctrl_dev *pctldev,
*/
ret = 0;
break;
- case PIN_CONFIG_OUTPUT_ENABLE:
- param = PM_PINCTRL_CONFIG_TRI_STATE;
- arg = PM_PINCTRL_TRI_STATE_DISABLE;
- ret = zynqmp_pm_pinctrl_set_config(pin, param, arg);
- break;
default:
dev_warn(pctldev->dev,
"unsupported configuration parameter '%u'\n",
--
2.17.1
From: Mike Pattrick <mkp(a)redhat.com>
[ Upstream commit 1100248a5c5ccd57059eb8d02ec077e839a23826 ]
Frames sent to userspace can be reported as dropped in
ovs_dp_process_packet, however, if they are dropped in the netlink code
then netlink_attachskb will report the same frame as dropped.
This patch checks for error codes which indicate that the frame has
already been freed.
Signed-off-by: Mike Pattrick <mkp(a)redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109946
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/openvswitch/datapath.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c28f0e2a7c3c..ab318844a19b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -278,10 +278,17 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
upcall.portid = ovs_vport_find_upcall_portid(p, skb);
upcall.mru = OVS_CB(skb)->mru;
error = ovs_dp_upcall(dp, skb, key, &upcall, 0);
- if (unlikely(error))
- kfree_skb(skb);
- else
+ switch (error) {
+ case 0:
+ case -EAGAIN:
+ case -ERESTARTSYS:
+ case -EINTR:
consume_skb(skb);
+ break;
+ default:
+ kfree_skb(skb);
+ break;
+ }
stats_counter = &stats->n_missed;
goto out;
}
--
2.35.1
From: Haibo Chen <haibo.chen(a)nxp.com>
[ Upstream commit e7c4ebe2f9cd68588eb24ba4ed122e696e2d5272 ]
Use the general touchscreen method to config the max pressure for
touch tsc2046(data sheet suggest 8 bit pressure), otherwise, for
ABS_PRESSURE, when config the same max and min value, weston will
meet the following issue,
[17:19:39.183] event1 - ADS7846 Touchscreen: is tagged by udev as: Touchscreen
[17:19:39.183] event1 - ADS7846 Touchscreen: kernel bug: device has min == max on ABS_PRESSURE
[17:19:39.183] event1 - ADS7846 Touchscreen: was rejected
[17:19:39.183] event1 - not using input device '/dev/input/event1'
This will then cause the APP weston-touch-calibrator can't list touch devices.
root@imx6ul7d:~# weston-touch-calibrator
could not load cursor 'dnd-move'
could not load cursor 'dnd-copy'
could not load cursor 'dnd-none'
No devices listed.
And accroding to binding Doc, "ti,x-max", "ti,y-max", "ti,pressure-max"
belong to the deprecated properties, so remove them. Also for "ti,x-min",
"ti,y-min", "ti,x-plate-ohms", the value set in dts equal to the default
value in driver, so are redundant, also remove here.
Signed-off-by: Haibo Chen <haibo.chen(a)nxp.com>
Signed-off-by: Shawn Guo <shawnguo(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/boot/dts/imx7d-sdb.dts | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/arch/arm/boot/dts/imx7d-sdb.dts b/arch/arm/boot/dts/imx7d-sdb.dts
index 2f33c463cbce..83867357f135 100644
--- a/arch/arm/boot/dts/imx7d-sdb.dts
+++ b/arch/arm/boot/dts/imx7d-sdb.dts
@@ -126,12 +126,7 @@ tsc2046@0 {
interrupt-parent = <&gpio2>;
interrupts = <29 0>;
pendown-gpio = <&gpio2 29 GPIO_ACTIVE_HIGH>;
- ti,x-min = /bits/ 16 <0>;
- ti,x-max = /bits/ 16 <0>;
- ti,y-min = /bits/ 16 <0>;
- ti,y-max = /bits/ 16 <0>;
- ti,pressure-max = /bits/ 16 <0>;
- ti,x-plate-ohms = /bits/ 16 <400>;
+ touchscreen-max-pressure = <255>;
wakeup-source;
};
};
--
2.35.1
Hello,
Good morning and how are you?
I have an important and favourable information/proposal which might
interest you to know,
let me hear from you to detail you, it's important
Sincerely,
M.Cheickna
tourecheickna(a)consultant.com
[ upstream commit 0091bfc81741b8d3aeb3b7ab8636f911b2de6e80 ]
Instead of putting io_uring's registered files in unix_gc() we want it
to be done by io_uring itself. The trick here is to consider io_uring
registered files for cycle detection but not actually putting them down.
Because io_uring can't register other ring instances, this will remove
all refs to the ring file triggering the ->release path and clean up
with io_ring_ctx_free().
Cc: stable(a)vger.kernel.org
Fixes: 6b06314c47e1 ("io_uring: add file set registration")
Reported-and-tested-by: David Bouman <dbouman03(a)gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
[axboe: add kerneldoc comment to skb, fold in skb leak fix]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
fs/io_uring.c | 1 +
include/linux/skbuff.h | 2 ++
net/unix/garbage.c | 20 ++++++++++++++++++++
3 files changed, 23 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 501c7e14c07c..e8df6345a812 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3172,6 +3172,7 @@ static int __io_sqe_files_scm(struct io_ring_ctx *ctx, int nr, int offset)
}
skb->sk = sk;
+ skb->scm_io_uring = 1;
skb->destructor = io_destruct_skb;
fpl->user = get_uid(ctx->user);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 680f71ecdc08..eab3a4d02f32 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -659,6 +659,7 @@ typedef unsigned char *sk_buff_data_t;
* @wifi_acked: whether frame was acked on wifi or not
* @no_fcs: Request NIC to treat last 4 bytes as Ethernet FCS
* @csum_not_inet: use CRC32c to resolve CHECKSUM_PARTIAL
+ * @scm_io_uring: SKB holds io_uring registered files
* @dst_pending_confirm: need to confirm neighbour
* @decrypted: Decrypted SKB
* @napi_id: id of the NAPI struct this skb came from
@@ -824,6 +825,7 @@ struct sk_buff {
#ifdef CONFIG_TLS_DEVICE
__u8 decrypted:1;
#endif
+ __u8 scm_io_uring:1;
#ifdef CONFIG_NET_SCHED
__u16 tc_index; /* traffic control index */
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index d45d5366115a..dc2763540393 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -204,6 +204,7 @@ void wait_for_unix_gc(void)
/* The external entry point: unix_gc() */
void unix_gc(void)
{
+ struct sk_buff *next_skb, *skb;
struct unix_sock *u;
struct unix_sock *next;
struct sk_buff_head hitlist;
@@ -297,11 +298,30 @@ void unix_gc(void)
spin_unlock(&unix_gc_lock);
+ /* We need io_uring to clean its registered files, ignore all io_uring
+ * originated skbs. It's fine as io_uring doesn't keep references to
+ * other io_uring instances and so killing all other files in the cycle
+ * will put all io_uring references forcing it to go through normal
+ * release.path eventually putting registered files.
+ */
+ skb_queue_walk_safe(&hitlist, skb, next_skb) {
+ if (skb->scm_io_uring) {
+ __skb_unlink(skb, &hitlist);
+ skb_queue_tail(&skb->sk->sk_receive_queue, skb);
+ }
+ }
+
/* Here we are. Hitlist is filled. Die. */
__skb_queue_purge(&hitlist);
spin_lock(&unix_gc_lock);
+ /* There could be io_uring registered files, just push them back to
+ * the inflight list
+ */
+ list_for_each_entry_safe(u, next, &gc_candidates, link)
+ list_move_tail(&u->link, &gc_inflight_list);
+
/* All candidates should have been detached by now. */
BUG_ON(!list_empty(&gc_candidates));
--
2.38.0
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling")
dfe8aec4520b ("btrfs: add a btrfs_block_group_root() helper")
b6e9f16c5fda ("btrfs: replace open coded while loop with proper construct")
42437a6386ff ("btrfs: introduce mount option rescue=ignorebadroots")
68319c18cb21 ("btrfs: show rescue=usebackuproot in /proc/mounts")
ab0b4a3ebf14 ("btrfs: add a helper to print out rescue= options")
ceafe3cc3992 ("btrfs: sysfs: export supported rescue= mount options")
334c16d82cfe ("btrfs: push the NODATASUM check into btrfs_lookup_bio_sums")
d70bf7484f72 ("btrfs: unify the ro checking for mount options")
7573df5547c0 ("btrfs: sysfs: export supported send stream version")
3ef3959b29c4 ("btrfs: don't show full path of bind mounts in subvol=")
74ef00185eb8 ("btrfs: introduce "rescue=" mount option")
e3ba67a108ff ("btrfs: factor out reading of bg from find_frist_block_group")
89d7da9bc592 ("btrfs: get mapping tree directly from fsinfo in find_first_block_group")
c730ae0c6bb3 ("btrfs: convert comments to fallthrough annotations")
aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree")
92a7cc425223 ("btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE")
3be4d8efe3cf ("btrfs: block-group: rename write_one_cache_group()")
97f4728af888 ("btrfs: block-group: refactor how we insert a block group item")
7357623a7f4b ("btrfs: block-group: refactor how we delete one block group item")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 81d5d61454c365718655cfc87d8200c84e25d596 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Tue, 9 Aug 2022 13:02:16 +0800
Subject: [PATCH] btrfs: enhance unsupported compat RO flags handling
Currently there are two corner cases not handling compat RO flags
correctly:
- Remount
We can still mount the fs RO with compat RO flags, then remount it RW.
We should not allow any write into a fs with unsupported RO flags.
- Still try to search block group items
In fact, behavior/on-disk format change to extent tree should not
need a full incompat flag.
And since we can ensure fs with unsupported RO flags never got any
writes (with above case fixed), then we can even skip block group
items search at mount time.
This patch will enhance the unsupported RO compat flags by:
- Reject read-write remount if there are unsupported RO compat flags
- Go dummy block group items directly for unsupported RO compat flags
In fact, only changes to chunk/subvolume/root/csum trees should go
incompat flags.
The latter part should allow future change to extent tree to be compat
RO flags.
Thus this patch also needs to be backported to all stable trees.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 53c44c52cb79..e7b5a54c8258 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2164,7 +2164,16 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
int need_clear = 0;
u64 cache_gen;
- if (!root)
+ /*
+ * Either no extent root (with ibadroots rescue option) or we have
+ * unsupported RO options. The fs can never be mounted read-write, so no
+ * need to waste time searching block group items.
+ *
+ * This also allows new extent tree related changes to be RO compat,
+ * no need for a full incompat flag.
+ */
+ if (!root || (btrfs_super_compat_ro_flags(info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP))
return fill_dummy_bgs(info);
key.objectid = 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7291e9d67e92..eb0ae7e396ef 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2117,6 +2117,15 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
ret = -EINVAL;
goto restore;
}
+ if (btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP) {
+ btrfs_err(fs_info,
+ "can not remount read-write due to unsupported optional flags 0x%llx",
+ btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP);
+ ret = -EINVAL;
+ goto restore;
+ }
if (fs_info->fs_devices->rw_devices == 0) {
ret = -EACCES;
goto restore;
Commit d4955c0ad77dbc684fc716387070ac24801b8bca upstream.
cpufreq_get_hw_max_freq() returns max frequency in kHz as *unsigned int*,
while freq_inv_set_max_ratio() gets passed this frequency in Hz as 'u64'.
Multiplying max frequency by 1000 can potentially result in overflow --
multiplying by 1000ULL instead should avoid that...
Found by Linux Verification Center (linuxtesting.org) with the SVACE static
analysis tool.
Fixes: cd0ed03a8903 ("arm64: use activity monitors for frequency invariance")
Signed-off-by: Sergey Shtylyov <s.shtylyov(a)omp.ru>
Link: https://lore.kernel.org/r/01493d64-2bce-d968-86dc-11a122a9c07d@omp.ru
Signed-off-by: Will Deacon <will(a)kernel.org>
---
arch/arm64/kernel/topology.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-stable/arch/arm64/kernel/topology.c
===================================================================
--- linux-stable.orig/arch/arm64/kernel/topology.c
+++ linux-stable/arch/arm64/kernel/topology.c
@@ -158,7 +158,7 @@ static int validate_cpu_freq_invariance_
}
/* Convert maximum frequency from KHz to Hz and validate */
- max_freq_hz = cpufreq_get_hw_max_freq(cpu) * 1000;
+ max_freq_hz = cpufreq_get_hw_max_freq(cpu) * 1000ULL;
if (unlikely(!max_freq_hz)) {
pr_debug("CPU%d: invalid maximum frequency.\n", cpu);
return -EINVAL;
Hi Sasha,
Please don't add this patch: it is not a fix, it is an internal change preparing for
a new feature (see commit 0975274557d1). So no need to backport this patch.
Regards,
Hans
On 10/17/22 04:24, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> media: v4l2-ctrls: allocate space for arrays
>
> to the 6.0-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> media-v4l2-ctrls-allocate-space-for-arrays.patch
> and it can be found in the queue-6.0 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 5cc036de01c402cf40cccf04dcb95af5e18e8313
> Author: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
> Date: Mon Jul 11 12:21:07 2022 +0200
>
> media: v4l2-ctrls: allocate space for arrays
>
> [ Upstream commit 5f2c5c69a61dc5411d436c1a422f8a1ee195a924 ]
>
> Just like dynamic arrays, also allocate space for regular arrays.
>
> This is in preparation for allowing to change the array size from
> a driver.
>
> Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas(a)ideasonboard.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org>
> Stable-dep-of: 211f8304fa21 ("media: exynos4-is: fimc-is: Add of_node_put() when breaking out of loop")
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/media/v4l2-core/v4l2-ctrls-api.c b/drivers/media/v4l2-core/v4l2-ctrls-api.c
> index 50d012ba3c02..1b90bd7c4010 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls-api.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls-api.c
> @@ -105,8 +105,8 @@ static int user_to_new(struct v4l2_ext_control *c, struct v4l2_ctrl *ctrl)
>
> ctrl->is_new = 0;
> if (ctrl->is_dyn_array &&
> - c->size > ctrl->p_dyn_alloc_elems * ctrl->elem_size) {
> - void *old = ctrl->p_dyn;
> + c->size > ctrl->p_array_alloc_elems * ctrl->elem_size) {
> + void *old = ctrl->p_array;
> void *tmp = kvzalloc(2 * c->size, GFP_KERNEL);
>
> if (!tmp)
> @@ -115,8 +115,8 @@ static int user_to_new(struct v4l2_ext_control *c, struct v4l2_ctrl *ctrl)
> memcpy(tmp + c->size, ctrl->p_cur.p, ctrl->elems * ctrl->elem_size);
> ctrl->p_new.p = tmp;
> ctrl->p_cur.p = tmp + c->size;
> - ctrl->p_dyn = tmp;
> - ctrl->p_dyn_alloc_elems = c->size / ctrl->elem_size;
> + ctrl->p_array = tmp;
> + ctrl->p_array_alloc_elems = c->size / ctrl->elem_size;
> kvfree(old);
> }
>
> diff --git a/drivers/media/v4l2-core/v4l2-ctrls-core.c b/drivers/media/v4l2-core/v4l2-ctrls-core.c
> index 1f85828d6694..9871c77f559b 100644
> --- a/drivers/media/v4l2-core/v4l2-ctrls-core.c
> +++ b/drivers/media/v4l2-core/v4l2-ctrls-core.c
> @@ -1135,14 +1135,14 @@ int req_to_new(struct v4l2_ctrl_ref *ref)
>
> /*
> * Check if the number of elements in the request is more than the
> - * elements in ctrl->p_dyn. If so, attempt to realloc ctrl->p_dyn.
> - * Note that p_dyn is allocated with twice the number of elements
> + * elements in ctrl->p_array. If so, attempt to realloc ctrl->p_array.
> + * Note that p_array is allocated with twice the number of elements
> * in the dynamic array since it has to store both the current and
> * new value of such a control.
> */
> - if (ref->p_req_elems > ctrl->p_dyn_alloc_elems) {
> + if (ref->p_req_elems > ctrl->p_array_alloc_elems) {
> unsigned int sz = ref->p_req_elems * ctrl->elem_size;
> - void *old = ctrl->p_dyn;
> + void *old = ctrl->p_array;
> void *tmp = kvzalloc(2 * sz, GFP_KERNEL);
>
> if (!tmp)
> @@ -1151,8 +1151,8 @@ int req_to_new(struct v4l2_ctrl_ref *ref)
> memcpy(tmp + sz, ctrl->p_cur.p, ctrl->elems * ctrl->elem_size);
> ctrl->p_new.p = tmp;
> ctrl->p_cur.p = tmp + sz;
> - ctrl->p_dyn = tmp;
> - ctrl->p_dyn_alloc_elems = ref->p_req_elems;
> + ctrl->p_array = tmp;
> + ctrl->p_array_alloc_elems = ref->p_req_elems;
> kvfree(old);
> }
>
> @@ -1252,7 +1252,7 @@ void v4l2_ctrl_handler_free(struct v4l2_ctrl_handler *hdl)
> list_del(&ctrl->node);
> list_for_each_entry_safe(sev, next_sev, &ctrl->ev_subs, node)
> list_del(&sev->node);
> - kvfree(ctrl->p_dyn);
> + kvfree(ctrl->p_array);
> kvfree(ctrl);
> }
> kvfree(hdl->buckets);
> @@ -1584,11 +1584,10 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl,
> V4L2_CTRL_FLAG_EXECUTE_ON_WRITE;
> else if (type == V4L2_CTRL_TYPE_CTRL_CLASS)
> flags |= V4L2_CTRL_FLAG_READ_ONLY;
> - else if (!(flags & V4L2_CTRL_FLAG_DYNAMIC_ARRAY) &&
> + else if (!is_array &&
> (type == V4L2_CTRL_TYPE_INTEGER64 ||
> type == V4L2_CTRL_TYPE_STRING ||
> - type >= V4L2_CTRL_COMPOUND_TYPES ||
> - is_array))
> + type >= V4L2_CTRL_COMPOUND_TYPES))
> sz_extra += 2 * tot_ctrl_size;
>
> if (type >= V4L2_CTRL_COMPOUND_TYPES && p_def.p_const)
> @@ -1632,14 +1631,14 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl,
> ctrl->cur.val = ctrl->val = def;
> data = &ctrl[1];
>
> - if (ctrl->is_dyn_array) {
> - ctrl->p_dyn_alloc_elems = elems;
> - ctrl->p_dyn = kvzalloc(2 * elems * elem_size, GFP_KERNEL);
> - if (!ctrl->p_dyn) {
> + if (ctrl->is_array) {
> + ctrl->p_array_alloc_elems = elems;
> + ctrl->p_array = kvzalloc(2 * elems * elem_size, GFP_KERNEL);
> + if (!ctrl->p_array) {
> kvfree(ctrl);
> return NULL;
> }
> - data = ctrl->p_dyn;
> + data = ctrl->p_array;
> }
>
> if (!ctrl->is_int) {
> @@ -1651,7 +1650,7 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl,
> }
>
> if (type >= V4L2_CTRL_COMPOUND_TYPES && p_def.p_const) {
> - if (ctrl->is_dyn_array)
> + if (ctrl->is_array)
> ctrl->p_def.p = &ctrl[1];
> else
> ctrl->p_def.p = ctrl->p_cur.p + tot_ctrl_size;
> @@ -1664,7 +1663,7 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl,
> }
>
> if (handler_new_ref(hdl, ctrl, NULL, false, false)) {
> - kvfree(ctrl->p_dyn);
> + kvfree(ctrl->p_array);
> kvfree(ctrl);
> return NULL;
> }
> diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h
> index 00828a4f9404..5ddd506ae7b9 100644
> --- a/include/media/v4l2-ctrls.h
> +++ b/include/media/v4l2-ctrls.h
> @@ -203,7 +203,7 @@ typedef void (*v4l2_ctrl_notify_fnc)(struct v4l2_ctrl *ctrl, void *priv);
> * @elem_size: The size in bytes of the control.
> * @new_elems: The number of elements in p_new. This is the same as @elems,
> * except for dynamic arrays. In that case it is in the range of
> - * 1 to @p_dyn_alloc_elems.
> + * 1 to @p_array_alloc_elems.
> * @dims: The size of each dimension.
> * @nr_of_dims:The number of dimensions in @dims.
> * @menu_skip_mask: The control's skip mask for menu controls. This makes it
> @@ -227,12 +227,11 @@ typedef void (*v4l2_ctrl_notify_fnc)(struct v4l2_ctrl *ctrl, void *priv);
> * not freed when the control is deleted. Should this be needed
> * then a new internal bitfield can be added to tell the framework
> * to free this pointer.
> - * @p_dyn: Pointer to the dynamically allocated array. Only valid if
> - * @is_dyn_array is true.
> - * @p_dyn_alloc_elems: The number of elements in the dynamically allocated
> - * array for both the cur and new values. So @p_dyn is actually
> - * sized for 2 * @p_dyn_alloc_elems * @elem_size. Only valid if
> - * @is_dyn_array is true.
> + * @p_array: Pointer to the allocated array. Only valid if @is_array is true.
> + * @p_array_alloc_elems: The number of elements in the allocated
> + * array for both the cur and new values. So @p_array is actually
> + * sized for 2 * @p_array_alloc_elems * @elem_size. Only valid if
> + * @is_array is true.
> * @cur: Structure to store the current value.
> * @cur.val: The control's current value, if the @type is represented via
> * a u32 integer (see &enum v4l2_ctrl_type).
> @@ -291,8 +290,8 @@ struct v4l2_ctrl {
> };
> unsigned long flags;
> void *priv;
> - void *p_dyn;
> - u32 p_dyn_alloc_elems;
> + void *p_array;
> + u32 p_array_alloc_elems;
> s32 val;
> struct {
> s32 val;
Шановний бенефіціар!
Ви отримали 1 000 000,00 доларів США як благодійні пожертви/допомогу від Фонду сім’ї Пола Г. Аллена. Вашу електронну адресу було вибрано в Інтернеті під час випадкового пошуку. Будь ласка, зв’яжіться з нами якнайшвидше. Для отримання додаткової інформації відвідайте: https://pgafamilyfoundation.org/
Від імені фонду вітаємо вас.
З найкращими побажаннями,
Доктор Вел Буш
This is the backport of the RS485 polarity fixes discussed here:
https://lkml.kernel.org/r/20221010085305.GA32599@wunner.de
It fixes RS485 DE initially set wrong on driver init, blocking other
devices from transmitting on the bus.
Mizobuchi-san did the backport and tested on our imx-based platform, but
we do not have any hardware to test other drivers.
The commits also apply cleanly on 5.15, and for 5.19 the second commit
does (first one has already been picked up), but these have not been
tested so would require more checking.
Kernels older than 5.10 do not have this particular polarity inversion
problem and do not need this as far as I can see.
(there might be other problems this addresses that I am not aware of
though)
Thanks,
Lino Sanfilippo (1):
serial: core: move RS485 configuration tasks from drivers into core
Lukas Wunner (1):
serial: Deassert Transmit Enable on probe in driver-specific way
drivers/tty/serial/8250/8250_omap.c | 3 ++
drivers/tty/serial/8250/8250_pci.c | 9 +----
drivers/tty/serial/8250/8250_port.c | 12 +++---
drivers/tty/serial/fsl_lpuart.c | 7 ++--
drivers/tty/serial/imx.c | 8 +---
drivers/tty/serial/serial_core.c | 61 +++++++++++++++++++++++------
6 files changed, 65 insertions(+), 35 deletions(-)
--
2.35.1
The quilt patch titled
Subject: ocfs2: fix ocfs2 corrupt when iputting an inode
has been removed from the -mm tree. Its filename was
ocfs2-fix-ocfs2-corrupt-when-iputting-an-inode.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Wangyan <wangyan122(a)huawei.com>
Subject: ocfs2: fix ocfs2 corrupt when iputting an inode
In this condition, it will cause an bug on error.
ocfs2_mkdir()
->ocfs2_mknod()
->ocfs2_mknod_locked()
->__ocfs2_mknod_locked()
//Assume inode->i_generation is genN.
->inode->i_generation = osb->s_next_generation++;
// The inode lockres has been initialized.
->ocfs2_populate_inode()
->ocfs2_create_new_inode_locks()
->An error happened, returned value is non-zero
// free the start_bit x in bg_blkno
->ocfs2_free_suballoc_bits()
->... /* Another process execute mkdir success in this place,
and it occupied the start_bit x in bg_blkno
which has been freed before. Its inode->i_generation
is genN + 1 */
->iput(inode)
->evict()
->ocfs2_evict_inode()
->ocfs2_delete_inode()
->ocfs2_inode_lock()
->ocfs2_inode_lock_update()
/* Bug on here, genN != genN + 1 */
->mlog_bug_on_msg(inode->i_generation !=
le32_to_cpu(fe->i_generation))
So, we need not to reclaim the inode when the inode->ip_inode_lockres
has been initialized. It will be freed in iput().
Link: http://lkml.kernel.org/r/ef080ca3-5d74-e276-17a1-d9e7c7e662c9@huawei.com
Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error")
Signed-off-by: Yan Wang <wangyan122(a)huawei.com>
Reviewed-by: Jun Piao <piaojun(a)huawei.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Joseph Qi <jiangqi903(a)gmail.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/namei.c~ocfs2-fix-ocfs2-corrupt-when-iputting-an-inode
+++ a/fs/ocfs2/namei.c
@@ -641,7 +641,8 @@ static int ocfs2_mknod_locked(struct ocf
status = __ocfs2_mknod_locked(dir, inode, dev, new_fe_bh,
parent_fe_bh, handle, inode_ac,
fe_blkno, suballoc_loc, suballoc_bit);
- if (status < 0) {
+ if (status < 0 && !(OCFS2_I(inode)->ip_inode_lockres.l_flags &
+ OCFS2_LOCK_INITIALIZED)) {
u64 bg_blkno = ocfs2_which_suballoc_group(fe_blkno, suballoc_bit);
int tmp = ocfs2_free_suballoc_bits(handle, inode_ac->ac_inode,
inode_ac->ac_bh, suballoc_bit, bg_blkno, 1);
_
Patches currently in -mm which might be from wangyan122(a)huawei.com are
The quilt patch titled
Subject: ocfs2: clear links count in ocfs2_mknod() if an error occurs
has been removed from the -mm tree. Its filename was
ocfs2-clear-links-count-in-ocfs2_mknod-if-an-error-occurs.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Wangyan <wangyan122(a)huawei.com>
Subject: ocfs2: clear links count in ocfs2_mknod() if an error occurs
In this condition, the inode can not be wiped when error happened.
ocfs2_mkdir()
->ocfs2_mknod()
->ocfs2_mknod_locked()
->__ocfs2_mknod_locked()
->ocfs2_set_links_count() // i_links_count is 2
-> ... // an error accrue, goto roll_back or leave.
->ocfs2_commit_trans()
->iput(inode)
->evict()
->ocfs2_evict_inode()
->ocfs2_delete_inode()
->ocfs2_inode_lock()
->ocfs2_inode_lock_update()
->ocfs2_refresh_inode()
->set_nlink(); // inode->i_nlink is 2 now.
/* if wipe is 0, it will goto bail_unlock_inode */
->ocfs2_query_inode_wipe()
->if (inode->i_nlink) return; // wipe is 0.
/* inode can not be wiped */
->ocfs2_wipe_inode()
So, we need clear links before the transaction committed.
Link: http://lkml.kernel.org/r/d8147c41-fb2b-bdf7-b660-1f3c8448c33f@huawei.com
Signed-off-by: Yan Wang <wangyan122(a)huawei.com>
Reviewed-by: Jun Piao <piaojun(a)huawei.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Joseph Qi <jiangqi903(a)gmail.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/namei.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/namei.c~ocfs2-clear-links-count-in-ocfs2_mknod-if-an-error-occurs
+++ a/fs/ocfs2/namei.c
@@ -454,8 +454,12 @@ roll_back:
leave:
if (status < 0 && did_quota_inode)
dquot_free_inode(inode);
- if (handle)
+ if (handle) {
+ if (status < 0 && new_fe_bh != NULL)
+ ocfs2_set_links_count((struct ocfs2_dinode *)
+ new_fe_bh->b_data, 0);
ocfs2_commit_trans(osb, handle);
+ }
ocfs2_inode_unlock(dir, 1);
if (did_block_signals)
@@ -599,6 +603,8 @@ static int __ocfs2_mknod_locked(struct i
leave:
if (status < 0) {
if (*new_fe_bh) {
+ if (fe)
+ ocfs2_set_links_count(fe, 0);
brelse(*new_fe_bh);
*new_fe_bh = NULL;
}
@@ -2028,8 +2034,12 @@ bail:
ocfs2_clusters_to_bytes(osb->sb, 1));
if (status < 0 && did_quota_inode)
dquot_free_inode(inode);
- if (handle)
+ if (handle) {
+ if (status < 0 && new_fe_bh != NULL)
+ ocfs2_set_links_count((struct ocfs2_dinode *)
+ new_fe_bh->b_data, 0);
ocfs2_commit_trans(osb, handle);
+ }
ocfs2_inode_unlock(dir, 1);
if (did_block_signals)
_
Patches currently in -mm which might be from wangyan122(a)huawei.com are
ocfs2-fix-ocfs2-corrupt-when-iputting-an-inode.patch
Hi,
So I looked at this, and it wasn't great one way or the other...
The first patch here is obvious, let's just take it and get one
of the parsings out of the way.
The second one removes a couple of more cases where this is done,
since it only happens when the last argument is non-NULL.
The third then is to avoid the UAF, and is simpler now since only
a few places can even allocate it.
johannes
This is the start of the stable review cycle for the 5.4.218 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 15 Oct 2022 17:51:33 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.218-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.218-rc1
Cameron Gutman <aicommander(a)gmail.com>
Input: xpad - fix wireless 360 controller breaking after suspend
Pavel Rojtberg <rojtberg(a)gmail.com>
Input: xpad - add supported devices as contributed on github
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: update hidden BSSes to avoid WARN_ON
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211_hwsim: avoid mac80211 warning on bad rate
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: avoid nontransmitted BSS list corruption
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: fix BSS refcounting bugs
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: ensure length byte is present before access
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211/mac80211: reject bad MBSSID elements
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans()
Jason A. Donenfeld <Jason(a)zx2c4.com>
random: use expired timer rather than wq for mixing fast pool
Jason A. Donenfeld <Jason(a)zx2c4.com>
random: avoid reading two cache lines on irq randomness
Jason A. Donenfeld <Jason(a)zx2c4.com>
random: restore O_NONBLOCK support
Frank Wunderlich <frank-w(a)public-files.de>
USB: serial: qcserial: add new usb-id for Dell branded EM7455
Linus Torvalds <torvalds(a)linux-foundation.org>
scsi: stex: Properly zero out the passthrough command structure
Orlando Chamberlain <redecorating(a)protonmail.com>
efi: Correct Macmini DMI match in uefi cert quirk
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda: Fix position reporting on Poulsbo
Jason A. Donenfeld <Jason(a)zx2c4.com>
random: clamp credited irq bits to maximum mixed
Hu Weiwen <sehuww(a)mail.scut.edu.cn>
ceph: don't truncate file in atomic_open
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix leak of nilfs_root in case of writer thread creation failure
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
rpmsg: qcom: glink: replace strncpy() with strscpy_pad()
Brian Norris <briannorris(a)chromium.org>
mmc: core: Terminate infinite loop in SD-UHS voltage switch
ChanWoo Lee <cw9316.lee(a)samsung.com>
mmc: core: Replace with already defined values for readability
Johan Hovold <johan(a)kernel.org>
USB: serial: ftdi_sio: fix 300 bps rate for SIO
Tadeusz Struk <tadeusz.struk(a)linaro.org>
usb: mon: make mmapped memory read only
David Gow <davidgow(a)google.com>
arch: um: Mark the stack non-executable to fix a binutils warning
Lukas Straub <lukasstraub2(a)web.de>
um: Cleanup compiler warning in arch/x86/um/tls_32.c
Lukas Straub <lukasstraub2(a)web.de>
um: Cleanup syscall_handler_t cast in syscalls_32.h
Haimin Zhang <tcs.kernel(a)gmail.com>
net/ieee802154: fix uninit value bug in dgram_sendmsg
Letu Ren <fantasquex(a)gmail.com>
scsi: qedf: Fix a UAF bug in __qedf_probe()
Sergei Antonov <saproj(a)gmail.com>
ARM: dts: fix Moxa SDIO 'compatible', remove 'sdhci' misnomer
Swati Agarwal <swati.agarwal(a)xilinx.com>
dmaengine: xilinx_dma: Report error in case of dma_set_mask_and_coherent API failure
Swati Agarwal <swati.agarwal(a)xilinx.com>
dmaengine: xilinx_dma: cleanup for fetching xlnx,num-fstores property
Cristian Marussi <cristian.marussi(a)arm.com>
firmware: arm_scmi: Add SCMI PM driver remove routine
Dongliang Mu <mudongliangabcd(a)gmail.com>
fs: fix UAF/GPF bug in nilfs_mdt_destroy
Alexey Dobriyan <adobriyan(a)gmail.com>
perf tools: Fixup get_current_dir_name() compilation
Steven Price <steven.price(a)arm.com>
mm: pagewalk: Fix race between unmap and page walker
-------------
Diffstat:
.../devicetree/bindings/dma/moxa,moxart-dma.txt | 4 +-
Makefile | 4 +-
arch/arm/boot/dts/moxart-uc7112lx.dts | 2 +-
arch/arm/boot/dts/moxart.dtsi | 4 +-
arch/um/Makefile | 8 +++
arch/x86/um/shared/sysdep/syscalls_32.h | 5 +-
arch/x86/um/tls_32.c | 6 --
arch/x86/um/vdso/Makefile | 2 +-
drivers/char/mem.c | 4 +-
drivers/char/random.c | 25 ++++---
drivers/dma/xilinx/xilinx_dma.c | 8 ++-
drivers/firmware/arm_scmi/scmi_pm_domain.c | 20 ++++++
drivers/input/joystick/xpad.c | 20 +++++-
drivers/mmc/core/sd.c | 3 +-
drivers/net/wireless/mac80211_hwsim.c | 2 +
drivers/rpmsg/qcom_glink_native.c | 2 +-
drivers/rpmsg/qcom_smd.c | 4 +-
drivers/scsi/qedf/qedf_main.c | 5 --
drivers/scsi/stex.c | 17 ++---
drivers/usb/mon/mon_bin.c | 5 ++
drivers/usb/serial/ftdi_sio.c | 3 +-
drivers/usb/serial/qcserial.c | 1 +
fs/ceph/file.c | 10 ++-
fs/inode.c | 7 +-
fs/nilfs2/inode.c | 2 +
fs/nilfs2/segment.c | 21 +++---
include/net/ieee802154_netdev.h | 37 +++++++++++
include/scsi/scsi_cmnd.h | 2 +-
mm/pagewalk.c | 13 ++--
net/ieee802154/socket.c | 42 ++++++------
net/mac80211/util.c | 2 +
net/wireless/scan.c | 77 ++++++++++++++--------
security/integrity/platform_certs/load_uefi.c | 2 +-
sound/pci/hda/hda_intel.c | 3 +-
tools/perf/util/get_current_dir_name.c | 3 +-
35 files changed, 256 insertions(+), 119 deletions(-)
This is the start of the stable review cycle for the 5.10.148 release.
There are 54 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 15 Oct 2022 17:51:33 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.148-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.148-rc1
Shunsuke Mie <mie(a)igel.co.jp>
misc: pci_endpoint_test: Fix pci_endpoint_test_{copy,write,read}() panic
Shunsuke Mie <mie(a)igel.co.jp>
misc: pci_endpoint_test: Aggregate params checking for xfer
Cameron Gutman <aicommander(a)gmail.com>
Input: xpad - fix wireless 360 controller breaking after suspend
Pavel Rojtberg <rojtberg(a)gmail.com>
Input: xpad - add supported devices as contributed on github
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: update hidden BSSes to avoid WARN_ON
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211: fix crash in beacon protection for P2P-device
Johannes Berg <johannes.berg(a)intel.com>
wifi: mac80211_hwsim: avoid mac80211 warning on bad rate
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: avoid nontransmitted BSS list corruption
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: fix BSS refcounting bugs
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: ensure length byte is present before access
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211/mac80211: reject bad MBSSID elements
Johannes Berg <johannes.berg(a)intel.com>
wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans()
Jason A. Donenfeld <Jason(a)zx2c4.com>
random: use expired timer rather than wq for mixing fast pool
Jason A. Donenfeld <Jason(a)zx2c4.com>
random: avoid reading two cache lines on irq randomness
Frank Wunderlich <frank-w(a)public-files.de>
USB: serial: qcserial: add new usb-id for Dell branded EM7455
Linus Torvalds <torvalds(a)linux-foundation.org>
scsi: stex: Properly zero out the passthrough command structure
Orlando Chamberlain <redecorating(a)protonmail.com>
efi: Correct Macmini DMI match in uefi cert quirk
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda: Fix position reporting on Poulsbo
Jason A. Donenfeld <Jason(a)zx2c4.com>
random: clamp credited irq bits to maximum mixed
Jason A. Donenfeld <Jason(a)zx2c4.com>
random: restore O_NONBLOCK support
Sasha Levin <sashal(a)kernel.org>
Revert "clk: ti: Stop using legacy clkctrl names for omap4 and 5"
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
rpmsg: qcom: glink: replace strncpy() with strscpy_pad()
Johan Hovold <johan(a)kernel.org>
USB: serial: ftdi_sio: fix 300 bps rate for SIO
Tadeusz Struk <tadeusz.struk(a)linaro.org>
usb: mon: make mmapped memory read only
Brian Norris <briannorris(a)chromium.org>
mmc: core: Terminate infinite loop in SD-UHS voltage switch
ChanWoo Lee <cw9316.lee(a)samsung.com>
mmc: core: Replace with already defined values for readability
zhikzhai <zhikai.zhai(a)amd.com>
drm/amd/display: skip audio setup when audio stream is enabled
Hugo Hu <hugo.hu(a)amd.com>
drm/amd/display: update gamut remap if plane has changed
Jianglei Nie <niejianglei2021(a)163.com>
net: atlantic: fix potential memory leak in aq_ndev_close()
David Gow <davidgow(a)google.com>
arch: um: Mark the stack non-executable to fix a binutils warning
Lukas Straub <lukasstraub2(a)web.de>
um: Cleanup compiler warning in arch/x86/um/tls_32.c
Lukas Straub <lukasstraub2(a)web.de>
um: Cleanup syscall_handler_t cast in syscalls_32.h
Jaroslav Kysela <perex(a)perex.cz>
ALSA: hda/hdmi: Fix the converter reuse for the silent stream
Haimin Zhang <tcs.kernel(a)gmail.com>
net/ieee802154: fix uninit value bug in dgram_sendmsg
Letu Ren <fantasquex(a)gmail.com>
scsi: qedf: Fix a UAF bug in __qedf_probe()
Sergei Antonov <saproj(a)gmail.com>
ARM: dts: fix Moxa SDIO 'compatible', remove 'sdhci' misnomer
Swati Agarwal <swati.agarwal(a)xilinx.com>
dmaengine: xilinx_dma: Report error in case of dma_set_mask_and_coherent API failure
Swati Agarwal <swati.agarwal(a)xilinx.com>
dmaengine: xilinx_dma: cleanup for fetching xlnx,num-fstores property
Swati Agarwal <swati.agarwal(a)xilinx.com>
dmaengine: xilinx_dma: Fix devm_platform_ioremap_resource error handling
Cristian Marussi <cristian.marussi(a)arm.com>
firmware: arm_scmi: Add SCMI PM driver remove routine
Nick Desaulniers <ndesaulniers(a)google.com>
compiler_attributes.h: move __compiletime_{error|warning}
Dongliang Mu <mudongliangabcd(a)gmail.com>
fs: fix UAF/GPF bug in nilfs_mdt_destroy
Yang Shi <shy828301(a)gmail.com>
powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush
Yang Shi <shy828301(a)gmail.com>
mm: gup: fix the fast GUP race against THP collapse
Takashi Iwai <tiwai(a)suse.de>
ALSA: pcm: oss: Fix race at SNDCTL_DSP_SYNC
Jalal Mostafa <jalal.a.mostapha(a)gmail.com>
xsk: Inherit need_wakeup flag for shared sockets
Alexey Dobriyan <adobriyan(a)gmail.com>
perf tools: Fixup get_current_dir_name() compilation
Shuah Khan <skhan(a)linuxfoundation.org>
docs: update mediator information in CoC docs
Sami Tolvanen <samitolvanen(a)google.com>
Makefile.extrawarn: Move -Wcast-function-type-strict to W=1
Hu Weiwen <sehuww(a)mail.scut.edu.cn>
ceph: don't truncate file in atomic_open
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix leak of nilfs_root in case of writer thread creation failure
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix use-after-free bug of struct nilfs_root
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
-------------
Diffstat:
.../devicetree/bindings/dma/moxa,moxart-dma.txt | 4 +-
.../process/code-of-conduct-interpretation.rst | 2 +-
Makefile | 4 +-
arch/arm/boot/dts/moxart-uc7112lx.dts | 2 +-
arch/arm/boot/dts/moxart.dtsi | 4 +-
arch/powerpc/mm/book3s64/radix_pgtable.c | 9 -
arch/um/Makefile | 8 +
arch/x86/um/shared/sysdep/syscalls_32.h | 5 +-
arch/x86/um/tls_32.c | 6 -
arch/x86/um/vdso/Makefile | 2 +-
drivers/char/mem.c | 4 +-
drivers/char/random.c | 25 ++-
drivers/clk/ti/clk-44xx.c | 210 ++++++++++-----------
drivers/clk/ti/clk-54xx.c | 160 ++++++++--------
drivers/clk/ti/clkctrl.c | 4 +
drivers/dma/xilinx/xilinx_dma.c | 21 ++-
drivers/firmware/arm_scmi/scmi_pm_domain.c | 20 ++
.../amd/display/dc/dce110/dce110_hw_sequencer.c | 6 +-
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 1 +
drivers/input/joystick/xpad.c | 20 +-
drivers/misc/pci_endpoint_test.c | 34 +++-
drivers/mmc/core/sd.c | 3 +-
drivers/net/ethernet/aquantia/atlantic/aq_main.c | 3 -
drivers/net/wireless/mac80211_hwsim.c | 2 +
drivers/rpmsg/qcom_glink_native.c | 2 +-
drivers/rpmsg/qcom_smd.c | 4 +-
drivers/scsi/qedf/qedf_main.c | 5 -
drivers/scsi/stex.c | 17 +-
drivers/usb/mon/mon_bin.c | 5 +
drivers/usb/serial/ftdi_sio.c | 3 +-
drivers/usb/serial/qcserial.c | 1 +
fs/ceph/file.c | 10 +-
fs/inode.c | 7 +-
fs/nilfs2/inode.c | 19 +-
fs/nilfs2/segment.c | 21 ++-
include/linux/compiler-gcc.h | 3 -
include/linux/compiler_attributes.h | 24 +++
include/linux/compiler_types.h | 6 -
include/net/ieee802154_netdev.h | 37 ++++
include/net/xsk_buff_pool.h | 2 +-
include/scsi/scsi_cmnd.h | 2 +-
mm/gup.c | 34 +++-
mm/khugepaged.c | 10 +-
net/ieee802154/socket.c | 42 +++--
net/mac80211/rx.c | 12 +-
net/mac80211/util.c | 2 +
net/wireless/scan.c | 77 +++++---
net/xdp/xsk.c | 4 +-
net/xdp/xsk_buff_pool.c | 5 +-
scripts/Makefile.extrawarn | 1 +
security/integrity/platform_certs/load_uefi.c | 2 +-
sound/core/oss/pcm_oss.c | 5 +-
sound/pci/hda/hda_intel.c | 3 +-
sound/pci/hda/patch_hdmi.c | 1 +
tools/perf/util/get_current_dir_name.c | 3 +-
55 files changed, 570 insertions(+), 358 deletions(-)
The patch titled
Subject: gcov: support GCC 12.1 and newer compilers
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
gcov-support-gcc-121-and-newer-compilers.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Martin Liska <mliska(a)suse.cz>
Subject: gcov: support GCC 12.1 and newer compilers
Date: Thu, 13 Oct 2022 09:40:59 +0200
Starting with GCC 12.1, there are 2 significant changes to the .gcda file
format that needs to be supported:
a) [gcov: Use system IO buffering] (23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05) changed
that all sizes in the format are in bytes and not in words (4B)
b) [gcov: make profile merging smarter] (72e0c742bd01f8e7e6dcca64042b9ad7e75979de)
add a new checksum to the file header.
Tested with GCC 7.5, 10.4, 12.2 and the current master.
Link: https://lkml.kernel.org/r/624bda92-f307-30e9-9aaa-8cc678b2dfb2@suse.cz
Signed-off-by: Martin Liska <mliska(a)suse.cz>
Tested-by: Peter Oberparleiter <oberpar(a)linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/gcov/gcc_4_7.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
--- a/kernel/gcov/gcc_4_7.c~gcov-support-gcc-121-and-newer-compilers
+++ a/kernel/gcov/gcc_4_7.c
@@ -30,6 +30,13 @@
#define GCOV_TAG_FUNCTION_LENGTH 3
+/* Since GCC 12.1 sizes are in BYTES and not in WORDS (4B). */
+#if (__GNUC__ >= 12)
+#define GCOV_UNIT_SIZE 4
+#else
+#define GCOV_UNIT_SIZE 1
+#endif
+
static struct gcov_info *gcov_info_head;
/**
@@ -383,12 +390,18 @@ size_t convert_to_gcda(char *buffer, str
pos += store_gcov_u32(buffer, pos, info->version);
pos += store_gcov_u32(buffer, pos, info->stamp);
+#if (__GNUC__ >= 12)
+ /* Use zero as checksum of the compilation unit. */
+ pos += store_gcov_u32(buffer, pos, 0);
+#endif
+
for (fi_idx = 0; fi_idx < info->n_functions; fi_idx++) {
fi_ptr = info->functions[fi_idx];
/* Function record. */
pos += store_gcov_u32(buffer, pos, GCOV_TAG_FUNCTION);
- pos += store_gcov_u32(buffer, pos, GCOV_TAG_FUNCTION_LENGTH);
+ pos += store_gcov_u32(buffer, pos,
+ GCOV_TAG_FUNCTION_LENGTH * GCOV_UNIT_SIZE);
pos += store_gcov_u32(buffer, pos, fi_ptr->ident);
pos += store_gcov_u32(buffer, pos, fi_ptr->lineno_checksum);
pos += store_gcov_u32(buffer, pos, fi_ptr->cfg_checksum);
@@ -402,7 +415,8 @@ size_t convert_to_gcda(char *buffer, str
/* Counter record. */
pos += store_gcov_u32(buffer, pos,
GCOV_TAG_FOR_COUNTER(ct_idx));
- pos += store_gcov_u32(buffer, pos, ci_ptr->num * 2);
+ pos += store_gcov_u32(buffer, pos,
+ ci_ptr->num * 2 * GCOV_UNIT_SIZE);
for (cv_idx = 0; cv_idx < ci_ptr->num; cv_idx++) {
pos += store_gcov_u64(buffer, pos,
_
Patches currently in -mm which might be from mliska(a)suse.cz are
gcov-support-gcc-121-and-newer-compilers.patch
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
c0a581d7126c ("tracing: Disable interrupt or preemption before acquiring arch_spinlock_t")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c0a581d7126c0bbc96163276f585fd7b4e4d8d0e Mon Sep 17 00:00:00 2001
From: Waiman Long <longman(a)redhat.com>
Date: Thu, 22 Sep 2022 10:56:22 -0400
Subject: [PATCH] tracing: Disable interrupt or preemption before acquiring
arch_spinlock_t
It was found that some tracing functions in kernel/trace/trace.c acquire
an arch_spinlock_t with preemption and irqs enabled. An example is the
tracing_saved_cmdlines_size_read() function which intermittently causes
a "BUG: using smp_processor_id() in preemptible" warning when the LTP
read_all_proc test is run.
That can be problematic in case preemption happens after acquiring the
lock. Add the necessary preemption or interrupt disabling code in the
appropriate places before acquiring an arch_spinlock_t.
The convention here is to disable preemption for trace_cmdline_lock and
interupt for max_lock.
Link: https://lkml.kernel.org/r/20220922145622.1744826-1-longman@redhat.com
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: a35873a0993b ("tracing: Add conditional snapshot")
Fixes: 939c7a4f04fc ("tracing: Introduce saved_cmdlines_size file")
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d3005279165d..aed7ea6e6045 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1193,12 +1193,14 @@ void *tracing_cond_snapshot_data(struct trace_array *tr)
{
void *cond_data = NULL;
+ local_irq_disable();
arch_spin_lock(&tr->max_lock);
if (tr->cond_snapshot)
cond_data = tr->cond_snapshot->cond_data;
arch_spin_unlock(&tr->max_lock);
+ local_irq_enable();
return cond_data;
}
@@ -1334,9 +1336,11 @@ int tracing_snapshot_cond_enable(struct trace_array *tr, void *cond_data,
goto fail_unlock;
}
+ local_irq_disable();
arch_spin_lock(&tr->max_lock);
tr->cond_snapshot = cond_snapshot;
arch_spin_unlock(&tr->max_lock);
+ local_irq_enable();
mutex_unlock(&trace_types_lock);
@@ -1363,6 +1367,7 @@ int tracing_snapshot_cond_disable(struct trace_array *tr)
{
int ret = 0;
+ local_irq_disable();
arch_spin_lock(&tr->max_lock);
if (!tr->cond_snapshot)
@@ -1373,6 +1378,7 @@ int tracing_snapshot_cond_disable(struct trace_array *tr)
}
arch_spin_unlock(&tr->max_lock);
+ local_irq_enable();
return ret;
}
@@ -2200,6 +2206,11 @@ static size_t tgid_map_max;
#define SAVED_CMDLINES_DEFAULT 128
#define NO_CMDLINE_MAP UINT_MAX
+/*
+ * Preemption must be disabled before acquiring trace_cmdline_lock.
+ * The various trace_arrays' max_lock must be acquired in a context
+ * where interrupt is disabled.
+ */
static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED;
struct saved_cmdlines_buffer {
unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
@@ -2412,7 +2423,11 @@ static int trace_save_cmdline(struct task_struct *tsk)
* the lock, but we also don't want to spin
* nor do we want to disable interrupts,
* so if we miss here, then better luck next time.
+ *
+ * This is called within the scheduler and wake up, so interrupts
+ * had better been disabled and run queue lock been held.
*/
+ lockdep_assert_preemption_disabled();
if (!arch_spin_trylock(&trace_cmdline_lock))
return 0;
@@ -5890,9 +5905,11 @@ tracing_saved_cmdlines_size_read(struct file *filp, char __user *ubuf,
char buf[64];
int r;
+ preempt_disable();
arch_spin_lock(&trace_cmdline_lock);
r = scnprintf(buf, sizeof(buf), "%u\n", savedcmd->cmdline_num);
arch_spin_unlock(&trace_cmdline_lock);
+ preempt_enable();
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
}
@@ -5917,10 +5934,12 @@ static int tracing_resize_saved_cmdlines(unsigned int val)
return -ENOMEM;
}
+ preempt_disable();
arch_spin_lock(&trace_cmdline_lock);
savedcmd_temp = savedcmd;
savedcmd = s;
arch_spin_unlock(&trace_cmdline_lock);
+ preempt_enable();
free_saved_cmdlines_buffer(savedcmd_temp);
return 0;
@@ -6373,10 +6392,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
#ifdef CONFIG_TRACER_SNAPSHOT
if (t->use_max_tr) {
+ local_irq_disable();
arch_spin_lock(&tr->max_lock);
if (tr->cond_snapshot)
ret = -EBUSY;
arch_spin_unlock(&tr->max_lock);
+ local_irq_enable();
if (ret)
goto out;
}
@@ -7436,10 +7457,12 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt,
goto out;
}
+ local_irq_disable();
arch_spin_lock(&tr->max_lock);
if (tr->cond_snapshot)
ret = -EBUSY;
arch_spin_unlock(&tr->max_lock);
+ local_irq_enable();
if (ret)
goto out;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6094b9136ca9 ("drm/amd/display: explicitly disable psr_feature_enable appropriately")
cd9a0d026baa ("drm/amd/display: parse and check PSR SU caps")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6094b9136ca9038b61e9c4b5d25cd5512ce50b34 Mon Sep 17 00:00:00 2001
From: Shirish S <shirish.s(a)amd.com>
Date: Fri, 7 Oct 2022 20:31:49 +0530
Subject: [PATCH] drm/amd/display: explicitly disable psr_feature_enable
appropriately
[Why]
If psr_feature_enable is set to true by default, it continues to be enabled
for non capable links.
[How]
explicitly disable the feature on links that are not capable of the same.
Fixes: 8c322309e48e9 ("drm/amd/display: Enable PSR")
Signed-off-by: Shirish S <shirish.s(a)amd.com>
Reviewed-by: Leo Li <sunpeng.li(a)amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 5.15+
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
index 8ca10ab3dfc1..26291db0a3cf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
@@ -60,11 +60,15 @@ static bool link_supports_psrsu(struct dc_link *link)
*/
void amdgpu_dm_set_psr_caps(struct dc_link *link)
{
- if (!(link->connector_signal & SIGNAL_TYPE_EDP))
+ if (!(link->connector_signal & SIGNAL_TYPE_EDP)) {
+ link->psr_settings.psr_feature_enabled = false;
return;
+ }
- if (link->type == dc_connection_none)
+ if (link->type == dc_connection_none) {
+ link->psr_settings.psr_feature_enabled = false;
return;
+ }
if (link->dpcd_caps.psr_info.psr_version == 0) {
link->psr_settings.psr_version = DC_PSR_VERSION_UNSUPPORTED;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
91db7a3fc7fe ("media: cedrus: Fix endless loop in cedrus_h265_skip_bits()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91db7a3fc7fe670cf1770a398a43bb4a1f776bf1 Mon Sep 17 00:00:00 2001
From: Dmitry Osipenko <dmitry.osipenko(a)collabora.com>
Date: Thu, 18 Aug 2022 22:33:08 +0200
Subject: [PATCH] media: cedrus: Fix endless loop in cedrus_h265_skip_bits()
The busy status bit may never de-assert if number of programmed skip
bits is incorrect, resulting in a kernel hang because the bit is polled
endlessly in the code. Fix it by adding timeout for the bit-polling.
This problem is reproducible by setting the data_bit_offset field of
the HEVC slice params to a wrong value by userspace.
Cc: stable(a)vger.kernel.org
Fixes: 7678c5462680 (media: cedrus: Fix decoding for some HEVC videos)
Reported-by: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko(a)collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org>
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
index f703c585d91c..4952fc17f3e6 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_h265.c
@@ -234,8 +234,9 @@ static void cedrus_h265_skip_bits(struct cedrus_dev *dev, int num)
cedrus_write(dev, VE_DEC_H265_TRIGGER,
VE_DEC_H265_TRIGGER_FLUSH_BITS |
VE_DEC_H265_TRIGGER_TYPE_N_BITS(tmp));
- while (cedrus_read(dev, VE_DEC_H265_STATUS) & VE_DEC_H265_STATUS_VLD_BUSY)
- udelay(1);
+
+ if (cedrus_wait_for(dev, VE_DEC_H265_STATUS, VE_DEC_H265_STATUS_VLD_BUSY))
+ dev_err_ratelimited(dev->dev, "timed out waiting to skip bits\n");
count += tmp;
}
The patch below does not apply to the 5.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6c482c62a635 ("drm/i915: Fix display problems after resume")
2ef6efa79fec ("drm/i915: Improve on suspend / resume time with VT-d enabled")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6c482c62a635aa4f534d2439fbf8afa37452b986 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom(a)linux.intel.com>
Date: Wed, 5 Oct 2022 14:11:59 +0200
Subject: [PATCH] drm/i915: Fix display problems after resume
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit 39a2bd34c933 ("drm/i915: Use the vma resource as argument for gtt
binding / unbinding") introduced a regression that due to the vma resource
tracking of the binding state, dpt ptes were not correctly repopulated.
Fix this by clearing the vma resource state before repopulating.
The state will subsequently be restored by the bind_vma operation.
Fixes: 39a2bd34c933 ("drm/i915: Use the vma resource as argument for gtt binding / unbinding")
Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220912121957.31310-1-thomas…
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.18+
Reported-and-tested-by: Kevin Boulain <kevinboulain(a)gmail.com>
Tested-by: David de Sousa <davidesousa(a)gmail.com>
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221005121159.340245-1-thoma…
(cherry picked from commit bc2472538c0d1cce334ffc9e97df0614cd2b1469)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 30cf5c3369d9..2049a00417af 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -1275,10 +1275,16 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
atomic_read(&vma->flags) & I915_VMA_BIND_MASK;
GEM_BUG_ON(!was_bound);
- if (!retained_ptes)
+ if (!retained_ptes) {
+ /*
+ * Clear the bound flags of the vma resource to allow
+ * ptes to be repopulated.
+ */
+ vma->resource->bound_flags = 0;
vma->ops->bind_vma(vm, NULL, vma->resource,
obj ? obj->cache_level : 0,
was_bound);
+ }
if (obj) { /* only used during resume => exclusive access */
write_domain_objs |= fetch_and_zero(&obj->write_domain);
obj->read_domains |= I915_GEM_DOMAIN_GTT;
The patch below does not apply to the 5.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f423fa1bc9fe ("drm/i915/gvt: Add missing vfio_unregister_group_dev() call")
a5ddd2a99a7a ("drm/i915/gvt: Use the new device life cycle helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f423fa1bc9fe1978e6b9f54927411b62cb43eb04 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg(a)ziepe.ca>
Date: Thu, 29 Sep 2022 14:48:35 -0300
Subject: [PATCH] drm/i915/gvt: Add missing vfio_unregister_group_dev() call
When converting to directly create the vfio_device the mdev driver has to
put a vfio_register_emulated_iommu_dev() in the probe() and a pairing
vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable(a)vger.kernel.org
Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev")
Reported-by: Alex Williamson <alex.williamson(a)redhat.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/0-v1-013609965fe8+9d-vfio_gvt_unregister_jgg@nvid…
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 41bba40feef8..9003145adb5a 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1615,6 +1615,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached))
return;
+ vfio_unregister_group_dev(&vgpu->vfio_device);
vfio_put_device(&vgpu->vfio_device);
}
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f423fa1bc9fe ("drm/i915/gvt: Add missing vfio_unregister_group_dev() call")
a5ddd2a99a7a ("drm/i915/gvt: Use the new device life cycle helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f423fa1bc9fe1978e6b9f54927411b62cb43eb04 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg(a)ziepe.ca>
Date: Thu, 29 Sep 2022 14:48:35 -0300
Subject: [PATCH] drm/i915/gvt: Add missing vfio_unregister_group_dev() call
When converting to directly create the vfio_device the mdev driver has to
put a vfio_register_emulated_iommu_dev() in the probe() and a pairing
vfio_unregister_group_dev() in the remove.
This was missed for gvt, add it.
Cc: stable(a)vger.kernel.org
Fixes: 978cf586ac35 ("drm/i915/gvt: convert to use vfio_register_emulated_iommu_dev")
Reported-by: Alex Williamson <alex.williamson(a)redhat.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/0-v1-013609965fe8+9d-vfio_gvt_unregister_jgg@nvid…
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 41bba40feef8..9003145adb5a 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1615,6 +1615,7 @@ static void intel_vgpu_remove(struct mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached))
return;
+ vfio_unregister_group_dev(&vgpu->vfio_device);
vfio_put_device(&vgpu->vfio_device);
}
Using an InterTech DMG-02 dongle, the led remains on when the system goes
into standby mode. After wakeup, it's no longer possible to control the
led.
It turned out that the register settings to enable or disable the led were
not correct. They worked for some dongles like the Edimax V2 but not for
others like the InterTech DMG-02.
This patch fixes the register settings. Bit 3 in the led_cfg2 register
controls the led status, bit 5 must always be set to be able to control
the led, bit 6 has no influence on the led. Setting the mac_pinmux_cfg
register is not necessary.
These settings were tested with Edimax V2 and InterTech DMG-02.
Cc: stable(a)vger.kernel.org
Fixes: 8cd574e6af54 ("staging: r8188eu: introduce new hal dir for RTL8188eu driver")
Suggested-by: Michael Straube <straube.linux(a)gmail.com>
Signed-off-by: Martin Kaiser <martin(a)kaiser.cx>
---
drivers/staging/r8188eu/core/rtw_led.c | 25 ++-----------------------
1 file changed, 2 insertions(+), 23 deletions(-)
diff --git a/drivers/staging/r8188eu/core/rtw_led.c b/drivers/staging/r8188eu/core/rtw_led.c
index 2527c252c3e9..5b214488571b 100644
--- a/drivers/staging/r8188eu/core/rtw_led.c
+++ b/drivers/staging/r8188eu/core/rtw_led.c
@@ -31,40 +31,19 @@ static void ResetLedStatus(struct led_priv *pLed)
static void SwLedOn(struct adapter *padapter, struct led_priv *pLed)
{
- u8 LedCfg;
- int res;
-
if (padapter->bDriverStopped)
return;
- res = rtw_read8(padapter, REG_LEDCFG2, &LedCfg);
- if (res)
- return;
-
- rtw_write8(padapter, REG_LEDCFG2, (LedCfg & 0xf0) | BIT(5) | BIT(6)); /* SW control led0 on. */
+ rtw_write8(padapter, REG_LEDCFG2, BIT(5)); /* SW control led0 on. */
pLed->bLedOn = true;
}
static void SwLedOff(struct adapter *padapter, struct led_priv *pLed)
{
- u8 LedCfg;
- int res;
-
if (padapter->bDriverStopped)
goto exit;
- res = rtw_read8(padapter, REG_LEDCFG2, &LedCfg);/* 0x4E */
- if (res)
- goto exit;
-
- LedCfg &= 0x90; /* Set to software control. */
- rtw_write8(padapter, REG_LEDCFG2, (LedCfg | BIT(3)));
- res = rtw_read8(padapter, REG_MAC_PINMUX_CFG, &LedCfg);
- if (res)
- goto exit;
-
- LedCfg &= 0xFE;
- rtw_write8(padapter, REG_MAC_PINMUX_CFG, LedCfg);
+ rtw_write8(padapter, REG_LEDCFG2, BIT(5) | BIT(3));
exit:
pLed->bLedOn = false;
}
--
2.30.2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f9929f69de94 ("drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9929f69de94212f98b3ad72a3e81c3bd3d333e0 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Mon, 25 Jul 2022 16:36:29 -0700
Subject: [PATCH] drm/simpledrm: Fix return type of
simpledrm_simple_display_pipe_mode_valid()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When booting a kernel compiled with clang's CFI protection
(CONFIG_CFI_CLANG), there is a CFI failure in
drm_simple_kms_crtc_mode_valid() when trying to call
simpledrm_simple_display_pipe_mode_valid() through ->mode_valid():
[ 0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8):
...
[ 0.324928] Call trace:
[ 0.324969] __ubsan_handle_cfi_check_fail+0x58/0x60
[ 0.325053] __cfi_check_fail+0x3c/0x44
[ 0.325120] __cfi_slowpath_diag+0x178/0x200
[ 0.325192] drm_simple_kms_crtc_mode_valid+0x58/0x80
[ 0.325279] __drm_helper_update_and_validate+0x31c/0x464
...
The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs'
expects a return type of 'enum drm_mode_status', not 'int'. Correct it
to fix the CFI failure.
Cc: stable(a)vger.kernel.org
Fixes: 11e8f5fd223b ("drm: Add simpledrm driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/1647
Reported-by: Tomasz Paweł Gajc <tpgxyz(a)gmail.com>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reviewed-by: Sami Tolvanen <samitolvanen(a)google.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220725233629.223223-1-natha…
(cherry picked from commit 0c09bc33aa8e9dc867300acaadc318c2f0d85a1e)
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c
index 768242a78e2b..5422363690e7 100644
--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -627,7 +627,7 @@ static const struct drm_connector_funcs simpledrm_connector_funcs = {
.atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
};
-static int
+static enum drm_mode_status
simpledrm_simple_display_pipe_mode_valid(struct drm_simple_display_pipe *pipe,
const struct drm_display_mode *mode)
{
The patch below does not apply to the 5.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f9929f69de94 ("drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9929f69de94212f98b3ad72a3e81c3bd3d333e0 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Mon, 25 Jul 2022 16:36:29 -0700
Subject: [PATCH] drm/simpledrm: Fix return type of
simpledrm_simple_display_pipe_mode_valid()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When booting a kernel compiled with clang's CFI protection
(CONFIG_CFI_CLANG), there is a CFI failure in
drm_simple_kms_crtc_mode_valid() when trying to call
simpledrm_simple_display_pipe_mode_valid() through ->mode_valid():
[ 0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8):
...
[ 0.324928] Call trace:
[ 0.324969] __ubsan_handle_cfi_check_fail+0x58/0x60
[ 0.325053] __cfi_check_fail+0x3c/0x44
[ 0.325120] __cfi_slowpath_diag+0x178/0x200
[ 0.325192] drm_simple_kms_crtc_mode_valid+0x58/0x80
[ 0.325279] __drm_helper_update_and_validate+0x31c/0x464
...
The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs'
expects a return type of 'enum drm_mode_status', not 'int'. Correct it
to fix the CFI failure.
Cc: stable(a)vger.kernel.org
Fixes: 11e8f5fd223b ("drm: Add simpledrm driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/1647
Reported-by: Tomasz Paweł Gajc <tpgxyz(a)gmail.com>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reviewed-by: Sami Tolvanen <samitolvanen(a)google.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220725233629.223223-1-natha…
(cherry picked from commit 0c09bc33aa8e9dc867300acaadc318c2f0d85a1e)
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c
index 768242a78e2b..5422363690e7 100644
--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -627,7 +627,7 @@ static const struct drm_connector_funcs simpledrm_connector_funcs = {
.atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
};
-static int
+static enum drm_mode_status
simpledrm_simple_display_pipe_mode_valid(struct drm_simple_display_pipe *pipe,
const struct drm_display_mode *mode)
{
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2ea89c7f7f7b ("KVM: nVMX: Make an event request when pending an MTF nested VM-Exit")
7709aba8f716 ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time")
28360f887068 ("KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit")
6c593b5276e6 ("KVM: x86: Hoist nested event checks above event injection logic")
72c14e00bdc4 ("KVM: x86: Formalize blocking of nested pending exceptions")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
593a5c2e3c12 ("KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
b9d44f9091ac ("KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
2d61391270a3 ("KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
b699da3dc279 ("Merge tag 'kvm-riscv-5.19-1' of https://github.com/kvm-riscv/linux into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2ea89c7f7f7b192e32d1842dafc2e972cd14329b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Sep 2022 00:31:51 +0000
Subject: [PATCH] KVM: nVMX: Make an event request when pending an MTF nested
VM-Exit
Set KVM_REQ_EVENT when MTF becomes pending to ensure that KVM will run
through inject_pending_event() and thus vmx_check_nested_events() prior
to re-entering the guest.
MTF currently works by virtue of KVM's hack that calls
kvm_check_nested_events() from kvm_vcpu_running(), but that hack will
be removed in the near future. Until that call is removed, the patch
introduces no real functional change.
Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation")
Cc: stable(a)vger.kernel.org
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20220921003201.1441511-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 85318d803f4f..3a080051a4ec 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6632,6 +6632,9 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
if (ret)
goto error_guest_mode;
+ if (vmx->nested.mtf_pending)
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+
return 0;
error_guest_mode:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 94c314dc2393..9dba04b6b019 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1665,10 +1665,12 @@ static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
(!vcpu->arch.exception.pending ||
vcpu->arch.exception.vector == DB_VECTOR) &&
(!vcpu->arch.exception_vmexit.pending ||
- vcpu->arch.exception_vmexit.vector == DB_VECTOR))
+ vcpu->arch.exception_vmexit.vector == DB_VECTOR)) {
vmx->nested.mtf_pending = true;
- else
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ } else {
vmx->nested.mtf_pending = false;
+ }
}
static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2ea89c7f7f7b ("KVM: nVMX: Make an event request when pending an MTF nested VM-Exit")
7709aba8f716 ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time")
28360f887068 ("KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit")
6c593b5276e6 ("KVM: x86: Hoist nested event checks above event injection logic")
72c14e00bdc4 ("KVM: x86: Formalize blocking of nested pending exceptions")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
593a5c2e3c12 ("KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
b9d44f9091ac ("KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
2d61391270a3 ("KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction")
3741aec4c38f ("KVM: SVM: Stuff next_rip on emulated INT3 injection if NRIPS is supported")
cd9e6da8048c ("KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails"")
00f08d99dd7d ("KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02")
b699da3dc279 ("Merge tag 'kvm-riscv-5.19-1' of https://github.com/kvm-riscv/linux into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2ea89c7f7f7b192e32d1842dafc2e972cd14329b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Sep 2022 00:31:51 +0000
Subject: [PATCH] KVM: nVMX: Make an event request when pending an MTF nested
VM-Exit
Set KVM_REQ_EVENT when MTF becomes pending to ensure that KVM will run
through inject_pending_event() and thus vmx_check_nested_events() prior
to re-entering the guest.
MTF currently works by virtue of KVM's hack that calls
kvm_check_nested_events() from kvm_vcpu_running(), but that hack will
be removed in the near future. Until that call is removed, the patch
introduces no real functional change.
Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation")
Cc: stable(a)vger.kernel.org
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20220921003201.1441511-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 85318d803f4f..3a080051a4ec 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6632,6 +6632,9 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
if (ret)
goto error_guest_mode;
+ if (vmx->nested.mtf_pending)
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+
return 0;
error_guest_mode:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 94c314dc2393..9dba04b6b019 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1665,10 +1665,12 @@ static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
(!vcpu->arch.exception.pending ||
vcpu->arch.exception.vector == DB_VECTOR) &&
(!vcpu->arch.exception_vmexit.pending ||
- vcpu->arch.exception_vmexit.vector == DB_VECTOR))
+ vcpu->arch.exception_vmexit.vector == DB_VECTOR)) {
vmx->nested.mtf_pending = true;
- else
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ } else {
vmx->nested.mtf_pending = false;
+ }
}
static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2ea89c7f7f7b ("KVM: nVMX: Make an event request when pending an MTF nested VM-Exit")
7709aba8f716 ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time")
28360f887068 ("KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit")
6c593b5276e6 ("KVM: x86: Hoist nested event checks above event injection logic")
72c14e00bdc4 ("KVM: x86: Formalize blocking of nested pending exceptions")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
593a5c2e3c12 ("KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
b9d44f9091ac ("KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2ea89c7f7f7b192e32d1842dafc2e972cd14329b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Sep 2022 00:31:51 +0000
Subject: [PATCH] KVM: nVMX: Make an event request when pending an MTF nested
VM-Exit
Set KVM_REQ_EVENT when MTF becomes pending to ensure that KVM will run
through inject_pending_event() and thus vmx_check_nested_events() prior
to re-entering the guest.
MTF currently works by virtue of KVM's hack that calls
kvm_check_nested_events() from kvm_vcpu_running(), but that hack will
be removed in the near future. Until that call is removed, the patch
introduces no real functional change.
Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation")
Cc: stable(a)vger.kernel.org
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20220921003201.1441511-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 85318d803f4f..3a080051a4ec 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6632,6 +6632,9 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
if (ret)
goto error_guest_mode;
+ if (vmx->nested.mtf_pending)
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+
return 0;
error_guest_mode:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 94c314dc2393..9dba04b6b019 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1665,10 +1665,12 @@ static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
(!vcpu->arch.exception.pending ||
vcpu->arch.exception.vector == DB_VECTOR) &&
(!vcpu->arch.exception_vmexit.pending ||
- vcpu->arch.exception_vmexit.vector == DB_VECTOR))
+ vcpu->arch.exception_vmexit.vector == DB_VECTOR)) {
vmx->nested.mtf_pending = true;
- else
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ } else {
vmx->nested.mtf_pending = false;
+ }
}
static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
The patch below does not apply to the 5.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2ea89c7f7f7b ("KVM: nVMX: Make an event request when pending an MTF nested VM-Exit")
7709aba8f716 ("KVM: x86: Morph pending exceptions to pending VM-Exits at queue time")
28360f887068 ("KVM: x86: Evaluate ability to inject SMI/NMI/IRQ after potential VM-Exit")
6c593b5276e6 ("KVM: x86: Hoist nested event checks above event injection logic")
72c14e00bdc4 ("KVM: x86: Formalize blocking of nested pending exceptions")
d4963e319f1f ("KVM: x86: Make kvm_queued_exception a properly named, visible struct")
593a5c2e3c12 ("KVM: nVMX: Unconditionally clear mtf_pending on nested VM-Exit")
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
b9d44f9091ac ("KVM: nVMX: Prioritize TSS T-flag #DBs over Monitor Trap Flag")
8d178f460772 ("KVM: nVMX: Treat General Detect #DB (DR7.GD=1) as fault-like")
eba9799b5a6e ("KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS")
2d61391270a3 ("KVM: x86: Differentiate Soft vs. Hard IRQs vs. reinjected in tracepoint")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2ea89c7f7f7b192e32d1842dafc2e972cd14329b Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 21 Sep 2022 00:31:51 +0000
Subject: [PATCH] KVM: nVMX: Make an event request when pending an MTF nested
VM-Exit
Set KVM_REQ_EVENT when MTF becomes pending to ensure that KVM will run
through inject_pending_event() and thus vmx_check_nested_events() prior
to re-entering the guest.
MTF currently works by virtue of KVM's hack that calls
kvm_check_nested_events() from kvm_vcpu_running(), but that hack will
be removed in the near future. Until that call is removed, the patch
introduces no real functional change.
Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation")
Cc: stable(a)vger.kernel.org
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20220921003201.1441511-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 85318d803f4f..3a080051a4ec 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6632,6 +6632,9 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
if (ret)
goto error_guest_mode;
+ if (vmx->nested.mtf_pending)
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+
return 0;
error_guest_mode:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 94c314dc2393..9dba04b6b019 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1665,10 +1665,12 @@ static void vmx_update_emulated_instruction(struct kvm_vcpu *vcpu)
(!vcpu->arch.exception.pending ||
vcpu->arch.exception.vector == DB_VECTOR) &&
(!vcpu->arch.exception_vmexit.pending ||
- vcpu->arch.exception_vmexit.vector == DB_VECTOR))
+ vcpu->arch.exception_vmexit.vector == DB_VECTOR)) {
vmx->nested.mtf_pending = true;
- else
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ } else {
vmx->nested.mtf_pending = false;
+ }
}
static int vmx_skip_emulated_instruction(struct kvm_vcpu *vcpu)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1")
63129754178c ("KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places)")
2a32a77cefa6 ("KVM: SVM: merge update_cr0_intercept into svm_set_cr0")
11f0cbf0c605 ("KVM: nSVM: Trace VM-Enter consistency check failures")
6906e06db9b0 ("KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state()")
c08f390a75c1 ("KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state()")
9e8f0fbfff1a ("KVM: nSVM: rename functions and variables according to vmcbXY nomenclature")
193015adf40d ("KVM: nSVM: Track the ASID generation of the vmcb vmrun through the vmcb")
af18fa775d07 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb")
4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF interception")
43c11d91fb1e ("KVM: x86: to track if L1 is running L2 VM")
9e46f6c6c959 ("KVM: SVM: Clear the CR4 register on reset")
2df8d3807ce7 ("KVM: SVM: Fix nested VM-Exit on #GP interception handling")
d2df592fd8c6 ("KVM: nSVM: prepare guest save area while is_guest_mode is true")
a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0")
996ff5429e98 ("KVM: x86: move kvm_inject_gp up from kvm_set_dr to callers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623f751bd9c438ed12840e086f33c4646440d19 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 23:15:55 +0000
Subject: [PATCH] KVM: x86: Treat #DBs from the emulator as fault-like (code
and DR7.GD=1)
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.
For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.
For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=1 on the stack, keep the current behavior of not setting
RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "== EXCPT_FAULT" check is correct. The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=1, hardware (multiple generations of both Intel and AMD), in fact does
not. Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.
Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee22264ab471..ee3041da222b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -533,6 +533,7 @@ static int exception_class(int vector)
#define EXCPT_TRAP 1
#define EXCPT_ABORT 2
#define EXCPT_INTERRUPT 3
+#define EXCPT_DB 4
static int exception_type(int vector)
{
@@ -543,8 +544,14 @@ static int exception_type(int vector)
mask = 1 << vector;
- /* #DB is trap, as instruction watchpoints are handled elsewhere */
- if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+ /*
+ * #DBs can be trap-like or fault-like, the caller must check other CPU
+ * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+ */
+ if (mask & (1 << DB_VECTOR))
+ return EXCPT_DB;
+
+ if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
return EXCPT_TRAP;
if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8844,6 +8851,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
+ /*
+ * Note, EXCPT_DB is assumed to be fault-like as the emulator
+ * only supports code breakpoints and general detect #DB, both
+ * of which are fault-like.
+ */
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9767,6 +9780,16 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
/* try to inject new event if pending */
if (vcpu->arch.exception.pending) {
+ /*
+ * Fault-class exceptions, except #DBs, set RF=1 in the RFLAGS
+ * value pushed on the stack. Trap-like exception and all #DBs
+ * leave RF as-is (KVM follows Intel's behavior in this regard;
+ * AMD states that code breakpoint #DBs excplitly clear RF=0).
+ *
+ * Note, most versions of Intel's SDM and AMD's APM incorrectly
+ * describe the behavior of General Detect #DBs, which are
+ * fault-like. They do _not_ set RF, a la code breakpoints.
+ */
if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT)
__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
X86_EFLAGS_RF);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1")
63129754178c ("KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places)")
2a32a77cefa6 ("KVM: SVM: merge update_cr0_intercept into svm_set_cr0")
11f0cbf0c605 ("KVM: nSVM: Trace VM-Enter consistency check failures")
6906e06db9b0 ("KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state()")
c08f390a75c1 ("KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state()")
9e8f0fbfff1a ("KVM: nSVM: rename functions and variables according to vmcbXY nomenclature")
193015adf40d ("KVM: nSVM: Track the ASID generation of the vmcb vmrun through the vmcb")
af18fa775d07 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb")
4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF interception")
43c11d91fb1e ("KVM: x86: to track if L1 is running L2 VM")
9e46f6c6c959 ("KVM: SVM: Clear the CR4 register on reset")
2df8d3807ce7 ("KVM: SVM: Fix nested VM-Exit on #GP interception handling")
d2df592fd8c6 ("KVM: nSVM: prepare guest save area while is_guest_mode is true")
a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0")
996ff5429e98 ("KVM: x86: move kvm_inject_gp up from kvm_set_dr to callers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623f751bd9c438ed12840e086f33c4646440d19 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 23:15:55 +0000
Subject: [PATCH] KVM: x86: Treat #DBs from the emulator as fault-like (code
and DR7.GD=1)
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.
For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.
For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=1 on the stack, keep the current behavior of not setting
RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "== EXCPT_FAULT" check is correct. The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=1, hardware (multiple generations of both Intel and AMD), in fact does
not. Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.
Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee22264ab471..ee3041da222b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -533,6 +533,7 @@ static int exception_class(int vector)
#define EXCPT_TRAP 1
#define EXCPT_ABORT 2
#define EXCPT_INTERRUPT 3
+#define EXCPT_DB 4
static int exception_type(int vector)
{
@@ -543,8 +544,14 @@ static int exception_type(int vector)
mask = 1 << vector;
- /* #DB is trap, as instruction watchpoints are handled elsewhere */
- if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+ /*
+ * #DBs can be trap-like or fault-like, the caller must check other CPU
+ * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+ */
+ if (mask & (1 << DB_VECTOR))
+ return EXCPT_DB;
+
+ if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
return EXCPT_TRAP;
if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8844,6 +8851,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
+ /*
+ * Note, EXCPT_DB is assumed to be fault-like as the emulator
+ * only supports code breakpoints and general detect #DB, both
+ * of which are fault-like.
+ */
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9767,6 +9780,16 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
/* try to inject new event if pending */
if (vcpu->arch.exception.pending) {
+ /*
+ * Fault-class exceptions, except #DBs, set RF=1 in the RFLAGS
+ * value pushed on the stack. Trap-like exception and all #DBs
+ * leave RF as-is (KVM follows Intel's behavior in this regard;
+ * AMD states that code breakpoint #DBs excplitly clear RF=0).
+ *
+ * Note, most versions of Intel's SDM and AMD's APM incorrectly
+ * describe the behavior of General Detect #DBs, which are
+ * fault-like. They do _not_ set RF, a la code breakpoints.
+ */
if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT)
__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
X86_EFLAGS_RF);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1")
63129754178c ("KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places)")
2a32a77cefa6 ("KVM: SVM: merge update_cr0_intercept into svm_set_cr0")
11f0cbf0c605 ("KVM: nSVM: Trace VM-Enter consistency check failures")
6906e06db9b0 ("KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state()")
c08f390a75c1 ("KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state()")
9e8f0fbfff1a ("KVM: nSVM: rename functions and variables according to vmcbXY nomenclature")
193015adf40d ("KVM: nSVM: Track the ASID generation of the vmcb vmrun through the vmcb")
af18fa775d07 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb")
4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF interception")
43c11d91fb1e ("KVM: x86: to track if L1 is running L2 VM")
9e46f6c6c959 ("KVM: SVM: Clear the CR4 register on reset")
2df8d3807ce7 ("KVM: SVM: Fix nested VM-Exit on #GP interception handling")
d2df592fd8c6 ("KVM: nSVM: prepare guest save area while is_guest_mode is true")
a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0")
996ff5429e98 ("KVM: x86: move kvm_inject_gp up from kvm_set_dr to callers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623f751bd9c438ed12840e086f33c4646440d19 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 23:15:55 +0000
Subject: [PATCH] KVM: x86: Treat #DBs from the emulator as fault-like (code
and DR7.GD=1)
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.
For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.
For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=1 on the stack, keep the current behavior of not setting
RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "== EXCPT_FAULT" check is correct. The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=1, hardware (multiple generations of both Intel and AMD), in fact does
not. Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.
Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee22264ab471..ee3041da222b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -533,6 +533,7 @@ static int exception_class(int vector)
#define EXCPT_TRAP 1
#define EXCPT_ABORT 2
#define EXCPT_INTERRUPT 3
+#define EXCPT_DB 4
static int exception_type(int vector)
{
@@ -543,8 +544,14 @@ static int exception_type(int vector)
mask = 1 << vector;
- /* #DB is trap, as instruction watchpoints are handled elsewhere */
- if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+ /*
+ * #DBs can be trap-like or fault-like, the caller must check other CPU
+ * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+ */
+ if (mask & (1 << DB_VECTOR))
+ return EXCPT_DB;
+
+ if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
return EXCPT_TRAP;
if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8844,6 +8851,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
+ /*
+ * Note, EXCPT_DB is assumed to be fault-like as the emulator
+ * only supports code breakpoints and general detect #DB, both
+ * of which are fault-like.
+ */
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9767,6 +9780,16 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
/* try to inject new event if pending */
if (vcpu->arch.exception.pending) {
+ /*
+ * Fault-class exceptions, except #DBs, set RF=1 in the RFLAGS
+ * value pushed on the stack. Trap-like exception and all #DBs
+ * leave RF as-is (KVM follows Intel's behavior in this regard;
+ * AMD states that code breakpoint #DBs excplitly clear RF=0).
+ *
+ * Note, most versions of Intel's SDM and AMD's APM incorrectly
+ * describe the behavior of General Detect #DBs, which are
+ * fault-like. They do _not_ set RF, a la code breakpoints.
+ */
if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT)
__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
X86_EFLAGS_RF);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1")
63129754178c ("KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places)")
2a32a77cefa6 ("KVM: SVM: merge update_cr0_intercept into svm_set_cr0")
11f0cbf0c605 ("KVM: nSVM: Trace VM-Enter consistency check failures")
6906e06db9b0 ("KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state()")
c08f390a75c1 ("KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state()")
9e8f0fbfff1a ("KVM: nSVM: rename functions and variables according to vmcbXY nomenclature")
193015adf40d ("KVM: nSVM: Track the ASID generation of the vmcb vmrun through the vmcb")
af18fa775d07 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb")
4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF interception")
43c11d91fb1e ("KVM: x86: to track if L1 is running L2 VM")
9e46f6c6c959 ("KVM: SVM: Clear the CR4 register on reset")
2df8d3807ce7 ("KVM: SVM: Fix nested VM-Exit on #GP interception handling")
d2df592fd8c6 ("KVM: nSVM: prepare guest save area while is_guest_mode is true")
a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0")
996ff5429e98 ("KVM: x86: move kvm_inject_gp up from kvm_set_dr to callers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623f751bd9c438ed12840e086f33c4646440d19 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 23:15:55 +0000
Subject: [PATCH] KVM: x86: Treat #DBs from the emulator as fault-like (code
and DR7.GD=1)
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.
For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.
For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=1 on the stack, keep the current behavior of not setting
RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "== EXCPT_FAULT" check is correct. The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=1, hardware (multiple generations of both Intel and AMD), in fact does
not. Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.
Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee22264ab471..ee3041da222b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -533,6 +533,7 @@ static int exception_class(int vector)
#define EXCPT_TRAP 1
#define EXCPT_ABORT 2
#define EXCPT_INTERRUPT 3
+#define EXCPT_DB 4
static int exception_type(int vector)
{
@@ -543,8 +544,14 @@ static int exception_type(int vector)
mask = 1 << vector;
- /* #DB is trap, as instruction watchpoints are handled elsewhere */
- if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+ /*
+ * #DBs can be trap-like or fault-like, the caller must check other CPU
+ * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+ */
+ if (mask & (1 << DB_VECTOR))
+ return EXCPT_DB;
+
+ if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
return EXCPT_TRAP;
if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8844,6 +8851,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
+ /*
+ * Note, EXCPT_DB is assumed to be fault-like as the emulator
+ * only supports code breakpoints and general detect #DB, both
+ * of which are fault-like.
+ */
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9767,6 +9780,16 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
/* try to inject new event if pending */
if (vcpu->arch.exception.pending) {
+ /*
+ * Fault-class exceptions, except #DBs, set RF=1 in the RFLAGS
+ * value pushed on the stack. Trap-like exception and all #DBs
+ * leave RF as-is (KVM follows Intel's behavior in this regard;
+ * AMD states that code breakpoint #DBs excplitly clear RF=0).
+ *
+ * Note, most versions of Intel's SDM and AMD's APM incorrectly
+ * describe the behavior of General Detect #DBs, which are
+ * fault-like. They do _not_ set RF, a la code breakpoints.
+ */
if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT)
__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
X86_EFLAGS_RF);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1")
63129754178c ("KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places)")
2a32a77cefa6 ("KVM: SVM: merge update_cr0_intercept into svm_set_cr0")
11f0cbf0c605 ("KVM: nSVM: Trace VM-Enter consistency check failures")
6906e06db9b0 ("KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state()")
c08f390a75c1 ("KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state()")
9e8f0fbfff1a ("KVM: nSVM: rename functions and variables according to vmcbXY nomenclature")
193015adf40d ("KVM: nSVM: Track the ASID generation of the vmcb vmrun through the vmcb")
af18fa775d07 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb")
4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
6d1b867d0456 ("KVM: SVM: Don't strip the C-bit from CR2 on #PF interception")
43c11d91fb1e ("KVM: x86: to track if L1 is running L2 VM")
9e46f6c6c959 ("KVM: SVM: Clear the CR4 register on reset")
2df8d3807ce7 ("KVM: SVM: Fix nested VM-Exit on #GP interception handling")
d2df592fd8c6 ("KVM: nSVM: prepare guest save area while is_guest_mode is true")
a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0")
996ff5429e98 ("KVM: x86: move kvm_inject_gp up from kvm_set_dr to callers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623f751bd9c438ed12840e086f33c4646440d19 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 23:15:55 +0000
Subject: [PATCH] KVM: x86: Treat #DBs from the emulator as fault-like (code
and DR7.GD=1)
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.
For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.
For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=1 on the stack, keep the current behavior of not setting
RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "== EXCPT_FAULT" check is correct. The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=1, hardware (multiple generations of both Intel and AMD), in fact does
not. Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.
Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee22264ab471..ee3041da222b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -533,6 +533,7 @@ static int exception_class(int vector)
#define EXCPT_TRAP 1
#define EXCPT_ABORT 2
#define EXCPT_INTERRUPT 3
+#define EXCPT_DB 4
static int exception_type(int vector)
{
@@ -543,8 +544,14 @@ static int exception_type(int vector)
mask = 1 << vector;
- /* #DB is trap, as instruction watchpoints are handled elsewhere */
- if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+ /*
+ * #DBs can be trap-like or fault-like, the caller must check other CPU
+ * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+ */
+ if (mask & (1 << DB_VECTOR))
+ return EXCPT_DB;
+
+ if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
return EXCPT_TRAP;
if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8844,6 +8851,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
+ /*
+ * Note, EXCPT_DB is assumed to be fault-like as the emulator
+ * only supports code breakpoints and general detect #DB, both
+ * of which are fault-like.
+ */
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9767,6 +9780,16 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
/* try to inject new event if pending */
if (vcpu->arch.exception.pending) {
+ /*
+ * Fault-class exceptions, except #DBs, set RF=1 in the RFLAGS
+ * value pushed on the stack. Trap-like exception and all #DBs
+ * leave RF as-is (KVM follows Intel's behavior in this regard;
+ * AMD states that code breakpoint #DBs excplitly clear RF=0).
+ *
+ * Note, most versions of Intel's SDM and AMD's APM incorrectly
+ * describe the behavior of General Detect #DBs, which are
+ * fault-like. They do _not_ set RF, a la code breakpoints.
+ */
if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT)
__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
X86_EFLAGS_RF);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623f751bd9c438ed12840e086f33c4646440d19 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 23:15:55 +0000
Subject: [PATCH] KVM: x86: Treat #DBs from the emulator as fault-like (code
and DR7.GD=1)
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.
For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.
For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=1 on the stack, keep the current behavior of not setting
RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "== EXCPT_FAULT" check is correct. The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=1, hardware (multiple generations of both Intel and AMD), in fact does
not. Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.
Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee22264ab471..ee3041da222b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -533,6 +533,7 @@ static int exception_class(int vector)
#define EXCPT_TRAP 1
#define EXCPT_ABORT 2
#define EXCPT_INTERRUPT 3
+#define EXCPT_DB 4
static int exception_type(int vector)
{
@@ -543,8 +544,14 @@ static int exception_type(int vector)
mask = 1 << vector;
- /* #DB is trap, as instruction watchpoints are handled elsewhere */
- if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+ /*
+ * #DBs can be trap-like or fault-like, the caller must check other CPU
+ * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+ */
+ if (mask & (1 << DB_VECTOR))
+ return EXCPT_DB;
+
+ if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
return EXCPT_TRAP;
if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8844,6 +8851,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
+ /*
+ * Note, EXCPT_DB is assumed to be fault-like as the emulator
+ * only supports code breakpoints and general detect #DB, both
+ * of which are fault-like.
+ */
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9767,6 +9780,16 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
/* try to inject new event if pending */
if (vcpu->arch.exception.pending) {
+ /*
+ * Fault-class exceptions, except #DBs, set RF=1 in the RFLAGS
+ * value pushed on the stack. Trap-like exception and all #DBs
+ * leave RF as-is (KVM follows Intel's behavior in this regard;
+ * AMD states that code breakpoint #DBs excplitly clear RF=0).
+ *
+ * Note, most versions of Intel's SDM and AMD's APM incorrectly
+ * describe the behavior of General Detect #DBs, which are
+ * fault-like. They do _not_ set RF, a la code breakpoints.
+ */
if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT)
__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
X86_EFLAGS_RF);
The patch below does not apply to the 5.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5623f751bd9c ("KVM: x86: Treat #DBs from the emulator as fault-like (code and DR7.GD=1)")
a61d7c5432ac ("KVM: x86: Trace re-injected exceptions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5623f751bd9c438ed12840e086f33c4646440d19 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 23:15:55 +0000
Subject: [PATCH] KVM: x86: Treat #DBs from the emulator as fault-like (code
and DR7.GD=1)
Add a dedicated "exception type" for #DBs, as #DBs can be fault-like or
trap-like depending the sub-type of #DB, and effectively defer the
decision of what to do with the #DB to the caller.
For the emulator's two calls to exception_type(), treat the #DB as
fault-like, as the emulator handles only code breakpoint and general
detect #DBs, both of which are fault-like.
For event injection, which uses exception_type() to determine whether to
set EFLAGS.RF=1 on the stack, keep the current behavior of not setting
RF=1 for #DBs. Intel and AMD explicitly state RF isn't set on code #DBs,
so exempting by failing the "== EXCPT_FAULT" check is correct. The only
other fault-like #DB is General Detect, and despite Intel and AMD both
strongly implying (through omission) that General Detect #DBs should set
RF=1, hardware (multiple generations of both Intel and AMD), in fact does
not. Through insider knowledge, extreme foresight, sheer dumb luck, or
some combination thereof, KVM correctly handled RF for General Detect #DBs.
Fixes: 38827dbd3fb8 ("KVM: x86: Do not update EFLAGS on faulting emulation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Link: https://lore.kernel.org/r/20220830231614.3580124-9-seanjc@google.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ee22264ab471..ee3041da222b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -533,6 +533,7 @@ static int exception_class(int vector)
#define EXCPT_TRAP 1
#define EXCPT_ABORT 2
#define EXCPT_INTERRUPT 3
+#define EXCPT_DB 4
static int exception_type(int vector)
{
@@ -543,8 +544,14 @@ static int exception_type(int vector)
mask = 1 << vector;
- /* #DB is trap, as instruction watchpoints are handled elsewhere */
- if (mask & ((1 << DB_VECTOR) | (1 << BP_VECTOR) | (1 << OF_VECTOR)))
+ /*
+ * #DBs can be trap-like or fault-like, the caller must check other CPU
+ * state, e.g. DR6, to determine whether a #DB is a trap or fault.
+ */
+ if (mask & (1 << DB_VECTOR))
+ return EXCPT_DB;
+
+ if (mask & ((1 << BP_VECTOR) | (1 << OF_VECTOR)))
return EXCPT_TRAP;
if (mask & ((1 << DF_VECTOR) | (1 << MC_VECTOR)))
@@ -8844,6 +8851,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+
+ /*
+ * Note, EXCPT_DB is assumed to be fault-like as the emulator
+ * only supports code breakpoints and general detect #DB, both
+ * of which are fault-like.
+ */
if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
kvm_pmu_trigger_event(vcpu, PERF_COUNT_HW_INSTRUCTIONS);
@@ -9767,6 +9780,16 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
/* try to inject new event if pending */
if (vcpu->arch.exception.pending) {
+ /*
+ * Fault-class exceptions, except #DBs, set RF=1 in the RFLAGS
+ * value pushed on the stack. Trap-like exception and all #DBs
+ * leave RF as-is (KVM follows Intel's behavior in this regard;
+ * AMD states that code breakpoint #DBs excplitly clear RF=0).
+ *
+ * Note, most versions of Intel's SDM and AMD's APM incorrectly
+ * describe the behavior of General Detect #DBs, which are
+ * fault-like. They do _not_ set RF, a la code breakpoints.
+ */
if (exception_type(vcpu->arch.exception.nr) == EXCPT_FAULT)
__kvm_set_rflags(vcpu, kvm_get_rflags(vcpu) |
X86_EFLAGS_RF);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
def9d705c05e ("KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02")
389ab25216c9 ("KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01")
d041b5ea9335 ("KVM: nVMX: Enable nested TSC scaling")
5e3d394fdd9e ("KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING")
4e2a0bc56ad1 ("KVM: VMX: Rename NMI_PENDING to NMI_WINDOW")
9dadc2f918df ("KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOW")
4289d2728664 ("KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers")
f399e60c45f6 ("KVM: x86: optimize more exit handlers in vmx.c")
b7ad61084842 ("tools headers kvm: Sync kvm headers with the kernel sources")
4a53d99dd0c2 ("KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode")
5497b95567c1 ("KVM: nVMX: add tracepoint for failed nested VM-Enter")
1edce0a9eb23 ("KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM code")
f20935d85a23 ("KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callers")
32d1d15c52c1 ("Merge tag 'kvmarm-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From def9d705c05eab3fdedeb10ad67907513b12038e Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 15:37:21 +0200
Subject: [PATCH] KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings
to vmcs02
Don't propagate vmcs12's VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL to vmcs02.
KVM doesn't disallow L1 from using VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
even when KVM itself doesn't use the control, e.g. due to the various
CPU errata that where the MSR can be corrupted on VM-Exit.
Preserve KVM's (vmcs01) setting to hopefully avoid having to toggle the
bit in vmcs02 at a later point. E.g. if KVM is loading PERF_GLOBAL_CTRL
when running L1, then odds are good KVM will also load the MSR when
running L2.
Fixes: 8bf00a529967 ("KVM: VMX: add support for switching of PERF_GLOBAL_CTRL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-18-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1743319015b7..400e015afa64 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2359,9 +2359,14 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
* are emulated by vmx_set_efer() in prepare_vmcs02(), but speculate
* on the related bits (if supported by the CPU) in the hope that
* we can avoid VMWrites during vmx_set_efer().
+ *
+ * Similarly, take vmcs01's PERF_GLOBAL_CTRL in the hope that if KVM is
+ * loading PERF_GLOBAL_CTRL via the VMCS for L1, then KVM will want to
+ * do the same for L2.
*/
exec_control = __vm_entry_controls_get(vmcs01);
- exec_control |= vmcs12->vm_entry_controls;
+ exec_control |= (vmcs12->vm_entry_controls &
+ ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
exec_control &= ~(VM_ENTRY_IA32E_MODE | VM_ENTRY_LOAD_IA32_EFER);
if (cpu_has_load_ia32_efer()) {
if (guest_efer & EFER_LMA)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
def9d705c05e ("KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02")
389ab25216c9 ("KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01")
d041b5ea9335 ("KVM: nVMX: Enable nested TSC scaling")
5e3d394fdd9e ("KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING")
4e2a0bc56ad1 ("KVM: VMX: Rename NMI_PENDING to NMI_WINDOW")
9dadc2f918df ("KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOW")
4289d2728664 ("KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers")
f399e60c45f6 ("KVM: x86: optimize more exit handlers in vmx.c")
b7ad61084842 ("tools headers kvm: Sync kvm headers with the kernel sources")
4a53d99dd0c2 ("KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode")
5497b95567c1 ("KVM: nVMX: add tracepoint for failed nested VM-Enter")
1edce0a9eb23 ("KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM code")
f20935d85a23 ("KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callers")
32d1d15c52c1 ("Merge tag 'kvmarm-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From def9d705c05eab3fdedeb10ad67907513b12038e Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 15:37:21 +0200
Subject: [PATCH] KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings
to vmcs02
Don't propagate vmcs12's VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL to vmcs02.
KVM doesn't disallow L1 from using VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
even when KVM itself doesn't use the control, e.g. due to the various
CPU errata that where the MSR can be corrupted on VM-Exit.
Preserve KVM's (vmcs01) setting to hopefully avoid having to toggle the
bit in vmcs02 at a later point. E.g. if KVM is loading PERF_GLOBAL_CTRL
when running L1, then odds are good KVM will also load the MSR when
running L2.
Fixes: 8bf00a529967 ("KVM: VMX: add support for switching of PERF_GLOBAL_CTRL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-18-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1743319015b7..400e015afa64 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2359,9 +2359,14 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
* are emulated by vmx_set_efer() in prepare_vmcs02(), but speculate
* on the related bits (if supported by the CPU) in the hope that
* we can avoid VMWrites during vmx_set_efer().
+ *
+ * Similarly, take vmcs01's PERF_GLOBAL_CTRL in the hope that if KVM is
+ * loading PERF_GLOBAL_CTRL via the VMCS for L1, then KVM will want to
+ * do the same for L2.
*/
exec_control = __vm_entry_controls_get(vmcs01);
- exec_control |= vmcs12->vm_entry_controls;
+ exec_control |= (vmcs12->vm_entry_controls &
+ ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
exec_control &= ~(VM_ENTRY_IA32E_MODE | VM_ENTRY_LOAD_IA32_EFER);
if (cpu_has_load_ia32_efer()) {
if (guest_efer & EFER_LMA)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
def9d705c05e ("KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02")
389ab25216c9 ("KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01")
d041b5ea9335 ("KVM: nVMX: Enable nested TSC scaling")
5e3d394fdd9e ("KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING")
4e2a0bc56ad1 ("KVM: VMX: Rename NMI_PENDING to NMI_WINDOW")
9dadc2f918df ("KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOW")
4289d2728664 ("KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers")
f399e60c45f6 ("KVM: x86: optimize more exit handlers in vmx.c")
b7ad61084842 ("tools headers kvm: Sync kvm headers with the kernel sources")
4a53d99dd0c2 ("KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode")
5497b95567c1 ("KVM: nVMX: add tracepoint for failed nested VM-Enter")
1edce0a9eb23 ("KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM code")
f20935d85a23 ("KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callers")
32d1d15c52c1 ("Merge tag 'kvmarm-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From def9d705c05eab3fdedeb10ad67907513b12038e Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 15:37:21 +0200
Subject: [PATCH] KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings
to vmcs02
Don't propagate vmcs12's VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL to vmcs02.
KVM doesn't disallow L1 from using VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
even when KVM itself doesn't use the control, e.g. due to the various
CPU errata that where the MSR can be corrupted on VM-Exit.
Preserve KVM's (vmcs01) setting to hopefully avoid having to toggle the
bit in vmcs02 at a later point. E.g. if KVM is loading PERF_GLOBAL_CTRL
when running L1, then odds are good KVM will also load the MSR when
running L2.
Fixes: 8bf00a529967 ("KVM: VMX: add support for switching of PERF_GLOBAL_CTRL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-18-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1743319015b7..400e015afa64 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2359,9 +2359,14 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
* are emulated by vmx_set_efer() in prepare_vmcs02(), but speculate
* on the related bits (if supported by the CPU) in the hope that
* we can avoid VMWrites during vmx_set_efer().
+ *
+ * Similarly, take vmcs01's PERF_GLOBAL_CTRL in the hope that if KVM is
+ * loading PERF_GLOBAL_CTRL via the VMCS for L1, then KVM will want to
+ * do the same for L2.
*/
exec_control = __vm_entry_controls_get(vmcs01);
- exec_control |= vmcs12->vm_entry_controls;
+ exec_control |= (vmcs12->vm_entry_controls &
+ ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
exec_control &= ~(VM_ENTRY_IA32E_MODE | VM_ENTRY_LOAD_IA32_EFER);
if (cpu_has_load_ia32_efer()) {
if (guest_efer & EFER_LMA)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
def9d705c05e ("KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02")
389ab25216c9 ("KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01")
d041b5ea9335 ("KVM: nVMX: Enable nested TSC scaling")
5e3d394fdd9e ("KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING")
4e2a0bc56ad1 ("KVM: VMX: Rename NMI_PENDING to NMI_WINDOW")
9dadc2f918df ("KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOW")
4289d2728664 ("KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers")
f399e60c45f6 ("KVM: x86: optimize more exit handlers in vmx.c")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From def9d705c05eab3fdedeb10ad67907513b12038e Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 15:37:21 +0200
Subject: [PATCH] KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings
to vmcs02
Don't propagate vmcs12's VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL to vmcs02.
KVM doesn't disallow L1 from using VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
even when KVM itself doesn't use the control, e.g. due to the various
CPU errata that where the MSR can be corrupted on VM-Exit.
Preserve KVM's (vmcs01) setting to hopefully avoid having to toggle the
bit in vmcs02 at a later point. E.g. if KVM is loading PERF_GLOBAL_CTRL
when running L1, then odds are good KVM will also load the MSR when
running L2.
Fixes: 8bf00a529967 ("KVM: VMX: add support for switching of PERF_GLOBAL_CTRL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-18-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1743319015b7..400e015afa64 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2359,9 +2359,14 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
* are emulated by vmx_set_efer() in prepare_vmcs02(), but speculate
* on the related bits (if supported by the CPU) in the hope that
* we can avoid VMWrites during vmx_set_efer().
+ *
+ * Similarly, take vmcs01's PERF_GLOBAL_CTRL in the hope that if KVM is
+ * loading PERF_GLOBAL_CTRL via the VMCS for L1, then KVM will want to
+ * do the same for L2.
*/
exec_control = __vm_entry_controls_get(vmcs01);
- exec_control |= vmcs12->vm_entry_controls;
+ exec_control |= (vmcs12->vm_entry_controls &
+ ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
exec_control &= ~(VM_ENTRY_IA32E_MODE | VM_ENTRY_LOAD_IA32_EFER);
if (cpu_has_load_ia32_efer()) {
if (guest_efer & EFER_LMA)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
def9d705c05e ("KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02")
389ab25216c9 ("KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01")
d041b5ea9335 ("KVM: nVMX: Enable nested TSC scaling")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From def9d705c05eab3fdedeb10ad67907513b12038e Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 30 Aug 2022 15:37:21 +0200
Subject: [PATCH] KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings
to vmcs02
Don't propagate vmcs12's VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL to vmcs02.
KVM doesn't disallow L1 from using VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL
even when KVM itself doesn't use the control, e.g. due to the various
CPU errata that where the MSR can be corrupted on VM-Exit.
Preserve KVM's (vmcs01) setting to hopefully avoid having to toggle the
bit in vmcs02 at a later point. E.g. if KVM is loading PERF_GLOBAL_CTRL
when running L1, then odds are good KVM will also load the MSR when
running L2.
Fixes: 8bf00a529967 ("KVM: VMX: add support for switching of PERF_GLOBAL_CTRL")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20220830133737.1539624-18-vkuznets@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1743319015b7..400e015afa64 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2359,9 +2359,14 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
* are emulated by vmx_set_efer() in prepare_vmcs02(), but speculate
* on the related bits (if supported by the CPU) in the hope that
* we can avoid VMWrites during vmx_set_efer().
+ *
+ * Similarly, take vmcs01's PERF_GLOBAL_CTRL in the hope that if KVM is
+ * loading PERF_GLOBAL_CTRL via the VMCS for L1, then KVM will want to
+ * do the same for L2.
*/
exec_control = __vm_entry_controls_get(vmcs01);
- exec_control |= vmcs12->vm_entry_controls;
+ exec_control |= (vmcs12->vm_entry_controls &
+ ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
exec_control &= ~(VM_ENTRY_IA32E_MODE | VM_ENTRY_LOAD_IA32_EFER);
if (cpu_has_load_ia32_efer()) {
if (guest_efer & EFER_LMA)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
5a20d073ec54 ("block: wbt: Remove unnecessary invoking of wbt_update_limits in wbt_init")
4d89e1d112a9 ("blk-wbt: rename __wbt_update_limits to wbt_update_limits")
9677a3e01f83 ("block/rq_qos: implement rq_qos_ops->queue_depth_changed()")
d3e65ffff61c ("block/rq_qos: add rq_qos_merge()")
14ccb66b3f58 ("block: remove the bi_phys_segments field in struct bio")
f924cddebc90 ("block: remove blk_init_request_from_bio")
0c8cf8c2a553 ("block: initialize the write priority in blk_rq_bio_prep")
47cdee29ef9d ("block: move blk_exit_queue into __blk_release_queue")
6869875fbc04 ("block: remove the bi_seg_{front,back}_size fields in struct bio")
eded341c085b ("block: don't decrement nr_phys_segments for physically contigous segments")
dcdca753c152 ("block: clean up __bio_add_pc_page a bit")
5c61ee2cd586 ("Merge tag 'v5.1-rc6' into for-5.2/block")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8c5035dfbb9475b67c82b3fdb7351236525bf52b Mon Sep 17 00:00:00 2001
From: Yu Kuai <yukuai3(a)huawei.com>
Date: Tue, 13 Sep 2022 18:57:49 +0800
Subject: [PATCH] blk-wbt: call rq_qos_add() after wb_normal is initialized
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Our test found a problem that wbt inflight counter is negative, which
will cause io hang(noted that this problem doesn't exist in mainline):
t1: device create t2: issue io
add_disk
blk_register_queue
wbt_enable_default
wbt_init
rq_qos_add
// wb_normal is still 0
/*
* in mainline, disk can't be opened before
* bdev_add(), however, in old kernels, disk
* can be opened before blk_register_queue().
*/
blkdev_issue_flush
// disk size is 0, however, it's not checked
submit_bio_wait
submit_bio
blk_mq_submit_bio
rq_qos_throttle
wbt_wait
bio_to_wbt_flags
rwb_enabled
// wb_normal is 0, inflight is not increased
wbt_queue_depth_changed(&rwb->rqos);
wbt_update_limits
// wb_normal is initialized
rq_qos_track
wbt_track
rq->wbt_flags |= bio_to_wbt_flags(rwb, bio);
// wb_normal is not 0,wbt_flags will be set
t3: io completion
blk_mq_free_request
rq_qos_done
wbt_done
wbt_is_tracked
// return true
__wbt_done
wbt_rqw_done
atomic_dec_return(&rqw->inflight);
// inflight is decreased
commit 8235b5c1e8c1 ("block: call bdev_add later in device_add_disk") can
avoid this problem, however it's better to fix this problem in wbt:
1) Lower kernel can't backport this patch due to lots of refactor.
2) Root cause is that wbt call rq_qos_add() before wb_normal is
initialized.
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a9982000b667..246467926253 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -843,6 +843,10 @@ int wbt_init(struct request_queue *q)
rwb->enable_state = WBT_STATE_ON_DEFAULT;
rwb->wc = 1;
rwb->rq_depth.default_depth = RWB_DEF_DEPTH;
+ rwb->min_lat_nsec = wbt_default_latency_nsec(q);
+
+ wbt_queue_depth_changed(&rwb->rqos);
+ wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
/*
* Assign rwb and add the stats callback.
@@ -853,11 +857,6 @@ int wbt_init(struct request_queue *q)
blk_stat_add_callback(q, rwb->cb);
- rwb->min_lat_nsec = wbt_default_latency_nsec(q);
-
- wbt_queue_depth_changed(&rwb->rqos);
- wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
-
return 0;
err_free:
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
5a20d073ec54 ("block: wbt: Remove unnecessary invoking of wbt_update_limits in wbt_init")
4d89e1d112a9 ("blk-wbt: rename __wbt_update_limits to wbt_update_limits")
9677a3e01f83 ("block/rq_qos: implement rq_qos_ops->queue_depth_changed()")
d3e65ffff61c ("block/rq_qos: add rq_qos_merge()")
14ccb66b3f58 ("block: remove the bi_phys_segments field in struct bio")
f924cddebc90 ("block: remove blk_init_request_from_bio")
0c8cf8c2a553 ("block: initialize the write priority in blk_rq_bio_prep")
47cdee29ef9d ("block: move blk_exit_queue into __blk_release_queue")
6869875fbc04 ("block: remove the bi_seg_{front,back}_size fields in struct bio")
eded341c085b ("block: don't decrement nr_phys_segments for physically contigous segments")
dcdca753c152 ("block: clean up __bio_add_pc_page a bit")
5c61ee2cd586 ("Merge tag 'v5.1-rc6' into for-5.2/block")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8c5035dfbb9475b67c82b3fdb7351236525bf52b Mon Sep 17 00:00:00 2001
From: Yu Kuai <yukuai3(a)huawei.com>
Date: Tue, 13 Sep 2022 18:57:49 +0800
Subject: [PATCH] blk-wbt: call rq_qos_add() after wb_normal is initialized
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Our test found a problem that wbt inflight counter is negative, which
will cause io hang(noted that this problem doesn't exist in mainline):
t1: device create t2: issue io
add_disk
blk_register_queue
wbt_enable_default
wbt_init
rq_qos_add
// wb_normal is still 0
/*
* in mainline, disk can't be opened before
* bdev_add(), however, in old kernels, disk
* can be opened before blk_register_queue().
*/
blkdev_issue_flush
// disk size is 0, however, it's not checked
submit_bio_wait
submit_bio
blk_mq_submit_bio
rq_qos_throttle
wbt_wait
bio_to_wbt_flags
rwb_enabled
// wb_normal is 0, inflight is not increased
wbt_queue_depth_changed(&rwb->rqos);
wbt_update_limits
// wb_normal is initialized
rq_qos_track
wbt_track
rq->wbt_flags |= bio_to_wbt_flags(rwb, bio);
// wb_normal is not 0,wbt_flags will be set
t3: io completion
blk_mq_free_request
rq_qos_done
wbt_done
wbt_is_tracked
// return true
__wbt_done
wbt_rqw_done
atomic_dec_return(&rqw->inflight);
// inflight is decreased
commit 8235b5c1e8c1 ("block: call bdev_add later in device_add_disk") can
avoid this problem, however it's better to fix this problem in wbt:
1) Lower kernel can't backport this patch due to lots of refactor.
2) Root cause is that wbt call rq_qos_add() before wb_normal is
initialized.
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a9982000b667..246467926253 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -843,6 +843,10 @@ int wbt_init(struct request_queue *q)
rwb->enable_state = WBT_STATE_ON_DEFAULT;
rwb->wc = 1;
rwb->rq_depth.default_depth = RWB_DEF_DEPTH;
+ rwb->min_lat_nsec = wbt_default_latency_nsec(q);
+
+ wbt_queue_depth_changed(&rwb->rqos);
+ wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
/*
* Assign rwb and add the stats callback.
@@ -853,11 +857,6 @@ int wbt_init(struct request_queue *q)
blk_stat_add_callback(q, rwb->cb);
- rwb->min_lat_nsec = wbt_default_latency_nsec(q);
-
- wbt_queue_depth_changed(&rwb->rqos);
- wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
-
return 0;
err_free:
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
5a20d073ec54 ("block: wbt: Remove unnecessary invoking of wbt_update_limits in wbt_init")
4d89e1d112a9 ("blk-wbt: rename __wbt_update_limits to wbt_update_limits")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8c5035dfbb9475b67c82b3fdb7351236525bf52b Mon Sep 17 00:00:00 2001
From: Yu Kuai <yukuai3(a)huawei.com>
Date: Tue, 13 Sep 2022 18:57:49 +0800
Subject: [PATCH] blk-wbt: call rq_qos_add() after wb_normal is initialized
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Our test found a problem that wbt inflight counter is negative, which
will cause io hang(noted that this problem doesn't exist in mainline):
t1: device create t2: issue io
add_disk
blk_register_queue
wbt_enable_default
wbt_init
rq_qos_add
// wb_normal is still 0
/*
* in mainline, disk can't be opened before
* bdev_add(), however, in old kernels, disk
* can be opened before blk_register_queue().
*/
blkdev_issue_flush
// disk size is 0, however, it's not checked
submit_bio_wait
submit_bio
blk_mq_submit_bio
rq_qos_throttle
wbt_wait
bio_to_wbt_flags
rwb_enabled
// wb_normal is 0, inflight is not increased
wbt_queue_depth_changed(&rwb->rqos);
wbt_update_limits
// wb_normal is initialized
rq_qos_track
wbt_track
rq->wbt_flags |= bio_to_wbt_flags(rwb, bio);
// wb_normal is not 0,wbt_flags will be set
t3: io completion
blk_mq_free_request
rq_qos_done
wbt_done
wbt_is_tracked
// return true
__wbt_done
wbt_rqw_done
atomic_dec_return(&rqw->inflight);
// inflight is decreased
commit 8235b5c1e8c1 ("block: call bdev_add later in device_add_disk") can
avoid this problem, however it's better to fix this problem in wbt:
1) Lower kernel can't backport this patch due to lots of refactor.
2) Root cause is that wbt call rq_qos_add() before wb_normal is
initialized.
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a9982000b667..246467926253 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -843,6 +843,10 @@ int wbt_init(struct request_queue *q)
rwb->enable_state = WBT_STATE_ON_DEFAULT;
rwb->wc = 1;
rwb->rq_depth.default_depth = RWB_DEF_DEPTH;
+ rwb->min_lat_nsec = wbt_default_latency_nsec(q);
+
+ wbt_queue_depth_changed(&rwb->rqos);
+ wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
/*
* Assign rwb and add the stats callback.
@@ -853,11 +857,6 @@ int wbt_init(struct request_queue *q)
blk_stat_add_callback(q, rwb->cb);
- rwb->min_lat_nsec = wbt_default_latency_nsec(q);
-
- wbt_queue_depth_changed(&rwb->rqos);
- wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
-
return 0;
err_free:
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8c5035dfbb94 ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
5a20d073ec54 ("block: wbt: Remove unnecessary invoking of wbt_update_limits in wbt_init")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8c5035dfbb9475b67c82b3fdb7351236525bf52b Mon Sep 17 00:00:00 2001
From: Yu Kuai <yukuai3(a)huawei.com>
Date: Tue, 13 Sep 2022 18:57:49 +0800
Subject: [PATCH] blk-wbt: call rq_qos_add() after wb_normal is initialized
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Our test found a problem that wbt inflight counter is negative, which
will cause io hang(noted that this problem doesn't exist in mainline):
t1: device create t2: issue io
add_disk
blk_register_queue
wbt_enable_default
wbt_init
rq_qos_add
// wb_normal is still 0
/*
* in mainline, disk can't be opened before
* bdev_add(), however, in old kernels, disk
* can be opened before blk_register_queue().
*/
blkdev_issue_flush
// disk size is 0, however, it's not checked
submit_bio_wait
submit_bio
blk_mq_submit_bio
rq_qos_throttle
wbt_wait
bio_to_wbt_flags
rwb_enabled
// wb_normal is 0, inflight is not increased
wbt_queue_depth_changed(&rwb->rqos);
wbt_update_limits
// wb_normal is initialized
rq_qos_track
wbt_track
rq->wbt_flags |= bio_to_wbt_flags(rwb, bio);
// wb_normal is not 0,wbt_flags will be set
t3: io completion
blk_mq_free_request
rq_qos_done
wbt_done
wbt_is_tracked
// return true
__wbt_done
wbt_rqw_done
atomic_dec_return(&rqw->inflight);
// inflight is decreased
commit 8235b5c1e8c1 ("block: call bdev_add later in device_add_disk") can
avoid this problem, however it's better to fix this problem in wbt:
1) Lower kernel can't backport this patch due to lots of refactor.
2) Root cause is that wbt call rq_qos_add() before wb_normal is
initialized.
Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a9982000b667..246467926253 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -843,6 +843,10 @@ int wbt_init(struct request_queue *q)
rwb->enable_state = WBT_STATE_ON_DEFAULT;
rwb->wc = 1;
rwb->rq_depth.default_depth = RWB_DEF_DEPTH;
+ rwb->min_lat_nsec = wbt_default_latency_nsec(q);
+
+ wbt_queue_depth_changed(&rwb->rqos);
+ wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
/*
* Assign rwb and add the stats callback.
@@ -853,11 +857,6 @@ int wbt_init(struct request_queue *q)
blk_stat_add_callback(q, rwb->cb);
- rwb->min_lat_nsec = wbt_default_latency_nsec(q);
-
- wbt_queue_depth_changed(&rwb->rqos);
- wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
-
return 0;
err_free:
The patch below does not apply to the 5.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
fe8b81fde69a ("media: cedrus: Fix watchdog race condition")
4af46bcc4915 ("media: cedrus: Add error handling for failed setup")
f1a413902aa7 ("media: cedrus: h265: Fix logic for not low delay flag")
104a70e1d0bc ("media: cedrus: h265: Fix flag name")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fe8b81fde69acfcbb5af9e85328e5b9549999fdb Mon Sep 17 00:00:00 2001
From: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
Date: Thu, 18 Aug 2022 22:33:06 +0200
Subject: [PATCH] media: cedrus: Fix watchdog race condition
The watchdog needs to be scheduled before we trigger the decode
operation, otherwise there is a risk that the decoder IRQ will be
called before we have schedule the watchdog. As a side effect, the
watchdog would never be cancelled and its function would be called
at an inappropriate time.
This was observed while running Fluster with GStreamer as a backend.
Some programming error would cause the decoder IRQ to be call very
quickly after the trigger. Later calls into the driver would deadlock
due to the unbalanced state.
Cc: stable(a)vger.kernel.org
Fixes: 7c38a551bda1 ("media: cedrus: Add watchdog for job completion")
Signed-off-by: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
Reviewed-by: Paul Kocialkowski <paul.kocialkowski(a)bootlin.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org>
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
index 3b6aa78a2985..e7f7602a5ab4 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
@@ -106,11 +106,11 @@ void cedrus_device_run(void *priv)
/* Trigger decoding if setup went well, bail out otherwise. */
if (!error) {
- dev->dec_ops[ctx->current_codec]->trigger(ctx);
-
/* Start the watchdog timer. */
schedule_delayed_work(&dev->watchdog_work,
msecs_to_jiffies(2000));
+
+ dev->dec_ops[ctx->current_codec]->trigger(ctx);
} else {
v4l2_m2m_buf_done_and_job_finish(ctx->dev->m2m_dev,
ctx->fh.m2m_ctx,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d80ca810f096 ("efi: libstub: drop pointless get_memory_map() call")
cd33a5c1d53e ("efi/libstub: Remove 'sys_table_arg' from all function prototypes")
8173ec7905b5 ("efi/libstub: Drop sys_table_arg from printk routines")
c3710de5065d ("efi/libstub/x86: Drop __efi_early() export and efi_config struct")
dc29da14ed94 ("efi/libstub: Unify the efi_char16_printk implementations")
2fcdad2a80a6 ("efi/libstub: Get rid of 'sys_table_arg' macro parameter")
afc4cc71cf78 ("efi/libstub/x86: Avoid thunking for native firmware calls")
f958efe97596 ("efi/libstub: Distinguish between native/mixed not 32/64 bit")
1786e8301164 ("efi/libstub: Extend native protocol definitions with mixed_mode aliases")
2732ea0d5c0a ("efi/libstub: Use a helper to iterate over a EFI handle array")
58ec655a7573 ("efi/libstub: Remove unused __efi_call_early() macro")
8de8788d2182 ("efi/gop: Unify 32/64-bit functions")
44c84b4ada73 ("efi/gop: Convert GOP structures to typedef and clean up some types")
8d62af177812 ("efi/gop: Remove bogus packed attribute from GOP structures")
4911ee401b7c ("x86/efistub: Disable paging at mixed mode entry")
818c7ce72477 ("efi/libstub/random: Initialize pointer variables to zero for mixed mode")
9fa76ca7b8bd ("efi: Fix efi_loaded_image_t::unload type")
ff397be685e4 ("efi/gop: Fix memory leak in __gop_query32/64()")
dbd89c303b44 ("efi/gop: Return EFI_SUCCESS if a usable GOP was found")
6fc3cec30dfe ("efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d80ca810f096ff66f451e7a3ed2f0cd9ef1ff519 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Thu, 15 Sep 2022 19:00:24 +0200
Subject: [PATCH] efi: libstub: drop pointless get_memory_map() call
Currently, the non-x86 stub code calls get_memory_map() redundantly,
given that the data it returns is never used anywhere. So drop the call.
Cc: <stable(a)vger.kernel.org> # v4.14+
Fixes: 24d7c494ce46 ("efi/arm-stub: Round up FDT allocation to mapping size")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index fe567be0f118..804f542be3f2 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -280,14 +280,6 @@ efi_status_t allocate_new_fdt_and_exit_boot(void *handle,
goto fail;
}
- /*
- * Now that we have done our final memory allocation (and free)
- * we can get the memory map key needed for exit_boot_services().
- */
- status = efi_get_memory_map(&map);
- if (status != EFI_SUCCESS)
- goto fail_free_new_fdt;
-
status = update_fdt((void *)fdt_addr, fdt_size,
(void *)*new_fdt_addr, MAX_FDT_SIZE, cmdline_ptr,
initrd_addr, initrd_size);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d80ca810f096 ("efi: libstub: drop pointless get_memory_map() call")
cd33a5c1d53e ("efi/libstub: Remove 'sys_table_arg' from all function prototypes")
8173ec7905b5 ("efi/libstub: Drop sys_table_arg from printk routines")
c3710de5065d ("efi/libstub/x86: Drop __efi_early() export and efi_config struct")
dc29da14ed94 ("efi/libstub: Unify the efi_char16_printk implementations")
2fcdad2a80a6 ("efi/libstub: Get rid of 'sys_table_arg' macro parameter")
afc4cc71cf78 ("efi/libstub/x86: Avoid thunking for native firmware calls")
f958efe97596 ("efi/libstub: Distinguish between native/mixed not 32/64 bit")
1786e8301164 ("efi/libstub: Extend native protocol definitions with mixed_mode aliases")
2732ea0d5c0a ("efi/libstub: Use a helper to iterate over a EFI handle array")
58ec655a7573 ("efi/libstub: Remove unused __efi_call_early() macro")
8de8788d2182 ("efi/gop: Unify 32/64-bit functions")
44c84b4ada73 ("efi/gop: Convert GOP structures to typedef and clean up some types")
8d62af177812 ("efi/gop: Remove bogus packed attribute from GOP structures")
4911ee401b7c ("x86/efistub: Disable paging at mixed mode entry")
818c7ce72477 ("efi/libstub/random: Initialize pointer variables to zero for mixed mode")
9fa76ca7b8bd ("efi: Fix efi_loaded_image_t::unload type")
ff397be685e4 ("efi/gop: Fix memory leak in __gop_query32/64()")
dbd89c303b44 ("efi/gop: Return EFI_SUCCESS if a usable GOP was found")
6fc3cec30dfe ("efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d80ca810f096ff66f451e7a3ed2f0cd9ef1ff519 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Thu, 15 Sep 2022 19:00:24 +0200
Subject: [PATCH] efi: libstub: drop pointless get_memory_map() call
Currently, the non-x86 stub code calls get_memory_map() redundantly,
given that the data it returns is never used anywhere. So drop the call.
Cc: <stable(a)vger.kernel.org> # v4.14+
Fixes: 24d7c494ce46 ("efi/arm-stub: Round up FDT allocation to mapping size")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index fe567be0f118..804f542be3f2 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -280,14 +280,6 @@ efi_status_t allocate_new_fdt_and_exit_boot(void *handle,
goto fail;
}
- /*
- * Now that we have done our final memory allocation (and free)
- * we can get the memory map key needed for exit_boot_services().
- */
- status = efi_get_memory_map(&map);
- if (status != EFI_SUCCESS)
- goto fail_free_new_fdt;
-
status = update_fdt((void *)fdt_addr, fdt_size,
(void *)*new_fdt_addr, MAX_FDT_SIZE, cmdline_ptr,
initrd_addr, initrd_size);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d80ca810f096 ("efi: libstub: drop pointless get_memory_map() call")
cd33a5c1d53e ("efi/libstub: Remove 'sys_table_arg' from all function prototypes")
8173ec7905b5 ("efi/libstub: Drop sys_table_arg from printk routines")
c3710de5065d ("efi/libstub/x86: Drop __efi_early() export and efi_config struct")
dc29da14ed94 ("efi/libstub: Unify the efi_char16_printk implementations")
2fcdad2a80a6 ("efi/libstub: Get rid of 'sys_table_arg' macro parameter")
afc4cc71cf78 ("efi/libstub/x86: Avoid thunking for native firmware calls")
f958efe97596 ("efi/libstub: Distinguish between native/mixed not 32/64 bit")
1786e8301164 ("efi/libstub: Extend native protocol definitions with mixed_mode aliases")
2732ea0d5c0a ("efi/libstub: Use a helper to iterate over a EFI handle array")
58ec655a7573 ("efi/libstub: Remove unused __efi_call_early() macro")
8de8788d2182 ("efi/gop: Unify 32/64-bit functions")
44c84b4ada73 ("efi/gop: Convert GOP structures to typedef and clean up some types")
8d62af177812 ("efi/gop: Remove bogus packed attribute from GOP structures")
4911ee401b7c ("x86/efistub: Disable paging at mixed mode entry")
818c7ce72477 ("efi/libstub/random: Initialize pointer variables to zero for mixed mode")
9fa76ca7b8bd ("efi: Fix efi_loaded_image_t::unload type")
ff397be685e4 ("efi/gop: Fix memory leak in __gop_query32/64()")
dbd89c303b44 ("efi/gop: Return EFI_SUCCESS if a usable GOP was found")
6fc3cec30dfe ("efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d80ca810f096ff66f451e7a3ed2f0cd9ef1ff519 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Thu, 15 Sep 2022 19:00:24 +0200
Subject: [PATCH] efi: libstub: drop pointless get_memory_map() call
Currently, the non-x86 stub code calls get_memory_map() redundantly,
given that the data it returns is never used anywhere. So drop the call.
Cc: <stable(a)vger.kernel.org> # v4.14+
Fixes: 24d7c494ce46 ("efi/arm-stub: Round up FDT allocation to mapping size")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c
index fe567be0f118..804f542be3f2 100644
--- a/drivers/firmware/efi/libstub/fdt.c
+++ b/drivers/firmware/efi/libstub/fdt.c
@@ -280,14 +280,6 @@ efi_status_t allocate_new_fdt_and_exit_boot(void *handle,
goto fail;
}
- /*
- * Now that we have done our final memory allocation (and free)
- * we can get the memory map key needed for exit_boot_services().
- */
- status = efi_get_memory_map(&map);
- if (status != EFI_SUCCESS)
- goto fail_free_new_fdt;
-
status = update_fdt((void *)fdt_addr, fdt_size,
(void *)*new_fdt_addr, MAX_FDT_SIZE, cmdline_ptr,
initrd_addr, initrd_size);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
01b2a5217173 ("tracing: Add ioctl() to force ring buffer waiters to wake up")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 01b2a52171735c6eea80ee2f355f32bea6c41418 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 29 Sep 2022 09:50:29 -0400
Subject: [PATCH] tracing: Add ioctl() to force ring buffer waiters to wake up
If a process is waiting on the ring buffer for data, there currently isn't
a clean way to force it to wake up. Add an ioctl call that will force any
tasks that are waiting on the trace_pipe_raw file to wake up.
Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e101b0764b39..58afc83afc9d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8349,12 +8349,34 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
return ret;
}
+/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */
+static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct ftrace_buffer_info *info = file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ if (cmd)
+ return -ENOIOCTLCMD;
+
+ mutex_lock(&trace_types_lock);
+
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+ mutex_unlock(&trace_types_lock);
+ return 0;
+}
+
static const struct file_operations tracing_buffers_fops = {
.open = tracing_buffers_open,
.read = tracing_buffers_read,
.poll = tracing_buffers_poll,
.release = tracing_buffers_release,
.splice_read = tracing_buffers_splice_read,
+ .unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
};
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7e9fbbb1b776d8d7969551565bc246f74ec53b27 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 28 Sep 2022 13:39:38 -0400
Subject: [PATCH] ring-buffer: Add ring_buffer_wake_waiters()
On closing of a file that represents a ring buffer or flushing the file,
there may be waiters on the ring buffer that needs to be woken up and exit
the ring_buffer_wait() function.
Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer
and allow them to exit the wait loop.
Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dac53fd3afea..2504df9a0453 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -101,7 +101,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table);
-
+void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 5a7d818ca3ea..3046deacf7b3 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -413,6 +413,7 @@ struct rb_irq_work {
struct irq_work work;
wait_queue_head_t waiters;
wait_queue_head_t full_waiters;
+ long wait_index;
bool waiters_pending;
bool full_waiters_pending;
bool wakeup_full;
@@ -924,6 +925,37 @@ static void rb_wake_up_waiters(struct irq_work *work)
}
}
+/**
+ * ring_buffer_wake_waiters - wake up any waiters on this ring buffer
+ * @buffer: The ring buffer to wake waiters on
+ *
+ * In the case of a file that represents a ring buffer is closing,
+ * it is prudent to wake up any waiters that are on this.
+ */
+void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct rb_irq_work *rbwork;
+
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+
+ /* Wake up individual ones too. One level recursion */
+ for_each_buffer_cpu(buffer, cpu)
+ ring_buffer_wake_waiters(buffer, cpu);
+
+ rbwork = &buffer->irq_work;
+ } else {
+ cpu_buffer = buffer->buffers[cpu];
+ rbwork = &cpu_buffer->irq_work;
+ }
+
+ rbwork->wait_index++;
+ /* make sure the waiters see the new index */
+ smp_wmb();
+
+ rb_wake_up_waiters(&rbwork->work);
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -939,6 +971,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
struct rb_irq_work *work;
+ long wait_index;
int ret = 0;
/*
@@ -957,6 +990,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
work = &cpu_buffer->irq_work;
}
+ wait_index = READ_ONCE(work->wait_index);
while (true) {
if (full)
@@ -1021,6 +1055,11 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
}
schedule();
+
+ /* Make sure to see the new wait index */
+ smp_rmb();
+ if (wait_index != work->wait_index)
+ break;
}
if (full)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0934ae9977c2 ("tracing: Fix reading strings from synthetic events")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0934ae9977c27133449b6dd8c6213970e7eece38 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 12 Oct 2022 06:40:58 -0400
Subject: [PATCH] tracing: Fix reading strings from synthetic events
The follow commands caused a crash:
# cd /sys/kernel/tracing
# echo 's:open char file[]' > dynamic_events
# echo 'hist:keys=common_pid:file=filename:onchange($file).trace(open,$file)' > events/syscalls/sys_enter_openat/trigger'
# echo 1 > events/synthetic/open/enable
BOOM!
The problem is that the synthetic event field "char file[]" will read
the value given to it as a string without any memory checks to make sure
the address is valid. The above example will pass in the user space
address and the sythetic event code will happily call strlen() on it
and then strscpy() where either one will cause an oops when accessing
user space addresses.
Use the helper functions from trace_kprobe and trace_eprobe that can
read strings safely (and actually succeed when the address is from user
space and the memory is mapped in).
Now the above can show:
packagekitd-1721 [000] ...2. 104.597170: open: file=/usr/lib/rpm/fileattrs/cmake.attr
in:imjournal-978 [006] ...2. 104.599642: open: file=/var/lib/rsyslog/imjournal.state.tmp
packagekitd-1721 [000] ...2. 104.626308: open: file=/usr/lib/rpm/fileattrs/debuginfo.attr
Link: https://lkml.kernel.org/r/20221012104534.826549315@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reviewed-by: Tom Zanussi <zanussi(a)kernel.org>
Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 5e8c07aef071..e310052dc83c 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -17,6 +17,8 @@
/* for gfp flag names */
#include <linux/trace_events.h>
#include <trace/events/mmflags.h>
+#include "trace_probe.h"
+#include "trace_probe_kernel.h"
#include "trace_synth.h"
@@ -409,6 +411,7 @@ static unsigned int trace_string(struct synth_trace_event *entry,
{
unsigned int len = 0;
char *str_field;
+ int ret;
if (is_dynamic) {
u32 data_offset;
@@ -417,19 +420,27 @@ static unsigned int trace_string(struct synth_trace_event *entry,
data_offset += event->n_u64 * sizeof(u64);
data_offset += data_size;
- str_field = (char *)entry + data_offset;
-
- len = strlen(str_val) + 1;
- strscpy(str_field, str_val, len);
+ len = kern_fetch_store_strlen((unsigned long)str_val);
data_offset |= len << 16;
*(u32 *)&entry->fields[*n_u64] = data_offset;
+ ret = kern_fetch_store_string((unsigned long)str_val, &entry->fields[*n_u64], entry);
+
(*n_u64)++;
} else {
str_field = (char *)&entry->fields[*n_u64];
- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
+#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
+ if ((unsigned long)str_val < TASK_SIZE)
+ ret = strncpy_from_user_nofault(str_field, str_val, STR_VAR_LEN_MAX);
+ else
+#endif
+ ret = strncpy_from_kernel_nofault(str_field, str_val, STR_VAR_LEN_MAX);
+
+ if (ret < 0)
+ strcpy(str_field, FAULT_STRING);
+
(*n_u64) += STR_VAR_LEN_MAX / sizeof(u64);
}
@@ -462,7 +473,7 @@ static notrace void trace_event_raw_event_synth(void *__data,
val_idx = var_ref_idx[field_pos];
str_val = (char *)(long)var_ref_vals[val_idx];
- len = strlen(str_val) + 1;
+ len = kern_fetch_store_strlen((unsigned long)str_val);
fields_size += len;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
01b2a5217173 ("tracing: Add ioctl() to force ring buffer waiters to wake up")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 01b2a52171735c6eea80ee2f355f32bea6c41418 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 29 Sep 2022 09:50:29 -0400
Subject: [PATCH] tracing: Add ioctl() to force ring buffer waiters to wake up
If a process is waiting on the ring buffer for data, there currently isn't
a clean way to force it to wake up. Add an ioctl call that will force any
tasks that are waiting on the trace_pipe_raw file to wake up.
Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e101b0764b39..58afc83afc9d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8349,12 +8349,34 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
return ret;
}
+/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */
+static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct ftrace_buffer_info *info = file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ if (cmd)
+ return -ENOIOCTLCMD;
+
+ mutex_lock(&trace_types_lock);
+
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+ mutex_unlock(&trace_types_lock);
+ return 0;
+}
+
static const struct file_operations tracing_buffers_fops = {
.open = tracing_buffers_open,
.read = tracing_buffers_read,
.poll = tracing_buffers_poll,
.release = tracing_buffers_release,
.splice_read = tracing_buffers_splice_read,
+ .unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
01b2a5217173 ("tracing: Add ioctl() to force ring buffer waiters to wake up")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 01b2a52171735c6eea80ee2f355f32bea6c41418 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 29 Sep 2022 09:50:29 -0400
Subject: [PATCH] tracing: Add ioctl() to force ring buffer waiters to wake up
If a process is waiting on the ring buffer for data, there currently isn't
a clean way to force it to wake up. Add an ioctl call that will force any
tasks that are waiting on the trace_pipe_raw file to wake up.
Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e101b0764b39..58afc83afc9d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8349,12 +8349,34 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
return ret;
}
+/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */
+static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct ftrace_buffer_info *info = file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ if (cmd)
+ return -ENOIOCTLCMD;
+
+ mutex_lock(&trace_types_lock);
+
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+ mutex_unlock(&trace_types_lock);
+ return 0;
+}
+
static const struct file_operations tracing_buffers_fops = {
.open = tracing_buffers_open,
.read = tracing_buffers_read,
.poll = tracing_buffers_poll,
.release = tracing_buffers_release,
.splice_read = tracing_buffers_splice_read,
+ .unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7e9fbbb1b776d8d7969551565bc246f74ec53b27 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 28 Sep 2022 13:39:38 -0400
Subject: [PATCH] ring-buffer: Add ring_buffer_wake_waiters()
On closing of a file that represents a ring buffer or flushing the file,
there may be waiters on the ring buffer that needs to be woken up and exit
the ring_buffer_wait() function.
Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer
and allow them to exit the wait loop.
Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dac53fd3afea..2504df9a0453 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -101,7 +101,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table);
-
+void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 5a7d818ca3ea..3046deacf7b3 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -413,6 +413,7 @@ struct rb_irq_work {
struct irq_work work;
wait_queue_head_t waiters;
wait_queue_head_t full_waiters;
+ long wait_index;
bool waiters_pending;
bool full_waiters_pending;
bool wakeup_full;
@@ -924,6 +925,37 @@ static void rb_wake_up_waiters(struct irq_work *work)
}
}
+/**
+ * ring_buffer_wake_waiters - wake up any waiters on this ring buffer
+ * @buffer: The ring buffer to wake waiters on
+ *
+ * In the case of a file that represents a ring buffer is closing,
+ * it is prudent to wake up any waiters that are on this.
+ */
+void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct rb_irq_work *rbwork;
+
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+
+ /* Wake up individual ones too. One level recursion */
+ for_each_buffer_cpu(buffer, cpu)
+ ring_buffer_wake_waiters(buffer, cpu);
+
+ rbwork = &buffer->irq_work;
+ } else {
+ cpu_buffer = buffer->buffers[cpu];
+ rbwork = &cpu_buffer->irq_work;
+ }
+
+ rbwork->wait_index++;
+ /* make sure the waiters see the new index */
+ smp_wmb();
+
+ rb_wake_up_waiters(&rbwork->work);
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -939,6 +971,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
struct rb_irq_work *work;
+ long wait_index;
int ret = 0;
/*
@@ -957,6 +990,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
work = &cpu_buffer->irq_work;
}
+ wait_index = READ_ONCE(work->wait_index);
while (true) {
if (full)
@@ -1021,6 +1055,11 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
}
schedule();
+
+ /* Make sure to see the new wait index */
+ smp_rmb();
+ if (wait_index != work->wait_index)
+ break;
}
if (full)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
01b2a5217173 ("tracing: Add ioctl() to force ring buffer waiters to wake up")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 01b2a52171735c6eea80ee2f355f32bea6c41418 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 29 Sep 2022 09:50:29 -0400
Subject: [PATCH] tracing: Add ioctl() to force ring buffer waiters to wake up
If a process is waiting on the ring buffer for data, there currently isn't
a clean way to force it to wake up. Add an ioctl call that will force any
tasks that are waiting on the trace_pipe_raw file to wake up.
Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e101b0764b39..58afc83afc9d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8349,12 +8349,34 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
return ret;
}
+/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */
+static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct ftrace_buffer_info *info = file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ if (cmd)
+ return -ENOIOCTLCMD;
+
+ mutex_lock(&trace_types_lock);
+
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+ mutex_unlock(&trace_types_lock);
+ return 0;
+}
+
static const struct file_operations tracing_buffers_fops = {
.open = tracing_buffers_open,
.read = tracing_buffers_read,
.poll = tracing_buffers_poll,
.release = tracing_buffers_release,
.splice_read = tracing_buffers_splice_read,
+ .unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
};
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7e9fbbb1b776d8d7969551565bc246f74ec53b27 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 28 Sep 2022 13:39:38 -0400
Subject: [PATCH] ring-buffer: Add ring_buffer_wake_waiters()
On closing of a file that represents a ring buffer or flushing the file,
there may be waiters on the ring buffer that needs to be woken up and exit
the ring_buffer_wait() function.
Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer
and allow them to exit the wait loop.
Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dac53fd3afea..2504df9a0453 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -101,7 +101,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table);
-
+void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 5a7d818ca3ea..3046deacf7b3 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -413,6 +413,7 @@ struct rb_irq_work {
struct irq_work work;
wait_queue_head_t waiters;
wait_queue_head_t full_waiters;
+ long wait_index;
bool waiters_pending;
bool full_waiters_pending;
bool wakeup_full;
@@ -924,6 +925,37 @@ static void rb_wake_up_waiters(struct irq_work *work)
}
}
+/**
+ * ring_buffer_wake_waiters - wake up any waiters on this ring buffer
+ * @buffer: The ring buffer to wake waiters on
+ *
+ * In the case of a file that represents a ring buffer is closing,
+ * it is prudent to wake up any waiters that are on this.
+ */
+void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct rb_irq_work *rbwork;
+
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+
+ /* Wake up individual ones too. One level recursion */
+ for_each_buffer_cpu(buffer, cpu)
+ ring_buffer_wake_waiters(buffer, cpu);
+
+ rbwork = &buffer->irq_work;
+ } else {
+ cpu_buffer = buffer->buffers[cpu];
+ rbwork = &cpu_buffer->irq_work;
+ }
+
+ rbwork->wait_index++;
+ /* make sure the waiters see the new index */
+ smp_wmb();
+
+ rb_wake_up_waiters(&rbwork->work);
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -939,6 +971,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
struct rb_irq_work *work;
+ long wait_index;
int ret = 0;
/*
@@ -957,6 +990,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
work = &cpu_buffer->irq_work;
}
+ wait_index = READ_ONCE(work->wait_index);
while (true) {
if (full)
@@ -1021,6 +1055,11 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
}
schedule();
+
+ /* Make sure to see the new wait index */
+ smp_rmb();
+ if (wait_index != work->wait_index)
+ break;
}
if (full)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
01b2a5217173 ("tracing: Add ioctl() to force ring buffer waiters to wake up")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 01b2a52171735c6eea80ee2f355f32bea6c41418 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 29 Sep 2022 09:50:29 -0400
Subject: [PATCH] tracing: Add ioctl() to force ring buffer waiters to wake up
If a process is waiting on the ring buffer for data, there currently isn't
a clean way to force it to wake up. Add an ioctl call that will force any
tasks that are waiting on the trace_pipe_raw file to wake up.
Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e101b0764b39..58afc83afc9d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8349,12 +8349,34 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
return ret;
}
+/* An ioctl call with cmd 0 to the ring buffer file will wake up all waiters */
+static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct ftrace_buffer_info *info = file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ if (cmd)
+ return -ENOIOCTLCMD;
+
+ mutex_lock(&trace_types_lock);
+
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+ mutex_unlock(&trace_types_lock);
+ return 0;
+}
+
static const struct file_operations tracing_buffers_fops = {
.open = tracing_buffers_open,
.read = tracing_buffers_read,
.poll = tracing_buffers_poll,
.release = tracing_buffers_release,
.splice_read = tracing_buffers_splice_read,
+ .unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
};
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7e9fbbb1b776d8d7969551565bc246f74ec53b27 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 28 Sep 2022 13:39:38 -0400
Subject: [PATCH] ring-buffer: Add ring_buffer_wake_waiters()
On closing of a file that represents a ring buffer or flushing the file,
there may be waiters on the ring buffer that needs to be woken up and exit
the ring_buffer_wait() function.
Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer
and allow them to exit the wait loop.
Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index dac53fd3afea..2504df9a0453 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -101,7 +101,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table);
-
+void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 5a7d818ca3ea..3046deacf7b3 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -413,6 +413,7 @@ struct rb_irq_work {
struct irq_work work;
wait_queue_head_t waiters;
wait_queue_head_t full_waiters;
+ long wait_index;
bool waiters_pending;
bool full_waiters_pending;
bool wakeup_full;
@@ -924,6 +925,37 @@ static void rb_wake_up_waiters(struct irq_work *work)
}
}
+/**
+ * ring_buffer_wake_waiters - wake up any waiters on this ring buffer
+ * @buffer: The ring buffer to wake waiters on
+ *
+ * In the case of a file that represents a ring buffer is closing,
+ * it is prudent to wake up any waiters that are on this.
+ */
+void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct rb_irq_work *rbwork;
+
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+
+ /* Wake up individual ones too. One level recursion */
+ for_each_buffer_cpu(buffer, cpu)
+ ring_buffer_wake_waiters(buffer, cpu);
+
+ rbwork = &buffer->irq_work;
+ } else {
+ cpu_buffer = buffer->buffers[cpu];
+ rbwork = &cpu_buffer->irq_work;
+ }
+
+ rbwork->wait_index++;
+ /* make sure the waiters see the new index */
+ smp_wmb();
+
+ rb_wake_up_waiters(&rbwork->work);
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -939,6 +971,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
struct rb_irq_work *work;
+ long wait_index;
int ret = 0;
/*
@@ -957,6 +990,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
work = &cpu_buffer->irq_work;
}
+ wait_index = READ_ONCE(work->wait_index);
while (true) {
if (full)
@@ -1021,6 +1055,11 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
}
schedule();
+
+ /* Make sure to see the new wait index */
+ smp_rmb();
+ if (wait_index != work->wait_index)
+ break;
}
if (full)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2e9906f84fc7 ("tracing: Add "(fault)" name injection to kernel probes")
f1d3cbfaafc1 ("tracing: Move duplicate code of trace_kprobe/eprobe.c into header")
7491e2c44278 ("tracing: Add a probe that attaches to trace events")
007517a01995 ("tracing/probe: Change traceprobe_set_print_fmt() to take a type")
bc87cf0a08d4 ("trace: Add a generic function to read/write u64 values from tracefs")
d262271d0483 ("tracing/dynevent: Delegate parsing to create function")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2e9906f84fc7c99388bb7123ade167250d50f1c0 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 12 Oct 2022 06:40:57 -0400
Subject: [PATCH] tracing: Add "(fault)" name injection to kernel probes
Have the specific functions for kernel probes that read strings to inject
the "(fault)" name directly. trace_probes.c does this too (for uprobes)
but as the code to read strings are going to be used by synthetic events
(and perhaps other utilities), it simplifies the code by making sure those
other uses do not need to implement the "(fault)" name injection as well.
Link: https://lkml.kernel.org/r/20221012104534.644803645@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reviewed-by: Tom Zanussi <zanussi(a)kernel.org>
Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_probe_kernel.h b/kernel/trace/trace_probe_kernel.h
index 1d43df29a1f8..77dbd9ff9782 100644
--- a/kernel/trace/trace_probe_kernel.h
+++ b/kernel/trace/trace_probe_kernel.h
@@ -2,6 +2,8 @@
#ifndef __TRACE_PROBE_KERNEL_H_
#define __TRACE_PROBE_KERNEL_H_
+#define FAULT_STRING "(fault)"
+
/*
* This depends on trace_probe.h, but can not include it due to
* the way trace_probe_tmpl.h is used by trace_kprobe.c and trace_eprobe.c.
@@ -13,8 +15,16 @@ static nokprobe_inline int
kern_fetch_store_strlen_user(unsigned long addr)
{
const void __user *uaddr = (__force const void __user *)addr;
+ int ret;
- return strnlen_user_nofault(uaddr, MAX_STRING_SIZE);
+ ret = strnlen_user_nofault(uaddr, MAX_STRING_SIZE);
+ /*
+ * strnlen_user_nofault returns zero on fault, insert the
+ * FAULT_STRING when that occurs.
+ */
+ if (ret <= 0)
+ return strlen(FAULT_STRING) + 1;
+ return ret;
}
/* Return the length of string -- including null terminal byte */
@@ -34,7 +44,18 @@ kern_fetch_store_strlen(unsigned long addr)
len++;
} while (c && ret == 0 && len < MAX_STRING_SIZE);
- return (ret < 0) ? ret : len;
+ /* For faults, return enough to hold the FAULT_STRING */
+ return (ret < 0) ? strlen(FAULT_STRING) + 1 : len;
+}
+
+static nokprobe_inline void set_data_loc(int ret, void *dest, void *__dest, void *base, int len)
+{
+ if (ret >= 0) {
+ *(u32 *)dest = make_data_loc(ret, __dest - base);
+ } else {
+ strscpy(__dest, FAULT_STRING, len);
+ ret = strlen(__dest) + 1;
+ }
}
/*
@@ -55,8 +76,7 @@ kern_fetch_store_string_user(unsigned long addr, void *dest, void *base)
__dest = get_loc_data(dest, base);
ret = strncpy_from_user_nofault(__dest, uaddr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base, maxlen);
return ret;
}
@@ -87,8 +107,7 @@ kern_fetch_store_string(unsigned long addr, void *dest, void *base)
* probing.
*/
ret = strncpy_from_kernel_nofault(__dest, (void *)addr, maxlen);
- if (ret >= 0)
- *(u32 *)dest = make_data_loc(ret, __dest - base);
+ set_data_loc(ret, dest, __dest, base, maxlen);
return ret;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a541a9559bb0 ("tracing: Do not free snapshot if tracer is on cmdline")
f4b0d318097e ("tracing: Simplify conditional compilation code in tracing_set_tracer()")
7440172974e8 ("tracing: Replace synchronize_sched() and call_rcu_sched()")
f8a79d5c7ef4 ("tracepoints: Free early tracepoints after RCU is initialized")
e0a568dcd18b ("tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()")
e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
2824f5033248 ("tracing: Make the snapshot trigger work with instances")
80765597bc58 ("tracing: Rewrite filter logic to be simpler and faster")
478325f18865 ("tracing: Clean up and document pred_funcs_##type creation and use")
e9baef0d8616 ("tracing: Combine enum and arrays into single macro in filter code")
567f6989fd2a ("tracing: Embed replace_filter_string() helper function")
404a3add43c9 ("tracing: Only add filter list when needed")
c7399708b3cd ("tracing: Remove filter allocator helper")
559d421267d1 ("tracing: Use trace_seq instead of open code string appending")
a0ff08fd4e3f ("tracing: Remove BUG_ON() from append_filter_string()")
844ccdd7dce2 ("rcu: Eliminate rcu_irq_enter_disabled()")
51a1fd30f130 ("rcu: Make ->dynticks_nesting be a simple counter")
58721f5da4bc ("rcu: Define rcu_irq_{enter,exit}() in terms of rcu_nmi_{enter,exit}()")
6136d6e48a01 ("rcu: Clamp ->dynticks_nmi_nesting at eqs entry/exit")
a0eb22bf64a7 ("rcu: Reduce dyntick-idle state space")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a541a9559bb0a8ecc434de01d3e4826c32e8bb53 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 5 Oct 2022 11:37:57 -0400
Subject: [PATCH] tracing: Do not free snapshot if tracer is on cmdline
The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
particular requires the snapshot to be allocated because it will take a
snapshot at the end of boot up allowing to see the traces that happened
during boot so that it's not lost when user space takes over.
When a tracer is registered (started) there's a path that checks if it
requires the snapshot buffer or not, and if it does not and it was
allocated it will do a synchronization and free the snapshot buffer.
This is only required if the previous tracer was using it for "max
latency" snapshots, as it needs to make sure all max snapshots are
complete before freeing. But this is only needed if the previous tracer
was using the snapshot buffer for latency (like irqoff tracer and
friends). But it does not make sense to free it, if the previous tracer
was not using it, and the snapshot was allocated by the cmdline
parameters. This basically takes away the point of allocating it in the
first place!
Note, the allocated snapshot worked fine for just trace events, but fails
when a tracer is enabled on the cmdline.
Further investigation, this goes back even further and it does not require
a tracer on the cmdline to fail. Simply enable snapshots and then enable a
tracer, and it will remove the snapshot.
Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated")
Reported-by: Ross Zwisler <zwisler(a)kernel.org>
Tested-by: Ross Zwisler <zwisler(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index def721de68a0..47a44b055a1d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6428,12 +6428,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
if (tr->current_trace->reset)
tr->current_trace->reset(tr);
+#ifdef CONFIG_TRACER_MAX_TRACE
+ had_max_tr = tr->current_trace->use_max_tr;
+
/* Current trace needs to be nop_trace before synchronize_rcu */
tr->current_trace = &nop_trace;
-#ifdef CONFIG_TRACER_MAX_TRACE
- had_max_tr = tr->allocated_snapshot;
-
if (had_max_tr && !t->use_max_tr) {
/*
* We need to make sure that the update_max_tr sees that
@@ -6446,11 +6446,13 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
free_snapshot(tr);
}
- if (t->use_max_tr && !had_max_tr) {
+ if (t->use_max_tr && !tr->allocated_snapshot) {
ret = tracing_alloc_snapshot_instance(tr);
if (ret < 0)
goto out;
}
+#else
+ tr->current_trace = &nop_trace;
#endif
if (t->init) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a541a9559bb0 ("tracing: Do not free snapshot if tracer is on cmdline")
f4b0d318097e ("tracing: Simplify conditional compilation code in tracing_set_tracer()")
7440172974e8 ("tracing: Replace synchronize_sched() and call_rcu_sched()")
f8a79d5c7ef4 ("tracepoints: Free early tracepoints after RCU is initialized")
e0a568dcd18b ("tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()")
e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
2824f5033248 ("tracing: Make the snapshot trigger work with instances")
80765597bc58 ("tracing: Rewrite filter logic to be simpler and faster")
478325f18865 ("tracing: Clean up and document pred_funcs_##type creation and use")
e9baef0d8616 ("tracing: Combine enum and arrays into single macro in filter code")
567f6989fd2a ("tracing: Embed replace_filter_string() helper function")
404a3add43c9 ("tracing: Only add filter list when needed")
c7399708b3cd ("tracing: Remove filter allocator helper")
559d421267d1 ("tracing: Use trace_seq instead of open code string appending")
a0ff08fd4e3f ("tracing: Remove BUG_ON() from append_filter_string()")
844ccdd7dce2 ("rcu: Eliminate rcu_irq_enter_disabled()")
51a1fd30f130 ("rcu: Make ->dynticks_nesting be a simple counter")
58721f5da4bc ("rcu: Define rcu_irq_{enter,exit}() in terms of rcu_nmi_{enter,exit}()")
6136d6e48a01 ("rcu: Clamp ->dynticks_nmi_nesting at eqs entry/exit")
a0eb22bf64a7 ("rcu: Reduce dyntick-idle state space")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a541a9559bb0a8ecc434de01d3e4826c32e8bb53 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 5 Oct 2022 11:37:57 -0400
Subject: [PATCH] tracing: Do not free snapshot if tracer is on cmdline
The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
particular requires the snapshot to be allocated because it will take a
snapshot at the end of boot up allowing to see the traces that happened
during boot so that it's not lost when user space takes over.
When a tracer is registered (started) there's a path that checks if it
requires the snapshot buffer or not, and if it does not and it was
allocated it will do a synchronization and free the snapshot buffer.
This is only required if the previous tracer was using it for "max
latency" snapshots, as it needs to make sure all max snapshots are
complete before freeing. But this is only needed if the previous tracer
was using the snapshot buffer for latency (like irqoff tracer and
friends). But it does not make sense to free it, if the previous tracer
was not using it, and the snapshot was allocated by the cmdline
parameters. This basically takes away the point of allocating it in the
first place!
Note, the allocated snapshot worked fine for just trace events, but fails
when a tracer is enabled on the cmdline.
Further investigation, this goes back even further and it does not require
a tracer on the cmdline to fail. Simply enable snapshots and then enable a
tracer, and it will remove the snapshot.
Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated")
Reported-by: Ross Zwisler <zwisler(a)kernel.org>
Tested-by: Ross Zwisler <zwisler(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index def721de68a0..47a44b055a1d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6428,12 +6428,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
if (tr->current_trace->reset)
tr->current_trace->reset(tr);
+#ifdef CONFIG_TRACER_MAX_TRACE
+ had_max_tr = tr->current_trace->use_max_tr;
+
/* Current trace needs to be nop_trace before synchronize_rcu */
tr->current_trace = &nop_trace;
-#ifdef CONFIG_TRACER_MAX_TRACE
- had_max_tr = tr->allocated_snapshot;
-
if (had_max_tr && !t->use_max_tr) {
/*
* We need to make sure that the update_max_tr sees that
@@ -6446,11 +6446,13 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
free_snapshot(tr);
}
- if (t->use_max_tr && !had_max_tr) {
+ if (t->use_max_tr && !tr->allocated_snapshot) {
ret = tracing_alloc_snapshot_instance(tr);
if (ret < 0)
goto out;
}
+#else
+ tr->current_trace = &nop_trace;
#endif
if (t->init) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a541a9559bb0 ("tracing: Do not free snapshot if tracer is on cmdline")
f4b0d318097e ("tracing: Simplify conditional compilation code in tracing_set_tracer()")
7440172974e8 ("tracing: Replace synchronize_sched() and call_rcu_sched()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a541a9559bb0a8ecc434de01d3e4826c32e8bb53 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 5 Oct 2022 11:37:57 -0400
Subject: [PATCH] tracing: Do not free snapshot if tracer is on cmdline
The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
particular requires the snapshot to be allocated because it will take a
snapshot at the end of boot up allowing to see the traces that happened
during boot so that it's not lost when user space takes over.
When a tracer is registered (started) there's a path that checks if it
requires the snapshot buffer or not, and if it does not and it was
allocated it will do a synchronization and free the snapshot buffer.
This is only required if the previous tracer was using it for "max
latency" snapshots, as it needs to make sure all max snapshots are
complete before freeing. But this is only needed if the previous tracer
was using the snapshot buffer for latency (like irqoff tracer and
friends). But it does not make sense to free it, if the previous tracer
was not using it, and the snapshot was allocated by the cmdline
parameters. This basically takes away the point of allocating it in the
first place!
Note, the allocated snapshot worked fine for just trace events, but fails
when a tracer is enabled on the cmdline.
Further investigation, this goes back even further and it does not require
a tracer on the cmdline to fail. Simply enable snapshots and then enable a
tracer, and it will remove the snapshot.
Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated")
Reported-by: Ross Zwisler <zwisler(a)kernel.org>
Tested-by: Ross Zwisler <zwisler(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index def721de68a0..47a44b055a1d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6428,12 +6428,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
if (tr->current_trace->reset)
tr->current_trace->reset(tr);
+#ifdef CONFIG_TRACER_MAX_TRACE
+ had_max_tr = tr->current_trace->use_max_tr;
+
/* Current trace needs to be nop_trace before synchronize_rcu */
tr->current_trace = &nop_trace;
-#ifdef CONFIG_TRACER_MAX_TRACE
- had_max_tr = tr->allocated_snapshot;
-
if (had_max_tr && !t->use_max_tr) {
/*
* We need to make sure that the update_max_tr sees that
@@ -6446,11 +6446,13 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
free_snapshot(tr);
}
- if (t->use_max_tr && !had_max_tr) {
+ if (t->use_max_tr && !tr->allocated_snapshot) {
ret = tracing_alloc_snapshot_instance(tr);
if (ret < 0)
goto out;
}
+#else
+ tr->current_trace = &nop_trace;
#endif
if (t->init) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a541a9559bb0 ("tracing: Do not free snapshot if tracer is on cmdline")
f4b0d318097e ("tracing: Simplify conditional compilation code in tracing_set_tracer()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a541a9559bb0a8ecc434de01d3e4826c32e8bb53 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 5 Oct 2022 11:37:57 -0400
Subject: [PATCH] tracing: Do not free snapshot if tracer is on cmdline
The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
particular requires the snapshot to be allocated because it will take a
snapshot at the end of boot up allowing to see the traces that happened
during boot so that it's not lost when user space takes over.
When a tracer is registered (started) there's a path that checks if it
requires the snapshot buffer or not, and if it does not and it was
allocated it will do a synchronization and free the snapshot buffer.
This is only required if the previous tracer was using it for "max
latency" snapshots, as it needs to make sure all max snapshots are
complete before freeing. But this is only needed if the previous tracer
was using the snapshot buffer for latency (like irqoff tracer and
friends). But it does not make sense to free it, if the previous tracer
was not using it, and the snapshot was allocated by the cmdline
parameters. This basically takes away the point of allocating it in the
first place!
Note, the allocated snapshot worked fine for just trace events, but fails
when a tracer is enabled on the cmdline.
Further investigation, this goes back even further and it does not require
a tracer on the cmdline to fail. Simply enable snapshots and then enable a
tracer, and it will remove the snapshot.
Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated")
Reported-by: Ross Zwisler <zwisler(a)kernel.org>
Tested-by: Ross Zwisler <zwisler(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index def721de68a0..47a44b055a1d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6428,12 +6428,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
if (tr->current_trace->reset)
tr->current_trace->reset(tr);
+#ifdef CONFIG_TRACER_MAX_TRACE
+ had_max_tr = tr->current_trace->use_max_tr;
+
/* Current trace needs to be nop_trace before synchronize_rcu */
tr->current_trace = &nop_trace;
-#ifdef CONFIG_TRACER_MAX_TRACE
- had_max_tr = tr->allocated_snapshot;
-
if (had_max_tr && !t->use_max_tr) {
/*
* We need to make sure that the update_max_tr sees that
@@ -6446,11 +6446,13 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
free_snapshot(tr);
}
- if (t->use_max_tr && !had_max_tr) {
+ if (t->use_max_tr && !tr->allocated_snapshot) {
ret = tracing_alloc_snapshot_instance(tr);
if (ret < 0)
goto out;
}
+#else
+ tr->current_trace = &nop_trace;
#endif
if (t->init) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a541a9559bb0 ("tracing: Do not free snapshot if tracer is on cmdline")
f4b0d318097e ("tracing: Simplify conditional compilation code in tracing_set_tracer()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a541a9559bb0a8ecc434de01d3e4826c32e8bb53 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 5 Oct 2022 11:37:57 -0400
Subject: [PATCH] tracing: Do not free snapshot if tracer is on cmdline
The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
particular requires the snapshot to be allocated because it will take a
snapshot at the end of boot up allowing to see the traces that happened
during boot so that it's not lost when user space takes over.
When a tracer is registered (started) there's a path that checks if it
requires the snapshot buffer or not, and if it does not and it was
allocated it will do a synchronization and free the snapshot buffer.
This is only required if the previous tracer was using it for "max
latency" snapshots, as it needs to make sure all max snapshots are
complete before freeing. But this is only needed if the previous tracer
was using the snapshot buffer for latency (like irqoff tracer and
friends). But it does not make sense to free it, if the previous tracer
was not using it, and the snapshot was allocated by the cmdline
parameters. This basically takes away the point of allocating it in the
first place!
Note, the allocated snapshot worked fine for just trace events, but fails
when a tracer is enabled on the cmdline.
Further investigation, this goes back even further and it does not require
a tracer on the cmdline to fail. Simply enable snapshots and then enable a
tracer, and it will remove the snapshot.
Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated")
Reported-by: Ross Zwisler <zwisler(a)kernel.org>
Tested-by: Ross Zwisler <zwisler(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index def721de68a0..47a44b055a1d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6428,12 +6428,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
if (tr->current_trace->reset)
tr->current_trace->reset(tr);
+#ifdef CONFIG_TRACER_MAX_TRACE
+ had_max_tr = tr->current_trace->use_max_tr;
+
/* Current trace needs to be nop_trace before synchronize_rcu */
tr->current_trace = &nop_trace;
-#ifdef CONFIG_TRACER_MAX_TRACE
- had_max_tr = tr->allocated_snapshot;
-
if (had_max_tr && !t->use_max_tr) {
/*
* We need to make sure that the update_max_tr sees that
@@ -6446,11 +6446,13 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
free_snapshot(tr);
}
- if (t->use_max_tr && !had_max_tr) {
+ if (t->use_max_tr && !tr->allocated_snapshot) {
ret = tracing_alloc_snapshot_instance(tr);
if (ret < 0)
goto out;
}
+#else
+ tr->current_trace = &nop_trace;
#endif
if (t->init) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a541a9559bb0 ("tracing: Do not free snapshot if tracer is on cmdline")
f4b0d318097e ("tracing: Simplify conditional compilation code in tracing_set_tracer()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a541a9559bb0a8ecc434de01d3e4826c32e8bb53 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 5 Oct 2022 11:37:57 -0400
Subject: [PATCH] tracing: Do not free snapshot if tracer is on cmdline
The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
particular requires the snapshot to be allocated because it will take a
snapshot at the end of boot up allowing to see the traces that happened
during boot so that it's not lost when user space takes over.
When a tracer is registered (started) there's a path that checks if it
requires the snapshot buffer or not, and if it does not and it was
allocated it will do a synchronization and free the snapshot buffer.
This is only required if the previous tracer was using it for "max
latency" snapshots, as it needs to make sure all max snapshots are
complete before freeing. But this is only needed if the previous tracer
was using the snapshot buffer for latency (like irqoff tracer and
friends). But it does not make sense to free it, if the previous tracer
was not using it, and the snapshot was allocated by the cmdline
parameters. This basically takes away the point of allocating it in the
first place!
Note, the allocated snapshot worked fine for just trace events, but fails
when a tracer is enabled on the cmdline.
Further investigation, this goes back even further and it does not require
a tracer on the cmdline to fail. Simply enable snapshots and then enable a
tracer, and it will remove the snapshot.
Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated")
Reported-by: Ross Zwisler <zwisler(a)kernel.org>
Tested-by: Ross Zwisler <zwisler(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index def721de68a0..47a44b055a1d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6428,12 +6428,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
if (tr->current_trace->reset)
tr->current_trace->reset(tr);
+#ifdef CONFIG_TRACER_MAX_TRACE
+ had_max_tr = tr->current_trace->use_max_tr;
+
/* Current trace needs to be nop_trace before synchronize_rcu */
tr->current_trace = &nop_trace;
-#ifdef CONFIG_TRACER_MAX_TRACE
- had_max_tr = tr->allocated_snapshot;
-
if (had_max_tr && !t->use_max_tr) {
/*
* We need to make sure that the update_max_tr sees that
@@ -6446,11 +6446,13 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
free_snapshot(tr);
}
- if (t->use_max_tr && !had_max_tr) {
+ if (t->use_max_tr && !tr->allocated_snapshot) {
ret = tracing_alloc_snapshot_instance(tr);
if (ret < 0)
goto out;
}
+#else
+ tr->current_trace = &nop_trace;
#endif
if (t->init) {
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2b0fd9a59b79 ("tracing: Wake up waiters when tracing is disabled")
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
8e99cf91b99b ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
ff895103a84a ("tracing: Save off entry when peeking at next entry")
03329f993978 ("tracing: Add tracefs file buffer_percentage")
2c2b0a78b373 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
2c1ea60b195d ("tracing: Add timestamp_mode trace file")
00b4145298ae ("ring-buffer: Add interface for setting absolute time stamps")
ecf927000ce3 ("ring_buffer_poll_wait() return value used as return value of ->poll()")
065e63f95143 ("tracing: Only have rmmod clear buffers that its events were active in")
dc8d387210e3 ("tracing: Update Documentation/trace/ftrace.txt")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b0fd9a59b7990c161fa1cb7b79edb22847c87c2 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 28 Sep 2022 18:22:20 -0400
Subject: [PATCH] tracing: Wake up waiters when tracing is disabled
When tracing is disabled, there's no reason that waiters should stay
waiting, wake them up, otherwise tasks get stuck when they should be
flushing the buffers.
Cc: stable(a)vger.kernel.org
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 58afc83afc9d..bb5597c6bfc1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8334,6 +8334,10 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (ret)
goto out;
+ /* No need to wait after waking up when tracing is off */
+ if (!tracer_tracing_is_on(iter->tr))
+ goto out;
+
/* Make sure we see the new wait_index */
smp_rmb();
if (wait_index != iter->wait_index)
@@ -9065,6 +9069,8 @@ rb_simple_write(struct file *filp, const char __user *ubuf,
tracer_tracing_off(tr);
if (tr->current_trace->stop)
tr->current_trace->stop(tr);
+ /* Wake up any waiters */
+ ring_buffer_wake_waiters(buffer, RING_BUFFER_ALL_CPUS);
}
mutex_unlock(&trace_types_lock);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2b0fd9a59b79 ("tracing: Wake up waiters when tracing is disabled")
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
8e99cf91b99b ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
ff895103a84a ("tracing: Save off entry when peeking at next entry")
03329f993978 ("tracing: Add tracefs file buffer_percentage")
2c2b0a78b373 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
2c1ea60b195d ("tracing: Add timestamp_mode trace file")
00b4145298ae ("ring-buffer: Add interface for setting absolute time stamps")
ecf927000ce3 ("ring_buffer_poll_wait() return value used as return value of ->poll()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b0fd9a59b7990c161fa1cb7b79edb22847c87c2 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 28 Sep 2022 18:22:20 -0400
Subject: [PATCH] tracing: Wake up waiters when tracing is disabled
When tracing is disabled, there's no reason that waiters should stay
waiting, wake them up, otherwise tasks get stuck when they should be
flushing the buffers.
Cc: stable(a)vger.kernel.org
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 58afc83afc9d..bb5597c6bfc1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8334,6 +8334,10 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (ret)
goto out;
+ /* No need to wait after waking up when tracing is off */
+ if (!tracer_tracing_is_on(iter->tr))
+ goto out;
+
/* Make sure we see the new wait_index */
smp_rmb();
if (wait_index != iter->wait_index)
@@ -9065,6 +9069,8 @@ rb_simple_write(struct file *filp, const char __user *ubuf,
tracer_tracing_off(tr);
if (tr->current_trace->stop)
tr->current_trace->stop(tr);
+ /* Wake up any waiters */
+ ring_buffer_wake_waiters(buffer, RING_BUFFER_ALL_CPUS);
}
mutex_unlock(&trace_types_lock);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2b0fd9a59b79 ("tracing: Wake up waiters when tracing is disabled")
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
8e99cf91b99b ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
ff895103a84a ("tracing: Save off entry when peeking at next entry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b0fd9a59b7990c161fa1cb7b79edb22847c87c2 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 28 Sep 2022 18:22:20 -0400
Subject: [PATCH] tracing: Wake up waiters when tracing is disabled
When tracing is disabled, there's no reason that waiters should stay
waiting, wake them up, otherwise tasks get stuck when they should be
flushing the buffers.
Cc: stable(a)vger.kernel.org
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 58afc83afc9d..bb5597c6bfc1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8334,6 +8334,10 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (ret)
goto out;
+ /* No need to wait after waking up when tracing is off */
+ if (!tracer_tracing_is_on(iter->tr))
+ goto out;
+
/* Make sure we see the new wait_index */
smp_rmb();
if (wait_index != iter->wait_index)
@@ -9065,6 +9069,8 @@ rb_simple_write(struct file *filp, const char __user *ubuf,
tracer_tracing_off(tr);
if (tr->current_trace->stop)
tr->current_trace->stop(tr);
+ /* Wake up any waiters */
+ ring_buffer_wake_waiters(buffer, RING_BUFFER_ALL_CPUS);
}
mutex_unlock(&trace_types_lock);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2b0fd9a59b79 ("tracing: Wake up waiters when tracing is disabled")
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
8e99cf91b99b ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
ff895103a84a ("tracing: Save off entry when peeking at next entry")
03329f993978 ("tracing: Add tracefs file buffer_percentage")
2c2b0a78b373 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b0fd9a59b7990c161fa1cb7b79edb22847c87c2 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 28 Sep 2022 18:22:20 -0400
Subject: [PATCH] tracing: Wake up waiters when tracing is disabled
When tracing is disabled, there's no reason that waiters should stay
waiting, wake them up, otherwise tasks get stuck when they should be
flushing the buffers.
Cc: stable(a)vger.kernel.org
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 58afc83afc9d..bb5597c6bfc1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8334,6 +8334,10 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (ret)
goto out;
+ /* No need to wait after waking up when tracing is off */
+ if (!tracer_tracing_is_on(iter->tr))
+ goto out;
+
/* Make sure we see the new wait_index */
smp_rmb();
if (wait_index != iter->wait_index)
@@ -9065,6 +9069,8 @@ rb_simple_write(struct file *filp, const char __user *ubuf,
tracer_tracing_off(tr);
if (tr->current_trace->stop)
tr->current_trace->stop(tr);
+ /* Wake up any waiters */
+ ring_buffer_wake_waiters(buffer, RING_BUFFER_ALL_CPUS);
}
mutex_unlock(&trace_types_lock);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
2b0fd9a59b79 ("tracing: Wake up waiters when tracing is disabled")
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b0fd9a59b7990c161fa1cb7b79edb22847c87c2 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Wed, 28 Sep 2022 18:22:20 -0400
Subject: [PATCH] tracing: Wake up waiters when tracing is disabled
When tracing is disabled, there's no reason that waiters should stay
waiting, wake them up, otherwise tasks get stuck when they should be
flushing the buffers.
Cc: stable(a)vger.kernel.org
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 58afc83afc9d..bb5597c6bfc1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8334,6 +8334,10 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (ret)
goto out;
+ /* No need to wait after waking up when tracing is off */
+ if (!tracer_tracing_is_on(iter->tr))
+ goto out;
+
/* Make sure we see the new wait_index */
smp_rmb();
if (wait_index != iter->wait_index)
@@ -9065,6 +9069,8 @@ rb_simple_write(struct file *filp, const char __user *ubuf,
tracer_tracing_off(tr);
if (tr->current_trace->stop)
tr->current_trace->stop(tr);
+ /* Wake up any waiters */
+ ring_buffer_wake_waiters(buffer, RING_BUFFER_ALL_CPUS);
}
mutex_unlock(&trace_types_lock);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
8e99cf91b99b ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
ff895103a84a ("tracing: Save off entry when peeking at next entry")
03329f993978 ("tracing: Add tracefs file buffer_percentage")
2c2b0a78b373 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
2c1ea60b195d ("tracing: Add timestamp_mode trace file")
00b4145298ae ("ring-buffer: Add interface for setting absolute time stamps")
ecf927000ce3 ("ring_buffer_poll_wait() return value used as return value of ->poll()")
065e63f95143 ("tracing: Only have rmmod clear buffers that its events were active in")
dc8d387210e3 ("tracing: Update Documentation/trace/ftrace.txt")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3ddb74ad0790030c9592229fb14d8c451f4e9a8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 27 Sep 2022 19:15:27 -0400
Subject: [PATCH] tracing: Wake up ring buffer waiters on closing of the file
When the file that represents the ring buffer is closed, there may be
waiters waiting on more input from the ring buffer. Call
ring_buffer_wake_waiters() to wake up any waiters when the file is
closed.
Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8401dec93c15..20749bd9db71 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -92,6 +92,7 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
+ long wait_index;
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index aed7ea6e6045..e101b0764b39 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8160,6 +8160,12 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
__trace_array_put(iter->tr);
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
if (info->spare)
ring_buffer_free_read_page(iter->array_buffer->buffer,
info->spare_cpu, info->spare);
@@ -8313,6 +8319,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
+ long wait_index;
+
if (ret)
goto out;
@@ -8320,10 +8328,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
+ wait_index = READ_ONCE(iter->wait_index);
+
ret = wait_on_pipe(iter, iter->tr->buffer_percent);
if (ret)
goto out;
+ /* Make sure we see the new wait_index */
+ smp_rmb();
+ if (wait_index != iter->wait_index)
+ goto out;
+
goto again;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
8e99cf91b99b ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
ff895103a84a ("tracing: Save off entry when peeking at next entry")
03329f993978 ("tracing: Add tracefs file buffer_percentage")
2c2b0a78b373 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
2c1ea60b195d ("tracing: Add timestamp_mode trace file")
00b4145298ae ("ring-buffer: Add interface for setting absolute time stamps")
ecf927000ce3 ("ring_buffer_poll_wait() return value used as return value of ->poll()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3ddb74ad0790030c9592229fb14d8c451f4e9a8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 27 Sep 2022 19:15:27 -0400
Subject: [PATCH] tracing: Wake up ring buffer waiters on closing of the file
When the file that represents the ring buffer is closed, there may be
waiters waiting on more input from the ring buffer. Call
ring_buffer_wake_waiters() to wake up any waiters when the file is
closed.
Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8401dec93c15..20749bd9db71 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -92,6 +92,7 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
+ long wait_index;
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index aed7ea6e6045..e101b0764b39 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8160,6 +8160,12 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
__trace_array_put(iter->tr);
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
if (info->spare)
ring_buffer_free_read_page(iter->array_buffer->buffer,
info->spare_cpu, info->spare);
@@ -8313,6 +8319,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
+ long wait_index;
+
if (ret)
goto out;
@@ -8320,10 +8328,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
+ wait_index = READ_ONCE(iter->wait_index);
+
ret = wait_on_pipe(iter, iter->tr->buffer_percent);
if (ret)
goto out;
+ /* Make sure we see the new wait_index */
+ smp_rmb();
+ if (wait_index != iter->wait_index)
+ goto out;
+
goto again;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
8e99cf91b99b ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
ff895103a84a ("tracing: Save off entry when peeking at next entry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3ddb74ad0790030c9592229fb14d8c451f4e9a8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 27 Sep 2022 19:15:27 -0400
Subject: [PATCH] tracing: Wake up ring buffer waiters on closing of the file
When the file that represents the ring buffer is closed, there may be
waiters waiting on more input from the ring buffer. Call
ring_buffer_wake_waiters() to wake up any waiters when the file is
closed.
Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8401dec93c15..20749bd9db71 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -92,6 +92,7 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
+ long wait_index;
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index aed7ea6e6045..e101b0764b39 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8160,6 +8160,12 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
__trace_array_put(iter->tr);
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
if (info->spare)
ring_buffer_free_read_page(iter->array_buffer->buffer,
info->spare_cpu, info->spare);
@@ -8313,6 +8319,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
+ long wait_index;
+
if (ret)
goto out;
@@ -8320,10 +8328,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
+ wait_index = READ_ONCE(iter->wait_index);
+
ret = wait_on_pipe(iter, iter->tr->buffer_percent);
if (ret)
goto out;
+ /* Make sure we see the new wait_index */
+ smp_rmb();
+ if (wait_index != iter->wait_index)
+ goto out;
+
goto again;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
8e99cf91b99b ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
ff895103a84a ("tracing: Save off entry when peeking at next entry")
03329f993978 ("tracing: Add tracefs file buffer_percentage")
2c2b0a78b373 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3ddb74ad0790030c9592229fb14d8c451f4e9a8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 27 Sep 2022 19:15:27 -0400
Subject: [PATCH] tracing: Wake up ring buffer waiters on closing of the file
When the file that represents the ring buffer is closed, there may be
waiters waiting on more input from the ring buffer. Call
ring_buffer_wake_waiters() to wake up any waiters when the file is
closed.
Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8401dec93c15..20749bd9db71 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -92,6 +92,7 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
+ long wait_index;
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index aed7ea6e6045..e101b0764b39 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8160,6 +8160,12 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
__trace_array_put(iter->tr);
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
if (info->spare)
ring_buffer_free_read_page(iter->array_buffer->buffer,
info->spare_cpu, info->spare);
@@ -8313,6 +8319,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
+ long wait_index;
+
if (ret)
goto out;
@@ -8320,10 +8328,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
+ wait_index = READ_ONCE(iter->wait_index);
+
ret = wait_on_pipe(iter, iter->tr->buffer_percent);
if (ret)
goto out;
+ /* Make sure we see the new wait_index */
+ smp_rmb();
+ if (wait_index != iter->wait_index)
+ goto out;
+
goto again;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file")
efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f3ddb74ad0790030c9592229fb14d8c451f4e9a8 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 27 Sep 2022 19:15:27 -0400
Subject: [PATCH] tracing: Wake up ring buffer waiters on closing of the file
When the file that represents the ring buffer is closed, there may be
waiters waiting on more input from the ring buffer. Call
ring_buffer_wake_waiters() to wake up any waiters when the file is
closed.
Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 8401dec93c15..20749bd9db71 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -92,6 +92,7 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
+ long wait_index;
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index aed7ea6e6045..e101b0764b39 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8160,6 +8160,12 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
__trace_array_put(iter->tr);
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
if (info->spare)
ring_buffer_free_read_page(iter->array_buffer->buffer,
info->spare_cpu, info->spare);
@@ -8313,6 +8319,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
+ long wait_index;
+
if (ret)
goto out;
@@ -8320,10 +8328,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
+ wait_index = READ_ONCE(iter->wait_index);
+
ret = wait_on_pipe(iter, iter->tr->buffer_percent);
if (ret)
goto out;
+ /* Make sure we see the new wait_index */
+ smp_rmb();
+ if (wait_index != iter->wait_index)
+ goto out;
+
goto again;
}
From: Javier Martinez Canillas <javierm(a)redhat.com>
[ Upstream commit 94dc3471d1b2b58b3728558d0e3f264e9ce6ff59 ]
The strlen() function returns a size_t which is an unsigned int on 32-bit
arches and an unsigned long on 64-bit arches. But in the drm_copy_field()
function, the strlen() return value is assigned to an 'int len' variable.
Later, the len variable is passed as copy_from_user() third argument that
is an unsigned long parameter as well.
In theory, this can lead to an integer overflow via type conversion. Since
the assignment happens to a signed int lvalue instead of a size_t lvalue.
In practice though, that's unlikely since the values copied are set by DRM
drivers and not controlled by userspace. But using a size_t for len is the
correct thing to do anyways.
Signed-off-by: Javier Martinez Canillas <javierm(a)redhat.com>
Tested-by: Peter Robinson <pbrobinson(a)gmail.com>
Reviewed-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220705100215.572498-2-javie…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index ce26e8fea9c2..335fad8b209a 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -437,7 +437,7 @@ EXPORT_SYMBOL(drm_invalid_op);
*/
static int drm_copy_field(char __user *buf, size_t *buf_len, const char *value)
{
- int len;
+ size_t len;
/* don't overflow userbuf */
len = strlen(value);
--
2.35.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1b45cc5c7b92 ("ext4: fix potential out of bound read in ext4_fc_replay_scan()")
dcc5827484d6 ("ext4: factor out ext4_fc_get_tl()")
fdc2a3c75dd8 ("ext4: introduce EXT4_FC_TAG_BASE_LEN helper")
ccbf8eeb39f2 ("ext4: fix miss release buffer head in ext4_fc_write_inode")
4978c659e7b5 ("ext4: use ext4_debug() instead of jbd_debug()")
d9bf099cb980 ("ext4: add commit_tid info in jbd debug log")
0915e464cb27 ("ext4: simplify updating of fast commit stats")
7bbbe241ec7c ("ext4: drop ineligible txn start stop APIs")
02f310fcf47f ("ext4: Speedup ext4 orphan inode handling")
25c6d98fc4c2 ("ext4: Move orphan inode handling into a separate file")
188c299e2a26 ("ext4: Support for checksumming from journal triggers")
bd2c38cf1726 ("ext4: Make sure quota files are not grabbed accidentally")
facec450a824 ("ext4: reduce arguments of ext4_fc_add_dentry_tlv")
b9a037b7f3c4 ("ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle")
a7ba36bc94f2 ("ext4: fix fast commit alignment issues")
fcdf3c34b7ab ("ext4: fix debug format string warning")
3088e5a5153c ("ext4: fix various seppling typos")
72ffb49a7b62 ("ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()")
c915fb80eaa6 ("ext4: fix bh ref count on error paths")
efc61345274d ("ext4: shrink race window in ext4_should_retry_alloc()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1b45cc5c7b920fd8bf72e5a888ec7abeadf41e09 Mon Sep 17 00:00:00 2001
From: Ye Bin <yebin10(a)huawei.com>
Date: Sat, 24 Sep 2022 15:52:33 +0800
Subject: [PATCH] ext4: fix potential out of bound read in
ext4_fc_replay_scan()
For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain
space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read
when mounting corrupt file system image.
ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this
three tags will read data during scan, tag length couldn't less than data length
which will read.
Cc: stable(a)kernel.org
Signed-off-by: Ye Bin <yebin10(a)huawei.com>
Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 54622005a0c8..ef05bfa87798 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1976,6 +1976,34 @@ void ext4_fc_replay_cleanup(struct super_block *sb)
kfree(sbi->s_fc_replay_state.fc_modified_inodes);
}
+static inline bool ext4_fc_tag_len_isvalid(struct ext4_fc_tl *tl,
+ u8 *val, u8 *end)
+{
+ if (val + tl->fc_len > end)
+ return false;
+
+ /* Here only check ADD_RANGE/TAIL/HEAD which will read data when do
+ * journal rescan before do CRC check. Other tags length check will
+ * rely on CRC check.
+ */
+ switch (tl->fc_tag) {
+ case EXT4_FC_TAG_ADD_RANGE:
+ return (sizeof(struct ext4_fc_add_range) == tl->fc_len);
+ case EXT4_FC_TAG_TAIL:
+ return (sizeof(struct ext4_fc_tail) <= tl->fc_len);
+ case EXT4_FC_TAG_HEAD:
+ return (sizeof(struct ext4_fc_head) == tl->fc_len);
+ case EXT4_FC_TAG_DEL_RANGE:
+ case EXT4_FC_TAG_LINK:
+ case EXT4_FC_TAG_UNLINK:
+ case EXT4_FC_TAG_CREAT:
+ case EXT4_FC_TAG_INODE:
+ case EXT4_FC_TAG_PAD:
+ default:
+ return true;
+ }
+}
+
/*
* Recovery Scan phase handler
*
@@ -2032,10 +2060,15 @@ static int ext4_fc_replay_scan(journal_t *journal,
}
state->fc_replay_expected_off++;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
+ if (!ext4_fc_tag_len_isvalid(&tl, val, end)) {
+ ret = state->fc_replay_num_tags ?
+ JBD2_FC_REPLAY_STOP : -ECANCELED;
+ goto out_err;
+ }
ext4_debug("Scan phase, tag:%s, blk %lld\n",
tag2str(tl.fc_tag), bh->b_blocknr);
switch (tl.fc_tag) {
@@ -2146,7 +2179,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
start = (u8 *)bh->b_data;
end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
@@ -2156,6 +2189,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
ext4_fc_set_bitmaps_and_counters(sb);
break;
}
+
ext4_debug("Replay phase, tag:%s\n", tag2str(tl.fc_tag));
state->fc_replay_num_tags--;
switch (tl.fc_tag) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1b45cc5c7b92 ("ext4: fix potential out of bound read in ext4_fc_replay_scan()")
dcc5827484d6 ("ext4: factor out ext4_fc_get_tl()")
fdc2a3c75dd8 ("ext4: introduce EXT4_FC_TAG_BASE_LEN helper")
ccbf8eeb39f2 ("ext4: fix miss release buffer head in ext4_fc_write_inode")
4978c659e7b5 ("ext4: use ext4_debug() instead of jbd_debug()")
d9bf099cb980 ("ext4: add commit_tid info in jbd debug log")
0915e464cb27 ("ext4: simplify updating of fast commit stats")
7bbbe241ec7c ("ext4: drop ineligible txn start stop APIs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1b45cc5c7b920fd8bf72e5a888ec7abeadf41e09 Mon Sep 17 00:00:00 2001
From: Ye Bin <yebin10(a)huawei.com>
Date: Sat, 24 Sep 2022 15:52:33 +0800
Subject: [PATCH] ext4: fix potential out of bound read in
ext4_fc_replay_scan()
For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain
space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read
when mounting corrupt file system image.
ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this
three tags will read data during scan, tag length couldn't less than data length
which will read.
Cc: stable(a)kernel.org
Signed-off-by: Ye Bin <yebin10(a)huawei.com>
Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 54622005a0c8..ef05bfa87798 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1976,6 +1976,34 @@ void ext4_fc_replay_cleanup(struct super_block *sb)
kfree(sbi->s_fc_replay_state.fc_modified_inodes);
}
+static inline bool ext4_fc_tag_len_isvalid(struct ext4_fc_tl *tl,
+ u8 *val, u8 *end)
+{
+ if (val + tl->fc_len > end)
+ return false;
+
+ /* Here only check ADD_RANGE/TAIL/HEAD which will read data when do
+ * journal rescan before do CRC check. Other tags length check will
+ * rely on CRC check.
+ */
+ switch (tl->fc_tag) {
+ case EXT4_FC_TAG_ADD_RANGE:
+ return (sizeof(struct ext4_fc_add_range) == tl->fc_len);
+ case EXT4_FC_TAG_TAIL:
+ return (sizeof(struct ext4_fc_tail) <= tl->fc_len);
+ case EXT4_FC_TAG_HEAD:
+ return (sizeof(struct ext4_fc_head) == tl->fc_len);
+ case EXT4_FC_TAG_DEL_RANGE:
+ case EXT4_FC_TAG_LINK:
+ case EXT4_FC_TAG_UNLINK:
+ case EXT4_FC_TAG_CREAT:
+ case EXT4_FC_TAG_INODE:
+ case EXT4_FC_TAG_PAD:
+ default:
+ return true;
+ }
+}
+
/*
* Recovery Scan phase handler
*
@@ -2032,10 +2060,15 @@ static int ext4_fc_replay_scan(journal_t *journal,
}
state->fc_replay_expected_off++;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
+ if (!ext4_fc_tag_len_isvalid(&tl, val, end)) {
+ ret = state->fc_replay_num_tags ?
+ JBD2_FC_REPLAY_STOP : -ECANCELED;
+ goto out_err;
+ }
ext4_debug("Scan phase, tag:%s, blk %lld\n",
tag2str(tl.fc_tag), bh->b_blocknr);
switch (tl.fc_tag) {
@@ -2146,7 +2179,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
start = (u8 *)bh->b_data;
end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
@@ -2156,6 +2189,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
ext4_fc_set_bitmaps_and_counters(sb);
break;
}
+
ext4_debug("Replay phase, tag:%s\n", tag2str(tl.fc_tag));
state->fc_replay_num_tags--;
switch (tl.fc_tag) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1b45cc5c7b92 ("ext4: fix potential out of bound read in ext4_fc_replay_scan()")
dcc5827484d6 ("ext4: factor out ext4_fc_get_tl()")
fdc2a3c75dd8 ("ext4: introduce EXT4_FC_TAG_BASE_LEN helper")
ccbf8eeb39f2 ("ext4: fix miss release buffer head in ext4_fc_write_inode")
4978c659e7b5 ("ext4: use ext4_debug() instead of jbd_debug()")
d9bf099cb980 ("ext4: add commit_tid info in jbd debug log")
0915e464cb27 ("ext4: simplify updating of fast commit stats")
7bbbe241ec7c ("ext4: drop ineligible txn start stop APIs")
02f310fcf47f ("ext4: Speedup ext4 orphan inode handling")
25c6d98fc4c2 ("ext4: Move orphan inode handling into a separate file")
188c299e2a26 ("ext4: Support for checksumming from journal triggers")
bd2c38cf1726 ("ext4: Make sure quota files are not grabbed accidentally")
facec450a824 ("ext4: reduce arguments of ext4_fc_add_dentry_tlv")
b9a037b7f3c4 ("ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle")
a7ba36bc94f2 ("ext4: fix fast commit alignment issues")
fcdf3c34b7ab ("ext4: fix debug format string warning")
3088e5a5153c ("ext4: fix various seppling typos")
72ffb49a7b62 ("ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()")
c915fb80eaa6 ("ext4: fix bh ref count on error paths")
efc61345274d ("ext4: shrink race window in ext4_should_retry_alloc()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1b45cc5c7b920fd8bf72e5a888ec7abeadf41e09 Mon Sep 17 00:00:00 2001
From: Ye Bin <yebin10(a)huawei.com>
Date: Sat, 24 Sep 2022 15:52:33 +0800
Subject: [PATCH] ext4: fix potential out of bound read in
ext4_fc_replay_scan()
For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain
space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read
when mounting corrupt file system image.
ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this
three tags will read data during scan, tag length couldn't less than data length
which will read.
Cc: stable(a)kernel.org
Signed-off-by: Ye Bin <yebin10(a)huawei.com>
Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 54622005a0c8..ef05bfa87798 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1976,6 +1976,34 @@ void ext4_fc_replay_cleanup(struct super_block *sb)
kfree(sbi->s_fc_replay_state.fc_modified_inodes);
}
+static inline bool ext4_fc_tag_len_isvalid(struct ext4_fc_tl *tl,
+ u8 *val, u8 *end)
+{
+ if (val + tl->fc_len > end)
+ return false;
+
+ /* Here only check ADD_RANGE/TAIL/HEAD which will read data when do
+ * journal rescan before do CRC check. Other tags length check will
+ * rely on CRC check.
+ */
+ switch (tl->fc_tag) {
+ case EXT4_FC_TAG_ADD_RANGE:
+ return (sizeof(struct ext4_fc_add_range) == tl->fc_len);
+ case EXT4_FC_TAG_TAIL:
+ return (sizeof(struct ext4_fc_tail) <= tl->fc_len);
+ case EXT4_FC_TAG_HEAD:
+ return (sizeof(struct ext4_fc_head) == tl->fc_len);
+ case EXT4_FC_TAG_DEL_RANGE:
+ case EXT4_FC_TAG_LINK:
+ case EXT4_FC_TAG_UNLINK:
+ case EXT4_FC_TAG_CREAT:
+ case EXT4_FC_TAG_INODE:
+ case EXT4_FC_TAG_PAD:
+ default:
+ return true;
+ }
+}
+
/*
* Recovery Scan phase handler
*
@@ -2032,10 +2060,15 @@ static int ext4_fc_replay_scan(journal_t *journal,
}
state->fc_replay_expected_off++;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
+ if (!ext4_fc_tag_len_isvalid(&tl, val, end)) {
+ ret = state->fc_replay_num_tags ?
+ JBD2_FC_REPLAY_STOP : -ECANCELED;
+ goto out_err;
+ }
ext4_debug("Scan phase, tag:%s, blk %lld\n",
tag2str(tl.fc_tag), bh->b_blocknr);
switch (tl.fc_tag) {
@@ -2146,7 +2179,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
start = (u8 *)bh->b_data;
end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
@@ -2156,6 +2189,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
ext4_fc_set_bitmaps_and_counters(sb);
break;
}
+
ext4_debug("Replay phase, tag:%s\n", tag2str(tl.fc_tag));
state->fc_replay_num_tags--;
switch (tl.fc_tag) {
The patch below does not apply to the 5.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1b45cc5c7b92 ("ext4: fix potential out of bound read in ext4_fc_replay_scan()")
dcc5827484d6 ("ext4: factor out ext4_fc_get_tl()")
fdc2a3c75dd8 ("ext4: introduce EXT4_FC_TAG_BASE_LEN helper")
ccbf8eeb39f2 ("ext4: fix miss release buffer head in ext4_fc_write_inode")
4978c659e7b5 ("ext4: use ext4_debug() instead of jbd_debug()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1b45cc5c7b920fd8bf72e5a888ec7abeadf41e09 Mon Sep 17 00:00:00 2001
From: Ye Bin <yebin10(a)huawei.com>
Date: Sat, 24 Sep 2022 15:52:33 +0800
Subject: [PATCH] ext4: fix potential out of bound read in
ext4_fc_replay_scan()
For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain
space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read
when mounting corrupt file system image.
ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this
three tags will read data during scan, tag length couldn't less than data length
which will read.
Cc: stable(a)kernel.org
Signed-off-by: Ye Bin <yebin10(a)huawei.com>
Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 54622005a0c8..ef05bfa87798 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1976,6 +1976,34 @@ void ext4_fc_replay_cleanup(struct super_block *sb)
kfree(sbi->s_fc_replay_state.fc_modified_inodes);
}
+static inline bool ext4_fc_tag_len_isvalid(struct ext4_fc_tl *tl,
+ u8 *val, u8 *end)
+{
+ if (val + tl->fc_len > end)
+ return false;
+
+ /* Here only check ADD_RANGE/TAIL/HEAD which will read data when do
+ * journal rescan before do CRC check. Other tags length check will
+ * rely on CRC check.
+ */
+ switch (tl->fc_tag) {
+ case EXT4_FC_TAG_ADD_RANGE:
+ return (sizeof(struct ext4_fc_add_range) == tl->fc_len);
+ case EXT4_FC_TAG_TAIL:
+ return (sizeof(struct ext4_fc_tail) <= tl->fc_len);
+ case EXT4_FC_TAG_HEAD:
+ return (sizeof(struct ext4_fc_head) == tl->fc_len);
+ case EXT4_FC_TAG_DEL_RANGE:
+ case EXT4_FC_TAG_LINK:
+ case EXT4_FC_TAG_UNLINK:
+ case EXT4_FC_TAG_CREAT:
+ case EXT4_FC_TAG_INODE:
+ case EXT4_FC_TAG_PAD:
+ default:
+ return true;
+ }
+}
+
/*
* Recovery Scan phase handler
*
@@ -2032,10 +2060,15 @@ static int ext4_fc_replay_scan(journal_t *journal,
}
state->fc_replay_expected_off++;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
+ if (!ext4_fc_tag_len_isvalid(&tl, val, end)) {
+ ret = state->fc_replay_num_tags ?
+ JBD2_FC_REPLAY_STOP : -ECANCELED;
+ goto out_err;
+ }
ext4_debug("Scan phase, tag:%s, blk %lld\n",
tag2str(tl.fc_tag), bh->b_blocknr);
switch (tl.fc_tag) {
@@ -2146,7 +2179,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
start = (u8 *)bh->b_data;
end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
@@ -2156,6 +2189,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
ext4_fc_set_bitmaps_and_counters(sb);
break;
}
+
ext4_debug("Replay phase, tag:%s\n", tag2str(tl.fc_tag));
state->fc_replay_num_tags--;
switch (tl.fc_tag) {
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1b45cc5c7b92 ("ext4: fix potential out of bound read in ext4_fc_replay_scan()")
dcc5827484d6 ("ext4: factor out ext4_fc_get_tl()")
fdc2a3c75dd8 ("ext4: introduce EXT4_FC_TAG_BASE_LEN helper")
ccbf8eeb39f2 ("ext4: fix miss release buffer head in ext4_fc_write_inode")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1b45cc5c7b920fd8bf72e5a888ec7abeadf41e09 Mon Sep 17 00:00:00 2001
From: Ye Bin <yebin10(a)huawei.com>
Date: Sat, 24 Sep 2022 15:52:33 +0800
Subject: [PATCH] ext4: fix potential out of bound read in
ext4_fc_replay_scan()
For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain
space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read
when mounting corrupt file system image.
ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this
three tags will read data during scan, tag length couldn't less than data length
which will read.
Cc: stable(a)kernel.org
Signed-off-by: Ye Bin <yebin10(a)huawei.com>
Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 54622005a0c8..ef05bfa87798 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1976,6 +1976,34 @@ void ext4_fc_replay_cleanup(struct super_block *sb)
kfree(sbi->s_fc_replay_state.fc_modified_inodes);
}
+static inline bool ext4_fc_tag_len_isvalid(struct ext4_fc_tl *tl,
+ u8 *val, u8 *end)
+{
+ if (val + tl->fc_len > end)
+ return false;
+
+ /* Here only check ADD_RANGE/TAIL/HEAD which will read data when do
+ * journal rescan before do CRC check. Other tags length check will
+ * rely on CRC check.
+ */
+ switch (tl->fc_tag) {
+ case EXT4_FC_TAG_ADD_RANGE:
+ return (sizeof(struct ext4_fc_add_range) == tl->fc_len);
+ case EXT4_FC_TAG_TAIL:
+ return (sizeof(struct ext4_fc_tail) <= tl->fc_len);
+ case EXT4_FC_TAG_HEAD:
+ return (sizeof(struct ext4_fc_head) == tl->fc_len);
+ case EXT4_FC_TAG_DEL_RANGE:
+ case EXT4_FC_TAG_LINK:
+ case EXT4_FC_TAG_UNLINK:
+ case EXT4_FC_TAG_CREAT:
+ case EXT4_FC_TAG_INODE:
+ case EXT4_FC_TAG_PAD:
+ default:
+ return true;
+ }
+}
+
/*
* Recovery Scan phase handler
*
@@ -2032,10 +2060,15 @@ static int ext4_fc_replay_scan(journal_t *journal,
}
state->fc_replay_expected_off++;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
+ if (!ext4_fc_tag_len_isvalid(&tl, val, end)) {
+ ret = state->fc_replay_num_tags ?
+ JBD2_FC_REPLAY_STOP : -ECANCELED;
+ goto out_err;
+ }
ext4_debug("Scan phase, tag:%s, blk %lld\n",
tag2str(tl.fc_tag), bh->b_blocknr);
switch (tl.fc_tag) {
@@ -2146,7 +2179,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
start = (u8 *)bh->b_data;
end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
- for (cur = start; cur < end;
+ for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN;
cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) {
ext4_fc_get_tl(&tl, cur);
val = cur + EXT4_FC_TAG_BASE_LEN;
@@ -2156,6 +2189,7 @@ static int ext4_fc_replay(journal_t *journal, struct buffer_head *bh,
ext4_fc_set_bitmaps_and_counters(sb);
break;
}
+
ext4_debug("Replay phase, tag:%s\n", tag2str(tl.fc_tag));
state->fc_replay_num_tags--;
switch (tl.fc_tag) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7177dd009c7c ("ext4: fix dir corruption when ext4_dx_add_entry() fails")
188c299e2a26 ("ext4: Support for checksumming from journal triggers")
c915fb80eaa6 ("ext4: fix bh ref count on error paths")
a3f5cf14ff91 ("ext4: drop ext4_handle_dirty_super()")
05c2c00f3769 ("ext4: protect superblock modifications with a buffer lock")
4392fbc4bab5 ("ext4: drop sync argument of ext4_commit_super()")
c92dc856848f ("ext4: defer saving error info from atomic context")
02a7780e4d2f ("ext4: simplify ext4 error translation")
4067662388f9 ("ext4: move functions in super.c")
014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()")
b08070eca9e2 ("ext4: don't remount read-only with errors=continue on reboot")
ca9b404ff137 ("ext4: print quota journalling mode on (re-)mount")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7177dd009c7c04290891e9a534cd47d1b620bd04 Mon Sep 17 00:00:00 2001
From: Zhihao Cheng <chengzhihao1(a)huawei.com>
Date: Sun, 11 Sep 2022 12:52:04 +0800
Subject: [PATCH] ext4: fix dir corruption when ext4_dx_add_entry() fails
Following process may lead to fs corruption:
1. ext4_create(dir/foo)
ext4_add_nondir
ext4_add_entry
ext4_dx_add_entry
a. add_dirent_to_buf
ext4_mark_inode_dirty
ext4_handle_dirty_metadata // dir inode bh is recorded into journal
b. ext4_append // dx_get_count(entries) == dx_get_limit(entries)
ext4_bread(EXT4_GET_BLOCKS_CREATE)
ext4_getblk
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
dquot_alloc_block
dquot_alloc_space_nodirty
inode_add_bytes // update dir's i_blocks
ext4_ext_insert_extent
ext4_ext_dirty // record extent bh into journal
ext4_handle_dirty_metadata(bh)
// record new block into journal
inode->i_size += inode->i_sb->s_blocksize // new size(in mem)
c. ext4_handle_dirty_dx_node(bh2)
// record dir's new block(dx_node) into journal
d. ext4_handle_dirty_dx_node((frame - 1)->bh)
e. ext4_handle_dirty_dx_node(frame->bh)
f. do_split // ret err!
g. add_dirent_to_buf
ext4_mark_inode_dirty(dir) // update raw_inode on disk(skipped)
2. fsck -a /dev/sdb
drop last block(dx_node) which beyonds dir's i_size.
/dev/sdb: recovering journal
/dev/sdb contains a file system with errors, check forced.
/dev/sdb: Inode 12, end of extent exceeds allowed value
(logical block 128, physical block 3938, len 1)
3. fsck -fn /dev/sdb
dx_node->entry[i].blk > dir->i_size
Pass 2: Checking directory structure
Problem in HTREE directory inode 12 (/dir): bad block number 128.
Clear HTree index? no
Problem in HTREE directory inode 12: block #3 has invalid depth (2)
Problem in HTREE directory inode 12: block #3 has bad max hash
Problem in HTREE directory inode 12: block #3 not referenced
Fix it by marking inode dirty directly inside ext4_append().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216466
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220911045204.516460-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index bc2e0612ec32..4183a4cb4a21 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -85,15 +85,20 @@ static struct buffer_head *ext4_append(handle_t *handle,
return bh;
inode->i_size += inode->i_sb->s_blocksize;
EXT4_I(inode)->i_disksize = inode->i_size;
+ err = ext4_mark_inode_dirty(handle, inode);
+ if (err)
+ goto out;
BUFFER_TRACE(bh, "get_write_access");
err = ext4_journal_get_write_access(handle, inode->i_sb, bh,
EXT4_JTR_NONE);
- if (err) {
- brelse(bh);
- ext4_std_error(inode->i_sb, err);
- return ERR_PTR(err);
- }
+ if (err)
+ goto out;
return bh;
+
+out:
+ brelse(bh);
+ ext4_std_error(inode->i_sb, err);
+ return ERR_PTR(err);
}
static int ext4_dx_csum_verify(struct inode *inode,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7177dd009c7c ("ext4: fix dir corruption when ext4_dx_add_entry() fails")
188c299e2a26 ("ext4: Support for checksumming from journal triggers")
c915fb80eaa6 ("ext4: fix bh ref count on error paths")
a3f5cf14ff91 ("ext4: drop ext4_handle_dirty_super()")
05c2c00f3769 ("ext4: protect superblock modifications with a buffer lock")
4392fbc4bab5 ("ext4: drop sync argument of ext4_commit_super()")
c92dc856848f ("ext4: defer saving error info from atomic context")
02a7780e4d2f ("ext4: simplify ext4 error translation")
4067662388f9 ("ext4: move functions in super.c")
014c9caa29d3 ("ext4: make ext4_abort() use __ext4_error()")
b08070eca9e2 ("ext4: don't remount read-only with errors=continue on reboot")
ca9b404ff137 ("ext4: print quota journalling mode on (re-)mount")
9b5f6c9b83d9 ("ext4: make s_mount_flags modifications atomic")
ababea77bc50 ("ext4: use s_mount_flags instead of s_mount_state for fast commit state")
8016e29f4362 ("ext4: fast commit recovery path")
5b849b5f96b4 ("jbd2: fast commit recovery path")
aa75f4d3daae ("ext4: main fast-commit commit path")
ff780b91efe9 ("jbd2: add fast commit machinery")
6866d7b3f2bb ("ext4 / jbd2: add fast commit initialization")
995a3ed67fc8 ("ext4: add fast_commit feature and handling for extended mount options")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7177dd009c7c04290891e9a534cd47d1b620bd04 Mon Sep 17 00:00:00 2001
From: Zhihao Cheng <chengzhihao1(a)huawei.com>
Date: Sun, 11 Sep 2022 12:52:04 +0800
Subject: [PATCH] ext4: fix dir corruption when ext4_dx_add_entry() fails
Following process may lead to fs corruption:
1. ext4_create(dir/foo)
ext4_add_nondir
ext4_add_entry
ext4_dx_add_entry
a. add_dirent_to_buf
ext4_mark_inode_dirty
ext4_handle_dirty_metadata // dir inode bh is recorded into journal
b. ext4_append // dx_get_count(entries) == dx_get_limit(entries)
ext4_bread(EXT4_GET_BLOCKS_CREATE)
ext4_getblk
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
dquot_alloc_block
dquot_alloc_space_nodirty
inode_add_bytes // update dir's i_blocks
ext4_ext_insert_extent
ext4_ext_dirty // record extent bh into journal
ext4_handle_dirty_metadata(bh)
// record new block into journal
inode->i_size += inode->i_sb->s_blocksize // new size(in mem)
c. ext4_handle_dirty_dx_node(bh2)
// record dir's new block(dx_node) into journal
d. ext4_handle_dirty_dx_node((frame - 1)->bh)
e. ext4_handle_dirty_dx_node(frame->bh)
f. do_split // ret err!
g. add_dirent_to_buf
ext4_mark_inode_dirty(dir) // update raw_inode on disk(skipped)
2. fsck -a /dev/sdb
drop last block(dx_node) which beyonds dir's i_size.
/dev/sdb: recovering journal
/dev/sdb contains a file system with errors, check forced.
/dev/sdb: Inode 12, end of extent exceeds allowed value
(logical block 128, physical block 3938, len 1)
3. fsck -fn /dev/sdb
dx_node->entry[i].blk > dir->i_size
Pass 2: Checking directory structure
Problem in HTREE directory inode 12 (/dir): bad block number 128.
Clear HTree index? no
Problem in HTREE directory inode 12: block #3 has invalid depth (2)
Problem in HTREE directory inode 12: block #3 has bad max hash
Problem in HTREE directory inode 12: block #3 not referenced
Fix it by marking inode dirty directly inside ext4_append().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216466
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220911045204.516460-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index bc2e0612ec32..4183a4cb4a21 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -85,15 +85,20 @@ static struct buffer_head *ext4_append(handle_t *handle,
return bh;
inode->i_size += inode->i_sb->s_blocksize;
EXT4_I(inode)->i_disksize = inode->i_size;
+ err = ext4_mark_inode_dirty(handle, inode);
+ if (err)
+ goto out;
BUFFER_TRACE(bh, "get_write_access");
err = ext4_journal_get_write_access(handle, inode->i_sb, bh,
EXT4_JTR_NONE);
- if (err) {
- brelse(bh);
- ext4_std_error(inode->i_sb, err);
- return ERR_PTR(err);
- }
+ if (err)
+ goto out;
return bh;
+
+out:
+ brelse(bh);
+ ext4_std_error(inode->i_sb, err);
+ return ERR_PTR(err);
}
static int ext4_dx_csum_verify(struct inode *inode,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a642c2c0827f ("ext4: fix i_version handling in ext4")
1ff20307393e ("ext4: unconditionally enable the i_version counter")
50f094a5580e ("ext4: don't increase iversion counter for ea_inodes")
960e0ab63b2e ("ext4: fix i_version handling on remount")
4437992be7ca ("ext4: remove lazytime/nolazytime mount options handled by MS_LAZYTIME")
cebe85d570cf ("ext4: switch to the new mount api")
02f960f8db1c ("ext4: clean up return values in handle_mount_opt()")
7edfd85b1ffd ("ext4: Completely separate options parsing and sb setup")
6e47a3cc68fc ("ext4: get rid of super block and sbi from handle_mount_ops()")
b6bd243500b6 ("ext4: check ext2/3 compatibility outside handle_mount_opt()")
e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()")
da812f611934 ("ext4: Allow sb to be NULL in ext4_msg()")
461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter")
4c94bff967d9 ("ext4: move option validation to a separate function")
e5a185c26c11 ("ext4: Add fs parameter specifications for mount options")
9a089b21f79b ("ext4: Send notifications on error")
2e5fd489a4e5 ("Merge tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a642c2c0827f5604a93f9fa1e5701eecdce4ae22 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton(a)kernel.org>
Date: Thu, 8 Sep 2022 13:24:42 -0400
Subject: [PATCH] ext4: fix i_version handling in ext4
ext4 currently updates the i_version counter when the atime is updated
during a read. This is less than ideal as it can cause unnecessary cache
invalidations with NFSv4 and unnecessary remeasurements for IMA.
The increment in ext4_mark_iloc_dirty is also problematic since it can
corrupt the i_version counter for ea_inodes. We aren't bumping the file
times in ext4_mark_iloc_dirty, so changing the i_version there seems
wrong, and is the cause of both problems.
Remove that callsite and add increments to the setattr, setxattr and
ioctl codepaths, at the same times that we update the ctime. The
i_version bump that already happens during timestamp updates should take
care of the rest.
In ext4_move_extents, increment the i_version on both inodes, and also
add in missing ctime updates.
[ Some minor updates since we've already enabled the i_version counter
unconditionally already via another patch series. -- TYT ]
Cc: stable(a)kernel.org
Cc: Lukas Czerner <lczerner(a)redhat.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Signed-off-by: Jeff Layton <jlayton(a)kernel.org>
Link: https://lore.kernel.org/r/20220908172448.208585-3-jlayton@kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a56c57375456..6da73be32bff 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5349,6 +5349,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
int error, rc = 0;
int orphan = 0;
const unsigned int ia_valid = attr->ia_valid;
+ bool inc_ivers = true;
if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
return -EIO;
@@ -5432,8 +5433,8 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
return -EINVAL;
}
- if (attr->ia_size != inode->i_size)
- inode_inc_iversion(inode);
+ if (attr->ia_size == inode->i_size)
+ inc_ivers = false;
if (shrink) {
if (ext4_should_order_data(inode)) {
@@ -5535,6 +5536,8 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
}
if (!error) {
+ if (inc_ivers)
+ inode_inc_iversion(inode);
setattr_copy(mnt_userns, inode, attr);
mark_inode_dirty(inode);
}
@@ -5738,13 +5741,6 @@ int ext4_mark_iloc_dirty(handle_t *handle,
}
ext4_fc_track_inode(handle, inode);
- /*
- * ea_inodes are using i_version for storing reference count, don't
- * mess with it
- */
- if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
- inode_inc_iversion(inode);
-
/* the do_update_inode consumes one bh->b_count */
get_bh(iloc->bh);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 3cf3ec4b1c21..ad3a294a88eb 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -452,6 +452,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
swap_inode_data(inode, inode_bl);
inode->i_ctime = inode_bl->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
inode->i_generation = prandom_u32();
inode_bl->i_generation = prandom_u32();
@@ -665,6 +666,7 @@ static int ext4_ioctl_setflags(struct inode *inode,
ext4_set_inode_flags(inode, false);
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
err = ext4_mark_iloc_dirty(handle, inode, &iloc);
flags_err:
@@ -775,6 +777,7 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
EXT4_I(inode)->i_projid = kprojid;
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
out_dirty:
rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
if (!err)
@@ -1257,6 +1260,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
err = ext4_reserve_inode_write(handle, inode, &iloc);
if (err == 0) {
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
inode->i_generation = generation;
err = ext4_mark_iloc_dirty(handle, inode, &iloc);
}
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 533216e80fa2..36d6ba7190b6 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -2412,6 +2412,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
if (!error) {
ext4_xattr_update_super_block(handle, inode->i_sb);
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
if (!value)
no_expand = 0;
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a642c2c0827f ("ext4: fix i_version handling in ext4")
1ff20307393e ("ext4: unconditionally enable the i_version counter")
50f094a5580e ("ext4: don't increase iversion counter for ea_inodes")
960e0ab63b2e ("ext4: fix i_version handling on remount")
4437992be7ca ("ext4: remove lazytime/nolazytime mount options handled by MS_LAZYTIME")
cebe85d570cf ("ext4: switch to the new mount api")
02f960f8db1c ("ext4: clean up return values in handle_mount_opt()")
7edfd85b1ffd ("ext4: Completely separate options parsing and sb setup")
6e47a3cc68fc ("ext4: get rid of super block and sbi from handle_mount_ops()")
b6bd243500b6 ("ext4: check ext2/3 compatibility outside handle_mount_opt()")
e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()")
da812f611934 ("ext4: Allow sb to be NULL in ext4_msg()")
461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter")
4c94bff967d9 ("ext4: move option validation to a separate function")
e5a185c26c11 ("ext4: Add fs parameter specifications for mount options")
9a089b21f79b ("ext4: Send notifications on error")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a642c2c0827f5604a93f9fa1e5701eecdce4ae22 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton(a)kernel.org>
Date: Thu, 8 Sep 2022 13:24:42 -0400
Subject: [PATCH] ext4: fix i_version handling in ext4
ext4 currently updates the i_version counter when the atime is updated
during a read. This is less than ideal as it can cause unnecessary cache
invalidations with NFSv4 and unnecessary remeasurements for IMA.
The increment in ext4_mark_iloc_dirty is also problematic since it can
corrupt the i_version counter for ea_inodes. We aren't bumping the file
times in ext4_mark_iloc_dirty, so changing the i_version there seems
wrong, and is the cause of both problems.
Remove that callsite and add increments to the setattr, setxattr and
ioctl codepaths, at the same times that we update the ctime. The
i_version bump that already happens during timestamp updates should take
care of the rest.
In ext4_move_extents, increment the i_version on both inodes, and also
add in missing ctime updates.
[ Some minor updates since we've already enabled the i_version counter
unconditionally already via another patch series. -- TYT ]
Cc: stable(a)kernel.org
Cc: Lukas Czerner <lczerner(a)redhat.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Signed-off-by: Jeff Layton <jlayton(a)kernel.org>
Link: https://lore.kernel.org/r/20220908172448.208585-3-jlayton@kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a56c57375456..6da73be32bff 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5349,6 +5349,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
int error, rc = 0;
int orphan = 0;
const unsigned int ia_valid = attr->ia_valid;
+ bool inc_ivers = true;
if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
return -EIO;
@@ -5432,8 +5433,8 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
return -EINVAL;
}
- if (attr->ia_size != inode->i_size)
- inode_inc_iversion(inode);
+ if (attr->ia_size == inode->i_size)
+ inc_ivers = false;
if (shrink) {
if (ext4_should_order_data(inode)) {
@@ -5535,6 +5536,8 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
}
if (!error) {
+ if (inc_ivers)
+ inode_inc_iversion(inode);
setattr_copy(mnt_userns, inode, attr);
mark_inode_dirty(inode);
}
@@ -5738,13 +5741,6 @@ int ext4_mark_iloc_dirty(handle_t *handle,
}
ext4_fc_track_inode(handle, inode);
- /*
- * ea_inodes are using i_version for storing reference count, don't
- * mess with it
- */
- if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
- inode_inc_iversion(inode);
-
/* the do_update_inode consumes one bh->b_count */
get_bh(iloc->bh);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 3cf3ec4b1c21..ad3a294a88eb 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -452,6 +452,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
swap_inode_data(inode, inode_bl);
inode->i_ctime = inode_bl->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
inode->i_generation = prandom_u32();
inode_bl->i_generation = prandom_u32();
@@ -665,6 +666,7 @@ static int ext4_ioctl_setflags(struct inode *inode,
ext4_set_inode_flags(inode, false);
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
err = ext4_mark_iloc_dirty(handle, inode, &iloc);
flags_err:
@@ -775,6 +777,7 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
EXT4_I(inode)->i_projid = kprojid;
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
out_dirty:
rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
if (!err)
@@ -1257,6 +1260,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
err = ext4_reserve_inode_write(handle, inode, &iloc);
if (err == 0) {
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
inode->i_generation = generation;
err = ext4_mark_iloc_dirty(handle, inode, &iloc);
}
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 533216e80fa2..36d6ba7190b6 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -2412,6 +2412,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
if (!error) {
ext4_xattr_update_super_block(handle, inode->i_sb);
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
if (!value)
no_expand = 0;
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a642c2c0827f ("ext4: fix i_version handling in ext4")
1ff20307393e ("ext4: unconditionally enable the i_version counter")
50f094a5580e ("ext4: don't increase iversion counter for ea_inodes")
960e0ab63b2e ("ext4: fix i_version handling on remount")
4437992be7ca ("ext4: remove lazytime/nolazytime mount options handled by MS_LAZYTIME")
cebe85d570cf ("ext4: switch to the new mount api")
02f960f8db1c ("ext4: clean up return values in handle_mount_opt()")
7edfd85b1ffd ("ext4: Completely separate options parsing and sb setup")
6e47a3cc68fc ("ext4: get rid of super block and sbi from handle_mount_ops()")
b6bd243500b6 ("ext4: check ext2/3 compatibility outside handle_mount_opt()")
e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()")
da812f611934 ("ext4: Allow sb to be NULL in ext4_msg()")
461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter")
4c94bff967d9 ("ext4: move option validation to a separate function")
e5a185c26c11 ("ext4: Add fs parameter specifications for mount options")
9a089b21f79b ("ext4: Send notifications on error")
2e5fd489a4e5 ("Merge tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a642c2c0827f5604a93f9fa1e5701eecdce4ae22 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton(a)kernel.org>
Date: Thu, 8 Sep 2022 13:24:42 -0400
Subject: [PATCH] ext4: fix i_version handling in ext4
ext4 currently updates the i_version counter when the atime is updated
during a read. This is less than ideal as it can cause unnecessary cache
invalidations with NFSv4 and unnecessary remeasurements for IMA.
The increment in ext4_mark_iloc_dirty is also problematic since it can
corrupt the i_version counter for ea_inodes. We aren't bumping the file
times in ext4_mark_iloc_dirty, so changing the i_version there seems
wrong, and is the cause of both problems.
Remove that callsite and add increments to the setattr, setxattr and
ioctl codepaths, at the same times that we update the ctime. The
i_version bump that already happens during timestamp updates should take
care of the rest.
In ext4_move_extents, increment the i_version on both inodes, and also
add in missing ctime updates.
[ Some minor updates since we've already enabled the i_version counter
unconditionally already via another patch series. -- TYT ]
Cc: stable(a)kernel.org
Cc: Lukas Czerner <lczerner(a)redhat.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Signed-off-by: Jeff Layton <jlayton(a)kernel.org>
Link: https://lore.kernel.org/r/20220908172448.208585-3-jlayton@kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a56c57375456..6da73be32bff 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5349,6 +5349,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
int error, rc = 0;
int orphan = 0;
const unsigned int ia_valid = attr->ia_valid;
+ bool inc_ivers = true;
if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
return -EIO;
@@ -5432,8 +5433,8 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
return -EINVAL;
}
- if (attr->ia_size != inode->i_size)
- inode_inc_iversion(inode);
+ if (attr->ia_size == inode->i_size)
+ inc_ivers = false;
if (shrink) {
if (ext4_should_order_data(inode)) {
@@ -5535,6 +5536,8 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
}
if (!error) {
+ if (inc_ivers)
+ inode_inc_iversion(inode);
setattr_copy(mnt_userns, inode, attr);
mark_inode_dirty(inode);
}
@@ -5738,13 +5741,6 @@ int ext4_mark_iloc_dirty(handle_t *handle,
}
ext4_fc_track_inode(handle, inode);
- /*
- * ea_inodes are using i_version for storing reference count, don't
- * mess with it
- */
- if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
- inode_inc_iversion(inode);
-
/* the do_update_inode consumes one bh->b_count */
get_bh(iloc->bh);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 3cf3ec4b1c21..ad3a294a88eb 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -452,6 +452,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
swap_inode_data(inode, inode_bl);
inode->i_ctime = inode_bl->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
inode->i_generation = prandom_u32();
inode_bl->i_generation = prandom_u32();
@@ -665,6 +666,7 @@ static int ext4_ioctl_setflags(struct inode *inode,
ext4_set_inode_flags(inode, false);
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
err = ext4_mark_iloc_dirty(handle, inode, &iloc);
flags_err:
@@ -775,6 +777,7 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
EXT4_I(inode)->i_projid = kprojid;
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
out_dirty:
rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
if (!err)
@@ -1257,6 +1260,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
err = ext4_reserve_inode_write(handle, inode, &iloc);
if (err == 0) {
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
inode->i_generation = generation;
err = ext4_mark_iloc_dirty(handle, inode, &iloc);
}
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 533216e80fa2..36d6ba7190b6 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -2412,6 +2412,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
if (!error) {
ext4_xattr_update_super_block(handle, inode->i_sb);
inode->i_ctime = current_time(inode);
+ inode_inc_iversion(inode);
if (!value)
no_expand = 0;
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1ff20307393e ("ext4: unconditionally enable the i_version counter")
50f094a5580e ("ext4: don't increase iversion counter for ea_inodes")
960e0ab63b2e ("ext4: fix i_version handling on remount")
4437992be7ca ("ext4: remove lazytime/nolazytime mount options handled by MS_LAZYTIME")
cebe85d570cf ("ext4: switch to the new mount api")
02f960f8db1c ("ext4: clean up return values in handle_mount_opt()")
7edfd85b1ffd ("ext4: Completely separate options parsing and sb setup")
6e47a3cc68fc ("ext4: get rid of super block and sbi from handle_mount_ops()")
b6bd243500b6 ("ext4: check ext2/3 compatibility outside handle_mount_opt()")
e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()")
da812f611934 ("ext4: Allow sb to be NULL in ext4_msg()")
461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter")
4c94bff967d9 ("ext4: move option validation to a separate function")
e5a185c26c11 ("ext4: Add fs parameter specifications for mount options")
9a089b21f79b ("ext4: Send notifications on error")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ff20307393e17dc57fde62226df625a3a3c36e9 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton(a)kernel.org>
Date: Wed, 24 Aug 2022 18:03:49 +0200
Subject: [PATCH] ext4: unconditionally enable the i_version counter
The original i_version implementation was pretty expensive, requiring a
log flush on every change. Because of this, it was gated behind a mount
option (implemented via the MS_I_VERSION mountoption flag).
Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
i_version flag much less expensive, so there is no longer a performance
penalty from enabling it. xfs and btrfs already enable it
unconditionally when the on-disk format can support it.
Have ext4 ignore the SB_I_VERSION flag, and just enable it
unconditionally. While we're in here, mark the i_version mount
option Opt_removed.
[ Removed leftover bits of i_version from ext4_apply_options() since it
now can't ever be set in ctx->mask_s_flags -- lczerner ]
Cc: stable(a)kernel.org
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: Benjamin Coddington <bcodding(a)redhat.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Lukas Czerner <lczerner(a)redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220824160349.39664-3-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2a220be34caa..c77d40f05763 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
return -EINVAL;
}
- if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
+ if (attr->ia_size != inode->i_size)
inode_inc_iversion(inode);
if (shrink) {
@@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
* ea_inodes are using i_version for storing reference count, don't
* mess with it
*/
- if (IS_I_VERSION(inode) &&
- !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+ if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
inode_inc_iversion(inode);
/* the do_update_inode consumes one bh->b_count */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6bb0f493221c..2a7af6cbe5e0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1585,7 +1585,7 @@ enum {
Opt_inlinecrypt,
Opt_usrjquota, Opt_grpjquota, Opt_quota,
Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
- Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
+ Opt_usrquota, Opt_grpquota, Opt_prjquota,
Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
@@ -1692,7 +1692,7 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
fsparam_flag ("barrier", Opt_barrier),
fsparam_u32 ("barrier", Opt_barrier),
fsparam_flag ("nobarrier", Opt_nobarrier),
- fsparam_flag ("i_version", Opt_i_version),
+ fsparam_flag ("i_version", Opt_removed),
fsparam_flag ("dax", Opt_dax),
fsparam_enum ("dax", Opt_dax_type, ext4_param_dax),
fsparam_u32 ("stripe", Opt_stripe),
@@ -2131,11 +2131,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
case Opt_abort:
ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
return 0;
- case Opt_i_version:
- ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
- ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
- ctx_set_flags(ctx, SB_I_VERSION);
- return 0;
case Opt_inlinecrypt:
#ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
ctx_set_flags(ctx, SB_INLINECRYPT);
@@ -2805,14 +2800,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
sb->s_flags &= ~ctx->mask_s_flags;
sb->s_flags |= ctx->vals_s_flags;
- /*
- * i_version differs from common mount option iversion so we have
- * to let vfs know that it was set, otherwise it would get cleared
- * on remount
- */
- if (ctx->mask_s_flags & SB_I_VERSION)
- fc->sb_flags |= SB_I_VERSION;
-
#define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
APPLY(s_commit_interval);
APPLY(s_stripe);
@@ -2961,8 +2948,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
- if (sb->s_flags & SB_I_VERSION)
- SEQ_OPTS_PUTS("i_version");
if (nodefs || sbi->s_stripe)
SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
if (nodefs || EXT4_MOUNT_DATA_FLAGS &
@@ -4632,6 +4617,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
+ /* i_version is always enabled now */
+ sb->s_flags |= SB_I_VERSION;
+
if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
(ext4_has_compat_features(sb) ||
ext4_has_ro_compat_features(sb) ||
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1ff20307393e ("ext4: unconditionally enable the i_version counter")
50f094a5580e ("ext4: don't increase iversion counter for ea_inodes")
960e0ab63b2e ("ext4: fix i_version handling on remount")
4437992be7ca ("ext4: remove lazytime/nolazytime mount options handled by MS_LAZYTIME")
cebe85d570cf ("ext4: switch to the new mount api")
02f960f8db1c ("ext4: clean up return values in handle_mount_opt()")
7edfd85b1ffd ("ext4: Completely separate options parsing and sb setup")
6e47a3cc68fc ("ext4: get rid of super block and sbi from handle_mount_ops()")
b6bd243500b6 ("ext4: check ext2/3 compatibility outside handle_mount_opt()")
e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()")
da812f611934 ("ext4: Allow sb to be NULL in ext4_msg()")
461c3af045d3 ("ext4: Change handle_mount_opt() to use fs_parameter")
4c94bff967d9 ("ext4: move option validation to a separate function")
e5a185c26c11 ("ext4: Add fs parameter specifications for mount options")
9a089b21f79b ("ext4: Send notifications on error")
2e5fd489a4e5 ("Merge tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ff20307393e17dc57fde62226df625a3a3c36e9 Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton(a)kernel.org>
Date: Wed, 24 Aug 2022 18:03:49 +0200
Subject: [PATCH] ext4: unconditionally enable the i_version counter
The original i_version implementation was pretty expensive, requiring a
log flush on every change. Because of this, it was gated behind a mount
option (implemented via the MS_I_VERSION mountoption flag).
Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
i_version flag much less expensive, so there is no longer a performance
penalty from enabling it. xfs and btrfs already enable it
unconditionally when the on-disk format can support it.
Have ext4 ignore the SB_I_VERSION flag, and just enable it
unconditionally. While we're in here, mark the i_version mount
option Opt_removed.
[ Removed leftover bits of i_version from ext4_apply_options() since it
now can't ever be set in ctx->mask_s_flags -- lczerner ]
Cc: stable(a)kernel.org
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: Benjamin Coddington <bcodding(a)redhat.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Lukas Czerner <lczerner(a)redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220824160349.39664-3-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2a220be34caa..c77d40f05763 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
return -EINVAL;
}
- if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
+ if (attr->ia_size != inode->i_size)
inode_inc_iversion(inode);
if (shrink) {
@@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
* ea_inodes are using i_version for storing reference count, don't
* mess with it
*/
- if (IS_I_VERSION(inode) &&
- !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+ if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
inode_inc_iversion(inode);
/* the do_update_inode consumes one bh->b_count */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6bb0f493221c..2a7af6cbe5e0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1585,7 +1585,7 @@ enum {
Opt_inlinecrypt,
Opt_usrjquota, Opt_grpjquota, Opt_quota,
Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
- Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
+ Opt_usrquota, Opt_grpquota, Opt_prjquota,
Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
@@ -1692,7 +1692,7 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
fsparam_flag ("barrier", Opt_barrier),
fsparam_u32 ("barrier", Opt_barrier),
fsparam_flag ("nobarrier", Opt_nobarrier),
- fsparam_flag ("i_version", Opt_i_version),
+ fsparam_flag ("i_version", Opt_removed),
fsparam_flag ("dax", Opt_dax),
fsparam_enum ("dax", Opt_dax_type, ext4_param_dax),
fsparam_u32 ("stripe", Opt_stripe),
@@ -2131,11 +2131,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
case Opt_abort:
ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
return 0;
- case Opt_i_version:
- ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
- ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
- ctx_set_flags(ctx, SB_I_VERSION);
- return 0;
case Opt_inlinecrypt:
#ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
ctx_set_flags(ctx, SB_INLINECRYPT);
@@ -2805,14 +2800,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
sb->s_flags &= ~ctx->mask_s_flags;
sb->s_flags |= ctx->vals_s_flags;
- /*
- * i_version differs from common mount option iversion so we have
- * to let vfs know that it was set, otherwise it would get cleared
- * on remount
- */
- if (ctx->mask_s_flags & SB_I_VERSION)
- fc->sb_flags |= SB_I_VERSION;
-
#define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
APPLY(s_commit_interval);
APPLY(s_stripe);
@@ -2961,8 +2948,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
- if (sb->s_flags & SB_I_VERSION)
- SEQ_OPTS_PUTS("i_version");
if (nodefs || sbi->s_stripe)
SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
if (nodefs || EXT4_MOUNT_DATA_FLAGS &
@@ -4632,6 +4617,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
+ /* i_version is always enabled now */
+ sb->s_flags |= SB_I_VERSION;
+
if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
(ext4_has_compat_features(sb) ||
ext4_has_ro_compat_features(sb) ||
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d766f2d1e3e3 ("ext2: Add sanity checks for group and filesystem size")
d9e9866803f7 ("ext2: Adjust indentation in ext2_fill_super")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d766f2d1e3e3bd44024a7f971ffcf8b8fbb7c5d2 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 14 Sep 2022 17:24:42 +0200
Subject: [PATCH] ext2: Add sanity checks for group and filesystem size
Add sanity check that filesystem size does not exceed the underlying
device size and that group size is big enough so that metadata can fit
into it. This avoid trying to mount some crafted filesystems with
extremely large group counts.
Reported-by: syzbot+0f2f7e65a3007d39539f(a)syzkaller.appspotmail.com
Reported-by: kernel test robot <oliver.sang(a)intel.com> # Test fixup
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 252c742379cf..afb31af9302d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1052,6 +1052,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_blocks_per_group);
goto failed_mount;
}
+ /* At least inode table, bitmaps, and sb have to fit in one group */
+ if (sbi->s_blocks_per_group <= sbi->s_itb_per_group + 3) {
+ ext2_msg(sb, KERN_ERR,
+ "error: #blocks per group smaller than metadata size: %lu <= %lu",
+ sbi->s_blocks_per_group, sbi->s_inodes_per_group + 3);
+ goto failed_mount;
+ }
if (sbi->s_frags_per_group > sb->s_blocksize * 8) {
ext2_msg(sb, KERN_ERR,
"error: #fragments per group too big: %lu",
@@ -1065,9 +1072,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_inodes_per_group);
goto failed_mount;
}
+ if (sb_bdev_nr_blocks(sb) < le32_to_cpu(es->s_blocks_count)) {
+ ext2_msg(sb, KERN_ERR,
+ "bad geometry: block count %u exceeds size of device (%u blocks)",
+ le32_to_cpu(es->s_blocks_count),
+ (unsigned)sb_bdev_nr_blocks(sb));
+ goto failed_mount;
+ }
- if (EXT2_BLOCKS_PER_GROUP(sb) == 0)
- goto cantfind_ext2;
sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) - 1)
/ EXT2_BLOCKS_PER_GROUP(sb)) + 1;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d766f2d1e3e3 ("ext2: Add sanity checks for group and filesystem size")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d766f2d1e3e3bd44024a7f971ffcf8b8fbb7c5d2 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 14 Sep 2022 17:24:42 +0200
Subject: [PATCH] ext2: Add sanity checks for group and filesystem size
Add sanity check that filesystem size does not exceed the underlying
device size and that group size is big enough so that metadata can fit
into it. This avoid trying to mount some crafted filesystems with
extremely large group counts.
Reported-by: syzbot+0f2f7e65a3007d39539f(a)syzkaller.appspotmail.com
Reported-by: kernel test robot <oliver.sang(a)intel.com> # Test fixup
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 252c742379cf..afb31af9302d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1052,6 +1052,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_blocks_per_group);
goto failed_mount;
}
+ /* At least inode table, bitmaps, and sb have to fit in one group */
+ if (sbi->s_blocks_per_group <= sbi->s_itb_per_group + 3) {
+ ext2_msg(sb, KERN_ERR,
+ "error: #blocks per group smaller than metadata size: %lu <= %lu",
+ sbi->s_blocks_per_group, sbi->s_inodes_per_group + 3);
+ goto failed_mount;
+ }
if (sbi->s_frags_per_group > sb->s_blocksize * 8) {
ext2_msg(sb, KERN_ERR,
"error: #fragments per group too big: %lu",
@@ -1065,9 +1072,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_inodes_per_group);
goto failed_mount;
}
+ if (sb_bdev_nr_blocks(sb) < le32_to_cpu(es->s_blocks_count)) {
+ ext2_msg(sb, KERN_ERR,
+ "bad geometry: block count %u exceeds size of device (%u blocks)",
+ le32_to_cpu(es->s_blocks_count),
+ (unsigned)sb_bdev_nr_blocks(sb));
+ goto failed_mount;
+ }
- if (EXT2_BLOCKS_PER_GROUP(sb) == 0)
- goto cantfind_ext2;
sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) - 1)
/ EXT2_BLOCKS_PER_GROUP(sb)) + 1;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d766f2d1e3e3 ("ext2: Add sanity checks for group and filesystem size")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d766f2d1e3e3bd44024a7f971ffcf8b8fbb7c5d2 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 14 Sep 2022 17:24:42 +0200
Subject: [PATCH] ext2: Add sanity checks for group and filesystem size
Add sanity check that filesystem size does not exceed the underlying
device size and that group size is big enough so that metadata can fit
into it. This avoid trying to mount some crafted filesystems with
extremely large group counts.
Reported-by: syzbot+0f2f7e65a3007d39539f(a)syzkaller.appspotmail.com
Reported-by: kernel test robot <oliver.sang(a)intel.com> # Test fixup
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 252c742379cf..afb31af9302d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1052,6 +1052,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_blocks_per_group);
goto failed_mount;
}
+ /* At least inode table, bitmaps, and sb have to fit in one group */
+ if (sbi->s_blocks_per_group <= sbi->s_itb_per_group + 3) {
+ ext2_msg(sb, KERN_ERR,
+ "error: #blocks per group smaller than metadata size: %lu <= %lu",
+ sbi->s_blocks_per_group, sbi->s_inodes_per_group + 3);
+ goto failed_mount;
+ }
if (sbi->s_frags_per_group > sb->s_blocksize * 8) {
ext2_msg(sb, KERN_ERR,
"error: #fragments per group too big: %lu",
@@ -1065,9 +1072,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_inodes_per_group);
goto failed_mount;
}
+ if (sb_bdev_nr_blocks(sb) < le32_to_cpu(es->s_blocks_count)) {
+ ext2_msg(sb, KERN_ERR,
+ "bad geometry: block count %u exceeds size of device (%u blocks)",
+ le32_to_cpu(es->s_blocks_count),
+ (unsigned)sb_bdev_nr_blocks(sb));
+ goto failed_mount;
+ }
- if (EXT2_BLOCKS_PER_GROUP(sb) == 0)
- goto cantfind_ext2;
sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) - 1)
/ EXT2_BLOCKS_PER_GROUP(sb)) + 1;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d766f2d1e3e3 ("ext2: Add sanity checks for group and filesystem size")
d9e9866803f7 ("ext2: Adjust indentation in ext2_fill_super")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d766f2d1e3e3bd44024a7f971ffcf8b8fbb7c5d2 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 14 Sep 2022 17:24:42 +0200
Subject: [PATCH] ext2: Add sanity checks for group and filesystem size
Add sanity check that filesystem size does not exceed the underlying
device size and that group size is big enough so that metadata can fit
into it. This avoid trying to mount some crafted filesystems with
extremely large group counts.
Reported-by: syzbot+0f2f7e65a3007d39539f(a)syzkaller.appspotmail.com
Reported-by: kernel test robot <oliver.sang(a)intel.com> # Test fixup
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 252c742379cf..afb31af9302d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1052,6 +1052,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_blocks_per_group);
goto failed_mount;
}
+ /* At least inode table, bitmaps, and sb have to fit in one group */
+ if (sbi->s_blocks_per_group <= sbi->s_itb_per_group + 3) {
+ ext2_msg(sb, KERN_ERR,
+ "error: #blocks per group smaller than metadata size: %lu <= %lu",
+ sbi->s_blocks_per_group, sbi->s_inodes_per_group + 3);
+ goto failed_mount;
+ }
if (sbi->s_frags_per_group > sb->s_blocksize * 8) {
ext2_msg(sb, KERN_ERR,
"error: #fragments per group too big: %lu",
@@ -1065,9 +1072,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_inodes_per_group);
goto failed_mount;
}
+ if (sb_bdev_nr_blocks(sb) < le32_to_cpu(es->s_blocks_count)) {
+ ext2_msg(sb, KERN_ERR,
+ "bad geometry: block count %u exceeds size of device (%u blocks)",
+ le32_to_cpu(es->s_blocks_count),
+ (unsigned)sb_bdev_nr_blocks(sb));
+ goto failed_mount;
+ }
- if (EXT2_BLOCKS_PER_GROUP(sb) == 0)
- goto cantfind_ext2;
sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) - 1)
/ EXT2_BLOCKS_PER_GROUP(sb)) + 1;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d766f2d1e3e3 ("ext2: Add sanity checks for group and filesystem size")
d9e9866803f7 ("ext2: Adjust indentation in ext2_fill_super")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d766f2d1e3e3bd44024a7f971ffcf8b8fbb7c5d2 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 14 Sep 2022 17:24:42 +0200
Subject: [PATCH] ext2: Add sanity checks for group and filesystem size
Add sanity check that filesystem size does not exceed the underlying
device size and that group size is big enough so that metadata can fit
into it. This avoid trying to mount some crafted filesystems with
extremely large group counts.
Reported-by: syzbot+0f2f7e65a3007d39539f(a)syzkaller.appspotmail.com
Reported-by: kernel test robot <oliver.sang(a)intel.com> # Test fixup
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 252c742379cf..afb31af9302d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1052,6 +1052,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_blocks_per_group);
goto failed_mount;
}
+ /* At least inode table, bitmaps, and sb have to fit in one group */
+ if (sbi->s_blocks_per_group <= sbi->s_itb_per_group + 3) {
+ ext2_msg(sb, KERN_ERR,
+ "error: #blocks per group smaller than metadata size: %lu <= %lu",
+ sbi->s_blocks_per_group, sbi->s_inodes_per_group + 3);
+ goto failed_mount;
+ }
if (sbi->s_frags_per_group > sb->s_blocksize * 8) {
ext2_msg(sb, KERN_ERR,
"error: #fragments per group too big: %lu",
@@ -1065,9 +1072,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_inodes_per_group);
goto failed_mount;
}
+ if (sb_bdev_nr_blocks(sb) < le32_to_cpu(es->s_blocks_count)) {
+ ext2_msg(sb, KERN_ERR,
+ "bad geometry: block count %u exceeds size of device (%u blocks)",
+ le32_to_cpu(es->s_blocks_count),
+ (unsigned)sb_bdev_nr_blocks(sb));
+ goto failed_mount;
+ }
- if (EXT2_BLOCKS_PER_GROUP(sb) == 0)
- goto cantfind_ext2;
sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) - 1)
/ EXT2_BLOCKS_PER_GROUP(sb)) + 1;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d766f2d1e3e3 ("ext2: Add sanity checks for group and filesystem size")
d9e9866803f7 ("ext2: Adjust indentation in ext2_fill_super")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d766f2d1e3e3bd44024a7f971ffcf8b8fbb7c5d2 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Wed, 14 Sep 2022 17:24:42 +0200
Subject: [PATCH] ext2: Add sanity checks for group and filesystem size
Add sanity check that filesystem size does not exceed the underlying
device size and that group size is big enough so that metadata can fit
into it. This avoid trying to mount some crafted filesystems with
extremely large group counts.
Reported-by: syzbot+0f2f7e65a3007d39539f(a)syzkaller.appspotmail.com
Reported-by: kernel test robot <oliver.sang(a)intel.com> # Test fixup
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 252c742379cf..afb31af9302d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1052,6 +1052,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_blocks_per_group);
goto failed_mount;
}
+ /* At least inode table, bitmaps, and sb have to fit in one group */
+ if (sbi->s_blocks_per_group <= sbi->s_itb_per_group + 3) {
+ ext2_msg(sb, KERN_ERR,
+ "error: #blocks per group smaller than metadata size: %lu <= %lu",
+ sbi->s_blocks_per_group, sbi->s_inodes_per_group + 3);
+ goto failed_mount;
+ }
if (sbi->s_frags_per_group > sb->s_blocksize * 8) {
ext2_msg(sb, KERN_ERR,
"error: #fragments per group too big: %lu",
@@ -1065,9 +1072,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_inodes_per_group);
goto failed_mount;
}
+ if (sb_bdev_nr_blocks(sb) < le32_to_cpu(es->s_blocks_count)) {
+ ext2_msg(sb, KERN_ERR,
+ "bad geometry: block count %u exceeds size of device (%u blocks)",
+ le32_to_cpu(es->s_blocks_count),
+ (unsigned)sb_bdev_nr_blocks(sb));
+ goto failed_mount;
+ }
- if (EXT2_BLOCKS_PER_GROUP(sb) == 0)
- goto cantfind_ext2;
sbi->s_groups_count = ((le32_to_cpu(es->s_blocks_count) -
le32_to_cpu(es->s_first_data_block) - 1)
/ EXT2_BLOCKS_PER_GROUP(sb)) + 1;
Hi Greg and Sasha,
Please find attached backports for commit 607e57c6c62c ("hardening:
Remove Clang's enable flag for -ftrivial-auto-var-init=zero") along with
prerequisite changes for 5.10 and 5.15. This change is necessary to keep
CONFIG_INIT_STACK_ALL_ZERO working with clang-16. This change is already
queued up for 5.19 and newer and it is not relevant for 5.4 and older
(at least upstream, I'll be doing backports for older Android branches
shortly). If there are any questions or problems, please let me know!
Cheers,
Nathan
Hi Greg,
please apply the following two cc-stable patches to 4.14-stable.
Ryusuke Konishi (2):
nilfs2: fix lockdep warnings in page operations for btree nodes
nilfs2: fix lockdep warnings during disk space reclamation
During testing nilfs2 filesystem with stable trees, I encountered a
lockdep warning followed by a kernel panic (by panic_on_warn) only in
4.14-stable.
I found that the cause was the lack of these patches which are applied
to other newer stable trees. I guess they were dropped since the
first patch was not applicable as is due to a pagevec change.
After I manually applied them to 4.14, the panic and lockdep warning
have gone. So, I believe these should be backported to 4.14-stable as
well.
Thanks,
Ryusuke Konishi
fs/nilfs2/btnode.c | 23 ++++++-
fs/nilfs2/btnode.h | 1 +
fs/nilfs2/btree.c | 27 +++++---
fs/nilfs2/dat.c | 4 +-
fs/nilfs2/gcinode.c | 7 +-
fs/nilfs2/inode.c | 159 ++++++++++++++++++++++++++++++++++++++++----
fs/nilfs2/mdt.c | 43 ++++++++----
fs/nilfs2/mdt.h | 6 +-
fs/nilfs2/nilfs.h | 16 ++---
fs/nilfs2/page.c | 7 +-
fs/nilfs2/segment.c | 11 +--
fs/nilfs2/super.c | 5 +-
12 files changed, 242 insertions(+), 67 deletions(-)
--
2.31.1
From: Aurelien Jarno <aurelien(a)aurel32.net>
commit 6df2a016c0c8a3d0933ef33dd192ea6606b115e3 upstream.
From version 2.38, binutils default to ISA spec version 20191213. This
means that the csr read/write (csrr*/csrw*) instructions and fence.i
instruction has separated from the `I` extension, become two standalone
extensions: Zicsr and Zifencei. As the kernel uses those instruction,
this causes the following build failure:
CC arch/riscv/kernel/vdso/vgettimeofday.o
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h: Assembler messages:
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01'
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01'
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01'
<<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:71: Error: unrecognized opcode `csrr a5,0xc01'
The fix is to specify those extensions explicitly in -march. However as
older binutils version do not support this, we first need to detect
that.
Signed-off-by: Aurelien Jarno <aurelien(a)aurel32.net>
Tested-by: Alexandre Ghiti <alexandre.ghiti(a)canonical.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
[Conor: converted to the 4.19 style of march string generation]
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
---
CC: Palmer Dabbelt <palmer(a)dabbelt.com>
CC: linux-riscv(a)lists.infradead.org
CC: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
lkp has yet to complain, so here you go Greg..
arch/riscv/Makefile | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 110be14e6122..b024029a2247 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -49,9 +49,16 @@ ifeq ($(CONFIG_RISCV_ISA_C),y)
KBUILD_ARCH_C = c
endif
-KBUILD_AFLAGS += -march=$(KBUILD_MARCH)$(KBUILD_ARCH_A)fd$(KBUILD_ARCH_C)
+# Newer binutils versions default to ISA spec version 20191213 which moves some
+# instructions from the I extension to the Zicsr and Zifencei extensions.
+toolchain-need-zicsr-zifencei := $(call cc-option-yn, -march=$(riscv-march-y)_zicsr_zifencei)
+ifeq ($(toolchain-need-zicsr-zifencei),y)
+ KBUILD_ARCH_ZISCR_ZIFENCEI = _zicsr_zifencei
+endif
+
+KBUILD_AFLAGS += -march=$(KBUILD_MARCH)$(KBUILD_ARCH_A)fd$(KBUILD_ARCH_C)$(KBUILD_ARCH_ZISCR_ZIFENCEI)
-KBUILD_CFLAGS += -march=$(KBUILD_MARCH)$(KBUILD_ARCH_A)$(KBUILD_ARCH_C)
+KBUILD_CFLAGS += -march=$(KBUILD_MARCH)$(KBUILD_ARCH_A)$(KBUILD_ARCH_C)$(KBUILD_ARCH_ZISCR_ZIFENCEI)
KBUILD_CFLAGS += -mno-save-restore
KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET)
--
2.37.3
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cef7820d6abf ("btrfs: fix missed extent on fsync after dropping extent maps")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cef7820d6abf8d61f8e1db411eae3c712f6d72a2 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 19 Sep 2022 15:06:28 +0100
Subject: [PATCH] btrfs: fix missed extent on fsync after dropping extent maps
When dropping extent maps for a range, through btrfs_drop_extent_cache(),
if we find an extent map that starts before our target range and/or ends
before the target range, and we are not able to allocate extent maps for
splitting that extent map, then we don't fail and simply remove the entire
extent map from the inode's extent map tree.
This is generally fine, because in case anyone needs to access the extent
map, it can just load it again later from the respective file extent
item(s) in the subvolume btree. However, if that extent map is new and is
in the list of modified extents, then a fast fsync will miss the parts of
the extent that were outside our range (that needed to be split),
therefore not logging them. Fix that by marking the inode for a full
fsync. This issue was introduced after removing BUG_ON()s triggered when
the split extent map allocations failed, done by commit 7014cdb49305ed
("Btrfs: btrfs_drop_extent_cache should never fail"), back in 2012, and
the fast fsync path already existed but was very recent.
Also, in the case where we could allocate extent maps for the split
operations but then fail to add a split extent map to the tree, mark the
inode for a full fsync as well. This is not supposed to ever fail, and we
assert that, but in case assertions are disabled (CONFIG_BTRFS_ASSERT is
not set), it's the correct thing to do to make sure a fast fsync will not
miss a new extent.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Anand Jain <anand.jain(a)oracle.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index df958fdb140c..8786b2b8f0a1 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -523,6 +523,7 @@ void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end,
testend = 0;
}
while (1) {
+ bool ends_after_range = false;
int no_splits = 0;
modified = false;
@@ -539,10 +540,12 @@ void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end,
write_unlock(&em_tree->lock);
break;
}
+ if (testend && em->start + em->len > start + len)
+ ends_after_range = true;
flags = em->flags;
gen = em->generation;
if (skip_pinned && test_bit(EXTENT_FLAG_PINNED, &em->flags)) {
- if (testend && em->start + em->len >= start + len) {
+ if (ends_after_range) {
free_extent_map(em);
write_unlock(&em_tree->lock);
break;
@@ -592,7 +595,7 @@ void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end,
split = split2;
split2 = NULL;
}
- if (testend && em->start + em->len > start + len) {
+ if (ends_after_range) {
u64 diff = start + len - em->start;
split->start = start + len;
@@ -630,14 +633,42 @@ void btrfs_drop_extent_cache(struct btrfs_inode *inode, u64 start, u64 end,
} else {
ret = add_extent_mapping(em_tree, split,
modified);
- ASSERT(ret == 0); /* Logic error */
+ /* Logic error, shouldn't happen. */
+ ASSERT(ret == 0);
+ if (WARN_ON(ret != 0) && modified)
+ btrfs_set_inode_full_sync(inode);
}
free_extent_map(split);
split = NULL;
}
next:
- if (extent_map_in_tree(em))
+ if (extent_map_in_tree(em)) {
+ /*
+ * If the extent map is still in the tree it means that
+ * either of the following is true:
+ *
+ * 1) It fits entirely in our range (doesn't end beyond
+ * it or starts before it);
+ *
+ * 2) It starts before our range and/or ends after our
+ * range, and we were not able to allocate the extent
+ * maps for split operations, @split and @split2.
+ *
+ * If we are at case 2) then we just remove the entire
+ * extent map - this is fine since if anyone needs it to
+ * access the subranges outside our range, will just
+ * load it again from the subvolume tree's file extent
+ * item. However if the extent map was in the list of
+ * modified extents, then we must mark the inode for a
+ * full fsync, otherwise a fast fsync will miss this
+ * extent if it's new and needs to be logged.
+ */
+ if ((em->start < start || ends_after_range) && modified) {
+ ASSERT(no_splits);
+ btrfs_set_inode_full_sync(inode);
+ }
remove_extent_mapping(em_tree, em);
+ }
write_unlock(&em_tree->lock);
/* once for us */
@@ -2260,14 +2291,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
atomic_inc(&root->log_batch);
- /*
- * Always check for the full sync flag while holding the inode's lock,
- * to avoid races with other tasks. The flag must be either set all the
- * time during logging or always off all the time while logging.
- */
- full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
- &BTRFS_I(inode)->runtime_flags);
-
/*
* Before we acquired the inode's lock and the mmap lock, someone may
* have dirtied more pages in the target range. We need to make sure
@@ -2292,6 +2315,17 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
goto out;
}
+ /*
+ * Always check for the full sync flag while holding the inode's lock,
+ * to avoid races with other tasks. The flag must be either set all the
+ * time during logging or always off all the time while logging.
+ * We check the flag here after starting delalloc above, because when
+ * running delalloc the full sync flag may be set if we need to drop
+ * extra extent map ranges due to temporary memory allocation failures.
+ */
+ full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
+ &BTRFS_I(inode)->runtime_flags);
+
/*
* We have to do this here to avoid the priority inversion of waiting on
* IO of a lower priority task while holding a transaction open.
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cbddcc4fa344 ("btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer")
b40130b23ca4 ("btrfs: fix lockdep splat with reloc root extent buffers")
0a27a0474d14 ("btrfs: move lockdep class helpers to locking.c")
b4be6aefa73c ("btrfs: do not start relocation until in progress drops are done")
fdfbf020664b ("btrfs: rework async transaction committing")
54230013d41f ("btrfs: get rid of root->orphan_cleanup_state")
ae5d29d4e70a ("btrfs: inline wait_current_trans_commit_start in its caller")
32cc4f8759e1 ("btrfs: sink wait_for_unblock parameter to async commit")
e9306ad4ef5c ("btrfs: more graceful errors/warnings on 32bit systems when reaching limits")
bc03f39ec3c1 ("btrfs: use a bit to track the existence of tree mod log users")
406808ab2f0b ("btrfs: use booleans where appropriate for the tree mod log functions")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
2f96e40212d4 ("btrfs: fix possible free space tree corruption with online conversion")
1aaac38c83a2 ("btrfs: don't allow tree block to cross page boundary for subpage support")
948462294577 ("btrfs: keep sb cache_generation consistent with space_cache")
8b228324a8ce ("btrfs: clear free space tree on ro->rw remount")
8cd2908846d1 ("btrfs: clear oneshot options on mount and remount")
5011139a4718 ("btrfs: create free space tree on ro->rw remount")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cbddcc4fa3443fe8cfb2ff8e210deb1f6a0eea38 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Date: Tue, 20 Sep 2022 22:43:51 +0900
Subject: [PATCH] btrfs: set generation before calling btrfs_clean_tree_block
in btrfs_init_new_buffer
syzbot is reporting uninit-value in btrfs_clean_tree_block() [1], for
commit bc877d285ca3dba2 ("btrfs: Deduplicate extent_buffer init code")
missed that btrfs_set_header_generation() in btrfs_init_new_buffer() must
not be moved to after clean_tree_block() because clean_tree_block() is
calling btrfs_header_generation() since commit 55c69072d6bd5be1 ("Btrfs:
Fix extent_buffer usage when nodesize != leafsize").
Since memzero_extent_buffer() will reset "struct btrfs_header" part, we
can't move btrfs_set_header_generation() to before memzero_extent_buffer().
Just re-add btrfs_set_header_generation() before btrfs_clean_tree_block().
Link: https://syzkaller.appspot.com/bug?extid=fba8e2116a12609b6c59 [1]
Reported-by: syzbot <syzbot+fba8e2116a12609b6c59(a)syzkaller.appspotmail.com>
Fixes: bc877d285ca3dba2 ("btrfs: Deduplicate extent_buffer init code")
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ae94be318a0f..cd2d36580f1a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4890,6 +4890,9 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
!test_bit(BTRFS_ROOT_RESET_LOCKDEP_CLASS, &root->state))
lockdep_owner = BTRFS_FS_TREE_OBJECTID;
+ /* btrfs_clean_tree_block() accesses generation field. */
+ btrfs_set_header_generation(buf, trans->transid);
+
/*
* This needs to stay, because we could allocate a freed block from an
* old tree into a new tree, so we need to make sure this new block is
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cbddcc4fa344 ("btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer")
b40130b23ca4 ("btrfs: fix lockdep splat with reloc root extent buffers")
0a27a0474d14 ("btrfs: move lockdep class helpers to locking.c")
b4be6aefa73c ("btrfs: do not start relocation until in progress drops are done")
fdfbf020664b ("btrfs: rework async transaction committing")
54230013d41f ("btrfs: get rid of root->orphan_cleanup_state")
ae5d29d4e70a ("btrfs: inline wait_current_trans_commit_start in its caller")
32cc4f8759e1 ("btrfs: sink wait_for_unblock parameter to async commit")
e9306ad4ef5c ("btrfs: more graceful errors/warnings on 32bit systems when reaching limits")
bc03f39ec3c1 ("btrfs: use a bit to track the existence of tree mod log users")
406808ab2f0b ("btrfs: use booleans where appropriate for the tree mod log functions")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
2f96e40212d4 ("btrfs: fix possible free space tree corruption with online conversion")
1aaac38c83a2 ("btrfs: don't allow tree block to cross page boundary for subpage support")
948462294577 ("btrfs: keep sb cache_generation consistent with space_cache")
8b228324a8ce ("btrfs: clear free space tree on ro->rw remount")
8cd2908846d1 ("btrfs: clear oneshot options on mount and remount")
5011139a4718 ("btrfs: create free space tree on ro->rw remount")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cbddcc4fa3443fe8cfb2ff8e210deb1f6a0eea38 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Date: Tue, 20 Sep 2022 22:43:51 +0900
Subject: [PATCH] btrfs: set generation before calling btrfs_clean_tree_block
in btrfs_init_new_buffer
syzbot is reporting uninit-value in btrfs_clean_tree_block() [1], for
commit bc877d285ca3dba2 ("btrfs: Deduplicate extent_buffer init code")
missed that btrfs_set_header_generation() in btrfs_init_new_buffer() must
not be moved to after clean_tree_block() because clean_tree_block() is
calling btrfs_header_generation() since commit 55c69072d6bd5be1 ("Btrfs:
Fix extent_buffer usage when nodesize != leafsize").
Since memzero_extent_buffer() will reset "struct btrfs_header" part, we
can't move btrfs_set_header_generation() to before memzero_extent_buffer().
Just re-add btrfs_set_header_generation() before btrfs_clean_tree_block().
Link: https://syzkaller.appspot.com/bug?extid=fba8e2116a12609b6c59 [1]
Reported-by: syzbot <syzbot+fba8e2116a12609b6c59(a)syzkaller.appspotmail.com>
Fixes: bc877d285ca3dba2 ("btrfs: Deduplicate extent_buffer init code")
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ae94be318a0f..cd2d36580f1a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4890,6 +4890,9 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
!test_bit(BTRFS_ROOT_RESET_LOCKDEP_CLASS, &root->state))
lockdep_owner = BTRFS_FS_TREE_OBJECTID;
+ /* btrfs_clean_tree_block() accesses generation field. */
+ btrfs_set_header_generation(buf, trans->transid);
+
/*
* This needs to stay, because we could allocate a freed block from an
* old tree into a new tree, so we need to make sure this new block is
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cbddcc4fa344 ("btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer")
b40130b23ca4 ("btrfs: fix lockdep splat with reloc root extent buffers")
0a27a0474d14 ("btrfs: move lockdep class helpers to locking.c")
b4be6aefa73c ("btrfs: do not start relocation until in progress drops are done")
fdfbf020664b ("btrfs: rework async transaction committing")
54230013d41f ("btrfs: get rid of root->orphan_cleanup_state")
ae5d29d4e70a ("btrfs: inline wait_current_trans_commit_start in its caller")
32cc4f8759e1 ("btrfs: sink wait_for_unblock parameter to async commit")
e9306ad4ef5c ("btrfs: more graceful errors/warnings on 32bit systems when reaching limits")
bc03f39ec3c1 ("btrfs: use a bit to track the existence of tree mod log users")
406808ab2f0b ("btrfs: use booleans where appropriate for the tree mod log functions")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
2f96e40212d4 ("btrfs: fix possible free space tree corruption with online conversion")
1aaac38c83a2 ("btrfs: don't allow tree block to cross page boundary for subpage support")
948462294577 ("btrfs: keep sb cache_generation consistent with space_cache")
8b228324a8ce ("btrfs: clear free space tree on ro->rw remount")
8cd2908846d1 ("btrfs: clear oneshot options on mount and remount")
5011139a4718 ("btrfs: create free space tree on ro->rw remount")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cbddcc4fa3443fe8cfb2ff8e210deb1f6a0eea38 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Date: Tue, 20 Sep 2022 22:43:51 +0900
Subject: [PATCH] btrfs: set generation before calling btrfs_clean_tree_block
in btrfs_init_new_buffer
syzbot is reporting uninit-value in btrfs_clean_tree_block() [1], for
commit bc877d285ca3dba2 ("btrfs: Deduplicate extent_buffer init code")
missed that btrfs_set_header_generation() in btrfs_init_new_buffer() must
not be moved to after clean_tree_block() because clean_tree_block() is
calling btrfs_header_generation() since commit 55c69072d6bd5be1 ("Btrfs:
Fix extent_buffer usage when nodesize != leafsize").
Since memzero_extent_buffer() will reset "struct btrfs_header" part, we
can't move btrfs_set_header_generation() to before memzero_extent_buffer().
Just re-add btrfs_set_header_generation() before btrfs_clean_tree_block().
Link: https://syzkaller.appspot.com/bug?extid=fba8e2116a12609b6c59 [1]
Reported-by: syzbot <syzbot+fba8e2116a12609b6c59(a)syzkaller.appspotmail.com>
Fixes: bc877d285ca3dba2 ("btrfs: Deduplicate extent_buffer init code")
CC: stable(a)vger.kernel.org # 4.19+
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ae94be318a0f..cd2d36580f1a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4890,6 +4890,9 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
!test_bit(BTRFS_ROOT_RESET_LOCKDEP_CLASS, &root->state))
lockdep_owner = BTRFS_FS_TREE_OBJECTID;
+ /* btrfs_clean_tree_block() accesses generation field. */
+ btrfs_set_header_generation(buf, trans->transid);
+
/*
* This needs to stay, because we could allocate a freed block from an
* old tree into a new tree, so we need to make sure this new block is
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling")
dfe8aec4520b ("btrfs: add a btrfs_block_group_root() helper")
b6e9f16c5fda ("btrfs: replace open coded while loop with proper construct")
42437a6386ff ("btrfs: introduce mount option rescue=ignorebadroots")
68319c18cb21 ("btrfs: show rescue=usebackuproot in /proc/mounts")
ab0b4a3ebf14 ("btrfs: add a helper to print out rescue= options")
ceafe3cc3992 ("btrfs: sysfs: export supported rescue= mount options")
334c16d82cfe ("btrfs: push the NODATASUM check into btrfs_lookup_bio_sums")
d70bf7484f72 ("btrfs: unify the ro checking for mount options")
7573df5547c0 ("btrfs: sysfs: export supported send stream version")
3ef3959b29c4 ("btrfs: don't show full path of bind mounts in subvol=")
74ef00185eb8 ("btrfs: introduce "rescue=" mount option")
e3ba67a108ff ("btrfs: factor out reading of bg from find_frist_block_group")
89d7da9bc592 ("btrfs: get mapping tree directly from fsinfo in find_first_block_group")
c730ae0c6bb3 ("btrfs: convert comments to fallthrough annotations")
aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree")
92a7cc425223 ("btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE")
3be4d8efe3cf ("btrfs: block-group: rename write_one_cache_group()")
97f4728af888 ("btrfs: block-group: refactor how we insert a block group item")
7357623a7f4b ("btrfs: block-group: refactor how we delete one block group item")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 81d5d61454c365718655cfc87d8200c84e25d596 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Tue, 9 Aug 2022 13:02:16 +0800
Subject: [PATCH] btrfs: enhance unsupported compat RO flags handling
Currently there are two corner cases not handling compat RO flags
correctly:
- Remount
We can still mount the fs RO with compat RO flags, then remount it RW.
We should not allow any write into a fs with unsupported RO flags.
- Still try to search block group items
In fact, behavior/on-disk format change to extent tree should not
need a full incompat flag.
And since we can ensure fs with unsupported RO flags never got any
writes (with above case fixed), then we can even skip block group
items search at mount time.
This patch will enhance the unsupported RO compat flags by:
- Reject read-write remount if there are unsupported RO compat flags
- Go dummy block group items directly for unsupported RO compat flags
In fact, only changes to chunk/subvolume/root/csum trees should go
incompat flags.
The latter part should allow future change to extent tree to be compat
RO flags.
Thus this patch also needs to be backported to all stable trees.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 53c44c52cb79..e7b5a54c8258 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2164,7 +2164,16 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
int need_clear = 0;
u64 cache_gen;
- if (!root)
+ /*
+ * Either no extent root (with ibadroots rescue option) or we have
+ * unsupported RO options. The fs can never be mounted read-write, so no
+ * need to waste time searching block group items.
+ *
+ * This also allows new extent tree related changes to be RO compat,
+ * no need for a full incompat flag.
+ */
+ if (!root || (btrfs_super_compat_ro_flags(info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP))
return fill_dummy_bgs(info);
key.objectid = 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7291e9d67e92..eb0ae7e396ef 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2117,6 +2117,15 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
ret = -EINVAL;
goto restore;
}
+ if (btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP) {
+ btrfs_err(fs_info,
+ "can not remount read-write due to unsupported optional flags 0x%llx",
+ btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP);
+ ret = -EINVAL;
+ goto restore;
+ }
if (fs_info->fs_devices->rw_devices == 0) {
ret = -EACCES;
goto restore;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling")
dfe8aec4520b ("btrfs: add a btrfs_block_group_root() helper")
b6e9f16c5fda ("btrfs: replace open coded while loop with proper construct")
42437a6386ff ("btrfs: introduce mount option rescue=ignorebadroots")
68319c18cb21 ("btrfs: show rescue=usebackuproot in /proc/mounts")
ab0b4a3ebf14 ("btrfs: add a helper to print out rescue= options")
ceafe3cc3992 ("btrfs: sysfs: export supported rescue= mount options")
334c16d82cfe ("btrfs: push the NODATASUM check into btrfs_lookup_bio_sums")
d70bf7484f72 ("btrfs: unify the ro checking for mount options")
7573df5547c0 ("btrfs: sysfs: export supported send stream version")
3ef3959b29c4 ("btrfs: don't show full path of bind mounts in subvol=")
74ef00185eb8 ("btrfs: introduce "rescue=" mount option")
e3ba67a108ff ("btrfs: factor out reading of bg from find_frist_block_group")
89d7da9bc592 ("btrfs: get mapping tree directly from fsinfo in find_first_block_group")
c730ae0c6bb3 ("btrfs: convert comments to fallthrough annotations")
aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree")
92a7cc425223 ("btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE")
3be4d8efe3cf ("btrfs: block-group: rename write_one_cache_group()")
97f4728af888 ("btrfs: block-group: refactor how we insert a block group item")
7357623a7f4b ("btrfs: block-group: refactor how we delete one block group item")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 81d5d61454c365718655cfc87d8200c84e25d596 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Tue, 9 Aug 2022 13:02:16 +0800
Subject: [PATCH] btrfs: enhance unsupported compat RO flags handling
Currently there are two corner cases not handling compat RO flags
correctly:
- Remount
We can still mount the fs RO with compat RO flags, then remount it RW.
We should not allow any write into a fs with unsupported RO flags.
- Still try to search block group items
In fact, behavior/on-disk format change to extent tree should not
need a full incompat flag.
And since we can ensure fs with unsupported RO flags never got any
writes (with above case fixed), then we can even skip block group
items search at mount time.
This patch will enhance the unsupported RO compat flags by:
- Reject read-write remount if there are unsupported RO compat flags
- Go dummy block group items directly for unsupported RO compat flags
In fact, only changes to chunk/subvolume/root/csum trees should go
incompat flags.
The latter part should allow future change to extent tree to be compat
RO flags.
Thus this patch also needs to be backported to all stable trees.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 53c44c52cb79..e7b5a54c8258 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2164,7 +2164,16 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
int need_clear = 0;
u64 cache_gen;
- if (!root)
+ /*
+ * Either no extent root (with ibadroots rescue option) or we have
+ * unsupported RO options. The fs can never be mounted read-write, so no
+ * need to waste time searching block group items.
+ *
+ * This also allows new extent tree related changes to be RO compat,
+ * no need for a full incompat flag.
+ */
+ if (!root || (btrfs_super_compat_ro_flags(info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP))
return fill_dummy_bgs(info);
key.objectid = 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7291e9d67e92..eb0ae7e396ef 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2117,6 +2117,15 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
ret = -EINVAL;
goto restore;
}
+ if (btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP) {
+ btrfs_err(fs_info,
+ "can not remount read-write due to unsupported optional flags 0x%llx",
+ btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP);
+ ret = -EINVAL;
+ goto restore;
+ }
if (fs_info->fs_devices->rw_devices == 0) {
ret = -EACCES;
goto restore;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling")
dfe8aec4520b ("btrfs: add a btrfs_block_group_root() helper")
b6e9f16c5fda ("btrfs: replace open coded while loop with proper construct")
42437a6386ff ("btrfs: introduce mount option rescue=ignorebadroots")
68319c18cb21 ("btrfs: show rescue=usebackuproot in /proc/mounts")
ab0b4a3ebf14 ("btrfs: add a helper to print out rescue= options")
ceafe3cc3992 ("btrfs: sysfs: export supported rescue= mount options")
334c16d82cfe ("btrfs: push the NODATASUM check into btrfs_lookup_bio_sums")
d70bf7484f72 ("btrfs: unify the ro checking for mount options")
7573df5547c0 ("btrfs: sysfs: export supported send stream version")
3ef3959b29c4 ("btrfs: don't show full path of bind mounts in subvol=")
74ef00185eb8 ("btrfs: introduce "rescue=" mount option")
e3ba67a108ff ("btrfs: factor out reading of bg from find_frist_block_group")
89d7da9bc592 ("btrfs: get mapping tree directly from fsinfo in find_first_block_group")
c730ae0c6bb3 ("btrfs: convert comments to fallthrough annotations")
aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree")
92a7cc425223 ("btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE")
3be4d8efe3cf ("btrfs: block-group: rename write_one_cache_group()")
97f4728af888 ("btrfs: block-group: refactor how we insert a block group item")
7357623a7f4b ("btrfs: block-group: refactor how we delete one block group item")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 81d5d61454c365718655cfc87d8200c84e25d596 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Tue, 9 Aug 2022 13:02:16 +0800
Subject: [PATCH] btrfs: enhance unsupported compat RO flags handling
Currently there are two corner cases not handling compat RO flags
correctly:
- Remount
We can still mount the fs RO with compat RO flags, then remount it RW.
We should not allow any write into a fs with unsupported RO flags.
- Still try to search block group items
In fact, behavior/on-disk format change to extent tree should not
need a full incompat flag.
And since we can ensure fs with unsupported RO flags never got any
writes (with above case fixed), then we can even skip block group
items search at mount time.
This patch will enhance the unsupported RO compat flags by:
- Reject read-write remount if there are unsupported RO compat flags
- Go dummy block group items directly for unsupported RO compat flags
In fact, only changes to chunk/subvolume/root/csum trees should go
incompat flags.
The latter part should allow future change to extent tree to be compat
RO flags.
Thus this patch also needs to be backported to all stable trees.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 53c44c52cb79..e7b5a54c8258 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2164,7 +2164,16 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
int need_clear = 0;
u64 cache_gen;
- if (!root)
+ /*
+ * Either no extent root (with ibadroots rescue option) or we have
+ * unsupported RO options. The fs can never be mounted read-write, so no
+ * need to waste time searching block group items.
+ *
+ * This also allows new extent tree related changes to be RO compat,
+ * no need for a full incompat flag.
+ */
+ if (!root || (btrfs_super_compat_ro_flags(info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP))
return fill_dummy_bgs(info);
key.objectid = 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7291e9d67e92..eb0ae7e396ef 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2117,6 +2117,15 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
ret = -EINVAL;
goto restore;
}
+ if (btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP) {
+ btrfs_err(fs_info,
+ "can not remount read-write due to unsupported optional flags 0x%llx",
+ btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP);
+ ret = -EINVAL;
+ goto restore;
+ }
if (fs_info->fs_devices->rw_devices == 0) {
ret = -EACCES;
goto restore;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling")
dfe8aec4520b ("btrfs: add a btrfs_block_group_root() helper")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 81d5d61454c365718655cfc87d8200c84e25d596 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Tue, 9 Aug 2022 13:02:16 +0800
Subject: [PATCH] btrfs: enhance unsupported compat RO flags handling
Currently there are two corner cases not handling compat RO flags
correctly:
- Remount
We can still mount the fs RO with compat RO flags, then remount it RW.
We should not allow any write into a fs with unsupported RO flags.
- Still try to search block group items
In fact, behavior/on-disk format change to extent tree should not
need a full incompat flag.
And since we can ensure fs with unsupported RO flags never got any
writes (with above case fixed), then we can even skip block group
items search at mount time.
This patch will enhance the unsupported RO compat flags by:
- Reject read-write remount if there are unsupported RO compat flags
- Go dummy block group items directly for unsupported RO compat flags
In fact, only changes to chunk/subvolume/root/csum trees should go
incompat flags.
The latter part should allow future change to extent tree to be compat
RO flags.
Thus this patch also needs to be backported to all stable trees.
CC: stable(a)vger.kernel.org # 4.9+
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 53c44c52cb79..e7b5a54c8258 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2164,7 +2164,16 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
int need_clear = 0;
u64 cache_gen;
- if (!root)
+ /*
+ * Either no extent root (with ibadroots rescue option) or we have
+ * unsupported RO options. The fs can never be mounted read-write, so no
+ * need to waste time searching block group items.
+ *
+ * This also allows new extent tree related changes to be RO compat,
+ * no need for a full incompat flag.
+ */
+ if (!root || (btrfs_super_compat_ro_flags(info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP))
return fill_dummy_bgs(info);
key.objectid = 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7291e9d67e92..eb0ae7e396ef 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2117,6 +2117,15 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
ret = -EINVAL;
goto restore;
}
+ if (btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP) {
+ btrfs_err(fs_info,
+ "can not remount read-write due to unsupported optional flags 0x%llx",
+ btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP);
+ ret = -EINVAL;
+ goto restore;
+ }
if (fs_info->fs_devices->rw_devices == 0) {
ret = -EACCES;
goto restore;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
88541cb414b7 ("ksmbd: fix incorrect handling of iterate_dir")
65ca7a3ffff8 ("ksmbd: handle smb2 query dir request for OutputBufferLength that is too small")
04e260948a16 ("ksmbd: don't align last entry offset in smb2 query directory")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 88541cb414b7a2450c45fc9c131b37b5753b7679 Mon Sep 17 00:00:00 2001
From: Namjae Jeon <linkinjeon(a)kernel.org>
Date: Fri, 9 Sep 2022 17:43:53 +0900
Subject: [PATCH] ksmbd: fix incorrect handling of iterate_dir
if iterate_dir() returns non-negative value, caller has to treat it
as normal and check there is any error while populating dentry
information. ksmbd doesn't have to do anything because ksmbd already
checks too small OutputBufferLength to store one file information.
And because ctx->pos is set to file->f_pos when iterative_dir is called,
remove restart_ctx(). And if iterate_dir() return -EIO, which mean
directory entry is corrupted, return STATUS_FILE_CORRUPT_ERROR error
response.
This patch fixes some failure of SMB2_QUERY_DIRECTORY, which happens when
ntfs3 is local filesystem.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hyunchul Lee <hyc.lee(a)gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/ksmbd/smb2pdu.c b/fs/ksmbd/smb2pdu.c
index ba74aba2f1d3..634e21bba770 100644
--- a/fs/ksmbd/smb2pdu.c
+++ b/fs/ksmbd/smb2pdu.c
@@ -3809,11 +3809,6 @@ static int __query_dir(struct dir_context *ctx, const char *name, int namlen,
return 0;
}
-static void restart_ctx(struct dir_context *ctx)
-{
- ctx->pos = 0;
-}
-
static int verify_info_level(int info_level)
{
switch (info_level) {
@@ -3921,7 +3916,6 @@ int smb2_query_dir(struct ksmbd_work *work)
if (srch_flag & SMB2_REOPEN || srch_flag & SMB2_RESTART_SCANS) {
ksmbd_debug(SMB, "Restart directory scan\n");
generic_file_llseek(dir_fp->filp, 0, SEEK_SET);
- restart_ctx(&dir_fp->readdir_data.ctx);
}
memset(&d_info, 0, sizeof(struct ksmbd_dir_info));
@@ -3968,11 +3962,9 @@ int smb2_query_dir(struct ksmbd_work *work)
*/
if (!d_info.out_buf_len && !d_info.num_entry)
goto no_buf_len;
- if (rc == 0)
- restart_ctx(&dir_fp->readdir_data.ctx);
- if (rc == -ENOSPC)
+ if (rc > 0 || rc == -ENOSPC)
rc = 0;
- if (rc)
+ else if (rc)
goto err_out;
d_info.wptr = d_info.rptr;
@@ -4029,6 +4021,8 @@ int smb2_query_dir(struct ksmbd_work *work)
rsp->hdr.Status = STATUS_NO_MEMORY;
else if (rc == -EFAULT)
rsp->hdr.Status = STATUS_INVALID_INFO_CLASS;
+ else if (rc == -EIO)
+ rsp->hdr.Status = STATUS_FILE_CORRUPT_ERROR;
if (!rsp->hdr.Status)
rsp->hdr.Status = STATUS_UNEXPECTED_IO_ERROR;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
592642e6b11e ("scsi: qedf: Populate sysfs attributes for vport")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 592642e6b11e620e4b43189f8072752429fc8dc3 Mon Sep 17 00:00:00 2001
From: Saurav Kashyap <skashyap(a)marvell.com>
Date: Mon, 19 Sep 2022 06:44:34 -0700
Subject: [PATCH] scsi: qedf: Populate sysfs attributes for vport
Few vport parameters were displayed by systool as 'Unknown' or 'NULL'.
Copy speed, supported_speed, frame_size and update port_type for NPIV port.
Link: https://lore.kernel.org/r/20220919134434.3513-1-njavali@marvell.com
Cc: stable(a)vger.kernel.org
Tested-by: Guangwu Zhang <guazhang(a)redhat.com>
Reviewed-by: John Meneghini <jmeneghi(a)redhat.com>
Signed-off-by: Saurav Kashyap <skashyap(a)marvell.com>
Signed-off-by: Nilesh Javali <njavali(a)marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/qedf/qedf_main.c b/drivers/scsi/qedf/qedf_main.c
index 3d6b137314f3..cc6d9decf62c 100644
--- a/drivers/scsi/qedf/qedf_main.c
+++ b/drivers/scsi/qedf/qedf_main.c
@@ -1921,6 +1921,27 @@ static int qedf_vport_create(struct fc_vport *vport, bool disabled)
fc_vport_setlink(vn_port);
}
+ /* Set symbolic node name */
+ if (base_qedf->pdev->device == QL45xxx)
+ snprintf(fc_host_symbolic_name(vn_port->host), 256,
+ "Marvell FastLinQ 45xxx FCoE v%s", QEDF_VERSION);
+
+ if (base_qedf->pdev->device == QL41xxx)
+ snprintf(fc_host_symbolic_name(vn_port->host), 256,
+ "Marvell FastLinQ 41xxx FCoE v%s", QEDF_VERSION);
+
+ /* Set supported speed */
+ fc_host_supported_speeds(vn_port->host) = n_port->link_supported_speeds;
+
+ /* Set speed */
+ vn_port->link_speed = n_port->link_speed;
+
+ /* Set port type */
+ fc_host_port_type(vn_port->host) = FC_PORTTYPE_NPIV;
+
+ /* Set maxframe size */
+ fc_host_maxframe_size(vn_port->host) = n_port->mfs;
+
QEDF_INFO(&(base_qedf->dbg_ctx), QEDF_LOG_NPIV, "vn_port=%p.\n",
vn_port);