Commit e8ae0e140c05 ("vfio: Require that devices support DMA cache
coherence") requires IOMMU drivers to advertise
IOMMU_CAP_CACHE_COHERENCY, in order to be used by VFIO. Since VFIO does
not provide to userspace the ability to maintain coherency through cache
invalidations, it requires hardware coherency. Advertise the capability
in order to restore VFIO support.
The meaning of IOMMU_CAP_CACHE_COHERENCY also changed from "IOMMU can
enforce cache coherent DMA transactions" to "IOMMU_CACHE is supported".
While virtio-iommu cannot enforce coherency (of PCIe no-snoop
transactions), it does support IOMMU_CACHE.
We can distinguish different cases of non-coherent DMA:
(1) When accesses from a hardware endpoint are not coherent. The host
would describe such a device using firmware methods ('dma-coherent'
in device-tree, '_CCA' in ACPI), since they are also needed without
a vIOMMU. In this case mappings are created without IOMMU_CACHE.
virtio-iommu doesn't need any additional support. It sends the same
requests as for coherent devices.
(2) When the physical IOMMU supports non-cacheable mappings. Supporting
those would require a new feature in virtio-iommu, new PROBE request
property and MAP flags. Device drivers would use a new API to
discover this since it depends on the architecture and the physical
IOMMU.
(3) When the hardware supports PCIe no-snoop. It is possible for
assigned PCIe devices to issue no-snoop transactions, and the
virtio-iommu specification is lacking any mention of this.
Arm platforms don't necessarily support no-snoop, and those that do
cannot enforce coherency of no-snoop transactions. Device drivers
must be careful about assuming that no-snoop transactions won't end
up cached; see commit e02f5c1bb228 ("drm: disable uncached DMA
optimization for ARM and arm64"). On x86 platforms, the host may or
may not enforce coherency of no-snoop transactions with the physical
IOMMU. But according to the above commit, on x86 a driver which
assumes that no-snoop DMA is compatible with uncached CPU mappings
will also work if the host enforces coherency.
Although these issues are not specific to virtio-iommu, it could be
used to facilitate discovery and configuration of no-snoop. This
would require a new feature bit, PROBE property and ATTACH/MAP
flags.
Cc: stable(a)vger.kernel.org
Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Signed-off-by: Jean-Philippe Brucker <jean-philippe(a)linaro.org>
---
Since v2 [1], I tried to refine the commit message.
This fix is needed for v5.19 and v6.0.
I can improve the check once Robin's change [2] is merged:
capable(IOMMU_CAP_CACHE_COHERENCY) could return dev->dma_coherent for
case (1) above.
[1] https://lore.kernel.org/linux-iommu/20220818163801.1011548-1-jean-philippe@…
[2] https://lore.kernel.org/linux-iommu/d8bd8777d06929ad8f49df7fc80e1b9af32a41b…
---
drivers/iommu/virtio-iommu.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 08eeafc9529f..80151176ba12 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1006,7 +1006,18 @@ static int viommu_of_xlate(struct device *dev, struct of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
}
+static bool viommu_capable(enum iommu_cap cap)
+{
+ switch (cap) {
+ case IOMMU_CAP_CACHE_COHERENCY:
+ return true;
+ default:
+ return false;
+ }
+}
+
static struct iommu_ops viommu_ops = {
+ .capable = viommu_capable,
.domain_alloc = viommu_domain_alloc,
.probe_device = viommu_probe_device,
.probe_finalize = viommu_probe_finalize,
--
2.37.1
HDP flush is used early in the init sequence as part of memory controller
block initialization. Hence remapping of HDP registers needed for flush
needs to happen earlier.
This also fixes the Unsupported Request error reported through AER during
driver load. The error happens as a write happens to the remap offset
before real remapping is done.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
The error was unnoticed before and got visible because of the commit
referenced below. This doesn't fix anything in the commit below, rather
fixes the issue in amdgpu exposed by the commit. The reference is only
to associate this commit with below one so that both go together.
Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
Reported-by: Tom Seewald <tseewald(a)gmail.com>
Signed-off-by: Lijo Lazar <lijo.lazar(a)amd.com>
Cc: stable(a)vger.kernel.org
---
v2:
Take care of IP resume cases (Alex Deucher)
Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
Add more details in commit message and associate with AER patch (Bjorn
Helgaas)
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/nv.c | 6 ------
drivers/gpu/drm/amd/amdgpu/soc15.c | 6 ------
drivers/gpu/drm/amd/amdgpu/soc21.c | 6 ------
4 files changed, 24 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ce7d117efdb5..e420118769a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
return 0;
}
+/**
+ * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Any common hardware initialization sequence that needs to be done before
+ * hw init of individual IPs is performed here. This is different from the
+ * 'common block' which initializes a set of IPs.
+ */
+static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
+{
+ /* Remap HDP registers to a hole in mmio space, for the purpose
+ * of exposing those registers to process space. This needs to be
+ * done before hw init of ip blocks to take care of HDP flush
+ * operations through registers during hw_init.
+ */
+ if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
+ !amdgpu_sriov_vf(adev))
+ adev->nbio.funcs->remap_hdp_registers(adev);
+}
/**
* amdgpu_device_ip_init - run init for hardware IPs
@@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
goto init_failed;
}
+
+ amdgpu_device_prepare_ip(adev);
r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
if (r) {
DRM_ERROR("hw_init %d failed %d\n", i, r);
@@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
AMD_IP_BLOCK_TYPE_IH,
};
+ amdgpu_device_prepare_ip(adev);
for (i = 0; i < adev->num_ip_blocks; i++) {
int j;
struct amdgpu_ip_block *block;
@@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
{
int i, r;
+ amdgpu_device_prepare_ip(adev);
for (i = 0; i < adev->num_ip_blocks; i++) {
if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index b3fba8dea63c..3ac7fef74277 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
nv_program_aspm(adev);
/* setup nbio registers */
adev->nbio.funcs->init_registers(adev);
- /* remap HDP registers to a hole in mmio space,
- * for the purpose of expose those registers
- * to process space
- */
- if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
- adev->nbio.funcs->remap_hdp_registers(adev);
/* enable the doorbell aperture */
nv_enable_doorbell_aperture(adev, true);
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index fde6154f2009..a0481e37d7cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
soc15_program_aspm(adev);
/* setup nbio registers */
adev->nbio.funcs->init_registers(adev);
- /* remap HDP registers to a hole in mmio space,
- * for the purpose of expose those registers
- * to process space
- */
- if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
- adev->nbio.funcs->remap_hdp_registers(adev);
/* enable the doorbell aperture */
soc15_enable_doorbell_aperture(adev, true);
diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 55284b24f113..16b447055102 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
soc21_program_aspm(adev);
/* setup nbio registers */
adev->nbio.funcs->init_registers(adev);
- /* remap HDP registers to a hole in mmio space,
- * for the purpose of expose those registers
- * to process space
- */
- if (adev->nbio.funcs->remap_hdp_registers)
- adev->nbio.funcs->remap_hdp_registers(adev);
/* enable the doorbell aperture */
soc21_enable_doorbell_aperture(adev, true);
--
2.25.1
Currently there are two corner cases not handling compat RO flags
correctly:
- Remount
We can still mount the fs RO with compat RO flags, then remount it RW.
We should not allow any write into a fs with unsupported RO flags.
- Still try to search block group items
In fact, behavior/on-disk format change to extent tree should not
need a full incompat flag.
And since we can ensure fs with unsupported RO flags never got any
writes (with above case fixed), then we can even skip block group
items search at mount time.
This patch will enhance the unsupported RO compat flags by:
- Reject RW remount if there is unsupported RO compat flags
- Go dummy block group items directly for unsupported RO compat flags
In fact, only changes to chunk/subvolume/root/csum trees should go
incompat flags.
The latter part should allow future change to extent tree to be compat
RO flags.
Thus this patch also needs to be backported to all stable trees.
Cc: stable(a)vger.kernel.org
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/block-group.c | 11 ++++++++++-
fs/btrfs/super.c | 9 +++++++++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 38a740d5297c..4a78136bcd17 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2186,7 +2186,16 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
int need_clear = 0;
u64 cache_gen;
- if (!root)
+ /*
+ * Either no extent root (with ibadroots rescue option) or we have
+ * unsupporter RO options. The fs can never be mounted RW, so no
+ * need to waste time search block group items.
+ *
+ * This also allows new extent tree related changes to be RO compat,
+ * no need for a full incompat flag.
+ */
+ if (!root || (btrfs_super_compat_ro_flags(info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP))
return fill_dummy_bgs(info);
key.objectid = 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4c7089b1681b..7d3213e67fb5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2110,6 +2110,15 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
ret = -EINVAL;
goto restore;
}
+ if (btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP) {
+ btrfs_err(fs_info,
+ "can not remount read-write due to unsupported optional flags 0x%llx",
+ btrfs_super_compat_ro_flags(fs_info->super_copy) &
+ ~BTRFS_FEATURE_COMPAT_RO_SUPP);
+ ret = -EINVAL;
+ goto restore;
+ }
if (fs_info->fs_devices->rw_devices == 0) {
ret = -EACCES;
goto restore;
--
2.37.0
In sgx_init(), if misc_register() fails or misc_register() succeeds but
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped. This may leave some unsanitized pages, which does
not matter, because SGX will be disabled for the whole power cycle.
This triggers WARN_ON() because sgx_dirty_page_list ends up being
non-empty, and dumps the call stack:
[ 0.268103] sgx: EPC section 0x40200000-0x45f7ffff
[ 0.268591] ------------[ cut here ]------------
[ 0.268592] WARNING: CPU: 6 PID: 83 at
arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
[ 0.268598] Modules linked in:
[ 0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
[ 0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
07/06/2022
[ 0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
[ 0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
[ 0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
[ 0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
0000000000000000
[ 0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
00000000ffffffff
[ 0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
ffff8dcd820a4180
[ 0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
ffffb6c74006bce0
[ 0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
0000000000000000
[ 0.268616] FS: 0000000000000000(0000) GS:ffff8dcf25580000(0000)
knlGS:0000000000000000
[ 0.268617] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
00000000003706e0
[ 0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 0.268622] Call Trace:
[ 0.268624] <TASK>
[ 0.268627] ? _raw_spin_lock_irqsave+0x24/0x60
[ 0.268632] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 0.268634] ? __kthread_parkme+0x36/0x90
[ 0.268637] kthread+0xe5/0x110
[ 0.268639] ? kthread_complete_and_exit+0x20/0x20
[ 0.268642] ret_from_fork+0x1f/0x30
[ 0.268647] </TASK>
[ 0.268648] ---[ end trace 0000000000000000 ]---
Ultimately this can crash the kernel, if the following is set:
/proc/sys/kernel/panic_on_warn
In premature stop, print nothing, as the number is by practical means a
random number. Otherwise, it is an indicator of a bug in the driver, and
therefore print the number of unsanitized pages with pr_err().
Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org…
Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
Cc: stable(a)vger.kernel.org # v5.13+
Reported-by: Paul Menzel <pmenzel(a)molgen.mpg.de>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v6:
- Address Reinette's feedback:
https://lore.kernel.org/linux-sgx/Yw6%2FiTzSdSw%2FY%2FVO@kernel.org/
v5:
- Add the klog dump and sysctl option to the commit message.
v4:
- Explain expectations for dirty_page_list in the function header, instead
of an inline comment.
- Improve commit message to explain the conditions better.
- Return the number of pages left dirty to ksgxd() and print warning after
the 2nd call, if there are any.
v3:
- Remove WARN_ON().
- Tuned comments and the commit message a bit.
v2:
- Replaced WARN_ON() with optional pr_info() inside
__sgx_sanitize_pages().
- Rewrote the commit message.
- Added the fixes tag.
---
arch/x86/kernel/cpu/sgx/main.c | 42 ++++++++++++++++++++++++++++------
1 file changed, 35 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 515e2a5f25bb..bcd6b64961bd 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
* Reset post-kexec EPC pages to the uninitialized state. The pages are removed
* from the input list, and made available for the page allocator. SECS pages
* prepending their children in the input list are left intact.
+ *
+ * Contents of the @dirty_page_list must be thread-local, i.e.
+ * not shared by multiple threads.
*/
-static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
+static long __sgx_sanitize_pages(struct list_head *dirty_page_list)
{
struct sgx_epc_page *page;
+ long left_dirty = 0;
LIST_HEAD(dirty);
int ret;
- /* dirty_page_list is thread-local, no need for a lock: */
while (!list_empty(dirty_page_list)) {
if (kthread_should_stop())
- return;
+ return -ECANCELED;
page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
@@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
} else {
/* The page is not yet clean - move to the dirty list. */
list_move_tail(&page->list, &dirty);
+ left_dirty++;
}
cond_resched();
}
list_splice(&dirty, dirty_page_list);
+ return left_dirty;
}
static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
@@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
static int ksgxd(void *p)
{
+ long ret;
+
set_freezable();
/*
* Sanitize pages in order to recover from kexec(). The 2nd pass is
* required for SECS pages, whose child pages blocked EREMOVE.
*/
- __sgx_sanitize_pages(&sgx_dirty_page_list);
- __sgx_sanitize_pages(&sgx_dirty_page_list);
+ ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
+ if (ret == -ECANCELED)
+ /* kthread stopped */
+ return 0;
- /* sanity check: */
- WARN_ON(!list_empty(&sgx_dirty_page_list));
+ ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
+ switch (ret) {
+ case 0:
+ /* success, no unsanitized pages */
+ break;
+
+ case -ECANCELED:
+ /* kthread stopped */
+ return 0;
+
+ default:
+ /*
+ * Never expected to happen in a working driver. If it happens
+ * the bug is expected to be in the sanitization process, but
+ * successfully sanitized pages are still valid and driver can
+ * be used and most importantly debugged without issues. To put
+ * short, the global state of kernel is not corrupted so no
+ * reason to do any more complicated rollback.
+ */
+ pr_err("%ld unsanitized pages\n", ret);
+ }
while (!kthread_should_stop()) {
if (try_to_freeze())
--
2.37.2
The advertisement of the persistent grants feature (writing
'feature-persistent' to xenbus) should mean not the decision for using
the feature but only the availability of the feature. However, commit
aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent
grants") made a field of blkback, which was a place for saving only the
negotiation result, to be used for yet another purpose: caching of the
'feature_persistent' parameter value. As a result, the advertisement,
which should follow only the parameter value, becomes inconsistent.
This commit fixes the misuse of the semantic by making blkback saves the
parameter value in a separate place and advertises the support based on
only the saved value.
Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent grants")
Cc: <stable(a)vger.kernel.org> # 5.10.x
Suggested-by: Juergen Gross <jgross(a)suse.com>
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
drivers/block/xen-blkback/common.h | 3 +++
drivers/block/xen-blkback/xenbus.c | 6 ++++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index bda5c815e441..a28473470e66 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -226,6 +226,9 @@ struct xen_vbd {
sector_t size;
unsigned int flush_support:1;
unsigned int discard_secure:1;
+ /* Connect-time cached feature_persistent parameter value */
+ unsigned int feature_gnt_persistent_parm:1;
+ /* Persistent grants feature negotiation result */
unsigned int feature_gnt_persistent:1;
unsigned int overflow_max_grants:1;
};
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index ee7ad2fb432d..c0227dfa4688 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -907,7 +907,7 @@ static void connect(struct backend_info *be)
xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
- be->blkif->vbd.feature_gnt_persistent);
+ be->blkif->vbd.feature_gnt_persistent_parm);
if (err) {
xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
dev->nodename);
@@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be)
return -ENOSYS;
}
- blkif->vbd.feature_gnt_persistent = feature_persistent &&
+ blkif->vbd.feature_gnt_persistent_parm = feature_persistent;
+ blkif->vbd.feature_gnt_persistent =
+ blkif->vbd.feature_gnt_persistent_parm &&
xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
blkif->vbd.overflow_max_grants = 0;
--
2.25.1