Am 21.06.24 um 00:02 schrieb Xu Yilun:
> On Thu, Jan 16, 2025 at 04:13:13PM +0100, Christian König wrote:
>> Am 15.01.25 um 18:09 schrieb Jason Gunthorpe:
>>
>> On Wed, Jan 15, 2025 at 05:34:23PM +0100, Christian König wrote:
>>
>> Granted, let me try to improve this.
>> Here is a real world example of one of the issues we ran into and why
>> CPU mappings of importers are redirected to the exporter.
>> We have a good bunch of different exporters who track the CPU mappings
>> of their backing store using address_space objects in one way or
>> another and then uses unmap_mapping_range() to invalidate those CPU
>> mappings.
>> But when importers get the PFNs of the backing store they can look
>> behind the curtain and directly insert this PFN into the CPU page
>> tables.
>> We had literally tons of cases like this where drivers developers cause
>> access after free issues because the importer created a CPU mappings on
>> their own without the exporter knowing about it.
>> This is just one example of what we ran into. Additional to that
>> basically the whole synchronization between drivers was overhauled as
>> well because we found that we can't trust importers to always do the
>> right thing.
>>
>> But this, fundamentally, is importers creating attachments and then
>> *ignoring the lifetime rules of DMABUF*. If you created an attachment,
>> got a move and *ignored the move* because you put the PFN in your own
>> VMA, then you are not following the attachment lifetime rules!
>>
>> Move notify is solely for informing the importer that they need to
>> re-fresh their DMA mappings and eventually block for ongoing DMA to end.
>>
>> This semantics doesn't work well for CPU mappings because you need to hold
>> the reservation lock to make sure that the information stay valid and you
>> can't hold a lock while returning from a page fault.
> Dealing with CPU mapping and resource invalidation is a little hard, but is
> resolvable, by using other types of locks. And I guess for now dma-buf
> exporters should always handle this CPU mapping VS. invalidation contention if
> they support mmap().
>
> It is resolvable so with some invalidation notify, a decent importers could
> also handle the contention well.
That doesn't work like this.
See page tables updates under DMA-buf works by using the same locking
approach for both the validation and invalidation side. In other words
we hold the same lock while inserting and removing entries into/from the
page tables.
That this here should be an unlocked API means that can only use it with
pre-allocated and hard pinned memory without any chance to invalidate it
while running. Otherwise you can never be sure of the validity of the
address information you got from the exporter.
> IIUC now the only concern is importer device drivers are easier to do
> something wrong, so move CPU mapping things to exporter. But most of the
> exporters are also device drivers, why they are smarter?
Exporters always use their invalidation code path no matter if they are
exporting their buffers for other to use or if they are stand alone.
If you do the invalidation on the importer side you always need both
exporter and importer around to test it.
Additional to that we have much more importers than exporters. E.g. a
lot of simple drivers only import DMA-heap buffers and never exports
anything.
> And there are increasing mapping needs, today exporters help handle CPU primary
> mapping, tomorrow should they also help on all other mappings? Clearly it is
> not feasible. So maybe conditionally give trust to some importers.
Why should that be necessary? Exporters *must* know what somebody does
with their buffers.
If you have an use case the exporter doesn't support in their mapping
operation then that use case most likely doesn't work in the first place.
For example direct I/O is enabled/disabled by exporters on their CPU
mappings based on if that works correctly for them. And importer simply
doesn't know if they should use vm_insert_pfn() or vm_insert_page().
We could of course implement that logic into each importer to chose
between the different approaches, but than each importer gains logic it
only exercises with a specific exporter. And that doesn't seem to be a
good idea at all.
Regards,
Christian.
>
> Thanks,
> Yilun
Hi Jyothi,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 55bcd2e0d04c1171d382badef1def1fd04ef66c5]
url: https://github.com/intel-lab-lkp/linux/commits/Jyothi-Kumar-Seerapu/dmaengi…
base: 55bcd2e0d04c1171d382badef1def1fd04ef66c5
patch link: https://lore.kernel.org/r/20250120095753.25539-3-quic_jseerapu%40quicinc.com
patch subject: [PATCH v5 2/2] i2c: i2c-qcom-geni: Add Block event interrupt support
config: arc-randconfig-001-20250120 (https://download.01.org/0day-ci/archive/20250120/202501202159.wLRVO16t-lkp@…)
compiler: arceb-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250120/202501202159.wLRVO16t-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501202159.wLRVO16t-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/i2c/busses/i2c-qcom-geni.c:599: warning: Excess function parameter 'dev' description in 'geni_i2c_gpi_multi_desc_unmap'
Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for OMAP2PLUS_MBOX
Depends on [n]: MAILBOX [=y] && (ARCH_OMAP2PLUS || ARCH_K3)
Selected by [m]:
- TI_K3_M4_REMOTEPROC [=m] && REMOTEPROC [=y] && (ARCH_K3 || COMPILE_TEST [=y])
vim +599 drivers/i2c/busses/i2c-qcom-geni.c
589
590 /**
591 * geni_i2c_gpi_multi_desc_unmap() - unmaps the buffers post multi message TX transfers
592 * @dev: pointer to the corresponding dev node
593 * @gi2c: i2c dev handle
594 * @msgs: i2c messages array
595 * @peripheral: pointer to the gpi_i2c_config
596 */
597 static void geni_i2c_gpi_multi_desc_unmap(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[],
598 struct gpi_i2c_config *peripheral)
> 599 {
600 u32 msg_xfer_cnt, wr_idx = 0;
601 struct geni_i2c_gpi_multi_desc_xfer *tx_multi_xfer = &gi2c->i2c_multi_desc_config;
602
603 /*
604 * In error case, need to unmap all messages based on the msg_idx_cnt.
605 * Non-error case unmap all the processed messages.
606 */
607 if (gi2c->err)
608 msg_xfer_cnt = tx_multi_xfer->msg_idx_cnt;
609 else
610 msg_xfer_cnt = tx_multi_xfer->irq_cnt * QCOM_I2C_GPI_NUM_MSGS_PER_IRQ;
611
612 /* Unmap the processed DMA buffers based on the received interrupt count */
613 for (; tx_multi_xfer->unmap_msg_cnt < msg_xfer_cnt; tx_multi_xfer->unmap_msg_cnt++) {
614 if (tx_multi_xfer->unmap_msg_cnt == gi2c->num_msgs)
615 break;
616 wr_idx = tx_multi_xfer->unmap_msg_cnt % QCOM_I2C_GPI_MAX_NUM_MSGS;
617 geni_i2c_gpi_unmap(gi2c, &msgs[tx_multi_xfer->unmap_msg_cnt],
618 tx_multi_xfer->dma_buf[wr_idx],
619 tx_multi_xfer->dma_addr[wr_idx],
620 NULL, (dma_addr_t)NULL);
621 tx_multi_xfer->freed_msg_cnt++;
622 }
623 }
624
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Mon, Jun 24, 2024 at 03:59:53AM +0800, Xu Yilun wrote:
> > But it also seems to me that VFIO should be able to support putting
> > the device into the RUN state
>
> Firstly I think VFIO should support putting device into *LOCKED* state.
> From LOCKED to RUN, there are many evidence fetching and attestation
> things that only guest cares. I don't think VFIO needs to opt-in.
VFIO is not just about running VMs. If someone wants to run DPDK on
VFIO they should be able to get the device into a RUN state and work
with secure memory without requiring a KVM. Yes there are many steps
to this, but we should imagine how it can work.
> > without involving KVM or cVMs.
>
> It may not be feasible for all vendors.
It must be. A CC guest with an in kernel driver can definately get the
PCI device into RUN, so VFIO running in the guest should be able as
well.
> I believe AMD would have one firmware call that requires cVM handle
> *AND* move device into LOCKED state. It really depends on firmware
> implementation.
IMHO, you would not use the secure firmware if you are not using VMs.
> Yes, the secure EPT is in the secure world and managed by TDX firmware.
> Now a SW Mirror Secure EPT is introduced in KVM and managed by KVM
> directly, and KVM will finally use firmware calls to propagate Mirror
> Secure EPT changes to secure EPT.
If the secure world managed it then the secure world can have rules
that work with the IOMMU as well..
Jason
On Fri, Jan 17, 2025 at 09:57:40AM +0800, Baolu Lu wrote:
> On 1/15/25 21:01, Jason Gunthorpe wrote:
> > On Wed, Jan 15, 2025 at 11:57:05PM +1100, Alexey Kardashevskiy wrote:
> > > On 15/1/25 00:35, Jason Gunthorpe wrote:
> > > > On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote:
> > > >
> > > > > > is needed so the secure world can prepare anything it needs prior to
> > > > > > starting the VM.
> > > > > OK. From Dan's patchset there are some touch point for vendor tsm
> > > > > drivers to do secure world preparation. e.g. pci_tsm_ops::probe().
> > > > >
> > > > > Maybe we could move to Dan's thread for discussion.
> > > > >
> > > > > https://lore.kernel.org/linux-
> > > > > coco/173343739517.1074769.13134786548545925484.stgit@dwillia2-
> > > > > xfh.jf.intel.com/
> > > > I think Dan's series is different, any uapi from that series should
> > > > not be used in the VMM case. We need proper vfio APIs for the VMM to
> > > > use. I would expect VFIO to be calling some of that infrastructure.
> > > Something like this experiment?
> > >
> > > https://github.com/aik/linux/commit/
> > > ce052512fb8784e19745d4cb222e23cabc57792e
> > Yeah, maybe, though I don't know which of vfio/iommufd/kvm should be
> > hosting those APIs, the above does seem to be a reasonable direction.
> >
> > When the various fds are closed I would expect the kernel to unbind
> > and restore the device back.
>
> I am curious about the value of tsm binding against an iomnufd_vdevice
> instead of the physical iommufd_device.
Interesting question
> It is likely that the kvm pointer should be passed to iommufd during the
> creation of a viommu object.
Yes, I fully expect this
> If my recollection is correct, the arm
> smmu-v3 needs it to obtain the vmid to setup the userspace event queue:
Right now it will use a VMID unrelated to KVM. BTM support on ARM will
require syncing the VMID with KVM.
AMD and Intel may require the KVM for some reason as well.
For CC I'm expecting the KVM fd to be the handle for the cVM, so any
RPCs that want to call into the secure world need the KVM FD to get
the cVM's identifier. Ie a "bind to cVM" RPC will need the PCI
information and the cVM's handle.
From that perspective it does make sense that any cVM related APIs,
like "bind to cVM" would be against the VDEVICE where we have a link
to the VIOMMU which has the KVM. On the iommufd side the VIOMMU is
part of the object hierarchy, but does not necessarily have to force a
vIOMMU to appear in the cVM.
But it also seems to me that VFIO should be able to support putting
the device into the RUN state without involving KVM or cVMs.
> Intel TDX connect implementation also needs a reference to the kvm
> pointer to obtain the secure EPT information. This is crucial because
> the CPU's page table must be shared with the iommu.
I thought kvm folks were NAKing this sharing entirely? Or is the
secure EPT in the secure world and not directly managed by Linux?
AFAIK AMD is going to mirror the iommu page table like today.
ARM, I suspect, will not have an "EPT" under Linux control, so
whatever happens will be hidden in their secure world.
Jason
Am 08.01.25 um 20:22 schrieb Xu Yilun:
> On Wed, Jan 08, 2025 at 07:44:54PM +0100, Simona Vetter wrote:
>> On Wed, Jan 08, 2025 at 12:22:27PM -0400, Jason Gunthorpe wrote:
>>> On Wed, Jan 08, 2025 at 04:25:54PM +0100, Christian König wrote:
>>>> Am 08.01.25 um 15:58 schrieb Jason Gunthorpe:
>>>>> I have imagined a staged approach were DMABUF gets a new API that
>>>>> works with the new DMA API to do importer mapping with "P2P source
>>>>> information" and a gradual conversion.
>>>> To make it clear as maintainer of that subsystem I would reject such a step
>>>> with all I have.
>>> This is unexpected, so you want to just leave dmabuf broken? Do you
>>> have any plan to fix it, to fix the misuse of the DMA API, and all
>>> the problems I listed below? This is a big deal, it is causing real
>>> problems today.
>>>
>>> If it going to be like this I think we will stop trying to use dmabuf
>>> and do something simpler for vfio/kvm/iommufd :(
>> As the gal who help edit the og dma-buf spec 13 years ago, I think adding
>> pfn isn't a terrible idea. By design, dma-buf is the "everything is
>> optional" interface. And in the beginning, even consistent locking was
>> optional, but we've managed to fix that by now :-/
Well you were also the person who mangled the struct page pointers in
the scatterlist because people were abusing this and getting a bloody
nose :)
>> Where I do agree with Christian is that stuffing pfn support into the
>> dma_buf_attachment interfaces feels a bit much wrong.
> So it could a dmabuf interface like mmap/vmap()? I was also wondering
> about that. But finally I start to use dma_buf_attachment interface
> because of leveraging existing buffer pin and move_notify.
Exactly that's the point, sharing pfn doesn't work with the pin and
move_notify interfaces because of the MMU notifier approach Sima mentioned.
>>>> We have already gone down that road and it didn't worked at all and
>>>> was a really big pain to pull people back from it.
>>> Nobody has really seriously tried to improve the DMA API before, so I
>>> don't think this is true at all.
>> Aside, I really hope this finally happens!
Sorry my fault. I was not talking about the DMA API, but rather that
people tried to look behind the curtain of DMA-buf backing stores.
In other words all the fun we had with scatterlists and that people try
to modify the struct pages inside of them.
Improving the DMA API is something I really really hope for as well.
>>>>> 3) Importing devices need to know if they are working with PCI P2P
>>>>> addresses during mapping because they need to do things like turn on
>>>>> ATS on their DMA. As for multi-path we have the same hacks inside mlx5
>>>>> today that assume DMABUFs are always P2P because we cannot determine
>>>>> if things are P2P or not after being DMA mapped.
>>>> Why would you need ATS on PCI P2P and not for system memory accesses?
>>> ATS has a significant performance cost. It is mandatory for PCI P2P,
>>> but ideally should be avoided for CPU memory.
>> Huh, I didn't know that. And yeah kinda means we've butchered the pci p2p
>> stuff a bit I guess ...
Hui? Why should ATS be mandatory for PCI P2P?
We have tons of production systems using PCI P2P without ATS. And it's
the first time I hear that.
>>>>> 5) iommufd and kvm are both using CPU addresses without DMA. No
>>>>> exporter mapping is possible
>>>> We have customers using both KVM and XEN with DMA-buf, so I can clearly
>>>> confirm that this isn't true.
>>> Today they are mmaping the dma-buf into a VMA and then using KVM's
>>> follow_pfn() flow to extract the CPU pfn from the PTE. Any mmapable
>>> dma-buf must have a CPU PFN.
>>>
>>> Here Xu implements basically the same path, except without the VMA
>>> indirection, and it suddenly not OK? Illogical.
>> So the big difference is that for follow_pfn() you need mmu_notifier since
>> the mmap might move around, whereas with pfn smashed into
>> dma_buf_attachment you need dma_resv_lock rules, and the move_notify
>> callback if you go dynamic.
>>
>> So I guess my first question is, which locking rules do you want here for
>> pfn importers?
> follow_pfn() is unwanted for private MMIO, so dma_resv_lock.
As Sima explained you either have follow_pfn() and mmu_notifier or you
have DMA addresses and dma_resv lock / dma_fence.
Just giving out PFNs without some lifetime associated with them is one
of the major problems we faced before and really not something you can do.
>> If mmu notifiers is fine, then I think the current approach of follow_pfn
>> should be ok. But if you instead dma_resv_lock rules (or the cpu mmap
>> somehow is an issue itself), then I think the clean design is create a new
> cpu mmap() is an issue, this series is aimed to eliminate userspace
> mapping for private MMIO resources.
Why?
>> separate access mechanism just for that. It would be the 5th or so (kernel
>> vmap, userspace mmap, dma_buf_attach and driver private stuff like
>> virtio_dma_buf.c where you access your buffer with a uuid), so really not
>> a big deal.
> OK, will think more about that.
Please note that we have follow_pfn() + mmu_notifier working for KVM/XEN
with MMIO mappings and P2P. And that required exactly zero DMA-buf
changes :)
I don't fully understand your use case, but I think it's quite likely
that we already have that working.
Regards,
Christian.
>
> Thanks,
> Yilun
>
>> And for non-contrived exporters we might be able to implement the other
>> access methods in terms of the pfn method generically, so this wouldn't
>> even be a terrible maintenance burden going forward. And meanwhile all the
>> contrived exporters just keep working as-is.
>>
>> The other part is that cpu mmap is optional, and there's plenty of strange
>> exporters who don't implement. But you can dma map the attachment into
>> plenty devices. This tends to mostly be a thing on SoC devices with some
>> very funky memory. But I guess you don't care about these use-case, so
>> should be ok.
>>
>> I couldn't come up with a good name for these pfn users, maybe
>> dma_buf_pfn_attachment? This does _not_ have a struct device, but maybe
>> some of these new p2p source specifiers (or a list of those which are
>> allowed, no idea how this would need to fit into the new dma api).
>>
>> Cheers, Sima
>> --
>> Simona Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
Am 16.01.25 um 02:46 schrieb Zhaoyang Huang:
> On Wed, Jan 15, 2025 at 7:49 PM Christian König
> <christian.koenig(a)amd.com> wrote:
>> Am 15.01.25 um 07:18 schrieb zhaoyang.huang:
>>> From: Zhaoyang Huang<zhaoyang.huang(a)unisoc.com>
>>>
>>> When using dma-buf as memory pool for VMM. The vmf_insert_pfn will
>>> apply PTE_SPECIAL on pte which have vm_normal_page report bad_pte and
>>> return NULL. This commit would like to suggest to replace
>>> vmf_insert_pfn by vmf_insert_page.
>> Setting PTE_SPECIAL is completely intentional here to prevent
>> get_user_pages() from working on DMA-buf mappings.
> ok. May I ask the reason?
Drivers using this interface own the backing store for their specific
use cases. There are a couple of things get_user_pages(),
pin_user_pages(), direct I/O etc.. do which usually clash with those use
cases. So that is intentionally completely disabled.
We have the possibility to create a DMA-buf from memfd object and you
can then do direct I/O to the memfd and still use the DMA-buf with GPUs
or V4L for example.
>> So absolutely clear NAK to this patch here.
>>
>> What exactly are you trying to do?
> I would like to have pkvm have guest kernel be faulted of its second
> stage page fault(ARM64's memory virtualization method) on dma-buf
> which use pin_user_pages.
Yeah, exactly that's one of the use case which we intentionally prevent
here.
The backing store drivers use don't care about the pin count of the
memory and happily give it back to memory pools and/or swap it with
device local memory if necessary.
When this happens the ARM VM wouldn't be informed of the change and
potentially accesses the wrong address.
So sorry, but this approach won't work.
You could try with the memfd+DMA-buf approach I mentioned earlier, but
that won't give you all functionality on all DMA-buf supporting devices.
For example GPUs usually can't scan out to a monitor from such buffers
because of hardware limitations.
Regards,
Christian.
>> Regards,
>> Christian.
>>
>>> [ 103.402787] kvm [5276]: gfn(ipa)=0x80000 hva=0x7d4a400000 write_fault=0
>>> [ 103.403822] BUG: Bad page map in process crosvm_vcpu0 pte:168000140000f43 pmd:8000000c1ca0003
>>> [ 103.405144] addr:0000007d4a400000 vm_flags:040400fb anon_vma:0000000000000000 mapping:ffffff8085163df0 index:0
>>> [ 103.406536]file:dmabuf fault:cma_heap_vm_fault [cma_heap] mmap:dma_buf_mmap_internal read_folio:0x0
>>> [ 103.407877] CPU: 3 PID: 5276 Comm: crosvm_vcpu0 Tainted: G W OE 6.6.46-android15-8-g8bab72b63c20-dirty-4k #1 1e474a12dac4553a3ebba3a911f3b744176a5d2d
>>> [ 103.409818] Hardware name: Unisoc UMS9632-base Board (DT)
>>> [ 103.410613] Call trace:
>>> [ 103.411038] dump_backtrace+0xf4/0x140
>>> [ 103.411641] show_stack+0x20/0x30
>>> [ 103.412184] dump_stack_lvl+0x60/0x84
>>> [ 103.412766] dump_stack+0x18/0x24
>>> [ 103.413304] print_bad_pte+0x1b8/0x1cc
>>> [ 103.413909] vm_normal_page+0xc8/0xd0
>>> [ 103.414491] follow_page_pte+0xb0/0x304
>>> [ 103.415096] follow_page_mask+0x108/0x240
>>> [ 103.415721] __get_user_pages+0x168/0x4ac
>>> [ 103.416342] __gup_longterm_locked+0x15c/0x864
>>> [ 103.417023] pin_user_pages+0x70/0xcc
>>> [ 103.417609] pkvm_mem_abort+0xf8/0x5c0
>>> [ 103.418207] kvm_handle_guest_abort+0x3e0/0x3e4
>>> [ 103.418906] handle_exit+0xac/0x33c
>>> [ 103.419472] kvm_arch_vcpu_ioctl_run+0x48c/0x8d8
>>> [ 103.420176] kvm_vcpu_ioctl+0x504/0x5bc
>>> [ 103.420785] __arm64_sys_ioctl+0xb0/0xec
>>> [ 103.421401] invoke_syscall+0x60/0x11c
>>> [ 103.422000] el0_svc_common+0xb4/0xe8
>>> [ 103.422590] do_el0_svc+0x24/0x30
>>> [ 103.423131] el0_svc+0x3c/0x70
>>> [ 103.423640] el0t_64_sync_handler+0x68/0xbc
>>> [ 103.424288] el0t_64_sync+0x1a8/0x1ac
>>>
>>> Signed-off-by: Xiwei Wang<xiwei.wang1(a)unisoc.com>
>>> Signed-off-by: Aijun Sun<aijun.sun(a)unisoc.com>
>>> Signed-off-by: Zhaoyang Huang<zhaoyang.huang(a)unisoc.com>
>>> ---
>>> drivers/dma-buf/heaps/cma_heap.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
>>> index c384004b918e..b301fb63f16b 100644
>>> --- a/drivers/dma-buf/heaps/cma_heap.c
>>> +++ b/drivers/dma-buf/heaps/cma_heap.c
>>> @@ -168,7 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
>>> if (vmf->pgoff > buffer->pagecount)
>>> return VM_FAULT_SIGBUS;
>>>
>>> - return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff]));
>>> + return vmf_insert_page(vma, vmf->address, buffer->pages[vmf->pgoff]);
>>> }
>>>
>>> static const struct vm_operations_struct dma_heap_vm_ops = {
On Wed, Jan 15, 2025 at 09:55:29AM +0100, Simona Vetter wrote:
> I think for 90% of exporters pfn would fit, but there's some really funny
> ones where you cannot get a cpu pfn by design. So we need to keep the
> pfn-less interfaces around. But ideally for the pfn-capable exporters we'd
> have helpers/common code that just implements all the other interfaces.
There is no way to have dma address without a PFN in Linux right now.
How would you generate them? That implies you have an IOMMU that can
generate IOVAs for something that doesn't have a physical address at
all.
Or do you mean some that don't have pages associated with them, and
thus have pfn_valid fail on them? They still have a PFN, just not
one that is valid to use in most of the Linux MM.
On Wed, Jan 15, 2025 at 11:57:05PM +1100, Alexey Kardashevskiy wrote:
> On 15/1/25 00:35, Jason Gunthorpe wrote:
> > On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote:
> >
> > > > is needed so the secure world can prepare anything it needs prior to
> > > > starting the VM.
> > >
> > > OK. From Dan's patchset there are some touch point for vendor tsm
> > > drivers to do secure world preparation. e.g. pci_tsm_ops::probe().
> > >
> > > Maybe we could move to Dan's thread for discussion.
> > >
> > > https://lore.kernel.org/linux-coco/173343739517.1074769.1313478654854592548…
> >
> > I think Dan's series is different, any uapi from that series should
> > not be used in the VMM case. We need proper vfio APIs for the VMM to
> > use. I would expect VFIO to be calling some of that infrastructure.
>
> Something like this experiment?
>
> https://github.com/aik/linux/commit/ce052512fb8784e19745d4cb222e23cabc57792e
Yeah, maybe, though I don't know which of vfio/iommufd/kvm should be
hosting those APIs, the above does seem to be a reasonable direction.
When the various fds are closed I would expect the kernel to unbind
and restore the device back.
Jason