The openvswitch selftests currently contain a few cases for managing the
datapath, which includes creating datapath instances, adding interfaces,
and doing some basic feature / upcall tests. This is useful to validate
the control path.
Add the ability to program some of the more common flows with actions. This
can be improved overtime to include regression testing, etc.
Aaron Conole (4):
selftests: openvswitch: add an initial flow programming case
selftests: openvswitch: add a test for ipv4 forwarding
selftests: openvswitch: add basic ct test case parsing
selftests: openvswitch: add ct-nat test case with ipv4
.../selftests/net/openvswitch/openvswitch.sh | 223 ++++++++
.../selftests/net/openvswitch/ovs-dpctl.py | 507 ++++++++++++++++++
2 files changed, 730 insertions(+)
--
2.40.1
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
With this proposal we're able to reach ~96.6% line rate speeds with data sent
and received directly from/to device memory.
* Patch overview:
** Part 1: struct paged device memory
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to understand a new memory type.
This proposal implements option #1. We implement a small framework to generate
struct pages for an sg_table returned from dma_buf_map_attachment(). The support
added here should be generic and easily extended to other use cases interested
in struct paged device memory. We use this framework to generate pages that can
be used in the networking stack.
** Part 2: recvmsg() & sendmsg() APIs
We define user APIs for the user to send and receive these dmabuf pages.
** part 3: support for unreadable skb frags
Dmabuf pages are not accessible by the host; we implement changes throughput the
networking stack to correctly handle skbs with unreadable frags.
** part 4: page pool support
We piggy back on Jakub's page pool memory providers idea:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory TCP
changes from the driver.
This is not strictly necessary, the driver can choose to allocate dmabuf pages
and use them directly without going through the page pool (if acceptable to
their maintainers).
Not included with this RFC is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This RFC is built on top of v6.4-rc7 with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using dmabuf pages
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
Not included in this series is our devmem TCP benchmark, which
transfers data to/from GPU dmabufs directly.
With this implementation & benchmark we're able to reach ~96.6% line rate
speeds with 4 GPU/NIC pairs running bi-direction traffic, with all the
packet payloads going straight to the GPU memory (no host buffer bounce).
** Test Setup
Kernel: v6.4-rc7, with this RFC and Jakub's memory provider API
cherry-picked locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Benchmark: custom devmem TCP benchmark not yet open sourced.
Mina Almasry (10):
dma-buf: add support for paged attachment mappings
dma-buf: add support for NET_RX pages
dma-buf: add support for NET_TX pages
net: add support for skbs with unreadable frags
tcp: implement recvmsg() RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages
tcp: implement sendmsg() TX path for for devmem tcp
selftests: add ncdevmem, netcat for devmem TCP
memory-provider: updates core provider API for devmem TCP
memory-provider: add dmabuf devmem provider
drivers/dma-buf/dma-buf.c | 444 ++++++++++++++++
include/linux/dma-buf.h | 142 +++++
include/linux/netdevice.h | 1 +
include/linux/skbuff.h | 34 +-
include/linux/socket.h | 1 +
include/net/page_pool.h | 21 +
include/net/sock.h | 4 +
include/net/tcp.h | 6 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/dma-buf.h | 12 +
include/uapi/linux/uio.h | 10 +
net/core/datagram.c | 3 +
net/core/page_pool.c | 111 +++-
net/core/skbuff.c | 81 ++-
net/core/sock.c | 47 ++
net/ipv4/tcp.c | 262 +++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 8 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ncdevmem.c | 693 +++++++++++++++++++++++++
23 files changed, 1868 insertions(+), 42 deletions(-)
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.41.0.390.g38632f3daf-goog
Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
Diff from v4:
Patch 02: moved from v4 08/10
added fs/tracefs/internal.h
Patch 03: moved from v4 02/10
removed fs/tracefs/internal.h
Patch 04: moved from v4 03/10
moved out changes of fs/tracefs/internal.h
Patch 05: moved from v4 04/10
renamed eventfs_add_top_file() -> eventfs_add_events_file()
Patch 06: moved from v4 07/10
implemented create_dentry() helper function
added create_file(), create_dir() stub function
Patch 07: moved from v4 06/10
Patch 08: moved from v4 05/10
improved eventfs remove functionality
Patch 09: removed unwanted if conditions
Patch 10: added available_filter_functions check
Diff from v3:
Patch 3,4,5,7,9:
removed all the eventfs_rwsem code and replaced it with an srcu
lock for the readers, and a mutex to synchronize the writers of
the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).
Patch 6: renamed:
eventfs_create_file() -> create_file()
eventfs_create_dir() -> create_dir()
Diff from v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9
Diff from v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 795 ++++++++++++++++++
fs/tracefs/inode.c | 151 +++-
fs/tracefs/internal.h | 26 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 30 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 76 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +-
.../test.d/kprobe/kprobe_args_string.tc | 9 +-
10 files changed, 1048 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.39.0
Hello,
This is v4 of the patch series for TDX selftests.
It has been updated for Intel’s v14 of the TDX host patches which was
proposed here:
https://lore.kernel.org/lkml/cover.1685333727.git.isaku.yamahata@intel.com/
The tree can be found at:
https://github.com/googleprodkernel/linux-cc/tree/tdx-selftests-rfc-v4
Changes from RFC v3:
In v14, TDX can only run with UPM enabled so the necessary changes were
made to handle that.
td_vcpu_run() was added to handle TdVmCalls that are now handled in
userspace.
The comments under the patch "KVM: selftests: Require GCC to realign
stacks on function entry" were addressed with the following patch:
https://lore.kernel.org/lkml/Y%2FfHLdvKHlK6D%2F1v@google.com/T/
And other minor tweaks were made to integrate the selftest
infrastructure onto v14.
In RFCv4, TDX selftest code is organized into:
+ headers in tools/testing/selftests/kvm/include/x86_64/tdx/
+ common code in tools/testing/selftests/kvm/lib/x86_64/tdx/
+ selftests in tools/testing/selftests/kvm/x86_64/tdx_*
Dependencies
+ Peter’s patches, which provide functions for the host to allocate
and track protected memory in the
guest. https://lore.kernel.org/lkml/20221018205845.770121-1-pgonda@google.com/T/
Further work for this patch series/TODOs
+ Sean’s comments for the non-confidential UPM selftests patch series
at https://lore.kernel.org/lkml/Y8dC8WDwEmYixJqt@google.com/T/#u apply
here as well
+ Add ucall support for TDX selftests
I would also like to acknowledge the following people, who helped
review or test patches in RFCv1, RFCv2, and RFCv3:
+ Sean Christopherson <seanjc(a)google.com>
+ Zhenzhong Duan <zhenzhong.duan(a)intel.com>
+ Peter Gonda <pgonda(a)google.com>
+ Andrew Jones <drjones(a)redhat.com>
+ Maxim Levitsky <mlevitsk(a)redhat.com>
+ Xiaoyao Li <xiaoyao.li(a)intel.com>
+ David Matlack <dmatlack(a)google.com>
+ Marc Orr <marcorr(a)google.com>
+ Isaku Yamahata <isaku.yamahata(a)gmail.com>
+ Maciej S. Szmigiero <maciej.szmigiero(a)oracle.com>
Links to earlier patch series
+ RFC v1: https://lore.kernel.org/lkml/20210726183816.1343022-1-erdemaktas@google.com…
+ RFC v2: https://lore.kernel.org/lkml/20220830222000.709028-1-sagis@google.com/T/#u
+ RFC v3: https://lore.kernel.org/lkml/20230121001542.2472357-1-ackerleytng@google.co…
Ackerley Tng (12):
KVM: selftests: Add function to allow one-to-one GVA to GPA mappings
KVM: selftests: Expose function that sets up sregs based on VM's mode
KVM: selftests: Store initial stack address in struct kvm_vcpu
KVM: selftests: Refactor steps in vCPU descriptor table initialization
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
KVM: selftests: TDX: Update load_td_memory_region for VM memory backed
by guest memfd
KVM: selftests: Add functions to allow mapping as shared
KVM: selftests: Expose _vm_vaddr_alloc
KVM: selftests: TDX: Add support for TDG.MEM.PAGE.ACCEPT
KVM: selftests: TDX: Add support for TDG.VP.VEINFO.GET
KVM: selftests: TDX: Add TDX UPM selftest
KVM: selftests: TDX: Add TDX UPM selftests for implicit conversion
Erdem Aktas (3):
KVM: selftests: Add helper functions to create TDX VMs
KVM: selftests: TDX: Add TDX lifecycle test
KVM: selftests: TDX: Adding test case for TDX port IO
Roger Wang (1):
KVM: selftests: TDX: Add TDG.VP.INFO test
Ryan Afranji (2):
KVM: selftests: TDX: Verify the behavior when host consumes a TD
private memory
KVM: selftests: TDX: Add shared memory test
Sagi Shahar (10):
KVM: selftests: TDX: Add report_fatal_error test
KVM: selftests: TDX: Add basic TDX CPUID test
KVM: selftests: TDX: Add basic get_td_vmcall_info test
KVM: selftests: TDX: Add TDX IO writes test
KVM: selftests: TDX: Add TDX IO reads test
KVM: selftests: TDX: Add TDX MSR read/write tests
KVM: selftests: TDX: Add TDX HLT exit test
KVM: selftests: TDX: Add TDX MMIO reads test
KVM: selftests: TDX: Add TDX MMIO writes test
KVM: selftests: TDX: Add TDX CPUID TDVMCALL test
tools/testing/selftests/kvm/Makefile | 8 +
.../selftests/kvm/include/kvm_util_base.h | 35 +
.../selftests/kvm/include/x86_64/processor.h | 4 +
.../kvm/include/x86_64/tdx/td_boot.h | 82 +
.../kvm/include/x86_64/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86_64/tdx/tdcall.h | 59 +
.../selftests/kvm/include/x86_64/tdx/tdx.h | 65 +
.../kvm/include/x86_64/tdx/tdx_util.h | 19 +
.../kvm/include/x86_64/tdx/test_util.h | 164 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 115 +-
.../selftests/kvm/lib/x86_64/processor.c | 77 +-
.../selftests/kvm/lib/x86_64/tdx/td_boot.S | 101 ++
.../selftests/kvm/lib/x86_64/tdx/tdcall.S | 158 ++
.../selftests/kvm/lib/x86_64/tdx/tdx.c | 262 ++++
.../selftests/kvm/lib/x86_64/tdx/tdx_util.c | 565 +++++++
.../selftests/kvm/lib/x86_64/tdx/test_util.c | 101 ++
.../kvm/x86_64/tdx_shared_mem_test.c | 134 ++
.../selftests/kvm/x86_64/tdx_upm_test.c | 469 ++++++
.../selftests/kvm/x86_64/tdx_vm_tests.c | 1322 +++++++++++++++++
19 files changed, 3730 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/test_util.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_shared_mem_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_upm_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_vm_tests.c
--
2.41.0.487.g6d72f3e995-goog
[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ]
On Fri, 21 Jul 2023 08:48:39 -0400
Steven Rostedt <rostedt(a)goodmis.org> wrote:
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 4db048250cdb..2718de1533e6 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -36,16 +36,36 @@ struct eventfs_file {
> const struct file_operations *fop;
> const struct inode_operations *iop;
> union {
> + struct list_head del_list;
> struct rcu_head rcu;
> - struct llist_node llist; /* For freeing after RCU */
> + unsigned long is_freed; /* Freed if one of the above is set */
I changed the freeing around. The dentries are freed before returning from
eventfs_remove_dir().
I also added a "is_freed" field that is part of the union and is set if
list elements have content. Note, since the union was criticized before, I
will state the entire purpose of doing this patch set is to save memory.
This structure will be used for every event file. What's the point of
getting rid of dentries if we are replacing it with something just as big?
Anyway, struct dentry does the exact same thing!
> };
> void *data;
> umode_t mode;
> - bool created;
> + unsigned int flags;
Bah, I forgot to remove flags (one iteration replaced the created with
flags to set both created and freed). I removed the freed with the above
"is_freed" and noticed that created is set if and only if ef->dentry is
set. So instead of using the created boolean, just test ef->dentry.
The flags isn't used and can be removed. I just forgot to do so.
> };
>
> static DEFINE_MUTEX(eventfs_mutex);
> DEFINE_STATIC_SRCU(eventfs_srcu);
> +
> +static struct dentry *eventfs_root_lookup(struct inode *dir,
> + struct dentry *dentry,
> + unsigned int flags);
> +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
> +static int eventfs_release(struct inode *inode, struct file *file);
> +
> +static const struct inode_operations eventfs_root_dir_inode_operations = {
> + .lookup = eventfs_root_lookup,
> +};
> +
> +static const struct file_operations eventfs_file_operations = {
> + .open = dcache_dir_open_wrapper,
> + .read = generic_read_dir,
> + .iterate_shared = dcache_readdir,
> + .llseek = generic_file_llseek,
> + .release = eventfs_release,
> +};
> +
In preparing for getting rid of eventfs_file, I noticed that all
directories are set to the above ops. In create_dir() instead of passing in
ef->*ops, just use these directly. This does help with future work.
> /**
> * create_file - create a file in the tracefs filesystem
> * @name: the name of the file to create.
> @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
> * If tracefs is not enabled in the kernel, the value -%ENODEV will be
> * returned.
> */
> -static struct dentry *create_dir(const char *name, umode_t mode,
> - struct dentry *parent, void *data,
> - const struct file_operations *fop,
> - const struct inode_operations *iop)
> +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
> {
As stated, the directories always used the same *op values, so I just hard
coded it.
> struct tracefs_inode *ti;
> struct dentry *dentry;
> struct inode *inode;
>
> - WARN_ON(!S_ISDIR(mode));
> -
> dentry = eventfs_start_creating(name, parent);
> if (IS_ERR(dentry))
> return dentry;
> @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
> if (unlikely(!inode))
> return eventfs_failed_creating(dentry);
>
> - inode->i_mode = mode;
> - inode->i_op = iop;
> - inode->i_fop = fop;
> + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> + inode->i_op = &eventfs_root_dir_inode_operations;
> + inode->i_fop = &eventfs_file_operations;
> inode->i_private = data;
>
> ti = get_tracefs(inode);
> @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
> struct tracefs_inode *ti_parent;
> struct eventfs_file *ef;
>
> + mutex_lock(&eventfs_mutex);
To synchronize with the removals, I needed to add locking here.
> ti_parent = get_tracefs(dentry->d_parent->d_inode);
> if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> - return;
> + goto out;
>
> ef = dentry->d_fsdata;
> if (!ef)
> - return;
> - ef->created = false;
> + goto out;
> + /*
> + * If ef was freed, then the LSB bit is set for d_fsdata.
> + * But this should not happen, as it should still have a
> + * ref count that prevents it. Warn in case it does.
> + */
> + if (WARN_ON_ONCE((unsigned long)ef & 1))
> + goto out;
During the remove, a dget() is done to keep the dentry from freeing. To
make sure that it doesn't get freed, I added this test.
> +
> + dentry->d_fsdata = NULL;
> +
> ef->dentry = NULL;
> + out:
> + mutex_unlock(&eventfs_mutex);
> }
>
> /**
> @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> ti->private = ef->ei;
> }
>
> +static struct dentry *
> +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> +{
Because both the lookup and the dir_open_wrapper did basically the same
thing, I created a helper function so that I didn't have to update both
locations.
> + bool invalidate = false;
> + struct dentry *dentry;
> +
> + mutex_lock(&eventfs_mutex);
> + if (ef->is_freed) {
> + mutex_unlock(&eventfs_mutex);
> + return NULL;
> + }
Ignore if the ef is on its way to be freed.
> + if (ef->dentry) {
> + dentry = ef->dentry;
If the ef already has a dentry (created) then use it.
> + /* On dir open, up the ref count */
> + if (!lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> + mutex_unlock(&eventfs_mutex);
> +
> + if (!lookup)
> + inode_lock(parent->d_inode);
> +
> + if (ef->ei)
> + dentry = create_dir(ef->name, parent, ef->data);
> + else
> + dentry = create_file(ef->name, ef->mode, parent,
> + ef->data, ef->fop);
> +
> + if (!lookup)
> + inode_unlock(parent->d_inode);
> +
> + mutex_lock(&eventfs_mutex);
> + if (IS_ERR_OR_NULL(dentry)) {
With the lock dropped, the dentry could have been created causing it to
fail. Check if the ef->dentry exists, and if so, use it instead.
Note, if the ef is freed, it should not have a dentry.
> + /* If the ef was already updated get it */
> + dentry = ef->dentry;
> + if (dentry && !lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> +
> + if (!ef->dentry && !ef->is_freed) {
With the lock dropped, the dentry could have been filled too. If so, drop
the created dentry and use the one owned by the ef->dentry.
> + ef->dentry = dentry;
> + if (ef->ei)
> + eventfs_post_create_dir(ef);
> + dentry->d_fsdata = ef;
> + } else {
> + /* A race here, should try again (unless freed) */
> + invalidate = true;
I had a WARN_ON() once here. Probably could add a:
WARN_ON_ONCE(!ef->is_freed);
> + }
> + mutex_unlock(&eventfs_mutex);
> + if (invalidate)
> + d_invalidate(dentry);
> +
> + if (lookup || invalidate)
> + dput(dentry);
> +
> + return invalidate ? NULL : dentry;
> +}
> +
> +static bool match_event_file(struct eventfs_file *ef, const char *name)
> +{
A bit of a paranoid helper function. I wanted to make sure to synchronize
with the removals.
> + bool ret;
> +
> + mutex_lock(&eventfs_mutex);
> + ret = !ef->is_freed && strcmp(ef->name, name) == 0;
> + mutex_unlock(&eventfs_mutex);
> +
> + return ret;
> +}
> +
> /**
> * eventfs_root_lookup - lookup routine to create file/dir
> * @dir: directory in which lookup to be done
> @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> * Used to create dynamic file/dir with-in @dir, search with-in ei
> * list, if @dentry found go ahead and create the file/dir
> */
> -
> static struct dentry *eventfs_root_lookup(struct inode *dir,
> struct dentry *dentry,
> unsigned int flags)
> @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (strcmp(ef->name, dentry->d_name.name))
> + if (!match_event_file(ef, dentry->d_name.name))
> continue;
> ret = simple_lookup(dir, dentry, flags);
> - if (ef->created)
> - continue;
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - mutex_unlock(&eventfs_mutex);
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - mutex_unlock(&eventfs_mutex);
> - dput(ef->dentry);
> - }
> + create_dentry(ef, ef->d_parent, true);
> break;
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
> struct tracefs_inode *ti;
> struct eventfs_inode *ei;
> struct eventfs_file *ef;
> + struct dentry *dentry;
> int idx;
>
> ti = get_tracefs(inode);
> @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (ef->created)
> - dput(ef->dentry);
> + mutex_lock(&eventfs_mutex);
> + dentry = ef->dentry;
> + mutex_unlock(&eventfs_mutex);
> + if (dentry)
> + dput(dentry);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_close(inode, file);
> @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
> ei = ti->private;
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
> - if (ef->created) {
> - dget(ef->dentry);
> - continue;
> - }
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> -
> - inode_lock(dentry->d_inode);
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, dentry,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, dentry,
> - ef->data, ef->fop);
> - inode_unlock(dentry->d_inode);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - }
> - mutex_unlock(&eventfs_mutex);
> + create_dentry(ef, dentry, false);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_open(inode, file);
> }
>
> -static const struct file_operations eventfs_file_operations = {
> - .open = dcache_dir_open_wrapper,
> - .read = generic_read_dir,
> - .iterate_shared = dcache_readdir,
> - .llseek = generic_file_llseek,
> - .release = eventfs_release,
> -};
> -
> -static const struct inode_operations eventfs_root_dir_inode_operations = {
> - .lookup = eventfs_root_lookup,
> -};
> -
> /**
> * eventfs_prepare_ef - helper function to prepare eventfs_file
> * @name: the name of the file/directory to create.
> @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
> ti_parent = get_tracefs(parent->d_inode);
> ei_parent = ti_parent->private;
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
For directories, just use the hard coded values.
> if (IS_ERR(ef))
> return ef;
>
> @@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
> if (!ef_parent)
> return ERR_PTR(-EINVAL);
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
ditto.
> if (IS_ERR(ef))
> return ef;
>
> @@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
> return 0;
> }
>
> -static LLIST_HEAD(free_list);
> -
> -static void eventfs_workfn(struct work_struct *work)
> -{
> - struct eventfs_file *ef, *tmp;
> - struct llist_node *llnode;
> -
> - llnode = llist_del_all(&free_list);
> - llist_for_each_entry_safe(ef, tmp, llnode, llist) {
> - if (ef->created && ef->dentry)
> - dput(ef->dentry);
> - kfree(ef->name);
> - kfree(ef->ei);
> - kfree(ef);
> - }
> -}
> -
> -DECLARE_WORK(eventfs_work, eventfs_workfn);
> -
> static void free_ef(struct rcu_head *head)
> {
> struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
>
> - if (!llist_add(&ef->llist, &free_list))
> - return;
> -
> - queue_work(system_unbound_wq, &eventfs_work);
> + kfree(ef->name);
> + kfree(ef->ei);
> + kfree(ef);
Since I did not do the dput() or d_invalidate() here I don't need call this
from task context. This simplifies the process.
> }
>
> -
> -
> /**
> * eventfs_remove_rec - remove eventfs dir or file from list
> * @ef: eventfs_file to be removed.
> @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
> * This function recursively remove eventfs_file which
> * contains info of file or dir.
> */
> -static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
> {
> struct eventfs_file *ef_child;
>
> @@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> /* search for nested folders or files */
> list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
> lockdep_is_held(&eventfs_mutex)) {
> - eventfs_remove_rec(ef_child, level + 1);
> + eventfs_remove_rec(ef_child, head, level + 1);
> }
> }
>
> - if (ef->created && ef->dentry)
> - d_invalidate(ef->dentry);
> -
> list_del_rcu(&ef->list);
> - call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + list_add_tail(&ef->del_list, head);
Hold off on freeing the ef. Add it to a link list to do so later.
> }
>
> /**
> @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> */
> void eventfs_remove(struct eventfs_file *ef)
> {
> + struct eventfs_file *tmp;
> + LIST_HEAD(ef_del_list);
> + struct dentry *dentry_list = NULL;
> + struct dentry *dentry;
> +
> if (!ef)
> return;
>
> mutex_lock(&eventfs_mutex);
> - eventfs_remove_rec(ef, 0);
> + eventfs_remove_rec(ef, &ef_del_list, 0);
The above returns back with ef_del_list holding all the ef's to be freed.
I probably could have just passed the dentry_list down instead, but I
wanted the below complexity done in a non recursive function.
> +
> + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
> + if (ef->dentry) {
> + unsigned long ptr = (unsigned long)dentry_list;
> +
> + /* Keep the dentry from being freed yet */
> + dget(ef->dentry);
> +
> + /*
> + * Paranoid: The dget() above should prevent the dentry
> + * from being freed and calling eventfs_set_ef_status_free().
> + * But just in case, set the link list LSB pointer to 1
> + * and have eventfs_set_ef_status_free() check that to
> + * make sure that if it does happen, it will not think
> + * the d_fsdata is an event_file.
> + *
> + * For this to work, no event_file should be allocated
> + * on a odd space, as the ef should always be allocated
> + * to be at least word aligned. Check for that too.
> + */
> + WARN_ON_ONCE(ptr & 1);
> +
> + ef->dentry->d_fsdata = (void *)(ptr | 1);
Set the d_fsdata to be a link list. The comment above needs to say to say
struct eventfs_file and struct dentry should be word aligned. Anyway, while
the eventfs_mutex is held, set all the dentries belonging to eventfs_files
to the dentry_list and clear the ef->dentry.
> + dentry_list = ef->dentry;
> + ef->dentry = NULL;
> + }
> + call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + }
> mutex_unlock(&eventfs_mutex);
> +
> + while (dentry_list) {
> + unsigned long ptr;
> +
> + dentry = dentry_list;
> + ptr = (unsigned long)dentry->d_fsdata & ~1UL;
> + dentry_list = (struct dentry *)ptr;
> + dentry->d_fsdata = NULL;
With the mutex released, it is safe to free the dentries here. This also
must be done before returning from this function, as when I had it done in
the workqueue, it was failing some tests that would remove a dynamic event
and still see that the directory was still around!
> + d_invalidate(dentry);
> + mutex_lock(&eventfs_mutex);
> + /* dentry should now have at least a single reference */
> + WARN_ONCE((int)d_count(dentry) < 1,
> + "dentry %px less than one reference (%d) after invalidate\n",
I did update the above to:
WARN_ONCE((int)d_count(dentry) < 1,
"dentry %px (%s) less than one reference (%d) after invalidate\n",
dentry, dentry->d_name.name, d_count(dentry));
To include the name of the dentry (my current work is triggering this still).
> + dentry, d_count(dentry));
> + mutex_unlock(&eventfs_mutex);
> + dput(dentry);
> + }
> }
>
> /**
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> index c443a0c32a8c..1b880b5cd29d 100644
> --- a/fs/tracefs/internal.h
> +++ b/fs/tracefs/internal.h
> @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
> struct dentry *tracefs_failed_creating(struct dentry *dentry);
> struct inode *tracefs_get_inode(struct super_block *sb);
>
> +void eventfs_set_ef_status_free(struct dentry *dentry);
> +
> #endif /* _TRACEFS_INTERNAL_H */
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 4d30b0cafc5f..47c1b4d21735 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
>
> void eventfs_remove_events_dir(struct dentry *dentry);
>
> -void eventfs_set_ef_status_free(struct dentry *dentry);
> -
Oh, and eventfs_set_ef_status_free() should not be exported to outside the
tracefs system.
-- Steve
> struct dentry *tracefs_create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fops);
Hi, Willy, Thomas
The suggestions of v1 nolibc powerpc patchset [1] from you have been applied,
here is v2.
Testing results:
- run with tinyconfig
arch/board | result
------------|------------
ppc/g3beige | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc/ppce500 | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64le/pseries | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64le/powernv | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64/pseries | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64/powernv | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
- run-user
(Tested with -Os, -O0 and -O2)
// for 32-bit PowerPC
$ for arch in powerpc ppc; do make run-user ARCH=$arch CROSS_COMPILE=powerpc-linux-gnu- ; done | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
// for 64-bit big-endian PowerPC and 64-bit little-endian PowerPC
$ for arch in ppc64 ppc64le; do make run-user ARCH=$arch CROSS_COMPILE=powerpc64le-linux-gnu- ; done | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
Changes from v1 --> v2:
- tools/nolibc: add support for powerpc
Add missing arch-powerpc.h lines to arch.h
Align with the other arch-<ARCH>.h, naming the variables
with more meaningful words, such as _ret, _num, _arg1 ...
Clean up the syscall instructions
No line from musl now.
Suggestons from Thomas
* tools/nolibc: add support for pppc64
No change
* selftests/nolibc: add extra configs customize support
To reduce complexity, merge the commands from the new extraconfig
target to defconfig target and drop the extconfig target completely.
Derived from Willy's suggestion of the tinyconfig patchset
* selftests/nolibc: add XARCH and ARCH mapping support
To reduce complexity, let's use XARCH internally and only reserve
ARCH as the input variable.
Derived from Willy's suggestion
* selftests/nolibc: add test support for powerpc
Add ppc as the default 32-bit variant for powerpc target, allow pass
ARCH=ppc or ARCH=powerpc to test 32-bit powerpc
Derived from Willy's suggestion
* selftests/nolibc: add test support for pppc64le
Rename powerpc64le to ppc64le
Suggestion from Willy
* selftests/nolibc: add test support for pppc64
Rename powerpc64 to ppc64
Suggestion from Willy
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1689713175.git.falcon@tinylab.org/
Zhangjin Wu (7):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add extra configs customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
tools/include/nolibc/arch-powerpc.h | 202 ++++++++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 48 +++++-
3 files changed, 244 insertions(+), 8 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
--
2.25.1
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 20 ++-
arch/riscv/include/asm/processor.h | 46 +++++-
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 1 +
tools/testing/selftests/riscv/mm/Makefile | 21 +++
.../selftests/riscv/mm/testcases/mmap.c | 133 ++++++++++++++++++
8 files changed, 234 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
--
2.41.0
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v8:
- Rebase to v6.5-rc2, update to new behavior of __iommu_group_set_domain()
v7: https://lore.kernel.org/r/0-v7-6c0fd698eda2+5e3-iommufd_alloc_jgg@nvidia.com
- Rebase to v6.4-rc2, update to new signature of iommufd_get_ioas()
v6: https://lore.kernel.org/r/0-v6-fdb604df649a+369-iommufd_alloc_jgg@nvidia.com
- Go back to the v4 locking arragnment with now both the attach/detach
igroup->locks inside the functions, Kevin says he needs this for a
followup series. This still fixes the syzkaller bug
- Fix two more error unwind locking bugs where
iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
Make sure fail_nth will catch these mistakes
- Add a patch allowing objects to have different abort than destroy
function, it allows hwpt abort to require the caller to continue
to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
- Go back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (17):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Allow a hwpt to be aborted after allocation
iommufd: Fix locking around hwpt allocation
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 38 +-
drivers/iommu/iommufd/device.c | 555 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 112 +++-
drivers/iommu/iommufd/io_pagetable.c | 32 +-
drivers/iommu/iommufd/iommufd_private.h | 52 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 24 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 +-
14 files changed, 867 insertions(+), 226 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c
--
2.41.0
Apologize for sending previous mail from a wrong app (not text mode).
Resending to keep the mailing list thread consistent.
On Wed, Jul 26, 2023 at 3:10 AM Markus Elfring <Markus.Elfring(a)web.de>
wrote:
>
> > Tests BPF redirect at the lwt xmit hook to ensure error handling are
> > safe, i.e. won't panic the kernel.
>
> Are imperative change descriptions still preferred?
Hi Markus,
I think you linked this to me yesterday that it should be described
imperatively:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
> See also:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
I don’t follow the purpose of this reference. This points to user impact
but this is a selftest, so I don’t see any user impact here. Or is there
anything I missed?
>
> Can remaining wording weaknesses be adjusted accordingly?
I am not following this question . Can you be more specific or provide an
example?
Yan
>
> Regards,
> Markus
>
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta