This series consolidates the behavior of the 2 drivers that implement
the ethtool MAC Merge layer by making NXP ENETC commit its preemptible
traffic classes to hardware only when MM TX is active (same as Ocelot).
Then, after resolving an issue with the ENETC driver, it restricts user
space from entering 2 states which don't make sense:
- pmac-enabled off tx-enabled on verify-enabled *
- pmac-enabled * tx-enabled off verify-enabled on
Then, it introduces a selftest (ethtool_mm.sh) which puts everything
together and tests all valid configurations known to me.
This is simultaneously the v2 of "[PATCH net-next 0/2] ethtool mm API
improvements":
https://lore.kernel.org/netdev/20230415173454.3970647-1-vladimir.oltean@nxp…
which had caused some problems to openlldp. Those were solved in the
meantime, see:
https://github.com/intel/openlldp/commit/11171b474f6f3cbccac5d608b7f26b32ff…
and of "[RFC PATCH net-next] selftests: forwarding: add a test for MAC
Merge layer":
https://lore.kernel.org/netdev/20230210221243.228932-1-vladimir.oltean@nxp.…
Petr Machata (2):
selftests: forwarding: sch_tbf_*: Add a pre-run hook
selftests: forwarding: generalize bail_on_lldpad from mlxsw
Vladimir Oltean (7):
net: enetc: fix MAC Merge layer remaining enabled until a link down
event
net: enetc: report mm tx-active based on tx-enabled and verify-status
net: enetc: only commit preemptible TCs to hardware when MM TX is
active
net: enetc: include MAC Merge / FP registers in register dump
net: ethtool: mm: sanitize some UAPI configurations
selftests: forwarding: introduce helper for standard ethtool counters
selftests: forwarding: add a test for MAC Merge layer
drivers/net/ethernet/freescale/enetc/enetc.c | 23 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 5 +-
.../ethernet/freescale/enetc/enetc_ethtool.c | 94 +++++-
.../net/ethernet/freescale/enetc/enetc_hw.h | 3 +
net/ethtool/mm.c | 10 +
.../drivers/net/mlxsw/qos_headroom.sh | 3 +-
.../selftests/drivers/net/mlxsw/qos_lib.sh | 28 --
.../selftests/drivers/net/mlxsw/qos_pfc.sh | 3 +-
.../selftests/drivers/net/mlxsw/sch_ets.sh | 3 +-
.../drivers/net/mlxsw/sch_red_core.sh | 1 -
.../drivers/net/mlxsw/sch_red_ets.sh | 2 +-
.../drivers/net/mlxsw/sch_red_root.sh | 2 +-
.../drivers/net/mlxsw/sch_tbf_ets.sh | 6 +-
.../drivers/net/mlxsw/sch_tbf_prio.sh | 6 +-
.../drivers/net/mlxsw/sch_tbf_root.sh | 6 +-
.../testing/selftests/net/forwarding/Makefile | 1 +
.../selftests/net/forwarding/ethtool_mm.sh | 288 ++++++++++++++++++
tools/testing/selftests/net/forwarding/lib.sh | 60 ++++
.../net/forwarding/sch_tbf_etsprio.sh | 4 +
.../selftests/net/forwarding/sch_tbf_root.sh | 4 +
20 files changed, 486 insertions(+), 66 deletions(-)
create mode 100755 tools/testing/selftests/net/forwarding/ethtool_mm.sh
--
2.34.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v6:
- Go back to the v4 locking arragnment with now both the attach/detach
igroup->locks inside the functions, Kevin says he needs this for a
followup series. This still fixes the syzkaller bug
- Fix two more error unwind locking bugs where
iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
Make sure fail_nth will catch these mistakes
- Add a patch allowing objects to have different abort than destroy
function, it allows hwpt abort to require the caller to continue
to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
- Go back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Compared to v3 the diff for the whole series looks like:
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 024ed8ee9939cd..2770087059ba73 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -341,14 +341,15 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
{
struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt;
- lockdep_assert_held(&idev->igroup->lock);
-
+ mutex_lock(&idev->igroup->lock);
list_del(&idev->group_item);
if (list_empty(&idev->igroup->device_list)) {
iommu_detach_group(hwpt->domain, idev->igroup->group);
idev->igroup->hwpt = NULL;
}
iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+ mutex_unlock(&idev->igroup->lock);
+
/* Caller must destroy hwpt */
return hwpt;
}
@@ -515,8 +516,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
hwpt->auto_domain = true;
*pt_id = hwpt->obj.id;
- mutex_unlock(&ioas->mutex);
iommufd_object_finalize(idev->ictx, &hwpt->obj);
+ mutex_unlock(&ioas->mutex);
return destroy_hwpt;
out_abort:
@@ -610,7 +611,6 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
* This is the same as
* iommufd_device_detach();
* iommufd_device_attach();
- *
* If it fails then no change is made to the attachment. The iommu driver may
* implement this so there is no disruption in translation. This can only be
* called if iommufd_device_attach() has already succeeded.
@@ -633,10 +633,7 @@ void iommufd_device_detach(struct iommufd_device *idev)
{
struct iommufd_hw_pagetable *hwpt;
- mutex_lock(&idev->igroup->lock);
hwpt = iommufd_hw_pagetable_detach(idev);
- mutex_unlock(&idev->igroup->lock);
-
iommufd_hw_pagetable_put(idev->ictx, hwpt);
refcount_dec(&idev->obj.users);
}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 8aa9ac130b5960..655ed32144f62e 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -26,6 +26,21 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
refcount_dec(&hwpt->ioas->obj.users);
}
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
+{
+ struct iommufd_hw_pagetable *hwpt =
+ container_of(obj, struct iommufd_hw_pagetable, obj);
+
+ /* The ioas->mutex must be held until finalize is called. */
+ lockdep_assert_held(&hwpt->ioas->mutex);
+
+ if (!list_empty(&hwpt->hwpt_item)) {
+ list_del_init(&hwpt->hwpt_item);
+ iopt_table_remove_domain(&hwpt->ioas->iopt, hwpt->domain);
+ }
+ iommufd_hw_pagetable_destroy(obj);
+}
+
int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
{
if (hwpt->enforce_cache_coherency)
@@ -50,6 +65,10 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
* Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
* will be linked to the given ioas and upon return the underlying iommu_domain
* is fully popoulated.
+ *
+ * The caller must hold the ioas->mutex until after
+ * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on
+ * the returned hwpt.
*/
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
@@ -93,9 +112,6 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
* directly allocate a domain. These drivers do not finish creating the
* domain until attach is completed. Thus we must have this call
* sequence. Once those drivers are fixed this should be removed.
- *
- * Note we hold the igroup->lock here which prevents any other thread
- * from observing igroup->hwpt until we finish setting it up.
*/
if (immediate_attach) {
rc = iommufd_hw_pagetable_attach(hwpt, idev);
@@ -140,10 +156,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
mutex_lock(&ioas->mutex);
hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false);
- mutex_unlock(&ioas->mutex);
if (IS_ERR(hwpt)) {
rc = PTR_ERR(hwpt);
- goto out_put_ioas;
+ goto out_unlock;
}
cmd->out_hwpt_id = hwpt->obj.id;
@@ -151,11 +166,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
if (rc)
goto out_hwpt;
iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
- goto out_put_ioas;
+ goto out_unlock;
out_hwpt:
iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
-out_put_ioas:
+out_unlock:
+ mutex_unlock(&ioas->mutex);
iommufd_put_object(&ioas->obj);
out_put_idev:
iommufd_put_object(&idev->obj);
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index f842768b2e250b..21052f64f95649 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -1172,6 +1172,9 @@ int iopt_table_enforce_dev_resv_regions(struct io_pagetable *iopt,
unsigned int num_sw_msi = 0;
int rc;
+ if (iommufd_should_fail())
+ return -EINVAL;
+
down_write(&iopt->iova_rwsem);
/* FIXME: drivers allocate memory but there is no failure propogated */
iommu_get_resv_regions(dev, &resv_regions);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index cb693190bf51c5..ba50eb4661e217 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -261,6 +261,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_detach(struct iommufd_device *idev);
void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj);
int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 694da191e4b155..73a91e96896252 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -24,6 +24,7 @@
struct iommufd_object_ops {
void (*destroy)(struct iommufd_object *obj);
+ void (*abort)(struct iommufd_object *obj);
};
static const struct iommufd_object_ops iommufd_object_ops[];
static struct miscdevice vfio_misc_dev;
@@ -104,7 +105,10 @@ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj)
void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx,
struct iommufd_object *obj)
{
- iommufd_object_ops[obj->type].destroy(obj);
+ if (iommufd_object_ops[obj->type].abort)
+ iommufd_object_ops[obj->type].abort(obj);
+ else
+ iommufd_object_ops[obj->type].destroy(obj);
iommufd_object_abort(ictx, obj);
}
@@ -413,6 +417,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
},
[IOMMUFD_OBJ_HW_PAGETABLE] = {
.destroy = iommufd_hw_pagetable_destroy,
+ .abort = iommufd_hw_pagetable_abort,
},
#ifdef CONFIG_IOMMUFD_TEST
[IOMMUFD_OBJ_SELFTEST] = {
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index c07252dbf62d72..8b2c18ac6a2864 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -9,9 +9,6 @@
#include "iommufd_utils.h"
-static void *buffer;
-
-static unsigned long PAGE_SIZE;
static unsigned long HUGEPAGE_SIZE;
#define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 7e1afb6ff9bd8d..d4c552e5694812 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -41,6 +41,8 @@ static int writeat(int dfd, const char *fn, const char *val)
static __attribute__((constructor)) void setup_buffer(void)
{
+ PAGE_SIZE = sysconf(_SC_PAGE_SIZE);
+
BUFFER_SIZE = 2*1024*1024;
buffer = mmap(0, BUFFER_SIZE, PROT_READ | PROT_WRITE,
@@ -579,6 +581,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
uint32_t stdev_id;
uint32_t idev_id;
uint32_t hwpt_id;
+ __u64 iova;
self->fd = open("/dev/iommu", O_RDWR);
if (self->fd == -1)
@@ -590,6 +593,18 @@ TEST_FAIL_NTH(basic_fail_nth, device)
if (_test_ioctl_ioas_alloc(self->fd, &ioas_id2))
return -1;
+ iova = MOCK_APERTURE_START;
+ if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, PAGE_SIZE, &iova,
+ IOMMU_IOAS_MAP_FIXED_IOVA |
+ IOMMU_IOAS_MAP_WRITEABLE |
+ IOMMU_IOAS_MAP_READABLE))
+ return -1;
+ if (_test_ioctl_ioas_map(self->fd, ioas_id2, buffer, PAGE_SIZE, &iova,
+ IOMMU_IOAS_MAP_FIXED_IOVA |
+ IOMMU_IOAS_MAP_WRITEABLE |
+ IOMMU_IOAS_MAP_READABLE))
+ return -1;
+
fail_nth_enable();
if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 9b6dcb921750b6..53b4d3f2d9fc6c 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -19,6 +19,8 @@
static void *buffer;
static unsigned long BUFFER_SIZE;
+static unsigned long PAGE_SIZE;
+
/*
* Have the kernel check the refcount on pages. I don't know why a freshly
* mmap'd anon non-compound page starts out with a ref of 3
Jason Gunthorpe (17):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Allow a hwpt to be aborted after allocation
iommufd: Fix locking around hwpt allocation
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 525 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 112 +++-
drivers/iommu/iommufd/io_pagetable.c | 30 +-
drivers/iommu/iommufd/iommufd_private.h | 52 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 24 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 ++-
14 files changed, 853 insertions(+), 211 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup to a map.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup helper
selftests/bpf: Add testcase for bpf_task_under_cgroup
include/uapi/linux/bpf.h | 13 +++++
kernel/bpf/verifier.c | 4 +-
kernel/trace/bpf_trace.c | 31 ++++++++++++
tools/include/uapi/linux/bpf.h | 13 +++++
.../bpf/prog_tests/task_under_cgroup.c | 49 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 31 ++++++++++++
6 files changed, 140 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.
Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].
I was asked about the performance delta between this and something like
sysfs. I created a small test program [2] and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
- access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
- riscv_hwprobe() vDSO and syscall: .0094us
- riscv_hwprobe() vDSO with no syscall: 0.0091us
These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.
[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050
[2] https://pastebin.com/x84NEKaS
Changes in v6:
- Remove spurious blank line (Conorbot)
- Update copyrights (Paul)
- Update copyrights (Paul)
- Wrap init_hwprobe_vdso_data() in CONFIG_MMU to fix nommu build break
(Conorbot)
- Update copyrights (Paul)
Changes in v5:
- Added tags
- Fixed misuse of ISA_EXT_c as bitmap, changed to use
riscv_isa_extension_available() (Heiko, Conor)
- Document the alternatives approach in the commit message (Conor and
Heiko).
- Fix __init call warnings by making probe_vendor_features() and
thead_feature_probe_func() __init_or_module.
- Fixed compat vdso compilation failure (lkp).
Changes in v4:
- Used real types in syscall prototypes (Arnd)
- Fixed static line break in do_riscv_hwprobe() (Conor)
- Added newlines between documentation lists (Conor)
- Crispen up size types to size_t, and cpu indices to int (Joe)
- Fix copy_from_user() return logic bug (found via kselftests!)
- Add __user to SYSCALL_DEFINE() to fix warning
- More newlines in BASE_BEHAVIOR_IMA documentation (Conor)
- Add newlines to CPUPERF_0 documentation (Conor)
- Add UNSUPPORTED value (Conor)
- Switched from DT to alternatives-based probing (Rob)
- Crispen up cpu index type to always be int (Conor)
- Fixed selftests commit description, no more tiny libc (Mark Brown)
- Fixed selftest syscall prototype types to match v4.
- Added a prototype to fix -Wmissing-prototype warning (lkp(a)intel.com)
- Fixed rv32 build failure (lkp(a)intel.com)
- Make vdso prototype match syscall types update
Changes in v3:
- Updated copyright date in cpufeature.h
- Fixed typo in cpufeature.h comment (Conor)
- Refactored functions so that kernel mode can query too, in
preparation for the vDSO data population.
- Changed the vendor/arch/imp IDs to return a value of -1 on mismatch
rather than failing the whole call.
- Const cpumask pointer in hwprobe_mid()
- Embellished documentation WRT cpu_set and the returned values.
- Renamed hwprobe_mid() to hwprobe_arch_id() (Conor)
- Fixed machine ID doc warnings, changed elements to c:macro:.
- Completed dangling unistd.h comment (Conor)
- Fixed line breaks and minor logic optimization (Conor).
- Use riscv_cached_mxxxid() (Conor)
- Refactored base ISA behavior probe to allow kernel probing as well,
in prep for vDSO data initialization.
- Fixed doc warnings in IMA text list, use :c:macro:.
- Have hwprobe_misaligned return int instead of long.
- Constify cpumask pointer in hwprobe_misaligned()
- Fix warnings in _PERF_O list documentation, use :c:macro:.
- Move include cpufeature.h to misaligned patch.
- Fix documentation mismatch for RISCV_HWPROBE_KEY_CPUPERF_0 (Conor)
- Use for_each_possible_cpu() instead of NR_CPUS (Conor)
- Break early in misaligned access iteration (Conor)
- Increase MISALIGNED_MASK from 2 bits to 3 for possible UNSUPPORTED future
value (Conor)
- Introduced vDSO function
Changes in v2:
- Factored the move of struct riscv_cpuinfo to its own header
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
- Added a shortcut to the cpuset parameters that if a size of 0 and
NULL is provided for the CPU set, the kernel will use a cpu mask of
all online CPUs. This is convenient because I suspect most callers
will only want to act on a feature if it's supported on all CPUs, and
it's a headache to dynamically allocate an array of all 1s, not to
mention a waste to have the kernel loop over all of the offline bits.
- Fixed logic error in if(of_property_read_string...) that caused crash
- Include cpufeature.h in cpufeature.h to avoid undeclared variable
warning.
- Added a _MASK define
- Fix random checkpatch complaints
- Updated the selftests to the new API and added some more.
- Fixed indentation, comments in .S, and general checkpatch complaints.
Evan Green (6):
RISC-V: Move struct riscv_cpuinfo to new header
RISC-V: Add a syscall for HW probing
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
RISC-V: hwprobe: Support probing of misaligned access performance
selftests: Test the new RISC-V hwprobe interface
RISC-V: Add hwprobe vDSO function and data
Documentation/riscv/hwprobe.rst | 86 +++++++
Documentation/riscv/index.rst | 1 +
arch/riscv/Kconfig | 1 +
arch/riscv/errata/thead/errata.c | 10 +
arch/riscv/include/asm/alternative.h | 5 +
arch/riscv/include/asm/cpufeature.h | 23 ++
arch/riscv/include/asm/hwprobe.h | 13 +
arch/riscv/include/asm/syscall.h | 4 +
arch/riscv/include/asm/vdso/data.h | 17 ++
arch/riscv/include/asm/vdso/gettimeofday.h | 8 +
arch/riscv/include/uapi/asm/hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/unistd.h | 9 +
arch/riscv/kernel/alternative.c | 19 ++
arch/riscv/kernel/compat_vdso/Makefile | 2 +-
arch/riscv/kernel/cpu.c | 8 +-
arch/riscv/kernel/cpufeature.c | 3 +
arch/riscv/kernel/smpboot.c | 1 +
arch/riscv/kernel/sys_riscv.c | 228 +++++++++++++++++-
arch/riscv/kernel/vdso.c | 6 -
arch/riscv/kernel/vdso/Makefile | 4 +
arch/riscv/kernel/vdso/hwprobe.c | 52 ++++
arch/riscv/kernel/vdso/sys_hwprobe.S | 15 ++
arch/riscv/kernel/vdso/vdso.lds.S | 3 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/riscv/Makefile | 58 +++++
.../testing/selftests/riscv/hwprobe/Makefile | 10 +
.../testing/selftests/riscv/hwprobe/hwprobe.c | 90 +++++++
.../selftests/riscv/hwprobe/sys_hwprobe.S | 12 +
28 files changed, 712 insertions(+), 14 deletions(-)
create mode 100644 Documentation/riscv/hwprobe.rst
create mode 100644 arch/riscv/include/asm/cpufeature.h
create mode 100644 arch/riscv/include/asm/hwprobe.h
create mode 100644 arch/riscv/include/asm/vdso/data.h
create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
create mode 100644 arch/riscv/kernel/vdso/hwprobe.c
create mode 100644 arch/riscv/kernel/vdso/sys_hwprobe.S
create mode 100644 tools/testing/selftests/riscv/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
--
2.25.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
I have this on github:
https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v3:
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (15):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 512 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 96 +++-
drivers/iommu/iommufd/io_pagetable.c | 27 +-
drivers/iommu/iommufd/iommufd_private.h | 51 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 17 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 64 ++-
.../selftests/iommu/iommufd_fail_nth.c | 52 +-
tools/testing/selftests/iommu/iommufd_utils.h | 61 ++-
14 files changed, 804 insertions(+), 200 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Chuck Lever <chuck.lever(a)oracle.com>
Circumvent the .gitignore wildcard to avoid warnings about ignored
.kunitconfig files. As far as I can tell, the warnings are harmless
and these files are not actually ignored.
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304142337.jc4oUrov-lkp@intel.com/
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
.gitignore | 1 +
1 file changed, 1 insertion(+)
Resending... It was not clear to me if this file has a specific
maintainer. I chose to send it to the most recent committer.
diff --git a/.gitignore b/.gitignore
index 70ec6037fa7a..51117ba29c88 100644
--- a/.gitignore
+++ b/.gitignore
@@ -105,6 +105,7 @@ modules.order
!.gitignore
!.mailmap
!.rustfmt.toml
+!.kunitconfig
#
# Generated include files
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which facilitate this fix by
factoring out the tc helper's logic which was shared with cg/sk_skb
(which operate correctly).
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add tc_socket_lookup tests
net/core/filter.c | 132 +++++--
.../bpf/prog_tests/tc_socket_lookup.c | 341 ++++++++++++++++++
.../selftests/bpf/progs/tc_socket_lookup.c | 73 ++++
3 files changed, 525 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/tc_socket_lookup.c
--
2.34.1
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2