When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which factor out the tc helper's
logic which was shared with cg/sk_skb (which operate correctly).
This refactoring is needed in order to avoid affecting the cgroup/sk_skb
flows as there does not seem to be a strict criteria for discerning which
flow the helper is called from based on the net device or packet
information.
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v5: Use reverse xmas tree indentation
v4: - Move dev_sdif() to include/linux/netdevice.h as suggested by Stanislav Fomichev
- Remove SYS and SYS_NOFAIL duplicate definitions
v3: - Rename bpf_l2_sdif() to dev_sdif() as suggested by Stanislav Fomichev
- Added xdp tests as suggested by Daniel Borkmann
- Use start_server() to avoid duplicate code as suggested by Stanislav Fomichev
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add vrf_socket_lookup tests
include/linux/netdevice.h | 9 +
net/core/filter.c | 123 +++++--
.../bpf/prog_tests/vrf_socket_lookup.c | 312 ++++++++++++++++++
.../selftests/bpf/progs/vrf_socket_lookup.c | 88 +++++
4 files changed, 511 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/vrf_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/vrf_socket_lookup.c
--
2.34.1
PTP_SYS_OFFSET_EXTENDED was added in November 2018 in
361800876f80 (" ptp: add PTP_SYS_OFFSET_EXTENDED ioctl")
and PTP_SYS_OFFSET_PRECISE was added in February 2016 in
719f1aa4a671 ("ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping")
The PTP selftest code is lacking support for these two IOCTLS.
This short series of patches adds support for them.
Alex Maftei (2):
selftests/ptp: Add -x option for testing PTP_SYS_OFFSET_EXTENDED
selftests/ptp: Add -X option for testing PTP_SYS_OFFSET_PRECISE
tools/testing/selftests/ptp/testptp.c | 71 ++++++++++++++++++++++++++-
1 file changed, 69 insertions(+), 2 deletions(-)
--
2.28.0
Now the writing operation return the count of writes whether events are
enabled or disabled. Fix this by just return -ENOENT when events are disabled.
v1 -> v2:
- Change the returh vale from -EFAULT to -ENOENT
sunliming (3):
tracing/user_events: Fix incorrect return value for writing operation
when events are disabled
selftests/user_events: Enable the event before write_fault test in
ftrace self-test
selftests/user_events: Add test cases when event is disabled
kernel/trace/trace_events_user.c | 3 ++-
tools/testing/selftests/user_events/ftrace_test.c | 8 ++++++++
2 files changed, 10 insertions(+), 1 deletion(-)
--
2.25.1
This patch-set implements 2 small extensions to the current F_OFD_GETLK,
allowing it to gather more information than it currently returns.
First extension allows to use F_UNLCK on query, which currently returns
EINVAL. Instead it can be used to query the locks on a particular fd -
something that is not currently possible. The basic idea is that on
F_OFD_GETLK, F_UNLCK would "conflict" with (or query) any types of the
lock on the same fd, and ignore any locks on other fds.
Use-cases:
1. CRIU-alike scenario when you want to read the locking info from an
fd for the later reconstruction. This can now be done by setting
l_start and l_len to 0 to cover entire file range, and do F_OFD_GETLK.
In the loop you need to advance l_start past the returned lock ranges,
to eventually collect all locked ranges.
2. Implementing the lock checking/enforcing policy.
Say you want to implement an "auditor" module in your program,
that checks that the I/O is done only after the proper locking is
applied on a file region. In this case you need to know if the
particular region is locked on that fd, and if so - with what type
of the lock. If you would do that currently (without this extension)
then you can only check for the write locks, and for that you need to
probe the lock on your fd and then open the same file via nother fd and
probe there. That way you can identify the write lock on a particular
fd, but such trick is non-atomic and complex. As for finding out the
read lock on a particular fd - impossible.
This extension allows to do such queries without any extra efforts.
3. Implementing the mandatory locking policy.
Suppose you want to make a policy where the write lock inhibits any
unlocked readers and writers. Currently you need to check if the
write lock is present on some other fd, and if it is not there - allow
the I/O operation. But because the write lock can appear at any moment,
you need to do that under some global lock, which can be released only
when the I/O operation is finished.
With the proposed extension you can instead just check the write lock
on your own fd first, and if it is there - allow the I/O operation on
that fd without using any global lock. Only if there is no write lock
on this fd, then you need to take global lock and check for a write
lock on other fds.
The second patch implements another extension.
Currently F_OFD_GETLK returns -1 in the l_pid member.
This patch removes the code that writes -1 there, so that the proper
pid is returned. I am not sure why it was decided to deliberately hide
the owner's pid. It may be needed in case you want to send some
message to the offending locker, like eg SIGKILL.
The third patch adds a test-case for OFD locks.
It tests both the generic things and the proposed extensions.
Stas Sergeev (3):
fs/locks: F_UNLCK extension for F_OFD_GETLK
fd/locks: allow get the lock owner by F_OFD_GETLK
selftests: add OFD lock tests
fs/locks.c | 25 +++-
tools/testing/selftests/locking/Makefile | 2 +
tools/testing/selftests/locking/ofdlocks.c | 135 +++++++++++++++++++++
3 files changed, 157 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/locking/ofdlocks.c
CC: Jeff Layton <jlayton(a)kernel.org>
CC: Chuck Lever <chuck.lever(a)oracle.com>
CC: Alexander Viro <viro(a)zeniv.linux.org.uk>
CC: Christian Brauner <brauner(a)kernel.org>
CC: linux-fsdevel(a)vger.kernel.org
CC: linux-kernel(a)vger.kernel.org
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
--
2.39.2
This is to add Intel VT-d nested translation based on IOMMUFD nesting
infrastructure. As the iommufd nesting infrastructure series[1], iommu
core supports new ops to report iommu hardware information, allocate
domains with user data and sync stage-1 IOTLB. The data required in
the three paths are vendor-specific, so
1) IOMMU_HW_INFO_TYPE_INTEL_VTD and struct iommu_device_info_vtd are
defined to report iommu hardware information for Intel VT-d .
2) IOMMU_HWPT_DATA_VTD_S1 is defined for the Intel VT-d stage-1 page
table, it will be used in the stage-1 domain allocation and IOTLB
syncing path. struct iommu_hwpt_intel_vtd is defined to pass user_data
for the Intel VT-d stage-1 domain allocation.
struct iommu_hwpt_invalidate_intel_vtd is defined to pass the data for
the Intel VT-d stage-1 IOTLB invalidation.
With above IOMMUFD extensions, the intel iommu driver implements the three
paths to support nested translation.
The first Intel platform supporting nested translation is Sapphire
Rapids which, unfortunately, has a hardware errata [2] requiring special
treatment. This errata happens when a stage-1 page table page (either
level) is located in a stage-2 read-only region. In that case the IOMMU
hardware may ignore the stage-2 RO permission and still set the A/D bit
in stage-1 page table entries during page table walking.
A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to report
this errata to userspace. With that restriction the user should either
disable nested translation to favor RO stage-2 mappings or ensure no
RO stage-2 mapping to enable nested translation.
Intel-iommu driver is armed with necessary checks to prevent such mix
in patch10 of this series.
Qemu currently does add RO mappings though. The vfio agent in Qemu
simply maps all valid regions in the GPA address space which certainly
includes RO regions e.g. vbios.
In reality we don't know a usage relying on DMA reads from the BIOS
region. Hence finding a way to allow user opt-out RO mappings in
Qemu might be an acceptable tradeoff. But how to achieve it cleanly
needs more discussion in Qemu community. For now we just hacked Qemu
to test.
Complete code can be found in [3], QEMU could can be found in [4].
base-commit: ce9b593b1f74ccd090edc5d2ad397da84baa9946
[1] https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
[2] https://www.intel.com/content/www/us/en/content-details/772415/content-deta…
[3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[4] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4.mig.reset.v4_var3%…
Change log:
v3:
- Further split the patches into an order of adding helpers for nested
domain, iotlb flush, nested domain attachment and nested domain allocation
callback, then report the hw_info to userspace.
- Add batch support in cache invalidation from userspace
- Disallow nested translation usage if RO mappings exists in stage-2 domain
due to errata on readonly mappings on Sapphire Rapids platform.
v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.…
- The iommufd infrastructure is split to be separate series.
v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Lu Baolu (5):
iommu/vt-d: Extend dmar_domain to support nested domain
iommu/vt-d: Add helper for nested domain allocation
iommu/vt-d: Add helper to setup pasid nested translation
iommu/vt-d: Add nested domain allocation
iommu/vt-d: Disallow nesting on domains with read-only mappings
Yi Liu (5):
iommufd: Add data structure for Intel VT-d stage-1 domain allocation
iommu/vt-d: Make domain attach helpers to be extern
iommu/vt-d: Set the nested domain to a device
iommu/vt-d: Add iotlb flush for nested domain
iommu/vt-d: Implement hw_info for iommu capability query
drivers/iommu/intel/Makefile | 2 +-
drivers/iommu/intel/iommu.c | 78 ++++++++++++---
drivers/iommu/intel/iommu.h | 55 +++++++++--
drivers/iommu/intel/nested.c | 181 +++++++++++++++++++++++++++++++++++
drivers/iommu/intel/pasid.c | 151 +++++++++++++++++++++++++++++
drivers/iommu/intel/pasid.h | 2 +
drivers/iommu/iommufd/main.c | 6 ++
include/linux/iommu.h | 1 +
include/uapi/linux/iommufd.h | 149 ++++++++++++++++++++++++++++
9 files changed, 603 insertions(+), 22 deletions(-)
create mode 100644 drivers/iommu/intel/nested.c
--
2.34.1
In order to cover this case, setting 'maxlen = 0', with the following
explanation:
EVIOCGKEY is executed from evdev_do_ioctl(), which is called from
evdev_ioctl_handler().
evdev_ioctl_handler() is called from 2 functions, where by code coverage,
only the first one is in use.
‘compat’ is given the value ‘0’ [1].
Thus, the condition [2] is always false.
This means ‘len’ always equals a positive number [3]
‘maxlen’ in evdev_handle_get_val [4] is defined locally in
evdev_do_ioctl() [5], and is sent in the variable 'size' [6]
[1] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1281
[2] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L705
[3] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L707
[4] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L886
[5] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1155
[6] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1141
Signed-off-by: Dana Elfassy <dangel101(a)gmail.com>
---
tools/testing/selftests/input/evioc-test.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c
index ad7b93fe39cf..b94de2ee5596 100644
--- a/tools/testing/selftests/input/evioc-test.c
+++ b/tools/testing/selftests/input/evioc-test.c
@@ -234,4 +234,23 @@ TEST(eviocsrep_set_repeat_settings)
selftest_uinput_destroy(uidev);
}
+TEST(eviocgkey_get_global_key_state)
+{
+ struct selftest_uinput *uidev;
+ int rep_values[2];
+ int rc;
+
+ memset(rep_values, 0, sizeof(rep_values));
+
+ rc = selftest_uinput_create_device(&uidev);
+ ASSERT_EQ(0, rc);
+ ASSERT_NE(NULL, uidev);
+
+ /* ioctl to create the scenario where len > maxlen in bits_to_user() */
+ rc = ioctl(uidev->evdev_fd, EVIOCGKEY(0), rep_values);
+ ASSERT_EQ(0, rc);
+
+ selftest_uinput_destroy(uidev);
+}
+
TEST_HARNESS_MAIN
--
2.41.0