Allow SCSI hosts to request avoiding trailing allocation length with VPD
inquiries, and use the mechanism to work around an issue with at least
some BusLogic MultiMaster host bus adapters and observed with the BT-958
model specifically where issuing commands that return less data than
provided for causes fatal failures:
scsi host0: BusLogic BT-958
scsi 0:0:0:0: Direct-Access IBM DDYS-T18350M SA5A PQ: 0 ANSI: 3
scsi 0:0:1:0: Direct-Access SEAGATE ST336607LW 0006 PQ: 0 ANSI: 3
scsi 0:0:5:0: Direct-Access IOMEGA ZIP 100 E.08 PQ: 0 ANSI: 2
sd 0:0:1:0: [sdb] 71687372 512-byte logical blocks: (36.7 GB/34.2 GiB)
sd 0:0:0:0: [sda] 35843670 512-byte logical blocks: (18.4 GB/17.1 GiB)
sd 0:0:1:0: [sdb] Write Protect is off
sd 0:0:5:0: [sdc] Attached SCSI removable disk
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
scsi0: *** BusLogic BT-958 Initialized Successfully ***
scsi0: *** BusLogic BT-958 Initialized Successfully ***
scsi0: *** BusLogic BT-958 Initialized Successfully ***
scsi0: *** BusLogic BT-958 Initialized Successfully ***
sd 0:0:0:0: Device offlined - not ready after error recovery
sd 0:0:1:0: Device offlined - not ready after error recovery
sd 0:0:0:0: scsi_vpd_inquiry(0): buf[64] => -5
sd 0:0:1:0: scsi_vpd_inquiry(0): buf[64] => -5
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:1:0: [sdb] Attached SCSI disk
VFS: Cannot open root device "802" or unknown-block(8,2): error -6
(here and elsewhere reported with some instrumentation added so as to
show the causing requests and with irrelevant messages filtered out).
As already observed back in 2003 and worked around in smartmontools at
least some versions of BusLogic firmware such as 5.07B are unable to
handle such commands, but it is possible to request enough data first
for the length of the data response to be determined and then reissue
the same command with the allocation length matching the response
expected.
It is what this change does on a host-by-host basis, by providing a flag
for individual HBA drivers to enable this workaround, currently set by
the BusLogic driver, and then issuing these double calls as requested,
which then produce results as expected:
sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:1:0: scsi_vpd_inquiry(0): buf[64] => 13
sd 0:0:0:0: scsi_vpd_inquiry(0): buf[64] => 7
sdb: sdb1 sdb2
sd 0:0:1:0: scsi_vpd_inquiry(0): buf[64] => 13
sd 0:0:1:0: [sdb] Attached SCSI disk
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 >
sd 0:0:0:0: scsi_vpd_inquiry(0): buf[64] => 7
sd 0:0:0:0: [sda] Attached SCSI disk
EXT4-fs (sda2): mounting ext2 file system using the ext4 subsystem
EXT4-fs (sda2): mounted filesystem without journal. Opts: (null). Quota mode: disabled.
VFS: Mounted root (ext2 filesystem) readonly on device 8:2.
The minimum request size of 4 for the repeated call has been chosen to
match one required for a successful return from `scsi_vpd_inquiry'.
Interestingly enough it has only started triggering with not so recent
commit af73623f5f10 ("[SCSI] sd: Reduce buffer size for vpd request")
that decreased the allocation length for the originating request from
512 down to 64. Previously the request was rejected outright by the
respective targets as invalid and therefore did not trigger the issue
with MultiMaster firmware as that would only happen for a command that
succeeded but produced less data than provided for:
scsi0: CCB #36 Target 0: Result 2 Host Adapter Status 00 Target Status 02
scsi0: CDB 12 01 00 02 00 00
scsi0: Sense 70 00 05 00 00 00 00 18 00 00 00 00 24 00 00 C0 00 03 00 [...]
sd 0:0:0:0: scsi_vpd_inquiry(0): buf[512] => -5
scsi0: CCB #37 Target 1: Result 2 Host Adapter Status 00 Target Status 02
scsi0: CDB 12 01 00 02 00 00
scsi0: Sense 70 00 05 00 00 00 00 0A 00 00 00 00 24 00 01 C9 00 03 00 [...]
sd 0:0:1:0: scsi_vpd_inquiry(0): buf[512] => -5
(here with the buffer size set back to 512, the `BusLogic=TraceErrors'
parameter and trailing sense data zeros trimmed for brevity). Note the
sense key of 0x5 returned denoting an illegal request even for page 0.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 881a256d84e6 ("[SCSI] Add VPD helper")
Cc: stable(a)vger.kernel.org # v2.6.30+
---
No changes from v1.
---
drivers/scsi/BusLogic.c | 1 +
drivers/scsi/scsi.c | 24 +++++++++++++++++++++---
include/scsi/scsi_host.h | 3 +++
3 files changed, 25 insertions(+), 3 deletions(-)
linux-buslogic-get-vpd-page-buffer.diff
Index: linux-macro-ide/drivers/scsi/BusLogic.c
===================================================================
--- linux-macro-ide.orig/drivers/scsi/BusLogic.c
+++ linux-macro-ide/drivers/scsi/BusLogic.c
@@ -2301,6 +2301,7 @@ static void __init blogic_inithoststruct
host->sg_tablesize = adapter->drvr_sglimit;
host->unchecked_isa_dma = adapter->need_bouncebuf;
host->cmd_per_lun = adapter->untag_qdepth;
+ host->no_trailing_allocation_length = true;
}
/*
Index: linux-macro-ide/drivers/scsi/scsi.c
===================================================================
--- linux-macro-ide.orig/drivers/scsi/scsi.c
+++ linux-macro-ide/drivers/scsi/scsi.c
@@ -346,8 +346,19 @@ int scsi_get_vpd_page(struct scsi_device
if (sdev->skip_vpd_pages)
goto fail;
- /* Ask for all the pages supported by this device */
- result = scsi_vpd_inquiry(sdev, buf, 0, buf_len);
+ /*
+ * Ask for all the pages supported by this device. Determine the
+ * actual data length first if so required by the host, e.g.
+ * BusLogic BT-958.
+ */
+ if (sdev->host->no_trailing_allocation_length) {
+ result = scsi_vpd_inquiry(sdev, buf, 0, min(4, buf_len));
+ if (result < 4)
+ goto fail;
+ } else {
+ result = buf_len;
+ }
+ result = scsi_vpd_inquiry(sdev, buf, 0, min(result, buf_len));
if (result < 4)
goto fail;
@@ -366,7 +377,14 @@ int scsi_get_vpd_page(struct scsi_device
goto fail;
found:
- result = scsi_vpd_inquiry(sdev, buf, page, buf_len);
+ if (sdev->host->no_trailing_allocation_length) {
+ result = scsi_vpd_inquiry(sdev, buf, page, min(4, buf_len));
+ if (result < 4)
+ goto fail;
+ } else {
+ result = buf_len;
+ }
+ result = scsi_vpd_inquiry(sdev, buf, page, min(result, buf_len));
if (result < 0)
goto fail;
Index: linux-macro-ide/include/scsi/scsi_host.h
===================================================================
--- linux-macro-ide.orig/include/scsi/scsi_host.h
+++ linux-macro-ide/include/scsi/scsi_host.h
@@ -653,6 +653,9 @@ struct Scsi_Host {
/* The transport requires the LUN bits NOT to be stored in CDB[1] */
unsigned no_scsi2_lun_in_cdb:1;
+ /* Allocation length must not exceed actual data length. */
+ unsigned no_trailing_allocation_length:1;
+
/*
* Optional work queue to be utilized by the transport
*/
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 5849cdf8c120e3979c57d34be55b92d90a77a47e
Gitweb: https://git.kernel.org/tip/5849cdf8c120e3979c57d34be55b92d90a77a47e
Author: Mike Galbraith <efault(a)gmx.de>
AuthorDate: Fri, 16 Apr 2021 14:02:07 +02:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Tue, 20 Apr 2021 17:32:46 +02:00
x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access
Commit in Fixes: added support for kexec-ing a kernel on panic using a
new system call. As part of it, it does prepare a memory map for the new
kernel.
However, while doing so, it wrongly accesses memory it has not
allocated: it accesses the first element of the cmem->ranges[] array in
memmap_exclude_ranges() but it has not allocated the memory for it in
crash_setup_memmap_entries(). As KASAN reports:
BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
Write of size 8 at addr ffffc90000426008 by task kexec/1187
(gdb) list *crash_setup_memmap_entries+0x17e
0xffffffff8107cafe is in crash_setup_memmap_entries (arch/x86/kernel/crash.c:322).
317 unsigned long long mend)
318 {
319 unsigned long start, end;
320
321 cmem->ranges[0].start = mstart;
322 cmem->ranges[0].end = mend;
323 cmem->nr_ranges = 1;
324
325 /* Exclude elf header region */
326 start = image->arch.elf_load_addr;
(gdb)
Make sure the ranges array becomes a single element allocated.
[ bp: Write a proper commit message. ]
Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Mike Galbraith <efault(a)gmx.de>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Dave Young <dyoung(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/725fa3dc1da2737f0f6188a1a9701bead257ea9d.camel@gm…
---
arch/x86/kernel/crash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index a8f3af2..b1deacb 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -337,7 +337,7 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
struct crash_memmap_data cmd;
struct crash_mem *cmem;
- cmem = vzalloc(sizeof(struct crash_mem));
+ cmem = vzalloc(struct_size(cmem, ranges, 1));
if (!cmem)
return -ENOMEM;
The main thread could start to send SIG_IPI at any time, even before signal
blocked on vcpu thread. Therefore, start the vcpu thread with the signal
blocked.
Without this patch, on very busy cores the dirty_log_test could fail directly
on receiving a SIGUSR1 without a handler (when vcpu runs far slower than main).
Reported-by: Peter Xu <peterx(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
tools/testing/selftests/kvm/dirty_log_test.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index ffa4e2791926..81edbd23d371 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -527,9 +527,8 @@ static void *vcpu_worker(void *data)
*/
sigmask->len = 8;
pthread_sigmask(0, NULL, sigset);
+ sigdelset(sigset, SIG_IPI);
vcpu_ioctl(vm, VCPU_ID, KVM_SET_SIGNAL_MASK, sigmask);
- sigaddset(sigset, SIG_IPI);
- pthread_sigmask(SIG_BLOCK, sigset, NULL);
sigemptyset(sigset);
sigaddset(sigset, SIG_IPI);
@@ -858,6 +857,7 @@ int main(int argc, char *argv[])
.interval = TEST_HOST_LOOP_INTERVAL,
};
int opt, i;
+ sigset_t sigset;
sem_init(&sem_vcpu_stop, 0, 0);
sem_init(&sem_vcpu_cont, 0, 0);
@@ -916,6 +916,11 @@ int main(int argc, char *argv[])
srandom(time(0));
+ /* Ensure that vCPU threads start with SIG_IPI blocked. */
+ sigemptyset(&sigset);
+ sigaddset(&sigset, SIG_IPI);
+ pthread_sigmask(SIG_BLOCK, &sigset, NULL);
+
if (host_log_mode_option == LOG_MODE_ALL) {
/* Run each log mode */
for (i = 0; i < LOG_MODE_NUM; i++) {
--
2.26.2
+Paolo
On Sun, Apr 18, 2021, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> KVM: VMX: Convert vcpu_vmx.exit_reason to a union
>
> to the 5.10-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> kvm-vmx-convert-vcpu_vmx.exit_reason-to-a-union.patch
> and it can be found in the queue-5.10 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
I'm not sure we want this going into stable kernels, even for 5.10 and 5.11.
I assume it got pulled in to resolve a conflict with commit 04c4f2ee3f68 ("KVM:
VMX: Don't use vcpu->run->internal.ndata as an array index"), but that's should
be trivial to resolve since it's just a collision with surrounding code.
Maybe we'll end up with a more painful conflict in the future that would be best
solved by grabbing this refactoring, but I don't think we're there yet.
> commit 1499b54db7d9e7e5f5014307e8391e3ad7986f1f
> Author: Sean Christopherson <seanjc(a)google.com>
> Date: Fri Nov 6 17:03:12 2020 +0800
>
> KVM: VMX: Convert vcpu_vmx.exit_reason to a union
>
> [ Upstream commit 8e53324021645f820a01bf8aa745711c802c8542 ]
>
> Convert vcpu_vmx.exit_reason from a u32 to a union (of size u32). The
> full VM_EXIT_REASON field is comprised of a 16-bit basic exit reason in
> bits 15:0, and single-bit modifiers in bits 31:16.
>
> Historically, KVM has only had to worry about handling the "failed
> VM-Entry" modifier, which could only be set in very specific flows and
> required dedicated handling. I.e. manually stripping the FAILED_VMENTRY
> bit was a somewhat viable approach. But even with only a single bit to
> worry about, KVM has had several bugs related to comparing a basic exit
> reason against the full exit reason store in vcpu_vmx.
>
> Upcoming Intel features, e.g. SGX, will add new modifier bits that can
> be set on more or less any VM-Exit, as opposed to the significantly more
> restricted FAILED_VMENTRY, i.e. correctly handling everything in one-off
> flows isn't scalable. Tracking exit reason in a union forces code to
> explicitly choose between consuming the full exit reason and the basic
> exit, and is a convenient way to document and access the modifiers.
>
> No functional change intended.
>
> Cc: Xiaoyao Li <xiaoyao.li(a)intel.com>
> Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
> Signed-off-by: Chenyi Qiang <chenyi.qiang(a)intel.com>
> Message-Id: <20201106090315.18606-2-chenyi.qiang(a)intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
On Mon, Apr 19, 2021 at 2:37 PM Nathan Chancellor <nathan(a)kernel.org> wrote:
>
> This should not have been merged into mainline by itself. It was a fix
> for "gcov: use kvmalloc()", which is still in -mm/-next. Merging it
> alone has now broken the build:
>
> https://github.com/ClangBuiltLinux/continuous-integration2/runs/2384465683?…
>
> Could it please be reverted in mainline [..]
Now reverted in my tree.
Sasha and stable cc'd too, since it was apparently auto-selected there..
Linus
The START_TRANSFER command needs to be executed while in ON/U0 link
state (with an exception during register initialization). Don't use
dwc->link_state to check this since the driver only tracks the link
state when the link state change interrupt is enabled. Check the link
state from DSTS register instead.
Note that often the host already brings the device out of low power
before it sends/requests the next transfer. So, the user won't see any
issue when the device starts transfer then. This issue is more
noticeable in cases when the device delays starting transfer, which can
happen during delayed control status after the host put the device in
low power.
Cc: <stable(a)vger.kernel.org>
Fixes: 799e9dc82968 ("usb: dwc3: gadget: conditionally disable Link State change events")
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
---
drivers/usb/dwc3/gadget.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 6227641f2d31..1a632a3faf7f 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -308,13 +308,12 @@ int dwc3_send_gadget_ep_cmd(struct dwc3_ep *dep, unsigned int cmd,
}
if (DWC3_DEPCMD_CMD(cmd) == DWC3_DEPCMD_STARTTRANSFER) {
- int needs_wakeup;
+ int link_state;
- needs_wakeup = (dwc->link_state == DWC3_LINK_STATE_U1 ||
- dwc->link_state == DWC3_LINK_STATE_U2 ||
- dwc->link_state == DWC3_LINK_STATE_U3);
-
- if (unlikely(needs_wakeup)) {
+ link_state = dwc3_gadget_get_link_state(dwc);
+ if (link_state == DWC3_LINK_STATE_U1 ||
+ link_state == DWC3_LINK_STATE_U2 ||
+ link_state == DWC3_LINK_STATE_U3) {
ret = __dwc3_gadget_wakeup(dwc);
dev_WARN_ONCE(dwc->dev, ret, "wakeup failed --> %d\n",
ret);
@@ -1989,6 +1988,8 @@ static int __dwc3_gadget_wakeup(struct dwc3 *dwc)
case DWC3_LINK_STATE_RESET:
case DWC3_LINK_STATE_RX_DET: /* in HS, means Early Suspend */
case DWC3_LINK_STATE_U3: /* in HS, means SUSPEND */
+ case DWC3_LINK_STATE_U2: /* in HS, means Sleep (L1) */
+ case DWC3_LINK_STATE_U1:
case DWC3_LINK_STATE_RESUME:
break;
default:
base-commit: 4b853c236c7b5161a2e444bd8b3c76fe5aa5ddcb
--
2.28.0
Commit a6dcfe08487e ("scsi: qla2xxx: Limit interrupt vectors to number
of CPUs") lowers the number of allocated MSI-X vectors to the number of
CPUs.
That breaks vector allocation assumptions in qla83xx_iospace_config(),
qla24xx_enable_msix() and qla2x00_iospace_config(). Either of the
functions computes maximum number of qpairs as:
ha->max_qpairs = ha->msix_count - 1 (MB interrupt) - 1 (default
response queue) - 1 (ATIO, in dual or pure target mode)
max_qpairs is set to zero in case of two CPUs and initiator mode. The
number is then used to allocate ha->queue_pair_map inside
qla2x00_alloc_queues(). No allocation happens and ha->queue_pair_map is
left NULL but the driver thinks there are queue pairs available.
qla2xxx_queuecommand() tries to find a qpair in the map and crashes:
if (ha->mqenable) {
uint32_t tag;
uint16_t hwq;
struct qla_qpair *qpair = NULL;
tag = blk_mq_unique_tag(cmd->request);
hwq = blk_mq_unique_tag_to_hwq(tag);
qpair = ha->queue_pair_map[hwq]; # <- HERE
if (qpair)
return qla2xxx_mqueuecommand(host, cmd, qpair);
}
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 72 Comm: kworker/u4:3 Tainted: G W 5.10.0-rc1+ #25
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: scsi_wq_7 fc_scsi_scan_rport [scsi_transport_fc]
RIP: 0010:qla2xxx_queuecommand+0x16b/0x3f0 [qla2xxx]
Call Trace:
scsi_queue_rq+0x58c/0xa60
blk_mq_dispatch_rq_list+0x2b7/0x6f0
? __sbitmap_get_word+0x2a/0x80
__blk_mq_sched_dispatch_requests+0xb8/0x170
blk_mq_sched_dispatch_requests+0x2b/0x50
__blk_mq_run_hw_queue+0x49/0xb0
__blk_mq_delay_run_hw_queue+0xfb/0x150
blk_mq_sched_insert_request+0xbe/0x110
blk_execute_rq+0x45/0x70
__scsi_execute+0x10e/0x250
scsi_probe_and_add_lun+0x228/0xda0
__scsi_scan_target+0xf4/0x620
? __pm_runtime_resume+0x4f/0x70
scsi_scan_target+0x100/0x110
fc_scsi_scan_rport+0xa1/0xb0 [scsi_transport_fc]
process_one_work+0x1ea/0x3b0
worker_thread+0x28/0x3b0
? process_one_work+0x3b0/0x3b0
kthread+0x112/0x130
? kthread_park+0x80/0x80
ret_from_fork+0x22/0x30
The driver should allocate enough vectors to provide every CPU it's own HW
queue and still handle reserved (MB, RSP, ATIO) interrupts.
The change fixes the crash on dual core VM and prevents unbalanced QP
allocation where nr_hw_queues is two less than the number of CPUs.
Cc: Daniel Wagner <daniel.wagner(a)suse.com>
Cc: Himanshu Madhani <himanshu.madhani(a)oracle.com>
Cc: Quinn Tran <qutran(a)marvell.com>
Cc: Nilesh Javali <njavali(a)marvell.com>
Cc: Martin K. Petersen <martin.petersen(a)oracle.com>
Cc: stable(a)vger.kernel.org # 5.11+
Fixes: a6dcfe08487e ("scsi: qla2xxx: Limit interrupt vectors to number of CPUs")
Reported-by: Aleksandr Volkov <a.y.volkov(a)yadro.com>
Reported-by: Aleksandr Miloserdov <a.miloserdov(a)yadro.com>
Signed-off-by: Roman Bolshakov <r.bolshakov(a)yadro.com>
---
drivers/scsi/qla2xxx/qla_isr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
index 11d6e0db07fe..6e8f737a4af3 100644
--- a/drivers/scsi/qla2xxx/qla_isr.c
+++ b/drivers/scsi/qla2xxx/qla_isr.c
@@ -3998,11 +3998,11 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
if (USER_CTRL_IRQ(ha) || !ha->mqiobase) {
/* user wants to control IRQ setting for target mode */
ret = pci_alloc_irq_vectors(ha->pdev, min_vecs,
- min((u16)ha->msix_count, (u16)num_online_cpus()),
+ min((u16)ha->msix_count, (u16)(num_online_cpus() + min_vecs)),
PCI_IRQ_MSIX);
} else
ret = pci_alloc_irq_vectors_affinity(ha->pdev, min_vecs,
- min((u16)ha->msix_count, (u16)num_online_cpus()),
+ min((u16)ha->msix_count, (u16)(num_online_cpus() + min_vecs)),
PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
&desc);
--
2.30.1
kernel test robot reports build errors in 3 Xilinx ethernet drivers.
They all use ioremap functions that are only available when HAS_IOMEM
is set/enabled. If it is not enabled, they all have build errors,
so make these 3 drivers depend on HAS_IOMEM.
ld: drivers/net/ethernet/xilinx/xilinx_emaclite.o: in function `xemaclite_of_probe':
xilinx_emaclite.c:(.text+0x9fc): undefined reference to `devm_ioremap_resource'
ld: drivers/net/ethernet/xilinx/xilinx_axienet_main.o: in function `axienet_probe':
xilinx_axienet_main.c:(.text+0x942): undefined reference to `devm_ioremap_resource'
ld: drivers/net/ethernet/xilinx/ll_temac_main.o: in function `temac_probe':
ll_temac_main.c:(.text+0x1283): undefined reference to `devm_platform_ioremap_resource_byname'
ld: ll_temac_main.c:(.text+0x13ad): undefined reference to `devm_of_iomap'
ld: ll_temac_main.c:(.text+0x162e): undefined reference to `devm_platform_ioremap_resource'
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Reported-by: kernel test robot <lkp(a)intel.com>
Cc: Radhey Shyam Pandey <radhey.shyam.pandey(a)xilinx.com>
Cc: Gary Guo <gary(a)garyguo.net>
Cc: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Cc: Andre Przywara <andre.przywara(a)arm.com>
Cc: stable(a)vger.kernel.org
Cc: Daniel Borkmann <daniel(a)iogearbox.net>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: netdev(a)vger.kernel.org
---
drivers/net/ethernet/xilinx/Kconfig | 3 +++
1 file changed, 3 insertions(+)
--- linux-next-20210416.orig/drivers/net/ethernet/xilinx/Kconfig
+++ linux-next-20210416/drivers/net/ethernet/xilinx/Kconfig
@@ -18,12 +18,14 @@ if NET_VENDOR_XILINX
config XILINX_EMACLITE
tristate "Xilinx 10/100 Ethernet Lite support"
+ depends on HAS_IOMEM
select PHYLIB
help
This driver supports the 10/100 Ethernet Lite from Xilinx.
config XILINX_AXI_EMAC
tristate "Xilinx 10/100/1000 AXI Ethernet support"
+ depends on HAS_IOMEM
select PHYLINK
help
This driver supports the 10/100/1000 Ethernet from Xilinx for the
@@ -31,6 +33,7 @@ config XILINX_AXI_EMAC
config XILINX_LL_TEMAC
tristate "Xilinx LL TEMAC (LocalLink Tri-mode Ethernet MAC) driver"
+ depends on HAS_IOMEM
select PHYLIB
help
This driver supports the Xilinx 10/100/1000 LocalLink TEMAC
Good Day Sir/Ms,
I hope this email finds you well.
We are pleased to invite you or your company to quote below
listed item:
Product/Model No: A702TH FYNE PRESSURE REGULATOR
Model Number: A702TH
Qty. 30 units
Important note: Kindly send your quotation to:
quotation(a)pfizerbvsupply.com
for immediate approval.
Kind Regards,
Albert Bourla
PFIZER B.V Supply Chain Manager
Tel: +31(0)208080 880
ADDRESS: Rivium Westlaan 142, 2909 LD
Capelle aan den IJssel, Netherlands
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1fe976d308acb6374c899a4ee8025a0a016e453e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Mon, 12 Apr 2021 18:57:39 +0200
Subject: [PATCH] net: phy: marvell: fix detection of PHY on Topaz switches
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit fee2d546414d ("net: phy: marvell: mv88e6390 temperature
sensor reading"), Linux reports the temperature of Topaz hwmon as
constant -75°C.
This is because switches from the Topaz family (88E6141 / 88E6341) have
the address of the temperature sensor register different from Peridot.
This address is instead compatible with 88E1510 PHYs, as was used for
Topaz before the above mentioned commit.
Create a new mapping table between switch family and PHY ID for families
which don't have a model number. And define PHY IDs for Topaz and Peridot
families.
Create a new PHY ID and a new PHY driver for Topaz's internal PHY.
The only difference from Peridot's PHY driver is the HWMON probing
method.
Prior this change Topaz's internal PHY is detected by kernel as:
PHY [...] driver [Marvell 88E6390] (irq=63)
And afterwards as:
PHY [...] driver [Marvell 88E6341 Family] (irq=63)
Signed-off-by: Pali Rohár <pali(a)kernel.org>
BugLink: https://github.com/globalscaletechnologies/linux/issues/1
Fixes: fee2d546414d ("net: phy: marvell: mv88e6390 temperature sensor reading")
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 903d619e08ed..e08bf9377140 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3026,10 +3026,17 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
return err;
}
+/* prod_id for switch families which do not have a PHY model number */
+static const u16 family_prod_id_table[] = {
+ [MV88E6XXX_FAMILY_6341] = MV88E6XXX_PORT_SWITCH_ID_PROD_6341,
+ [MV88E6XXX_FAMILY_6390] = MV88E6XXX_PORT_SWITCH_ID_PROD_6390,
+};
+
static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
{
struct mv88e6xxx_mdio_bus *mdio_bus = bus->priv;
struct mv88e6xxx_chip *chip = mdio_bus->chip;
+ u16 prod_id;
u16 val;
int err;
@@ -3040,23 +3047,12 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
err = chip->info->ops->phy_read(chip, bus, phy, reg, &val);
mv88e6xxx_reg_unlock(chip);
- if (reg == MII_PHYSID2) {
- /* Some internal PHYs don't have a model number. */
- if (chip->info->family != MV88E6XXX_FAMILY_6165)
- /* Then there is the 6165 family. It gets is
- * PHYs correct. But it can also have two
- * SERDES interfaces in the PHY address
- * space. And these don't have a model
- * number. But they are not PHYs, so we don't
- * want to give them something a PHY driver
- * will recognise.
- *
- * Use the mv88e6390 family model number
- * instead, for anything which really could be
- * a PHY,
- */
- if (!(val & 0x3f0))
- val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
+ /* Some internal PHYs don't have a model number. */
+ if (reg == MII_PHYSID2 && !(val & 0x3f0) &&
+ chip->info->family < ARRAY_SIZE(family_prod_id_table)) {
+ prod_id = family_prod_id_table[chip->info->family];
+ if (prod_id)
+ val |= prod_id >> 4;
}
return err ? err : val;
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index e26a5d663f8a..8018ddf7f316 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -3021,9 +3021,34 @@ static struct phy_driver marvell_drivers[] = {
.get_stats = marvell_get_stats,
},
{
- .phy_id = MARVELL_PHY_ID_88E6390,
+ .phy_id = MARVELL_PHY_ID_88E6341_FAMILY,
.phy_id_mask = MARVELL_PHY_ID_MASK,
- .name = "Marvell 88E6390",
+ .name = "Marvell 88E6341 Family",
+ /* PHY_GBIT_FEATURES */
+ .flags = PHY_POLL_CABLE_TEST,
+ .probe = m88e1510_probe,
+ .config_init = marvell_config_init,
+ .config_aneg = m88e6390_config_aneg,
+ .read_status = marvell_read_status,
+ .config_intr = marvell_config_intr,
+ .handle_interrupt = marvell_handle_interrupt,
+ .resume = genphy_resume,
+ .suspend = genphy_suspend,
+ .read_page = marvell_read_page,
+ .write_page = marvell_write_page,
+ .get_sset_count = marvell_get_sset_count,
+ .get_strings = marvell_get_strings,
+ .get_stats = marvell_get_stats,
+ .get_tunable = m88e1540_get_tunable,
+ .set_tunable = m88e1540_set_tunable,
+ .cable_test_start = marvell_vct7_cable_test_start,
+ .cable_test_tdr_start = marvell_vct5_cable_test_tdr_start,
+ .cable_test_get_status = marvell_vct7_cable_test_get_status,
+ },
+ {
+ .phy_id = MARVELL_PHY_ID_88E6390_FAMILY,
+ .phy_id_mask = MARVELL_PHY_ID_MASK,
+ .name = "Marvell 88E6390 Family",
/* PHY_GBIT_FEATURES */
.flags = PHY_POLL_CABLE_TEST,
.probe = m88e6390_probe,
@@ -3107,7 +3132,8 @@ static struct mdio_device_id __maybe_unused marvell_tbl[] = {
{ MARVELL_PHY_ID_88E1540, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1545, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E3016, MARVELL_PHY_ID_MASK },
- { MARVELL_PHY_ID_88E6390, MARVELL_PHY_ID_MASK },
+ { MARVELL_PHY_ID_88E6341_FAMILY, MARVELL_PHY_ID_MASK },
+ { MARVELL_PHY_ID_88E6390_FAMILY, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1340S, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1548P, MARVELL_PHY_ID_MASK },
{ }
diff --git a/include/linux/marvell_phy.h b/include/linux/marvell_phy.h
index 52b1610eae68..c544b70dfbd2 100644
--- a/include/linux/marvell_phy.h
+++ b/include/linux/marvell_phy.h
@@ -28,11 +28,12 @@
/* Marvel 88E1111 in Finisar SFP module with modified PHY ID */
#define MARVELL_PHY_ID_88E1111_FINISAR 0x01ff0cc0
-/* The MV88e6390 Ethernet switch contains embedded PHYs. These PHYs do
+/* These Ethernet switch families contain embedded PHYs, but they do
* not have a model ID. So the switch driver traps reads to the ID2
* register and returns the switch family ID
*/
-#define MARVELL_PHY_ID_88E6390 0x01410f90
+#define MARVELL_PHY_ID_88E6341_FAMILY 0x01410f41
+#define MARVELL_PHY_ID_88E6390_FAMILY 0x01410f90
#define MARVELL_PHY_FAMILY_ID(id) ((id) >> 4)
commit 1fe976d308acb6374c899a4ee8025a0a016e453e upstream.
Since commit fee2d546414d ("net: phy: marvell: mv88e6390 temperature
sensor reading"), Linux reports the temperature of Topaz hwmon as
constant -75°C.
This is because switches from the Topaz family (88E6141 / 88E6341) have
the address of the temperature sensor register different from Peridot.
This address is instead compatible with 88E1510 PHYs, as was used for
Topaz before the above mentioned commit.
Create a new mapping table between switch family and PHY ID for families
which don't have a model number. And define PHY IDs for Topaz and Peridot
families.
Create a new PHY ID and a new PHY driver for Topaz's internal PHY.
The only difference from Peridot's PHY driver is the HWMON probing
method.
Prior this change Topaz's internal PHY is detected by kernel as:
PHY [...] driver [Marvell 88E6390] (irq=63)
And afterwards as:
PHY [...] driver [Marvell 88E6341 Family] (irq=63)
Signed-off-by: Pali Rohár <pali(a)kernel.org>
BugLink: https://github.com/globalscaletechnologies/linux/issues/1
Fixes: fee2d546414d ("net: phy: marvell: mv88e6390 temperature sensor reading")
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
[pali: Backported to 5.10 version]
---
This patch is backported to 5.10 version. Tested on Turris Mox with Topaz switch.
---
drivers/net/dsa/mv88e6xxx/chip.c | 30 +++++++++++++----------------
drivers/net/phy/marvell.c | 33 +++++++++++++++++++++++++++++---
include/linux/marvell_phy.h | 5 +++--
3 files changed, 46 insertions(+), 22 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 87160e723dfc..70ec17f3c300 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2994,10 +2994,17 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
return err;
}
+/* prod_id for switch families which do not have a PHY model number */
+static const u16 family_prod_id_table[] = {
+ [MV88E6XXX_FAMILY_6341] = MV88E6XXX_PORT_SWITCH_ID_PROD_6341,
+ [MV88E6XXX_FAMILY_6390] = MV88E6XXX_PORT_SWITCH_ID_PROD_6390,
+};
+
static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
{
struct mv88e6xxx_mdio_bus *mdio_bus = bus->priv;
struct mv88e6xxx_chip *chip = mdio_bus->chip;
+ u16 prod_id;
u16 val;
int err;
@@ -3008,23 +3015,12 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
err = chip->info->ops->phy_read(chip, bus, phy, reg, &val);
mv88e6xxx_reg_unlock(chip);
- if (reg == MII_PHYSID2) {
- /* Some internal PHYs don't have a model number. */
- if (chip->info->family != MV88E6XXX_FAMILY_6165)
- /* Then there is the 6165 family. It gets is
- * PHYs correct. But it can also have two
- * SERDES interfaces in the PHY address
- * space. And these don't have a model
- * number. But they are not PHYs, so we don't
- * want to give them something a PHY driver
- * will recognise.
- *
- * Use the mv88e6390 family model number
- * instead, for anything which really could be
- * a PHY,
- */
- if (!(val & 0x3f0))
- val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
+ /* Some internal PHYs don't have a model number. */
+ if (reg == MII_PHYSID2 && !(val & 0x3f0) &&
+ chip->info->family < ARRAY_SIZE(family_prod_id_table)) {
+ prod_id = family_prod_id_table[chip->info->family];
+ if (prod_id)
+ val |= prod_id >> 4;
}
return err ? err : val;
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 5dbdaf0f5f09..823a89354466 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -2913,9 +2913,35 @@ static struct phy_driver marvell_drivers[] = {
.get_stats = marvell_get_stats,
},
{
- .phy_id = MARVELL_PHY_ID_88E6390,
+ .phy_id = MARVELL_PHY_ID_88E6341_FAMILY,
.phy_id_mask = MARVELL_PHY_ID_MASK,
- .name = "Marvell 88E6390",
+ .name = "Marvell 88E6341 Family",
+ /* PHY_GBIT_FEATURES */
+ .flags = PHY_POLL_CABLE_TEST,
+ .probe = m88e1510_probe,
+ .config_init = marvell_config_init,
+ .config_aneg = m88e6390_config_aneg,
+ .read_status = marvell_read_status,
+ .ack_interrupt = marvell_ack_interrupt,
+ .config_intr = marvell_config_intr,
+ .did_interrupt = m88e1121_did_interrupt,
+ .resume = genphy_resume,
+ .suspend = genphy_suspend,
+ .read_page = marvell_read_page,
+ .write_page = marvell_write_page,
+ .get_sset_count = marvell_get_sset_count,
+ .get_strings = marvell_get_strings,
+ .get_stats = marvell_get_stats,
+ .get_tunable = m88e1540_get_tunable,
+ .set_tunable = m88e1540_set_tunable,
+ .cable_test_start = marvell_vct7_cable_test_start,
+ .cable_test_tdr_start = marvell_vct5_cable_test_tdr_start,
+ .cable_test_get_status = marvell_vct7_cable_test_get_status,
+ },
+ {
+ .phy_id = MARVELL_PHY_ID_88E6390_FAMILY,
+ .phy_id_mask = MARVELL_PHY_ID_MASK,
+ .name = "Marvell 88E6390 Family",
/* PHY_GBIT_FEATURES */
.flags = PHY_POLL_CABLE_TEST,
.probe = m88e6390_probe,
@@ -3001,7 +3027,8 @@ static struct mdio_device_id __maybe_unused marvell_tbl[] = {
{ MARVELL_PHY_ID_88E1540, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1545, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E3016, MARVELL_PHY_ID_MASK },
- { MARVELL_PHY_ID_88E6390, MARVELL_PHY_ID_MASK },
+ { MARVELL_PHY_ID_88E6341_FAMILY, MARVELL_PHY_ID_MASK },
+ { MARVELL_PHY_ID_88E6390_FAMILY, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1340S, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1548P, MARVELL_PHY_ID_MASK },
{ }
diff --git a/include/linux/marvell_phy.h b/include/linux/marvell_phy.h
index ff7b7607c8cf..f5cf19d19776 100644
--- a/include/linux/marvell_phy.h
+++ b/include/linux/marvell_phy.h
@@ -25,11 +25,12 @@
#define MARVELL_PHY_ID_88X3310 0x002b09a0
#define MARVELL_PHY_ID_88E2110 0x002b09b0
-/* The MV88e6390 Ethernet switch contains embedded PHYs. These PHYs do
+/* These Ethernet switch families contain embedded PHYs, but they do
* not have a model ID. So the switch driver traps reads to the ID2
* register and returns the switch family ID
*/
-#define MARVELL_PHY_ID_88E6390 0x01410f90
+#define MARVELL_PHY_ID_88E6341_FAMILY 0x01410f41
+#define MARVELL_PHY_ID_88E6390_FAMILY 0x01410f90
#define MARVELL_PHY_FAMILY_ID(id) ((id) >> 4)
--
2.20.1
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7fedb63a8307dda0ec3b8969a3b233a1dd7ea8e0 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Wed, 24 Mar 2021 10:38:26 +0100
Subject: [PATCH] bpf: Tighten speculative pointer arithmetic mask
This work tightens the offset mask we use for unprivileged pointer arithmetic
in order to mitigate a corner case reported by Piotr and Benedict where in
the speculative domain it is possible to advance, for example, the map value
pointer by up to value_size-1 out-of-bounds in order to leak kernel memory
via side-channel to user space.
Before this change, the computed ptr_limit for retrieve_ptr_limit() helper
represents largest valid distance when moving pointer to the right or left
which is then fed as aux->alu_limit to generate masking instructions against
the offset register. After the change, the derived aux->alu_limit represents
the largest potential value of the offset register which we mask against which
is just a narrower subset of the former limit.
For minimal complexity, we call sanitize_ptr_alu() from 2 observation points
in adjust_ptr_min_max_vals(), that is, before and after the simulated alu
operation. In the first step, we retieve the alu_state and alu_limit before
the operation as well as we branch-off a verifier path and push it to the
verification stack as we did before which checks the dst_reg under truncation,
in other words, when the speculative domain would attempt to move the pointer
out-of-bounds.
In the second step, we retrieve the new alu_limit and calculate the absolute
distance between both. Moreover, we commit the alu_state and final alu_limit
via update_alu_sanitation_state() to the env's instruction aux data, and bail
out from there if there is a mismatch due to coming from different verification
paths with different states.
Reported-by: Piotr Krysiuk <piotras(a)gmail.com>
Reported-by: Benedict Schlueter <benedict.schlueter(a)rub.de>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend(a)gmail.com>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
Tested-by: Benedict Schlueter <benedict.schlueter(a)rub.de>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e41b6326e3e6..0399ac092b36 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5871,7 +5871,7 @@ static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg,
bool off_is_neg = off_reg->smin_value < 0;
bool mask_to_left = (opcode == BPF_ADD && off_is_neg) ||
(opcode == BPF_SUB && !off_is_neg);
- u32 off, max = 0, ptr_limit = 0;
+ u32 max = 0, ptr_limit = 0;
if (!tnum_is_const(off_reg->var_off) &&
(off_reg->smin_value < 0) != (off_reg->smax_value < 0))
@@ -5880,26 +5880,18 @@ static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg,
switch (ptr_reg->type) {
case PTR_TO_STACK:
/* Offset 0 is out-of-bounds, but acceptable start for the
- * left direction, see BPF_REG_FP.
+ * left direction, see BPF_REG_FP. Also, unknown scalar
+ * offset where we would need to deal with min/max bounds is
+ * currently prohibited for unprivileged.
*/
max = MAX_BPF_STACK + mask_to_left;
- /* Indirect variable offset stack access is prohibited in
- * unprivileged mode so it's not handled here.
- */
- off = ptr_reg->off + ptr_reg->var_off.value;
- if (mask_to_left)
- ptr_limit = MAX_BPF_STACK + off;
- else
- ptr_limit = -off - 1;
+ ptr_limit = -(ptr_reg->var_off.value + ptr_reg->off);
break;
case PTR_TO_MAP_VALUE:
max = ptr_reg->map_ptr->value_size;
- if (mask_to_left) {
- ptr_limit = ptr_reg->umax_value + ptr_reg->off;
- } else {
- off = ptr_reg->smin_value + ptr_reg->off;
- ptr_limit = ptr_reg->map_ptr->value_size - off - 1;
- }
+ ptr_limit = (mask_to_left ?
+ ptr_reg->smin_value :
+ ptr_reg->umax_value) + ptr_reg->off;
break;
default:
return REASON_TYPE;
@@ -5954,10 +5946,12 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env,
struct bpf_insn *insn,
const struct bpf_reg_state *ptr_reg,
const struct bpf_reg_state *off_reg,
- struct bpf_reg_state *dst_reg)
+ struct bpf_reg_state *dst_reg,
+ struct bpf_insn_aux_data *tmp_aux,
+ const bool commit_window)
{
+ struct bpf_insn_aux_data *aux = commit_window ? cur_aux(env) : tmp_aux;
struct bpf_verifier_state *vstate = env->cur_state;
- struct bpf_insn_aux_data *aux = cur_aux(env);
bool off_is_neg = off_reg->smin_value < 0;
bool ptr_is_dst_reg = ptr_reg == dst_reg;
u8 opcode = BPF_OP(insn->code);
@@ -5976,18 +5970,33 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env,
if (vstate->speculative)
goto do_sim;
- alu_state = off_is_neg ? BPF_ALU_NEG_VALUE : 0;
- alu_state |= ptr_is_dst_reg ?
- BPF_ALU_SANITIZE_SRC : BPF_ALU_SANITIZE_DST;
-
err = retrieve_ptr_limit(ptr_reg, off_reg, &alu_limit, opcode);
if (err < 0)
return err;
+ if (commit_window) {
+ /* In commit phase we narrow the masking window based on
+ * the observed pointer move after the simulated operation.
+ */
+ alu_state = tmp_aux->alu_state;
+ alu_limit = abs(tmp_aux->alu_limit - alu_limit);
+ } else {
+ alu_state = off_is_neg ? BPF_ALU_NEG_VALUE : 0;
+ alu_state |= ptr_is_dst_reg ?
+ BPF_ALU_SANITIZE_SRC : BPF_ALU_SANITIZE_DST;
+ }
+
err = update_alu_sanitation_state(aux, alu_state, alu_limit);
if (err < 0)
return err;
do_sim:
+ /* If we're in commit phase, we're done here given we already
+ * pushed the truncated dst_reg into the speculative verification
+ * stack.
+ */
+ if (commit_window)
+ return 0;
+
/* Simulate and find potential out-of-bounds access under
* speculative execution from truncation as a result of
* masking when off was not within expected range. If off
@@ -6130,6 +6139,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
smin_ptr = ptr_reg->smin_value, smax_ptr = ptr_reg->smax_value;
u64 umin_val = off_reg->umin_value, umax_val = off_reg->umax_value,
umin_ptr = ptr_reg->umin_value, umax_ptr = ptr_reg->umax_value;
+ struct bpf_insn_aux_data tmp_aux = {};
u8 opcode = BPF_OP(insn->code);
u32 dst = insn->dst_reg;
int ret;
@@ -6196,12 +6206,15 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
/* pointer types do not carry 32-bit bounds at the moment. */
__mark_reg32_unbounded(dst_reg);
- switch (opcode) {
- case BPF_ADD:
- ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg);
+ if (sanitize_needed(opcode)) {
+ ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg,
+ &tmp_aux, false);
if (ret < 0)
return sanitize_err(env, insn, ret, off_reg, dst_reg);
+ }
+ switch (opcode) {
+ case BPF_ADD:
/* We can take a fixed offset as long as it doesn't overflow
* the s32 'off' field
*/
@@ -6252,10 +6265,6 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
}
break;
case BPF_SUB:
- ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg);
- if (ret < 0)
- return sanitize_err(env, insn, ret, off_reg, dst_reg);
-
if (dst_reg == off_reg) {
/* scalar -= pointer. Creates an unknown scalar */
verbose(env, "R%d tried to subtract pointer from scalar\n",
@@ -6338,6 +6347,12 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
if (sanitize_check_bounds(env, insn, dst_reg) < 0)
return -EACCES;
+ if (sanitize_needed(opcode)) {
+ ret = sanitize_ptr_alu(env, insn, dst_reg, off_reg, dst_reg,
+ &tmp_aux, true);
+ if (ret < 0)
+ return sanitize_err(env, insn, ret, off_reg, dst_reg);
+ }
return 0;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7fedb63a8307dda0ec3b8969a3b233a1dd7ea8e0 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Wed, 24 Mar 2021 10:38:26 +0100
Subject: [PATCH] bpf: Tighten speculative pointer arithmetic mask
This work tightens the offset mask we use for unprivileged pointer arithmetic
in order to mitigate a corner case reported by Piotr and Benedict where in
the speculative domain it is possible to advance, for example, the map value
pointer by up to value_size-1 out-of-bounds in order to leak kernel memory
via side-channel to user space.
Before this change, the computed ptr_limit for retrieve_ptr_limit() helper
represents largest valid distance when moving pointer to the right or left
which is then fed as aux->alu_limit to generate masking instructions against
the offset register. After the change, the derived aux->alu_limit represents
the largest potential value of the offset register which we mask against which
is just a narrower subset of the former limit.
For minimal complexity, we call sanitize_ptr_alu() from 2 observation points
in adjust_ptr_min_max_vals(), that is, before and after the simulated alu
operation. In the first step, we retieve the alu_state and alu_limit before
the operation as well as we branch-off a verifier path and push it to the
verification stack as we did before which checks the dst_reg under truncation,
in other words, when the speculative domain would attempt to move the pointer
out-of-bounds.
In the second step, we retrieve the new alu_limit and calculate the absolute
distance between both. Moreover, we commit the alu_state and final alu_limit
via update_alu_sanitation_state() to the env's instruction aux data, and bail
out from there if there is a mismatch due to coming from different verification
paths with different states.
Reported-by: Piotr Krysiuk <piotras(a)gmail.com>
Reported-by: Benedict Schlueter <benedict.schlueter(a)rub.de>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend(a)gmail.com>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
Tested-by: Benedict Schlueter <benedict.schlueter(a)rub.de>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e41b6326e3e6..0399ac092b36 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5871,7 +5871,7 @@ static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg,
bool off_is_neg = off_reg->smin_value < 0;
bool mask_to_left = (opcode == BPF_ADD && off_is_neg) ||
(opcode == BPF_SUB && !off_is_neg);
- u32 off, max = 0, ptr_limit = 0;
+ u32 max = 0, ptr_limit = 0;
if (!tnum_is_const(off_reg->var_off) &&
(off_reg->smin_value < 0) != (off_reg->smax_value < 0))
@@ -5880,26 +5880,18 @@ static int retrieve_ptr_limit(const struct bpf_reg_state *ptr_reg,
switch (ptr_reg->type) {
case PTR_TO_STACK:
/* Offset 0 is out-of-bounds, but acceptable start for the
- * left direction, see BPF_REG_FP.
+ * left direction, see BPF_REG_FP. Also, unknown scalar
+ * offset where we would need to deal with min/max bounds is
+ * currently prohibited for unprivileged.
*/
max = MAX_BPF_STACK + mask_to_left;
- /* Indirect variable offset stack access is prohibited in
- * unprivileged mode so it's not handled here.
- */
- off = ptr_reg->off + ptr_reg->var_off.value;
- if (mask_to_left)
- ptr_limit = MAX_BPF_STACK + off;
- else
- ptr_limit = -off - 1;
+ ptr_limit = -(ptr_reg->var_off.value + ptr_reg->off);
break;
case PTR_TO_MAP_VALUE:
max = ptr_reg->map_ptr->value_size;
- if (mask_to_left) {
- ptr_limit = ptr_reg->umax_value + ptr_reg->off;
- } else {
- off = ptr_reg->smin_value + ptr_reg->off;
- ptr_limit = ptr_reg->map_ptr->value_size - off - 1;
- }
+ ptr_limit = (mask_to_left ?
+ ptr_reg->smin_value :
+ ptr_reg->umax_value) + ptr_reg->off;
break;
default:
return REASON_TYPE;
@@ -5954,10 +5946,12 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env,
struct bpf_insn *insn,
const struct bpf_reg_state *ptr_reg,
const struct bpf_reg_state *off_reg,
- struct bpf_reg_state *dst_reg)
+ struct bpf_reg_state *dst_reg,
+ struct bpf_insn_aux_data *tmp_aux,
+ const bool commit_window)
{
+ struct bpf_insn_aux_data *aux = commit_window ? cur_aux(env) : tmp_aux;
struct bpf_verifier_state *vstate = env->cur_state;
- struct bpf_insn_aux_data *aux = cur_aux(env);
bool off_is_neg = off_reg->smin_value < 0;
bool ptr_is_dst_reg = ptr_reg == dst_reg;
u8 opcode = BPF_OP(insn->code);
@@ -5976,18 +5970,33 @@ static int sanitize_ptr_alu(struct bpf_verifier_env *env,
if (vstate->speculative)
goto do_sim;
- alu_state = off_is_neg ? BPF_ALU_NEG_VALUE : 0;
- alu_state |= ptr_is_dst_reg ?
- BPF_ALU_SANITIZE_SRC : BPF_ALU_SANITIZE_DST;
-
err = retrieve_ptr_limit(ptr_reg, off_reg, &alu_limit, opcode);
if (err < 0)
return err;
+ if (commit_window) {
+ /* In commit phase we narrow the masking window based on
+ * the observed pointer move after the simulated operation.
+ */
+ alu_state = tmp_aux->alu_state;
+ alu_limit = abs(tmp_aux->alu_limit - alu_limit);
+ } else {
+ alu_state = off_is_neg ? BPF_ALU_NEG_VALUE : 0;
+ alu_state |= ptr_is_dst_reg ?
+ BPF_ALU_SANITIZE_SRC : BPF_ALU_SANITIZE_DST;
+ }
+
err = update_alu_sanitation_state(aux, alu_state, alu_limit);
if (err < 0)
return err;
do_sim:
+ /* If we're in commit phase, we're done here given we already
+ * pushed the truncated dst_reg into the speculative verification
+ * stack.
+ */
+ if (commit_window)
+ return 0;
+
/* Simulate and find potential out-of-bounds access under
* speculative execution from truncation as a result of
* masking when off was not within expected range. If off
@@ -6130,6 +6139,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
smin_ptr = ptr_reg->smin_value, smax_ptr = ptr_reg->smax_value;
u64 umin_val = off_reg->umin_value, umax_val = off_reg->umax_value,
umin_ptr = ptr_reg->umin_value, umax_ptr = ptr_reg->umax_value;
+ struct bpf_insn_aux_data tmp_aux = {};
u8 opcode = BPF_OP(insn->code);
u32 dst = insn->dst_reg;
int ret;
@@ -6196,12 +6206,15 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
/* pointer types do not carry 32-bit bounds at the moment. */
__mark_reg32_unbounded(dst_reg);
- switch (opcode) {
- case BPF_ADD:
- ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg);
+ if (sanitize_needed(opcode)) {
+ ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg,
+ &tmp_aux, false);
if (ret < 0)
return sanitize_err(env, insn, ret, off_reg, dst_reg);
+ }
+ switch (opcode) {
+ case BPF_ADD:
/* We can take a fixed offset as long as it doesn't overflow
* the s32 'off' field
*/
@@ -6252,10 +6265,6 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
}
break;
case BPF_SUB:
- ret = sanitize_ptr_alu(env, insn, ptr_reg, off_reg, dst_reg);
- if (ret < 0)
- return sanitize_err(env, insn, ret, off_reg, dst_reg);
-
if (dst_reg == off_reg) {
/* scalar -= pointer. Creates an unknown scalar */
verbose(env, "R%d tried to subtract pointer from scalar\n",
@@ -6338,6 +6347,12 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
if (sanitize_check_bounds(env, insn, dst_reg) < 0)
return -EACCES;
+ if (sanitize_needed(opcode)) {
+ ret = sanitize_ptr_alu(env, insn, dst_reg, off_reg, dst_reg,
+ &tmp_aux, true);
+ if (ret < 0)
+ return sanitize_err(env, insn, ret, off_reg, dst_reg);
+ }
return 0;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 073815b756c51ba9d8384d924c5d1c03ca3d1ae4 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Tue, 23 Mar 2021 15:05:48 +0100
Subject: [PATCH] bpf: Refactor and streamline bounds check into helper
Move the bounds check in adjust_ptr_min_max_vals() into a small helper named
sanitize_check_bounds() in order to simplify the former a bit.
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend(a)gmail.com>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f378d4ae405f..db77e2c670b9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6075,6 +6075,37 @@ static int check_stack_access_for_ptr_arithmetic(
return 0;
}
+static int sanitize_check_bounds(struct bpf_verifier_env *env,
+ const struct bpf_insn *insn,
+ const struct bpf_reg_state *dst_reg)
+{
+ u32 dst = insn->dst_reg;
+
+ /* For unprivileged we require that resulting offset must be in bounds
+ * in order to be able to sanitize access later on.
+ */
+ if (env->bypass_spec_v1)
+ return 0;
+
+ switch (dst_reg->type) {
+ case PTR_TO_STACK:
+ if (check_stack_access_for_ptr_arithmetic(env, dst, dst_reg,
+ dst_reg->off + dst_reg->var_off.value))
+ return -EACCES;
+ break;
+ case PTR_TO_MAP_VALUE:
+ if (check_map_access(env, dst, dst_reg->off, 1, false)) {
+ verbose(env, "R%d pointer arithmetic of map value goes out of range, "
+ "prohibited for !root\n", dst);
+ return -EACCES;
+ }
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
/* Handles arithmetic on a pointer and a scalar: computes new min/max and var_off.
* Caller should also handle BPF_MOV case separately.
@@ -6300,22 +6331,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
__reg_deduce_bounds(dst_reg);
__reg_bound_offset(dst_reg);
- /* For unprivileged we require that resulting offset must be in bounds
- * in order to be able to sanitize access later on.
- */
- if (!env->bypass_spec_v1) {
- if (dst_reg->type == PTR_TO_MAP_VALUE &&
- check_map_access(env, dst, dst_reg->off, 1, false)) {
- verbose(env, "R%d pointer arithmetic of map value goes out of range, "
- "prohibited for !root\n", dst);
- return -EACCES;
- } else if (dst_reg->type == PTR_TO_STACK &&
- check_stack_access_for_ptr_arithmetic(
- env, dst, dst_reg, dst_reg->off +
- dst_reg->var_off.value)) {
- return -EACCES;
- }
- }
+ if (sanitize_check_bounds(env, insn, dst_reg) < 0)
+ return -EACCES;
return 0;
}
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 073815b756c51ba9d8384d924c5d1c03ca3d1ae4 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Tue, 23 Mar 2021 15:05:48 +0100
Subject: [PATCH] bpf: Refactor and streamline bounds check into helper
Move the bounds check in adjust_ptr_min_max_vals() into a small helper named
sanitize_check_bounds() in order to simplify the former a bit.
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend(a)gmail.com>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f378d4ae405f..db77e2c670b9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6075,6 +6075,37 @@ static int check_stack_access_for_ptr_arithmetic(
return 0;
}
+static int sanitize_check_bounds(struct bpf_verifier_env *env,
+ const struct bpf_insn *insn,
+ const struct bpf_reg_state *dst_reg)
+{
+ u32 dst = insn->dst_reg;
+
+ /* For unprivileged we require that resulting offset must be in bounds
+ * in order to be able to sanitize access later on.
+ */
+ if (env->bypass_spec_v1)
+ return 0;
+
+ switch (dst_reg->type) {
+ case PTR_TO_STACK:
+ if (check_stack_access_for_ptr_arithmetic(env, dst, dst_reg,
+ dst_reg->off + dst_reg->var_off.value))
+ return -EACCES;
+ break;
+ case PTR_TO_MAP_VALUE:
+ if (check_map_access(env, dst, dst_reg->off, 1, false)) {
+ verbose(env, "R%d pointer arithmetic of map value goes out of range, "
+ "prohibited for !root\n", dst);
+ return -EACCES;
+ }
+ break;
+ default:
+ break;
+ }
+
+ return 0;
+}
/* Handles arithmetic on a pointer and a scalar: computes new min/max and var_off.
* Caller should also handle BPF_MOV case separately.
@@ -6300,22 +6331,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
__reg_deduce_bounds(dst_reg);
__reg_bound_offset(dst_reg);
- /* For unprivileged we require that resulting offset must be in bounds
- * in order to be able to sanitize access later on.
- */
- if (!env->bypass_spec_v1) {
- if (dst_reg->type == PTR_TO_MAP_VALUE &&
- check_map_access(env, dst, dst_reg->off, 1, false)) {
- verbose(env, "R%d pointer arithmetic of map value goes out of range, "
- "prohibited for !root\n", dst);
- return -EACCES;
- } else if (dst_reg->type == PTR_TO_STACK &&
- check_stack_access_for_ptr_arithmetic(
- env, dst, dst_reg, dst_reg->off +
- dst_reg->var_off.value)) {
- return -EACCES;
- }
- }
+ if (sanitize_check_bounds(env, insn, dst_reg) < 0)
+ return -EACCES;
return 0;
}
The following commit has been merged into the x86/build branch of tip:
Commit-ID: 0ef3439cd80ba7770723edb0470d15815914bb62
Gitweb: https://git.kernel.org/tip/0ef3439cd80ba7770723edb0470d15815914bb62
Author: Maciej W. Rozycki <macro(a)orcam.me.uk>
AuthorDate: Wed, 14 Apr 2021 12:38:28 +02:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Mon, 19 Apr 2021 14:02:12 +02:00
x86/build: Disable HIGHMEM64G selection for M486SX
Fix a regression caused by making the 486SX separately selectable in
Kconfig, for which the HIGHMEM64G setting has not been updated and
therefore has become exposed as a user-selectable option for the M486SX
configuration setting unlike with original M486 and all the other
settings that choose non-PAE-enabled processors:
High Memory Support
> 1. off (NOHIGHMEM)
2. 4GB (HIGHMEM4G)
3. 64GB (HIGHMEM64G)
choice[1-3?]:
With the fix in place the setting is now correctly removed:
High Memory Support
> 1. off (NOHIGHMEM)
2. 4GB (HIGHMEM4G)
choice[1-2?]:
[ bp: Massage commit message. ]
Fixes: 87d6021b8143 ("x86/math-emu: Limit MATH_EMULATION to 486SX compatibles")
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org # v5.5+
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.2104141221340.44318@angie.orcam.m…
---
arch/x86/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879..268b7d5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1406,7 +1406,7 @@ config HIGHMEM4G
config HIGHMEM64G
bool "64GB"
- depends on !M486 && !M586 && !M586TSC && !M586MMX && !MGEODE_LX && !MGEODEGX1 && !MCYRIXIII && !MELAN && !MWINCHIPC6 && !WINCHIP3D && !MK6
+ depends on !M486SX && !M486 && !M586 && !M586TSC && !M586MMX && !MGEODE_LX && !MGEODEGX1 && !MCYRIXIII && !MELAN && !MWINCHIPC6 && !WINCHIP3D && !MK6
select X86_PAE
help
Select this if you have a 32-bit processor and more than 4
commit 2decad92f4731fac9755a083fcfefa66edb7d67d upstream.
The entry from EL0 code checks the TFSRE0_EL1 register for any
asynchronous tag check faults in user space and sets the
TIF_MTE_ASYNC_FAULT flag. This is not done atomically, potentially
racing with another CPU calling set_tsk_thread_flag().
Replace the non-atomic ORR+STR with an STSET instruction. While STSET
requires ARMv8.1 and an assembler that understands LSE atomics, the MTE
feature is part of ARMv8.5 and already requires an updated assembler.
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Fixes: 637ec831ea4f ("arm64: mte: Handle synchronous and asynchronous tag check faults")
Cc: <stable(a)vger.kernel.org> # 5.10.x
Reported-by: Will Deacon <will(a)kernel.org>
Cc: Will Deacon <will(a)kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Link: https://lore.kernel.org/r/20210409173710.18582-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
---
arch/arm64/Kconfig | 6 +++++-
arch/arm64/kernel/entry.S | 10 ++++++----
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c1be64228327..5e5cf3af6351 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1390,10 +1390,13 @@ config ARM64_PAN
The feature is detected at runtime, and will remain as a 'nop'
instruction if the cpu does not implement the feature.
+config AS_HAS_LSE_ATOMICS
+ def_bool $(as-instr,.arch_extension lse)
+
config ARM64_LSE_ATOMICS
bool
default ARM64_USE_LSE_ATOMICS
- depends on $(as-instr,.arch_extension lse)
+ depends on AS_HAS_LSE_ATOMICS
config ARM64_USE_LSE_ATOMICS
bool "Atomic instructions"
@@ -1667,6 +1670,7 @@ config ARM64_MTE
bool "Memory Tagging Extension support"
default y
depends on ARM64_AS_HAS_MTE && ARM64_TAGGED_ADDR_ABI
+ depends on AS_HAS_LSE_ATOMICS
select ARCH_USES_HIGH_VMA_FLAGS
help
Memory Tagging (part of the ARMv8.5 Extensions) provides
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index d72c818b019c..2da82c139e1c 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -148,16 +148,18 @@ alternative_cb_end
.endm
/* Check for MTE asynchronous tag check faults */
- .macro check_mte_async_tcf, flgs, tmp
+ .macro check_mte_async_tcf, tmp, ti_flags
#ifdef CONFIG_ARM64_MTE
+ .arch_extension lse
alternative_if_not ARM64_MTE
b 1f
alternative_else_nop_endif
mrs_s \tmp, SYS_TFSRE0_EL1
tbz \tmp, #SYS_TFSR_EL1_TF0_SHIFT, 1f
/* Asynchronous TCF occurred for TTBR0 access, set the TI flag */
- orr \flgs, \flgs, #_TIF_MTE_ASYNC_FAULT
- str \flgs, [tsk, #TSK_TI_FLAGS]
+ mov \tmp, #_TIF_MTE_ASYNC_FAULT
+ add \ti_flags, tsk, #TSK_TI_FLAGS
+ stset \tmp, [\ti_flags]
msr_s SYS_TFSRE0_EL1, xzr
1:
#endif
@@ -207,7 +209,7 @@ alternative_else_nop_endif
disable_step_tsk x19, x20
/* Check for asynchronous tag check faults in user space */
- check_mte_async_tcf x19, x22
+ check_mte_async_tcf x22, x23
apply_ssbd 1, x22, x23
ptrauth_keys_install_kernel tsk, x20, x22, x23
A few archs like powerpc have different errno.h values for macros
EDEADLOCK and EDEADLK. In code including both libc and linux versions of
errno.h, this can result in multiple definitions of EDEADLOCK in the
include chain. Definitions to the same value (e.g. seen with mips) do
not raise warnings, but on powerpc there are redefinitions changing the
value, which raise warnings and errors (if using "-Werror").
Guard against these redefinitions to avoid build errors like the following,
first seen cross-compiling libbpf v5.8.9 for powerpc using GCC 8.4.0 with
musl 1.1.24:
In file included from ../../arch/powerpc/include/uapi/asm/errno.h:5,
from ../../include/linux/err.h:8,
from libbpf.c:29:
../../include/uapi/asm-generic/errno.h:40: error: "EDEADLOCK" redefined [-Werror]
#define EDEADLOCK EDEADLK
In file included from toolchain-powerpc_8540_gcc-8.4.0_musl/include/errno.h:10,
from libbpf.c:26:
toolchain-powerpc_8540_gcc-8.4.0_musl/include/bits/errno.h:58: note: this is the location of the previous definition
#define EDEADLOCK 58
cc1: all warnings being treated as errors
CC: Stable <stable(a)vger.kernel.org>
Reported-by: Rosen Penev <rosenp(a)gmail.com>
Signed-off-by: Tony Ambardar <Tony.Ambardar(a)gmail.com>
---
v1 -> v2:
* clean up commit description formatting
v2 -> v3: (per Michael Ellerman)
* drop indeterminate 'Fixes' tags, request stable backports instead
---
arch/powerpc/include/uapi/asm/errno.h | 1 +
tools/arch/powerpc/include/uapi/asm/errno.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/powerpc/include/uapi/asm/errno.h b/arch/powerpc/include/uapi/asm/errno.h
index cc79856896a1..4ba87de32be0 100644
--- a/arch/powerpc/include/uapi/asm/errno.h
+++ b/arch/powerpc/include/uapi/asm/errno.h
@@ -2,6 +2,7 @@
#ifndef _ASM_POWERPC_ERRNO_H
#define _ASM_POWERPC_ERRNO_H
+#undef EDEADLOCK
#include <asm-generic/errno.h>
#undef EDEADLOCK
diff --git a/tools/arch/powerpc/include/uapi/asm/errno.h b/tools/arch/powerpc/include/uapi/asm/errno.h
index cc79856896a1..4ba87de32be0 100644
--- a/tools/arch/powerpc/include/uapi/asm/errno.h
+++ b/tools/arch/powerpc/include/uapi/asm/errno.h
@@ -2,6 +2,7 @@
#ifndef _ASM_POWERPC_ERRNO_H
#define _ASM_POWERPC_ERRNO_H
+#undef EDEADLOCK
#include <asm-generic/errno.h>
#undef EDEADLOCK
--
2.25.1
During the EEH MMIO error checking, the current implementation fails to map
the (virtual) MMIO address back to the pci device on radix with hugepage
mappings for I/O. This results into failure to dispatch EEH event with no
recovery even when EEH capability has been enabled on the device.
eeh_check_failure(token) # token = virtual MMIO address
addr = eeh_token_to_phys(token);
edev = eeh_addr_cache_get_dev(addr);
if (!edev)
return 0;
eeh_dev_check_failure(edev); <= Dispatch the EEH event
In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys
translation that results in wrong physical address, which is then passed to
eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges
to get to a PCI device. Hence, it fails to find a match and the EEH event
never gets dispatched leaving the device in failed state.
The commit 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space")
introduced following logic to translate virt to phys for hugepage mappings:
eeh_token_to_phys():
+ pa = pte_pfn(*ptep);
+
+ /* On radix we can do hugepage mappings for io, so handle that */
+ if (hugepage_shift) {
+ pa <<= hugepage_shift; <= This is wrong
+ pa |= token & ((1ul << hugepage_shift) - 1);
+ }
This patch fixes the virt -> phys translation in eeh_token_to_phys()
function.
$ cat /sys/kernel/debug/powerpc/eeh_address_cache
mem addr range [0x0000040080000000-0x00000400807fffff]: 0030:01:00.1
mem addr range [0x0000040080800000-0x0000040080ffffff]: 0030:01:00.1
mem addr range [0x0000040081000000-0x00000400817fffff]: 0030:01:00.0
mem addr range [0x0000040081800000-0x0000040081ffffff]: 0030:01:00.0
mem addr range [0x0000040082000000-0x000004008207ffff]: 0030:01:00.1
mem addr range [0x0000040082080000-0x00000400820fffff]: 0030:01:00.0
mem addr range [0x0000040082100000-0x000004008210ffff]: 0030:01:00.1
mem addr range [0x0000040082110000-0x000004008211ffff]: 0030:01:00.0
Above is the list of cached io address ranges of pci 0030:01:00.<fn>.
Before this patch:
Tracing 'arg1' of function eeh_addr_cache_get_dev() during error injection
clearly shows that 'addr=' contains wrong physical address:
kworker/u16:0-7 [001] .... 108.883775: eeh_addr_cache_get_dev:
(eeh_addr_cache_get_dev+0xc/0xf0) addr=0x80103000a510
dmesg shows no EEH recovery messages:
[ 108.563768] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x9ae) != mcp_pulse (0x7fff)
[ 108.563788] bnx2x: [bnx2x_hw_stats_update:870(eth2)]NIG timer max (4294967295)
[ 108.883788] bnx2x: [bnx2x_acquire_hw_lock:2013(eth1)]lock_status 0xffffffff resource_bit 0x1
[ 108.884407] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout
[ 108.884976] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout
<..>
After this patch:
eeh_addr_cache_get_dev() trace shows correct physical address:
<idle>-0 [001] ..s. 1043.123828: eeh_addr_cache_get_dev:
(eeh_addr_cache_get_dev+0xc/0xf0) addr=0x40080bc7cd8
dmesg logs shows EEH recovery getting triggerred:
[ 964.323980] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x746f) != mcp_pulse (0x7fff)
[ 964.323991] EEH: Recovering PHB#30-PE#10000
[ 964.324002] EEH: PE location: N/A, PHB location: N/A
[ 964.324006] EEH: Frozen PHB#30-PE#10000 detected
<..>
Cc: stable(a)vger.kernel.org
Fixes: 33439620680be ("powerpc/eeh: Handle hugepages in ioremap space")
Signed-off-by: Mahesh Salgaonkar <mahesh(a)linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Dominic DeMarco <ddemarc(a)us.ibm.com>
---
Change in v2:
- Fixed error reported by checkpatch.pl.
---
arch/powerpc/kernel/eeh.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index cd60bc1c87011..7040e430a1249 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -362,14 +362,11 @@ static inline unsigned long eeh_token_to_phys(unsigned long token)
pa = pte_pfn(*ptep);
/* On radix we can do hugepage mappings for io, so handle that */
- if (hugepage_shift) {
- pa <<= hugepage_shift;
- pa |= token & ((1ul << hugepage_shift) - 1);
- } else {
- pa <<= PAGE_SHIFT;
- pa |= token & (PAGE_SIZE - 1);
- }
+ if (!hugepage_shift)
+ hugepage_shift = PAGE_SHIFT;
+ pa <<= PAGE_SHIFT;
+ pa |= token & ((1ul << hugepage_shift) - 1);
return pa;
}
Since commit 511157ab641e ("powerpc/vdso: Move vdso datapage up front")
VVAR page is in front of the VDSO area. In result it breaks CRIU
(Checkpoint Restore In Userspace) [1], where CRIU expects that "[vdso]"
from /proc/../maps points at ELF/vdso image, rather than at VVAR data page.
Laurent made a patch to keep CRIU working (by reading aux vector).
But I think it still makes sence to separate two mappings into different
VMAs. It will also make ppc64 less "special" for userspace and as
a side-bonus will make VVAR page un-writable by debugger (which previously
would COW page and can be unexpected).
I opportunistically Cc stable on it: I understand that usually such
stuff isn't a stable material, but that will allow us in CRIU have
one workaround less that is needed just for one release (v5.11) on
one platform (ppc64), which we otherwise have to maintain.
I wouldn't go as far as to say that the commit 511157ab641e is ABI
regression as no other userspace got broken, but I'd really appreciate
if it gets backported to v5.11 after v5.12 is released, so as not
to complicate already non-simple CRIU-vdso code. Thanks!
Cc: Andrei Vagin <avagin(a)gmail.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Laurent Dufour <ldufour(a)linux.ibm.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Paul Mackerras <paulus(a)samba.org>
Cc: linuxppc-dev(a)lists.ozlabs.org
Cc: stable(a)vger.kernel.org # v5.11
[1]: https://github.com/checkpoint-restore/criu/issues/1417
Signed-off-by: Dmitry Safonov <dima(a)arista.com>
Tested-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
---
arch/powerpc/include/asm/mmu_context.h | 2 +-
arch/powerpc/kernel/vdso.c | 54 +++++++++++++++++++-------
2 files changed, 40 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 652ce85f9410..4bc45d3ed8b0 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -263,7 +263,7 @@ extern void arch_exit_mmap(struct mm_struct *mm);
static inline void arch_unmap(struct mm_struct *mm,
unsigned long start, unsigned long end)
{
- unsigned long vdso_base = (unsigned long)mm->context.vdso - PAGE_SIZE;
+ unsigned long vdso_base = (unsigned long)mm->context.vdso;
if (start <= vdso_base && vdso_base < end)
mm->context.vdso = NULL;
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index e839a906fdf2..b14907209822 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -55,10 +55,10 @@ static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struc
{
unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
- if (new_size != text_size + PAGE_SIZE)
+ if (new_size != text_size)
return -EINVAL;
- current->mm->context.vdso = (void __user *)new_vma->vm_start + PAGE_SIZE;
+ current->mm->context.vdso = (void __user *)new_vma->vm_start;
return 0;
}
@@ -73,6 +73,10 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
}
+static struct vm_special_mapping vvar_spec __ro_after_init = {
+ .name = "[vvar]",
+};
+
static struct vm_special_mapping vdso32_spec __ro_after_init = {
.name = "[vdso]",
.mremap = vdso32_mremap,
@@ -89,11 +93,11 @@ static struct vm_special_mapping vdso64_spec __ro_after_init = {
*/
static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
{
- struct mm_struct *mm = current->mm;
+ unsigned long vdso_size, vdso_base, mappings_size;
struct vm_special_mapping *vdso_spec;
+ unsigned long vvar_size = PAGE_SIZE;
+ struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- unsigned long vdso_size;
- unsigned long vdso_base;
if (is_32bit_task()) {
vdso_spec = &vdso32_spec;
@@ -110,8 +114,8 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
vdso_base = 0;
}
- /* Add a page to the vdso size for the data page */
- vdso_size += PAGE_SIZE;
+ mappings_size = vdso_size + vvar_size;
+ mappings_size += (VDSO_ALIGNMENT - 1) & PAGE_MASK;
/*
* pick a base address for the vDSO in process space. We try to put it
@@ -119,9 +123,7 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
* and end up putting it elsewhere.
* Add enough to the size so that the result can be aligned.
*/
- vdso_base = get_unmapped_area(NULL, vdso_base,
- vdso_size + ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
- 0, 0);
+ vdso_base = get_unmapped_area(NULL, vdso_base, mappings_size, 0, 0);
if (IS_ERR_VALUE(vdso_base))
return vdso_base;
@@ -133,7 +135,13 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
* install_special_mapping or the perf counter mmap tracking code
* will fail to recognise it as a vDSO.
*/
- mm->context.vdso = (void __user *)vdso_base + PAGE_SIZE;
+ mm->context.vdso = (void __user *)vdso_base + vvar_size;
+
+ vma = _install_special_mapping(mm, vdso_base, vvar_size,
+ VM_READ | VM_MAYREAD | VM_IO |
+ VM_DONTDUMP | VM_PFNMAP, &vvar_spec);
+ if (IS_ERR(vma))
+ return PTR_ERR(vma);
/*
* our vma flags don't have VM_WRITE so by default, the process isn't
@@ -145,9 +153,12 @@ static int __arch_setup_additional_pages(struct linux_binprm *bprm, int uses_int
* It's fine to use that for setting breakpoints in the vDSO code
* pages though.
*/
- vma = _install_special_mapping(mm, vdso_base, vdso_size,
+ vma = _install_special_mapping(mm, vdso_base + vvar_size, vdso_size,
VM_READ | VM_EXEC | VM_MAYREAD |
VM_MAYWRITE | VM_MAYEXEC, vdso_spec);
+ if (IS_ERR(vma))
+ do_munmap(mm, vdso_base, vvar_size, NULL);
+
return PTR_ERR_OR_ZERO(vma);
}
@@ -249,11 +260,22 @@ static struct page ** __init vdso_setup_pages(void *start, void *end)
if (!pagelist)
panic("%s: Cannot allocate page list for VDSO", __func__);
- pagelist[0] = virt_to_page(vdso_data);
-
for (i = 0; i < pages; i++)
- pagelist[i + 1] = virt_to_page(start + i * PAGE_SIZE);
+ pagelist[i] = virt_to_page(start + i * PAGE_SIZE);
+
+ return pagelist;
+}
+
+static struct page ** __init vvar_setup_pages(void)
+{
+ struct page **pagelist;
+ /* .pages is NULL-terminated */
+ pagelist = kcalloc(2, sizeof(struct page *), GFP_KERNEL);
+ if (!pagelist)
+ panic("%s: Cannot allocate page list for VVAR", __func__);
+
+ pagelist[0] = virt_to_page(vdso_data);
return pagelist;
}
@@ -295,6 +317,8 @@ static int __init vdso_init(void)
if (IS_ENABLED(CONFIG_PPC64))
vdso64_spec.pages = vdso_setup_pages(&vdso64_start, &vdso64_end);
+ vvar_spec.pages = vvar_setup_pages();
+
smp_wmb();
return 0;
--
2.31.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 453a77894efa4d9b6ef9644d74b9419c47ac427c Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1(a)gmail.com>
Date: Wed, 14 Apr 2021 10:47:10 +0200
Subject: [PATCH] r8169: don't advertise pause in jumbo mode
It has been reported [0] that using pause frames in jumbo mode impacts
performance. There's no available chip documentation, but vendor
drivers r8168 and r8125 don't advertise pause in jumbo mode. So let's
do the same, according to Roman it fixes the issue.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=212617
Fixes: 9cf9b84cc701 ("r8169: make use of phy_set_asym_pause")
Reported-by: Roman Mamedov <rm+bko(a)romanrm.net>
Tested-by: Roman Mamedov <rm+bko(a)romanrm.net>
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 581a92fc3292..1df2c002c9f6 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2350,6 +2350,13 @@ static void rtl_jumbo_config(struct rtl8169_private *tp)
if (pci_is_pcie(tp->pci_dev) && tp->supports_gmii)
pcie_set_readrq(tp->pci_dev, readrq);
+
+ /* Chip doesn't support pause in jumbo mode */
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+ tp->phydev->advertising, !jumbo);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+ tp->phydev->advertising, !jumbo);
+ phy_start_aneg(tp->phydev);
}
DECLARE_RTL_COND(rtl_chipcmd_cond)
@@ -4630,8 +4637,6 @@ static int r8169_phy_connect(struct rtl8169_private *tp)
if (!tp->supports_gmii)
phy_set_max_speed(phydev, SPEED_100);
- phy_support_asym_pause(phydev);
-
phy_attached_info(phydev);
return 0;
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04c4f2ee3f68c9a4bf1653d15f1a9a435ae33f7a Mon Sep 17 00:00:00 2001
From: Reiji Watanabe <reijiw(a)google.com>
Date: Tue, 13 Apr 2021 15:47:40 +0000
Subject: [PATCH] KVM: VMX: Don't use vcpu->run->internal.ndata as an array
index
__vmx_handle_exit() uses vcpu->run->internal.ndata as an index for
an array access. Since vcpu->run is (can be) mapped to a user address
space with a writer permission, the 'ndata' could be updated by the
user process at anytime (the user process can set it to outside the
bounds of the array).
So, it is not safe that __vmx_handle_exit() uses the 'ndata' that way.
Fixes: 1aa561b1a4c0 ("kvm: x86: Add "last CPU" to some KVM_EXIT information")
Signed-off-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20210413154739.490299-1-reijiw(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 32cf8287d4a7..29b40e092d13 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6027,19 +6027,19 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
exit_reason.basic != EXIT_REASON_PML_FULL &&
exit_reason.basic != EXIT_REASON_APIC_ACCESS &&
exit_reason.basic != EXIT_REASON_TASK_SWITCH)) {
+ int ndata = 3;
+
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV;
- vcpu->run->internal.ndata = 3;
vcpu->run->internal.data[0] = vectoring_info;
vcpu->run->internal.data[1] = exit_reason.full;
vcpu->run->internal.data[2] = vcpu->arch.exit_qualification;
if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG) {
- vcpu->run->internal.ndata++;
- vcpu->run->internal.data[3] =
+ vcpu->run->internal.data[ndata++] =
vmcs_read64(GUEST_PHYSICAL_ADDRESS);
}
- vcpu->run->internal.data[vcpu->run->internal.ndata++] =
- vcpu->arch.last_vmentry_cpu;
+ vcpu->run->internal.data[ndata++] = vcpu->arch.last_vmentry_cpu;
+ vcpu->run->internal.ndata = ndata;
return 0;
}
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 505cfd43c441 - riscv: Fix spelling mistake "SPARSEMEM" to "SPARSMEM"
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ✅ Storage: lvm device-mapper test
🚧 ✅ stress: stress-ng
Host 2:
✅ Boot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
ppc64le:
Host 1:
✅ Boot test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ✅ Storage: lvm device-mapper test
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ LTP
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ LTP
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
x86_64:
Host 1:
✅ Boot test
✅ ACPI table test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: sanity smoke test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ xfstests - nfsv4.2
✅ selinux-policy: serge-testsuite
✅ power-management: cpupower/sanity test
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ CPU: Idle Test
🚧 ❌ xfstests - btrfs
🚧 ✅ xfstests - cifsv3.11
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ✅ Storage: lvm device-mapper test
🚧 ⚡⚡⚡ stress: stress-ng
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 0d3f85d5cbb1 - gro: ensure frag0 meets IP header alignment
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test
🚧 ⚡⚡⚡ stress: stress-ng
Host 2:
✅ Boot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
ppc64le:
Host 1:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ❌ Storage: lvm device-mapper test
Host 2:
✅ Boot test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ LTP
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ LTP
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
x86_64:
Host 1:
✅ Boot test
✅ ACPI table test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: sanity smoke test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ xfstests - nfsv4.2
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ❌ xfstests - btrfs
🚧 ✅ xfstests - cifsv3.11
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ✅ Storage: lvm device-mapper test
🚧 ✅ stress: stress-ng
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 453a77894efa4d9b6ef9644d74b9419c47ac427c Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1(a)gmail.com>
Date: Wed, 14 Apr 2021 10:47:10 +0200
Subject: [PATCH] r8169: don't advertise pause in jumbo mode
It has been reported [0] that using pause frames in jumbo mode impacts
performance. There's no available chip documentation, but vendor
drivers r8168 and r8125 don't advertise pause in jumbo mode. So let's
do the same, according to Roman it fixes the issue.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=212617
Fixes: 9cf9b84cc701 ("r8169: make use of phy_set_asym_pause")
Reported-by: Roman Mamedov <rm+bko(a)romanrm.net>
Tested-by: Roman Mamedov <rm+bko(a)romanrm.net>
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 581a92fc3292..1df2c002c9f6 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2350,6 +2350,13 @@ static void rtl_jumbo_config(struct rtl8169_private *tp)
if (pci_is_pcie(tp->pci_dev) && tp->supports_gmii)
pcie_set_readrq(tp->pci_dev, readrq);
+
+ /* Chip doesn't support pause in jumbo mode */
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+ tp->phydev->advertising, !jumbo);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+ tp->phydev->advertising, !jumbo);
+ phy_start_aneg(tp->phydev);
}
DECLARE_RTL_COND(rtl_chipcmd_cond)
@@ -4630,8 +4637,6 @@ static int r8169_phy_connect(struct rtl8169_private *tp)
if (!tp->supports_gmii)
phy_set_max_speed(phydev, SPEED_100);
- phy_support_asym_pause(phydev);
-
phy_attached_info(phydev);
return 0;
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 453a77894efa4d9b6ef9644d74b9419c47ac427c Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1(a)gmail.com>
Date: Wed, 14 Apr 2021 10:47:10 +0200
Subject: [PATCH] r8169: don't advertise pause in jumbo mode
It has been reported [0] that using pause frames in jumbo mode impacts
performance. There's no available chip documentation, but vendor
drivers r8168 and r8125 don't advertise pause in jumbo mode. So let's
do the same, according to Roman it fixes the issue.
[0] https://bugzilla.kernel.org/show_bug.cgi?id=212617
Fixes: 9cf9b84cc701 ("r8169: make use of phy_set_asym_pause")
Reported-by: Roman Mamedov <rm+bko(a)romanrm.net>
Tested-by: Roman Mamedov <rm+bko(a)romanrm.net>
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 581a92fc3292..1df2c002c9f6 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2350,6 +2350,13 @@ static void rtl_jumbo_config(struct rtl8169_private *tp)
if (pci_is_pcie(tp->pci_dev) && tp->supports_gmii)
pcie_set_readrq(tp->pci_dev, readrq);
+
+ /* Chip doesn't support pause in jumbo mode */
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_Pause_BIT,
+ tp->phydev->advertising, !jumbo);
+ linkmode_mod_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
+ tp->phydev->advertising, !jumbo);
+ phy_start_aneg(tp->phydev);
}
DECLARE_RTL_COND(rtl_chipcmd_cond)
@@ -4630,8 +4637,6 @@ static int r8169_phy_connect(struct rtl8169_private *tp)
if (!tp->supports_gmii)
phy_set_max_speed(phydev, SPEED_100);
- phy_support_asym_pause(phydev);
-
phy_attached_info(phydev);
return 0;
SQPOLL task won't submit requests for a context that is currently dying,
so no need to remove ctx from sqd_list prior the main loop of
io_ring_exit_work(). Kill it, will be removed by io_sq_thread_finish()
and only brings confusion and lockups.
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
---
fs/io_uring.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e02a18cd287f..fb41725204f0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6744,6 +6744,10 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
if (!list_empty(&ctx->iopoll_list))
io_do_iopoll(ctx, &nr_events, 0);
+ /*
+ * Don't submit if refs are dying, good for io_uring_register(),
+ * but also it is relied upon by io_ring_exit_work()
+ */
if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) &&
!(ctx->flags & IORING_SETUP_R_DISABLED))
ret = io_submit_sqes(ctx, to_submit);
@@ -8537,14 +8541,6 @@ static void io_ring_exit_work(struct work_struct *work)
struct io_tctx_node *node;
int ret;
- /* prevent SQPOLL from submitting new requests */
- if (ctx->sq_data) {
- io_sq_thread_park(ctx->sq_data);
- list_del_init(&ctx->sqd_list);
- io_sqd_update_thread_idle(ctx->sq_data);
- io_sq_thread_unpark(ctx->sq_data);
- }
-
/*
* If we're doing polled IO and end up having requests being
* submitted async (out-of-line), then completions can come in while
--
2.31.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1fe976d308acb6374c899a4ee8025a0a016e453e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Mon, 12 Apr 2021 18:57:39 +0200
Subject: [PATCH] net: phy: marvell: fix detection of PHY on Topaz switches
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit fee2d546414d ("net: phy: marvell: mv88e6390 temperature
sensor reading"), Linux reports the temperature of Topaz hwmon as
constant -75°C.
This is because switches from the Topaz family (88E6141 / 88E6341) have
the address of the temperature sensor register different from Peridot.
This address is instead compatible with 88E1510 PHYs, as was used for
Topaz before the above mentioned commit.
Create a new mapping table between switch family and PHY ID for families
which don't have a model number. And define PHY IDs for Topaz and Peridot
families.
Create a new PHY ID and a new PHY driver for Topaz's internal PHY.
The only difference from Peridot's PHY driver is the HWMON probing
method.
Prior this change Topaz's internal PHY is detected by kernel as:
PHY [...] driver [Marvell 88E6390] (irq=63)
And afterwards as:
PHY [...] driver [Marvell 88E6341 Family] (irq=63)
Signed-off-by: Pali Rohár <pali(a)kernel.org>
BugLink: https://github.com/globalscaletechnologies/linux/issues/1
Fixes: fee2d546414d ("net: phy: marvell: mv88e6390 temperature sensor reading")
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 903d619e08ed..e08bf9377140 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3026,10 +3026,17 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
return err;
}
+/* prod_id for switch families which do not have a PHY model number */
+static const u16 family_prod_id_table[] = {
+ [MV88E6XXX_FAMILY_6341] = MV88E6XXX_PORT_SWITCH_ID_PROD_6341,
+ [MV88E6XXX_FAMILY_6390] = MV88E6XXX_PORT_SWITCH_ID_PROD_6390,
+};
+
static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
{
struct mv88e6xxx_mdio_bus *mdio_bus = bus->priv;
struct mv88e6xxx_chip *chip = mdio_bus->chip;
+ u16 prod_id;
u16 val;
int err;
@@ -3040,23 +3047,12 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
err = chip->info->ops->phy_read(chip, bus, phy, reg, &val);
mv88e6xxx_reg_unlock(chip);
- if (reg == MII_PHYSID2) {
- /* Some internal PHYs don't have a model number. */
- if (chip->info->family != MV88E6XXX_FAMILY_6165)
- /* Then there is the 6165 family. It gets is
- * PHYs correct. But it can also have two
- * SERDES interfaces in the PHY address
- * space. And these don't have a model
- * number. But they are not PHYs, so we don't
- * want to give them something a PHY driver
- * will recognise.
- *
- * Use the mv88e6390 family model number
- * instead, for anything which really could be
- * a PHY,
- */
- if (!(val & 0x3f0))
- val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
+ /* Some internal PHYs don't have a model number. */
+ if (reg == MII_PHYSID2 && !(val & 0x3f0) &&
+ chip->info->family < ARRAY_SIZE(family_prod_id_table)) {
+ prod_id = family_prod_id_table[chip->info->family];
+ if (prod_id)
+ val |= prod_id >> 4;
}
return err ? err : val;
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index e26a5d663f8a..8018ddf7f316 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -3021,9 +3021,34 @@ static struct phy_driver marvell_drivers[] = {
.get_stats = marvell_get_stats,
},
{
- .phy_id = MARVELL_PHY_ID_88E6390,
+ .phy_id = MARVELL_PHY_ID_88E6341_FAMILY,
.phy_id_mask = MARVELL_PHY_ID_MASK,
- .name = "Marvell 88E6390",
+ .name = "Marvell 88E6341 Family",
+ /* PHY_GBIT_FEATURES */
+ .flags = PHY_POLL_CABLE_TEST,
+ .probe = m88e1510_probe,
+ .config_init = marvell_config_init,
+ .config_aneg = m88e6390_config_aneg,
+ .read_status = marvell_read_status,
+ .config_intr = marvell_config_intr,
+ .handle_interrupt = marvell_handle_interrupt,
+ .resume = genphy_resume,
+ .suspend = genphy_suspend,
+ .read_page = marvell_read_page,
+ .write_page = marvell_write_page,
+ .get_sset_count = marvell_get_sset_count,
+ .get_strings = marvell_get_strings,
+ .get_stats = marvell_get_stats,
+ .get_tunable = m88e1540_get_tunable,
+ .set_tunable = m88e1540_set_tunable,
+ .cable_test_start = marvell_vct7_cable_test_start,
+ .cable_test_tdr_start = marvell_vct5_cable_test_tdr_start,
+ .cable_test_get_status = marvell_vct7_cable_test_get_status,
+ },
+ {
+ .phy_id = MARVELL_PHY_ID_88E6390_FAMILY,
+ .phy_id_mask = MARVELL_PHY_ID_MASK,
+ .name = "Marvell 88E6390 Family",
/* PHY_GBIT_FEATURES */
.flags = PHY_POLL_CABLE_TEST,
.probe = m88e6390_probe,
@@ -3107,7 +3132,8 @@ static struct mdio_device_id __maybe_unused marvell_tbl[] = {
{ MARVELL_PHY_ID_88E1540, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1545, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E3016, MARVELL_PHY_ID_MASK },
- { MARVELL_PHY_ID_88E6390, MARVELL_PHY_ID_MASK },
+ { MARVELL_PHY_ID_88E6341_FAMILY, MARVELL_PHY_ID_MASK },
+ { MARVELL_PHY_ID_88E6390_FAMILY, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1340S, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1548P, MARVELL_PHY_ID_MASK },
{ }
diff --git a/include/linux/marvell_phy.h b/include/linux/marvell_phy.h
index 52b1610eae68..c544b70dfbd2 100644
--- a/include/linux/marvell_phy.h
+++ b/include/linux/marvell_phy.h
@@ -28,11 +28,12 @@
/* Marvel 88E1111 in Finisar SFP module with modified PHY ID */
#define MARVELL_PHY_ID_88E1111_FINISAR 0x01ff0cc0
-/* The MV88e6390 Ethernet switch contains embedded PHYs. These PHYs do
+/* These Ethernet switch families contain embedded PHYs, but they do
* not have a model ID. So the switch driver traps reads to the ID2
* register and returns the switch family ID
*/
-#define MARVELL_PHY_ID_88E6390 0x01410f90
+#define MARVELL_PHY_ID_88E6341_FAMILY 0x01410f41
+#define MARVELL_PHY_ID_88E6390_FAMILY 0x01410f90
#define MARVELL_PHY_FAMILY_ID(id) ((id) >> 4)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38ec4944b593fd90c5ef42aaaa53e66ae5769d04 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 13 Apr 2021 05:41:35 -0700
Subject: [PATCH] gro: ensure frag0 meets IP header alignment
After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Guenter Roeck reported one failure in his tests using sh architecture.
After much debugging, we have been able to spot silent unaligned accesses
in inet_gro_receive()
The issue at hand is that upper networking stacks assume their header
is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
bytes before the Ethernet header to make that happen.
This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
if the fragment is not properly aligned.
Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
as 0, this extra check will be a NOP for them.
Note that if frag0 is not used, GRO will call pskb_may_pull()
as many times as needed to pull network and transport headers.
Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/core/dev.c b/net/core/dev.c
index af8c1ea040b9..1f79b9aa9a3f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5924,7 +5924,8 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
NAPI_GRO_CB(skb)->frag0_len = 0;
if (!skb_headlen(skb) && pinfo->nr_frags &&
- !PageHighMem(skb_frag_page(frag0))) {
+ !PageHighMem(skb_frag_page(frag0)) &&
+ (!NET_IP_ALIGN || !(skb_frag_off(frag0) & 3))) {
NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0);
NAPI_GRO_CB(skb)->frag0_len = min_t(unsigned int,
skb_frag_size(frag0),
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38ec4944b593fd90c5ef42aaaa53e66ae5769d04 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 13 Apr 2021 05:41:35 -0700
Subject: [PATCH] gro: ensure frag0 meets IP header alignment
After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Guenter Roeck reported one failure in his tests using sh architecture.
After much debugging, we have been able to spot silent unaligned accesses
in inet_gro_receive()
The issue at hand is that upper networking stacks assume their header
is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
bytes before the Ethernet header to make that happen.
This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
if the fragment is not properly aligned.
Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
as 0, this extra check will be a NOP for them.
Note that if frag0 is not used, GRO will call pskb_may_pull()
as many times as needed to pull network and transport headers.
Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/core/dev.c b/net/core/dev.c
index af8c1ea040b9..1f79b9aa9a3f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5924,7 +5924,8 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
NAPI_GRO_CB(skb)->frag0_len = 0;
if (!skb_headlen(skb) && pinfo->nr_frags &&
- !PageHighMem(skb_frag_page(frag0))) {
+ !PageHighMem(skb_frag_page(frag0)) &&
+ (!NET_IP_ALIGN || !(skb_frag_off(frag0) & 3))) {
NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0);
NAPI_GRO_CB(skb)->frag0_len = min_t(unsigned int,
skb_frag_size(frag0),
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38ec4944b593fd90c5ef42aaaa53e66ae5769d04 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 13 Apr 2021 05:41:35 -0700
Subject: [PATCH] gro: ensure frag0 meets IP header alignment
After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Guenter Roeck reported one failure in his tests using sh architecture.
After much debugging, we have been able to spot silent unaligned accesses
in inet_gro_receive()
The issue at hand is that upper networking stacks assume their header
is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
bytes before the Ethernet header to make that happen.
This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
if the fragment is not properly aligned.
Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
as 0, this extra check will be a NOP for them.
Note that if frag0 is not used, GRO will call pskb_may_pull()
as many times as needed to pull network and transport headers.
Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/core/dev.c b/net/core/dev.c
index af8c1ea040b9..1f79b9aa9a3f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5924,7 +5924,8 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
NAPI_GRO_CB(skb)->frag0_len = 0;
if (!skb_headlen(skb) && pinfo->nr_frags &&
- !PageHighMem(skb_frag_page(frag0))) {
+ !PageHighMem(skb_frag_page(frag0)) &&
+ (!NET_IP_ALIGN || !(skb_frag_off(frag0) & 3))) {
NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0);
NAPI_GRO_CB(skb)->frag0_len = min_t(unsigned int,
skb_frag_size(frag0),
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38ec4944b593fd90c5ef42aaaa53e66ae5769d04 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 13 Apr 2021 05:41:35 -0700
Subject: [PATCH] gro: ensure frag0 meets IP header alignment
After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Guenter Roeck reported one failure in his tests using sh architecture.
After much debugging, we have been able to spot silent unaligned accesses
in inet_gro_receive()
The issue at hand is that upper networking stacks assume their header
is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
bytes before the Ethernet header to make that happen.
This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
if the fragment is not properly aligned.
Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
as 0, this extra check will be a NOP for them.
Note that if frag0 is not used, GRO will call pskb_may_pull()
as many times as needed to pull network and transport headers.
Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/core/dev.c b/net/core/dev.c
index af8c1ea040b9..1f79b9aa9a3f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5924,7 +5924,8 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
NAPI_GRO_CB(skb)->frag0_len = 0;
if (!skb_headlen(skb) && pinfo->nr_frags &&
- !PageHighMem(skb_frag_page(frag0))) {
+ !PageHighMem(skb_frag_page(frag0)) &&
+ (!NET_IP_ALIGN || !(skb_frag_off(frag0) & 3))) {
NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0);
NAPI_GRO_CB(skb)->frag0_len = min_t(unsigned int,
skb_frag_size(frag0),
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 38ec4944b593fd90c5ef42aaaa53e66ae5769d04 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 13 Apr 2021 05:41:35 -0700
Subject: [PATCH] gro: ensure frag0 meets IP header alignment
After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Guenter Roeck reported one failure in his tests using sh architecture.
After much debugging, we have been able to spot silent unaligned accesses
in inet_gro_receive()
The issue at hand is that upper networking stacks assume their header
is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
bytes before the Ethernet header to make that happen.
This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
if the fragment is not properly aligned.
Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
as 0, this extra check will be a NOP for them.
Note that if frag0 is not used, GRO will call pskb_may_pull()
as many times as needed to pull network and transport headers.
Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/core/dev.c b/net/core/dev.c
index af8c1ea040b9..1f79b9aa9a3f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5924,7 +5924,8 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
NAPI_GRO_CB(skb)->frag0_len = 0;
if (!skb_headlen(skb) && pinfo->nr_frags &&
- !PageHighMem(skb_frag_page(frag0))) {
+ !PageHighMem(skb_frag_page(frag0)) &&
+ (!NET_IP_ALIGN || !(skb_frag_off(frag0) & 3))) {
NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0);
NAPI_GRO_CB(skb)->frag0_len = min_t(unsigned int,
skb_frag_size(frag0),
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1fe976d308acb6374c899a4ee8025a0a016e453e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pali=20Roh=C3=A1r?= <pali(a)kernel.org>
Date: Mon, 12 Apr 2021 18:57:39 +0200
Subject: [PATCH] net: phy: marvell: fix detection of PHY on Topaz switches
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit fee2d546414d ("net: phy: marvell: mv88e6390 temperature
sensor reading"), Linux reports the temperature of Topaz hwmon as
constant -75°C.
This is because switches from the Topaz family (88E6141 / 88E6341) have
the address of the temperature sensor register different from Peridot.
This address is instead compatible with 88E1510 PHYs, as was used for
Topaz before the above mentioned commit.
Create a new mapping table between switch family and PHY ID for families
which don't have a model number. And define PHY IDs for Topaz and Peridot
families.
Create a new PHY ID and a new PHY driver for Topaz's internal PHY.
The only difference from Peridot's PHY driver is the HWMON probing
method.
Prior this change Topaz's internal PHY is detected by kernel as:
PHY [...] driver [Marvell 88E6390] (irq=63)
And afterwards as:
PHY [...] driver [Marvell 88E6341 Family] (irq=63)
Signed-off-by: Pali Rohár <pali(a)kernel.org>
BugLink: https://github.com/globalscaletechnologies/linux/issues/1
Fixes: fee2d546414d ("net: phy: marvell: mv88e6390 temperature sensor reading")
Reviewed-by: Marek Behún <kabel(a)kernel.org>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 903d619e08ed..e08bf9377140 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3026,10 +3026,17 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
return err;
}
+/* prod_id for switch families which do not have a PHY model number */
+static const u16 family_prod_id_table[] = {
+ [MV88E6XXX_FAMILY_6341] = MV88E6XXX_PORT_SWITCH_ID_PROD_6341,
+ [MV88E6XXX_FAMILY_6390] = MV88E6XXX_PORT_SWITCH_ID_PROD_6390,
+};
+
static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
{
struct mv88e6xxx_mdio_bus *mdio_bus = bus->priv;
struct mv88e6xxx_chip *chip = mdio_bus->chip;
+ u16 prod_id;
u16 val;
int err;
@@ -3040,23 +3047,12 @@ static int mv88e6xxx_mdio_read(struct mii_bus *bus, int phy, int reg)
err = chip->info->ops->phy_read(chip, bus, phy, reg, &val);
mv88e6xxx_reg_unlock(chip);
- if (reg == MII_PHYSID2) {
- /* Some internal PHYs don't have a model number. */
- if (chip->info->family != MV88E6XXX_FAMILY_6165)
- /* Then there is the 6165 family. It gets is
- * PHYs correct. But it can also have two
- * SERDES interfaces in the PHY address
- * space. And these don't have a model
- * number. But they are not PHYs, so we don't
- * want to give them something a PHY driver
- * will recognise.
- *
- * Use the mv88e6390 family model number
- * instead, for anything which really could be
- * a PHY,
- */
- if (!(val & 0x3f0))
- val |= MV88E6XXX_PORT_SWITCH_ID_PROD_6390 >> 4;
+ /* Some internal PHYs don't have a model number. */
+ if (reg == MII_PHYSID2 && !(val & 0x3f0) &&
+ chip->info->family < ARRAY_SIZE(family_prod_id_table)) {
+ prod_id = family_prod_id_table[chip->info->family];
+ if (prod_id)
+ val |= prod_id >> 4;
}
return err ? err : val;
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index e26a5d663f8a..8018ddf7f316 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -3021,9 +3021,34 @@ static struct phy_driver marvell_drivers[] = {
.get_stats = marvell_get_stats,
},
{
- .phy_id = MARVELL_PHY_ID_88E6390,
+ .phy_id = MARVELL_PHY_ID_88E6341_FAMILY,
.phy_id_mask = MARVELL_PHY_ID_MASK,
- .name = "Marvell 88E6390",
+ .name = "Marvell 88E6341 Family",
+ /* PHY_GBIT_FEATURES */
+ .flags = PHY_POLL_CABLE_TEST,
+ .probe = m88e1510_probe,
+ .config_init = marvell_config_init,
+ .config_aneg = m88e6390_config_aneg,
+ .read_status = marvell_read_status,
+ .config_intr = marvell_config_intr,
+ .handle_interrupt = marvell_handle_interrupt,
+ .resume = genphy_resume,
+ .suspend = genphy_suspend,
+ .read_page = marvell_read_page,
+ .write_page = marvell_write_page,
+ .get_sset_count = marvell_get_sset_count,
+ .get_strings = marvell_get_strings,
+ .get_stats = marvell_get_stats,
+ .get_tunable = m88e1540_get_tunable,
+ .set_tunable = m88e1540_set_tunable,
+ .cable_test_start = marvell_vct7_cable_test_start,
+ .cable_test_tdr_start = marvell_vct5_cable_test_tdr_start,
+ .cable_test_get_status = marvell_vct7_cable_test_get_status,
+ },
+ {
+ .phy_id = MARVELL_PHY_ID_88E6390_FAMILY,
+ .phy_id_mask = MARVELL_PHY_ID_MASK,
+ .name = "Marvell 88E6390 Family",
/* PHY_GBIT_FEATURES */
.flags = PHY_POLL_CABLE_TEST,
.probe = m88e6390_probe,
@@ -3107,7 +3132,8 @@ static struct mdio_device_id __maybe_unused marvell_tbl[] = {
{ MARVELL_PHY_ID_88E1540, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1545, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E3016, MARVELL_PHY_ID_MASK },
- { MARVELL_PHY_ID_88E6390, MARVELL_PHY_ID_MASK },
+ { MARVELL_PHY_ID_88E6341_FAMILY, MARVELL_PHY_ID_MASK },
+ { MARVELL_PHY_ID_88E6390_FAMILY, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1340S, MARVELL_PHY_ID_MASK },
{ MARVELL_PHY_ID_88E1548P, MARVELL_PHY_ID_MASK },
{ }
diff --git a/include/linux/marvell_phy.h b/include/linux/marvell_phy.h
index 52b1610eae68..c544b70dfbd2 100644
--- a/include/linux/marvell_phy.h
+++ b/include/linux/marvell_phy.h
@@ -28,11 +28,12 @@
/* Marvel 88E1111 in Finisar SFP module with modified PHY ID */
#define MARVELL_PHY_ID_88E1111_FINISAR 0x01ff0cc0
-/* The MV88e6390 Ethernet switch contains embedded PHYs. These PHYs do
+/* These Ethernet switch families contain embedded PHYs, but they do
* not have a model ID. So the switch driver traps reads to the ID2
* register and returns the switch family ID
*/
-#define MARVELL_PHY_ID_88E6390 0x01410f90
+#define MARVELL_PHY_ID_88E6341_FAMILY 0x01410f41
+#define MARVELL_PHY_ID_88E6390_FAMILY 0x01410f90
#define MARVELL_PHY_FAMILY_ID(id) ((id) >> 4)
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9601148392520e2e134936e76788fc2a6371e7be Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Tue, 23 Mar 2021 08:32:59 +0100
Subject: [PATCH] bpf: Use correct permission flag for mixed signed bounds
arithmetic
We forbid adding unknown scalars with mixed signed bounds due to the
spectre v1 masking mitigation. Hence this also needs bypass_spec_v1
flag instead of allow_ptr_leaks.
Fixes: 2c78ee898d8f ("bpf: Implement CAP_BPF")
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend(a)gmail.com>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3a738724a380..2ede4b850230 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6085,7 +6085,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
dst, reg_type_str[ptr_reg->type]);
return -EACCES;
case PTR_TO_MAP_VALUE:
- if (!env->allow_ptr_leaks && !known && (smin_val < 0) != (smax_val < 0)) {
+ if (!env->env->bypass_spec_v1 && !known && (smin_val < 0) != (smax_val < 0)) {
verbose(env, "R%d has unknown scalar with mixed signed bounds, pointer arithmetic with it prohibited for !root\n",
off_reg == dst_reg ? dst : src);
return -EACCES;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9601148392520e2e134936e76788fc2a6371e7be Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel(a)iogearbox.net>
Date: Tue, 23 Mar 2021 08:32:59 +0100
Subject: [PATCH] bpf: Use correct permission flag for mixed signed bounds
arithmetic
We forbid adding unknown scalars with mixed signed bounds due to the
spectre v1 masking mitigation. Hence this also needs bypass_spec_v1
flag instead of allow_ptr_leaks.
Fixes: 2c78ee898d8f ("bpf: Implement CAP_BPF")
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend(a)gmail.com>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3a738724a380..2ede4b850230 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6085,7 +6085,7 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
dst, reg_type_str[ptr_reg->type]);
return -EACCES;
case PTR_TO_MAP_VALUE:
- if (!env->allow_ptr_leaks && !known && (smin_val < 0) != (smax_val < 0)) {
+ if (!env->env->bypass_spec_v1 && !known && (smin_val < 0) != (smax_val < 0)) {
verbose(env, "R%d has unknown scalar with mixed signed bounds, pointer arithmetic with it prohibited for !root\n",
off_reg == dst_reg ? dst : src);
return -EACCES;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 941ea91e87a6e879ed82dad4949f6234f2702bec Mon Sep 17 00:00:00 2001
From: Hristo Venev <hristo(a)venev.name>
Date: Mon, 12 Apr 2021 20:41:17 +0300
Subject: [PATCH] net: ip6_tunnel: Unregister catch-all devices
Similarly to the sit case, we need to remove the tunnels with no
addresses that have been moved to another network namespace.
Fixes: 0bd8762824e73 ("ip6tnl: add x-netns support")
Signed-off-by: Hristo Venev <hristo(a)venev.name>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 3fa0eca5a06f..42fe7db6bbb3 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -2244,6 +2244,16 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net, struct list_head
t = rtnl_dereference(t->next);
}
}
+
+ t = rtnl_dereference(ip6n->tnls_wc[0]);
+ while (t) {
+ /* If dev is in the same netns, it has already
+ * been added to the list by the previous loop.
+ */
+ if (!net_eq(dev_net(t->dev), net))
+ unregister_netdevice_queue(t->dev, list);
+ t = rtnl_dereference(t->next);
+ }
}
static int __net_init ip6_tnl_init_net(struct net *net)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 941ea91e87a6e879ed82dad4949f6234f2702bec Mon Sep 17 00:00:00 2001
From: Hristo Venev <hristo(a)venev.name>
Date: Mon, 12 Apr 2021 20:41:17 +0300
Subject: [PATCH] net: ip6_tunnel: Unregister catch-all devices
Similarly to the sit case, we need to remove the tunnels with no
addresses that have been moved to another network namespace.
Fixes: 0bd8762824e73 ("ip6tnl: add x-netns support")
Signed-off-by: Hristo Venev <hristo(a)venev.name>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 3fa0eca5a06f..42fe7db6bbb3 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -2244,6 +2244,16 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net, struct list_head
t = rtnl_dereference(t->next);
}
}
+
+ t = rtnl_dereference(ip6n->tnls_wc[0]);
+ while (t) {
+ /* If dev is in the same netns, it has already
+ * been added to the list by the previous loop.
+ */
+ if (!net_eq(dev_net(t->dev), net))
+ unregister_netdevice_queue(t->dev, list);
+ t = rtnl_dereference(t->next);
+ }
}
static int __net_init ip6_tnl_init_net(struct net *net)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 941ea91e87a6e879ed82dad4949f6234f2702bec Mon Sep 17 00:00:00 2001
From: Hristo Venev <hristo(a)venev.name>
Date: Mon, 12 Apr 2021 20:41:17 +0300
Subject: [PATCH] net: ip6_tunnel: Unregister catch-all devices
Similarly to the sit case, we need to remove the tunnels with no
addresses that have been moved to another network namespace.
Fixes: 0bd8762824e73 ("ip6tnl: add x-netns support")
Signed-off-by: Hristo Venev <hristo(a)venev.name>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 3fa0eca5a06f..42fe7db6bbb3 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -2244,6 +2244,16 @@ static void __net_exit ip6_tnl_destroy_tunnels(struct net *net, struct list_head
t = rtnl_dereference(t->next);
}
}
+
+ t = rtnl_dereference(ip6n->tnls_wc[0]);
+ while (t) {
+ /* If dev is in the same netns, it has already
+ * been added to the list by the previous loop.
+ */
+ if (!net_eq(dev_net(t->dev), net))
+ unregister_netdevice_queue(t->dev, list);
+ t = rtnl_dereference(t->next);
+ }
}
static int __net_init ip6_tnl_init_net(struct net *net)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 610f8c0fc8d46e0933955ce13af3d64484a4630a Mon Sep 17 00:00:00 2001
From: Hristo Venev <hristo(a)venev.name>
Date: Mon, 12 Apr 2021 20:41:16 +0300
Subject: [PATCH] net: sit: Unregister catch-all devices
A sit interface created without a local or a remote address is linked
into the `sit_net::tunnels_wc` list of its original namespace. When
deleting a network namespace, delete the devices that have been moved.
The following script triggers a null pointer dereference if devices
linked in a deleted `sit_net` remain:
for i in `seq 1 30`; do
ip netns add ns-test
ip netns exec ns-test ip link add dev veth0 type veth peer veth1
ip netns exec ns-test ip link add dev sit$i type sit dev veth0
ip netns exec ns-test ip link set dev sit$i netns $$
ip netns del ns-test
done
for i in `seq 1 30`; do
ip link del dev sit$i
done
Fixes: 5e6700b3bf98f ("sit: add support of x-netns")
Signed-off-by: Hristo Venev <hristo(a)venev.name>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 63ccd9f2dccc..9fdccf0718b5 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1867,9 +1867,9 @@ static void __net_exit sit_destroy_tunnels(struct net *net,
if (dev->rtnl_link_ops == &sit_link_ops)
unregister_netdevice_queue(dev, head);
- for (prio = 1; prio < 4; prio++) {
+ for (prio = 0; prio < 4; prio++) {
int h;
- for (h = 0; h < IP6_SIT_HASH_SIZE; h++) {
+ for (h = 0; h < (prio ? IP6_SIT_HASH_SIZE : 1); h++) {
struct ip_tunnel *t;
t = rtnl_dereference(sitn->tunnels[prio][h]);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d163a925ebbc6eb5b562b0f1d72c7e817aa75c40 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw(a)strlen.de>
Date: Wed, 7 Apr 2021 21:43:40 +0200
Subject: [PATCH] netfilter: arp_tables: add pre_exit hook for table unregister
Same problem that also existed in iptables/ip(6)tables, when
arptable_filter is removed there is no longer a wait period before the
table/ruleset is free'd.
Unregister the hook in pre_exit, then remove the table in the exit
function.
This used to work correctly because the old nf_hook_unregister API
did unconditional synchronize_net.
The per-net hook unregister function uses call_rcu instead.
Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/include/linux/netfilter_arp/arp_tables.h b/include/linux/netfilter_arp/arp_tables.h
index 7d3537c40ec9..26a13294318c 100644
--- a/include/linux/netfilter_arp/arp_tables.h
+++ b/include/linux/netfilter_arp/arp_tables.h
@@ -52,8 +52,9 @@ extern void *arpt_alloc_initial_table(const struct xt_table *);
int arpt_register_table(struct net *net, const struct xt_table *table,
const struct arpt_replace *repl,
const struct nf_hook_ops *ops, struct xt_table **res);
-void arpt_unregister_table(struct net *net, struct xt_table *table,
- const struct nf_hook_ops *ops);
+void arpt_unregister_table(struct net *net, struct xt_table *table);
+void arpt_unregister_table_pre_exit(struct net *net, struct xt_table *table,
+ const struct nf_hook_ops *ops);
extern unsigned int arpt_do_table(struct sk_buff *skb,
const struct nf_hook_state *state,
struct xt_table *table);
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index d1e04d2b5170..6c26533480dd 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1539,10 +1539,15 @@ int arpt_register_table(struct net *net,
return ret;
}
-void arpt_unregister_table(struct net *net, struct xt_table *table,
- const struct nf_hook_ops *ops)
+void arpt_unregister_table_pre_exit(struct net *net, struct xt_table *table,
+ const struct nf_hook_ops *ops)
{
nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
+}
+EXPORT_SYMBOL(arpt_unregister_table_pre_exit);
+
+void arpt_unregister_table(struct net *net, struct xt_table *table)
+{
__arpt_unregister_table(net, table);
}
diff --git a/net/ipv4/netfilter/arptable_filter.c b/net/ipv4/netfilter/arptable_filter.c
index c216b9ad3bb2..6c300ba5634e 100644
--- a/net/ipv4/netfilter/arptable_filter.c
+++ b/net/ipv4/netfilter/arptable_filter.c
@@ -56,16 +56,24 @@ static int __net_init arptable_filter_table_init(struct net *net)
return err;
}
+static void __net_exit arptable_filter_net_pre_exit(struct net *net)
+{
+ if (net->ipv4.arptable_filter)
+ arpt_unregister_table_pre_exit(net, net->ipv4.arptable_filter,
+ arpfilter_ops);
+}
+
static void __net_exit arptable_filter_net_exit(struct net *net)
{
if (!net->ipv4.arptable_filter)
return;
- arpt_unregister_table(net, net->ipv4.arptable_filter, arpfilter_ops);
+ arpt_unregister_table(net, net->ipv4.arptable_filter);
net->ipv4.arptable_filter = NULL;
}
static struct pernet_operations arptable_filter_net_ops = {
.exit = arptable_filter_net_exit,
+ .pre_exit = arptable_filter_net_pre_exit,
};
static int __init arptable_filter_init(void)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d163a925ebbc6eb5b562b0f1d72c7e817aa75c40 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw(a)strlen.de>
Date: Wed, 7 Apr 2021 21:43:40 +0200
Subject: [PATCH] netfilter: arp_tables: add pre_exit hook for table unregister
Same problem that also existed in iptables/ip(6)tables, when
arptable_filter is removed there is no longer a wait period before the
table/ruleset is free'd.
Unregister the hook in pre_exit, then remove the table in the exit
function.
This used to work correctly because the old nf_hook_unregister API
did unconditional synchronize_net.
The per-net hook unregister function uses call_rcu instead.
Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/include/linux/netfilter_arp/arp_tables.h b/include/linux/netfilter_arp/arp_tables.h
index 7d3537c40ec9..26a13294318c 100644
--- a/include/linux/netfilter_arp/arp_tables.h
+++ b/include/linux/netfilter_arp/arp_tables.h
@@ -52,8 +52,9 @@ extern void *arpt_alloc_initial_table(const struct xt_table *);
int arpt_register_table(struct net *net, const struct xt_table *table,
const struct arpt_replace *repl,
const struct nf_hook_ops *ops, struct xt_table **res);
-void arpt_unregister_table(struct net *net, struct xt_table *table,
- const struct nf_hook_ops *ops);
+void arpt_unregister_table(struct net *net, struct xt_table *table);
+void arpt_unregister_table_pre_exit(struct net *net, struct xt_table *table,
+ const struct nf_hook_ops *ops);
extern unsigned int arpt_do_table(struct sk_buff *skb,
const struct nf_hook_state *state,
struct xt_table *table);
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index d1e04d2b5170..6c26533480dd 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1539,10 +1539,15 @@ int arpt_register_table(struct net *net,
return ret;
}
-void arpt_unregister_table(struct net *net, struct xt_table *table,
- const struct nf_hook_ops *ops)
+void arpt_unregister_table_pre_exit(struct net *net, struct xt_table *table,
+ const struct nf_hook_ops *ops)
{
nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
+}
+EXPORT_SYMBOL(arpt_unregister_table_pre_exit);
+
+void arpt_unregister_table(struct net *net, struct xt_table *table)
+{
__arpt_unregister_table(net, table);
}
diff --git a/net/ipv4/netfilter/arptable_filter.c b/net/ipv4/netfilter/arptable_filter.c
index c216b9ad3bb2..6c300ba5634e 100644
--- a/net/ipv4/netfilter/arptable_filter.c
+++ b/net/ipv4/netfilter/arptable_filter.c
@@ -56,16 +56,24 @@ static int __net_init arptable_filter_table_init(struct net *net)
return err;
}
+static void __net_exit arptable_filter_net_pre_exit(struct net *net)
+{
+ if (net->ipv4.arptable_filter)
+ arpt_unregister_table_pre_exit(net, net->ipv4.arptable_filter,
+ arpfilter_ops);
+}
+
static void __net_exit arptable_filter_net_exit(struct net *net)
{
if (!net->ipv4.arptable_filter)
return;
- arpt_unregister_table(net, net->ipv4.arptable_filter, arpfilter_ops);
+ arpt_unregister_table(net, net->ipv4.arptable_filter);
net->ipv4.arptable_filter = NULL;
}
static struct pernet_operations arptable_filter_net_ops = {
.exit = arptable_filter_net_exit,
+ .pre_exit = arptable_filter_net_pre_exit,
};
static int __init arptable_filter_init(void)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d163a925ebbc6eb5b562b0f1d72c7e817aa75c40 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw(a)strlen.de>
Date: Wed, 7 Apr 2021 21:43:40 +0200
Subject: [PATCH] netfilter: arp_tables: add pre_exit hook for table unregister
Same problem that also existed in iptables/ip(6)tables, when
arptable_filter is removed there is no longer a wait period before the
table/ruleset is free'd.
Unregister the hook in pre_exit, then remove the table in the exit
function.
This used to work correctly because the old nf_hook_unregister API
did unconditional synchronize_net.
The per-net hook unregister function uses call_rcu instead.
Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/include/linux/netfilter_arp/arp_tables.h b/include/linux/netfilter_arp/arp_tables.h
index 7d3537c40ec9..26a13294318c 100644
--- a/include/linux/netfilter_arp/arp_tables.h
+++ b/include/linux/netfilter_arp/arp_tables.h
@@ -52,8 +52,9 @@ extern void *arpt_alloc_initial_table(const struct xt_table *);
int arpt_register_table(struct net *net, const struct xt_table *table,
const struct arpt_replace *repl,
const struct nf_hook_ops *ops, struct xt_table **res);
-void arpt_unregister_table(struct net *net, struct xt_table *table,
- const struct nf_hook_ops *ops);
+void arpt_unregister_table(struct net *net, struct xt_table *table);
+void arpt_unregister_table_pre_exit(struct net *net, struct xt_table *table,
+ const struct nf_hook_ops *ops);
extern unsigned int arpt_do_table(struct sk_buff *skb,
const struct nf_hook_state *state,
struct xt_table *table);
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index d1e04d2b5170..6c26533480dd 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1539,10 +1539,15 @@ int arpt_register_table(struct net *net,
return ret;
}
-void arpt_unregister_table(struct net *net, struct xt_table *table,
- const struct nf_hook_ops *ops)
+void arpt_unregister_table_pre_exit(struct net *net, struct xt_table *table,
+ const struct nf_hook_ops *ops)
{
nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
+}
+EXPORT_SYMBOL(arpt_unregister_table_pre_exit);
+
+void arpt_unregister_table(struct net *net, struct xt_table *table)
+{
__arpt_unregister_table(net, table);
}
diff --git a/net/ipv4/netfilter/arptable_filter.c b/net/ipv4/netfilter/arptable_filter.c
index c216b9ad3bb2..6c300ba5634e 100644
--- a/net/ipv4/netfilter/arptable_filter.c
+++ b/net/ipv4/netfilter/arptable_filter.c
@@ -56,16 +56,24 @@ static int __net_init arptable_filter_table_init(struct net *net)
return err;
}
+static void __net_exit arptable_filter_net_pre_exit(struct net *net)
+{
+ if (net->ipv4.arptable_filter)
+ arpt_unregister_table_pre_exit(net, net->ipv4.arptable_filter,
+ arpfilter_ops);
+}
+
static void __net_exit arptable_filter_net_exit(struct net *net)
{
if (!net->ipv4.arptable_filter)
return;
- arpt_unregister_table(net, net->ipv4.arptable_filter, arpfilter_ops);
+ arpt_unregister_table(net, net->ipv4.arptable_filter);
net->ipv4.arptable_filter = NULL;
}
static struct pernet_operations arptable_filter_net_ops = {
.exit = arptable_filter_net_exit,
+ .pre_exit = arptable_filter_net_pre_exit,
};
static int __init arptable_filter_init(void)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7ee3c61dcd28bf6e290e06ad382f13511dc790e9 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw(a)strlen.de>
Date: Wed, 7 Apr 2021 21:43:39 +0200
Subject: [PATCH] netfilter: bridge: add pre_exit hooks for ebtable
unregistration
Just like ip/ip6/arptables, the hooks have to be removed, then
synchronize_rcu() has to be called to make sure no more packets are being
processed before the ruleset data is released.
Place the hook unregistration in the pre_exit hook, then call the new
ebtables pre_exit function from there.
Years ago, when first netns support got added for netfilter+ebtables,
this used an older (now removed) netfilter hook unregister API, that did
a unconditional synchronize_rcu().
Now that all is done with call_rcu, ebtable_{filter,nat,broute} pernet exit
handlers may free the ebtable ruleset while packets are still in flight.
This can only happens on module removal, not during netns exit.
The new function expects the table name, not the table struct.
This is because upcoming patch set (targeting -next) will remove all
net->xt.{nat,filter,broute}_table instances, this makes it necessary
to avoid external references to those member variables.
The existing APIs will be converted, so follow the upcoming scheme of
passing name + hook type instead.
Fixes: aee12a0a3727e ("ebtables: remove nf_hook_register usage")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
index 2f5c4e6ecd8a..3a956145a25c 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -110,8 +110,9 @@ extern int ebt_register_table(struct net *net,
const struct ebt_table *table,
const struct nf_hook_ops *ops,
struct ebt_table **res);
-extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
- const struct nf_hook_ops *);
+extern void ebt_unregister_table(struct net *net, struct ebt_table *table);
+void ebt_unregister_table_pre_exit(struct net *net, const char *tablename,
+ const struct nf_hook_ops *ops);
extern unsigned int ebt_do_table(struct sk_buff *skb,
const struct nf_hook_state *state,
struct ebt_table *table);
diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c
index 66e7af165494..32bc2821027f 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -105,14 +105,20 @@ static int __net_init broute_net_init(struct net *net)
&net->xt.broute_table);
}
+static void __net_exit broute_net_pre_exit(struct net *net)
+{
+ ebt_unregister_table_pre_exit(net, "broute", &ebt_ops_broute);
+}
+
static void __net_exit broute_net_exit(struct net *net)
{
- ebt_unregister_table(net, net->xt.broute_table, &ebt_ops_broute);
+ ebt_unregister_table(net, net->xt.broute_table);
}
static struct pernet_operations broute_net_ops = {
.init = broute_net_init,
.exit = broute_net_exit,
+ .pre_exit = broute_net_pre_exit,
};
static int __init ebtable_broute_init(void)
diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c
index 78cb9b21022d..bcf982e12f16 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -99,14 +99,20 @@ static int __net_init frame_filter_net_init(struct net *net)
&net->xt.frame_filter);
}
+static void __net_exit frame_filter_net_pre_exit(struct net *net)
+{
+ ebt_unregister_table_pre_exit(net, "filter", ebt_ops_filter);
+}
+
static void __net_exit frame_filter_net_exit(struct net *net)
{
- ebt_unregister_table(net, net->xt.frame_filter, ebt_ops_filter);
+ ebt_unregister_table(net, net->xt.frame_filter);
}
static struct pernet_operations frame_filter_net_ops = {
.init = frame_filter_net_init,
.exit = frame_filter_net_exit,
+ .pre_exit = frame_filter_net_pre_exit,
};
static int __init ebtable_filter_init(void)
diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c
index 0888936ef853..0d092773f816 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -99,14 +99,20 @@ static int __net_init frame_nat_net_init(struct net *net)
&net->xt.frame_nat);
}
+static void __net_exit frame_nat_net_pre_exit(struct net *net)
+{
+ ebt_unregister_table_pre_exit(net, "nat", ebt_ops_nat);
+}
+
static void __net_exit frame_nat_net_exit(struct net *net)
{
- ebt_unregister_table(net, net->xt.frame_nat, ebt_ops_nat);
+ ebt_unregister_table(net, net->xt.frame_nat);
}
static struct pernet_operations frame_nat_net_ops = {
.init = frame_nat_net_init,
.exit = frame_nat_net_exit,
+ .pre_exit = frame_nat_net_pre_exit,
};
static int __init ebtable_nat_init(void)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index ebe33b60efd6..d481ff24a150 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1232,10 +1232,34 @@ int ebt_register_table(struct net *net, const struct ebt_table *input_table,
return ret;
}
-void ebt_unregister_table(struct net *net, struct ebt_table *table,
- const struct nf_hook_ops *ops)
+static struct ebt_table *__ebt_find_table(struct net *net, const char *name)
+{
+ struct ebt_table *t;
+
+ mutex_lock(&ebt_mutex);
+
+ list_for_each_entry(t, &net->xt.tables[NFPROTO_BRIDGE], list) {
+ if (strcmp(t->name, name) == 0) {
+ mutex_unlock(&ebt_mutex);
+ return t;
+ }
+ }
+
+ mutex_unlock(&ebt_mutex);
+ return NULL;
+}
+
+void ebt_unregister_table_pre_exit(struct net *net, const char *name, const struct nf_hook_ops *ops)
+{
+ struct ebt_table *table = __ebt_find_table(net, name);
+
+ if (table)
+ nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
+}
+EXPORT_SYMBOL(ebt_unregister_table_pre_exit);
+
+void ebt_unregister_table(struct net *net, struct ebt_table *table)
{
- nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
__ebt_unregister_table(net, table);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7ee3c61dcd28bf6e290e06ad382f13511dc790e9 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw(a)strlen.de>
Date: Wed, 7 Apr 2021 21:43:39 +0200
Subject: [PATCH] netfilter: bridge: add pre_exit hooks for ebtable
unregistration
Just like ip/ip6/arptables, the hooks have to be removed, then
synchronize_rcu() has to be called to make sure no more packets are being
processed before the ruleset data is released.
Place the hook unregistration in the pre_exit hook, then call the new
ebtables pre_exit function from there.
Years ago, when first netns support got added for netfilter+ebtables,
this used an older (now removed) netfilter hook unregister API, that did
a unconditional synchronize_rcu().
Now that all is done with call_rcu, ebtable_{filter,nat,broute} pernet exit
handlers may free the ebtable ruleset while packets are still in flight.
This can only happens on module removal, not during netns exit.
The new function expects the table name, not the table struct.
This is because upcoming patch set (targeting -next) will remove all
net->xt.{nat,filter,broute}_table instances, this makes it necessary
to avoid external references to those member variables.
The existing APIs will be converted, so follow the upcoming scheme of
passing name + hook type instead.
Fixes: aee12a0a3727e ("ebtables: remove nf_hook_register usage")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h
index 2f5c4e6ecd8a..3a956145a25c 100644
--- a/include/linux/netfilter_bridge/ebtables.h
+++ b/include/linux/netfilter_bridge/ebtables.h
@@ -110,8 +110,9 @@ extern int ebt_register_table(struct net *net,
const struct ebt_table *table,
const struct nf_hook_ops *ops,
struct ebt_table **res);
-extern void ebt_unregister_table(struct net *net, struct ebt_table *table,
- const struct nf_hook_ops *);
+extern void ebt_unregister_table(struct net *net, struct ebt_table *table);
+void ebt_unregister_table_pre_exit(struct net *net, const char *tablename,
+ const struct nf_hook_ops *ops);
extern unsigned int ebt_do_table(struct sk_buff *skb,
const struct nf_hook_state *state,
struct ebt_table *table);
diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c
index 66e7af165494..32bc2821027f 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -105,14 +105,20 @@ static int __net_init broute_net_init(struct net *net)
&net->xt.broute_table);
}
+static void __net_exit broute_net_pre_exit(struct net *net)
+{
+ ebt_unregister_table_pre_exit(net, "broute", &ebt_ops_broute);
+}
+
static void __net_exit broute_net_exit(struct net *net)
{
- ebt_unregister_table(net, net->xt.broute_table, &ebt_ops_broute);
+ ebt_unregister_table(net, net->xt.broute_table);
}
static struct pernet_operations broute_net_ops = {
.init = broute_net_init,
.exit = broute_net_exit,
+ .pre_exit = broute_net_pre_exit,
};
static int __init ebtable_broute_init(void)
diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c
index 78cb9b21022d..bcf982e12f16 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -99,14 +99,20 @@ static int __net_init frame_filter_net_init(struct net *net)
&net->xt.frame_filter);
}
+static void __net_exit frame_filter_net_pre_exit(struct net *net)
+{
+ ebt_unregister_table_pre_exit(net, "filter", ebt_ops_filter);
+}
+
static void __net_exit frame_filter_net_exit(struct net *net)
{
- ebt_unregister_table(net, net->xt.frame_filter, ebt_ops_filter);
+ ebt_unregister_table(net, net->xt.frame_filter);
}
static struct pernet_operations frame_filter_net_ops = {
.init = frame_filter_net_init,
.exit = frame_filter_net_exit,
+ .pre_exit = frame_filter_net_pre_exit,
};
static int __init ebtable_filter_init(void)
diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c
index 0888936ef853..0d092773f816 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -99,14 +99,20 @@ static int __net_init frame_nat_net_init(struct net *net)
&net->xt.frame_nat);
}
+static void __net_exit frame_nat_net_pre_exit(struct net *net)
+{
+ ebt_unregister_table_pre_exit(net, "nat", ebt_ops_nat);
+}
+
static void __net_exit frame_nat_net_exit(struct net *net)
{
- ebt_unregister_table(net, net->xt.frame_nat, ebt_ops_nat);
+ ebt_unregister_table(net, net->xt.frame_nat);
}
static struct pernet_operations frame_nat_net_ops = {
.init = frame_nat_net_init,
.exit = frame_nat_net_exit,
+ .pre_exit = frame_nat_net_pre_exit,
};
static int __init ebtable_nat_init(void)
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index ebe33b60efd6..d481ff24a150 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1232,10 +1232,34 @@ int ebt_register_table(struct net *net, const struct ebt_table *input_table,
return ret;
}
-void ebt_unregister_table(struct net *net, struct ebt_table *table,
- const struct nf_hook_ops *ops)
+static struct ebt_table *__ebt_find_table(struct net *net, const char *name)
+{
+ struct ebt_table *t;
+
+ mutex_lock(&ebt_mutex);
+
+ list_for_each_entry(t, &net->xt.tables[NFPROTO_BRIDGE], list) {
+ if (strcmp(t->name, name) == 0) {
+ mutex_unlock(&ebt_mutex);
+ return t;
+ }
+ }
+
+ mutex_unlock(&ebt_mutex);
+ return NULL;
+}
+
+void ebt_unregister_table_pre_exit(struct net *net, const char *name, const struct nf_hook_ops *ops)
+{
+ struct ebt_table *table = __ebt_find_table(net, name);
+
+ if (table)
+ nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
+}
+EXPORT_SYMBOL(ebt_unregister_table_pre_exit);
+
+void ebt_unregister_table(struct net *net, struct ebt_table *table)
{
- nf_unregister_net_hooks(net, ops, hweight32(table->valid_hooks));
__ebt_unregister_table(net, table);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2decad92f4731fac9755a083fcfefa66edb7d67d Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas(a)arm.com>
Date: Fri, 9 Apr 2021 18:37:10 +0100
Subject: [PATCH] arm64: mte: Ensure TIF_MTE_ASYNC_FAULT is set atomically
The entry from EL0 code checks the TFSRE0_EL1 register for any
asynchronous tag check faults in user space and sets the
TIF_MTE_ASYNC_FAULT flag. This is not done atomically, potentially
racing with another CPU calling set_tsk_thread_flag().
Replace the non-atomic ORR+STR with an STSET instruction. While STSET
requires ARMv8.1 and an assembler that understands LSE atomics, the MTE
feature is part of ARMv8.5 and already requires an updated assembler.
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Fixes: 637ec831ea4f ("arm64: mte: Handle synchronous and asynchronous tag check faults")
Cc: <stable(a)vger.kernel.org> # 5.10.x
Reported-by: Will Deacon <will(a)kernel.org>
Cc: Will Deacon <will(a)kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Link: https://lore.kernel.org/r/20210409173710.18582-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e4e1b6550115..dfdc3e0af5e1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1406,10 +1406,13 @@ config ARM64_PAN
config AS_HAS_LDAPR
def_bool $(as-instr,.arch_extension rcpc)
+config AS_HAS_LSE_ATOMICS
+ def_bool $(as-instr,.arch_extension lse)
+
config ARM64_LSE_ATOMICS
bool
default ARM64_USE_LSE_ATOMICS
- depends on $(as-instr,.arch_extension lse)
+ depends on AS_HAS_LSE_ATOMICS
config ARM64_USE_LSE_ATOMICS
bool "Atomic instructions"
@@ -1666,6 +1669,7 @@ config ARM64_MTE
default y
depends on ARM64_AS_HAS_MTE && ARM64_TAGGED_ADDR_ABI
depends on AS_HAS_ARMV8_5
+ depends on AS_HAS_LSE_ATOMICS
# Required for tag checking in the uaccess routines
depends on ARM64_PAN
select ARCH_USES_HIGH_VMA_FLAGS
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a31a0a713c85..6acfc5e6b5e0 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -148,16 +148,18 @@ alternative_cb_end
.endm
/* Check for MTE asynchronous tag check faults */
- .macro check_mte_async_tcf, flgs, tmp
+ .macro check_mte_async_tcf, tmp, ti_flags
#ifdef CONFIG_ARM64_MTE
+ .arch_extension lse
alternative_if_not ARM64_MTE
b 1f
alternative_else_nop_endif
mrs_s \tmp, SYS_TFSRE0_EL1
tbz \tmp, #SYS_TFSR_EL1_TF0_SHIFT, 1f
/* Asynchronous TCF occurred for TTBR0 access, set the TI flag */
- orr \flgs, \flgs, #_TIF_MTE_ASYNC_FAULT
- str \flgs, [tsk, #TSK_TI_FLAGS]
+ mov \tmp, #_TIF_MTE_ASYNC_FAULT
+ add \ti_flags, tsk, #TSK_TI_FLAGS
+ stset \tmp, [\ti_flags]
msr_s SYS_TFSRE0_EL1, xzr
1:
#endif
@@ -244,7 +246,7 @@ alternative_else_nop_endif
disable_step_tsk x19, x20
/* Check for asynchronous tag check faults in user space */
- check_mte_async_tcf x19, x22
+ check_mte_async_tcf x22, x23
apply_ssbd 1, x22, x23
ptrauth_keys_install_kernel tsk, x20, x22, x23
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 04c4f2ee3f68c9a4bf1653d15f1a9a435ae33f7a Mon Sep 17 00:00:00 2001
From: Reiji Watanabe <reijiw(a)google.com>
Date: Tue, 13 Apr 2021 15:47:40 +0000
Subject: [PATCH] KVM: VMX: Don't use vcpu->run->internal.ndata as an array
index
__vmx_handle_exit() uses vcpu->run->internal.ndata as an index for
an array access. Since vcpu->run is (can be) mapped to a user address
space with a writer permission, the 'ndata' could be updated by the
user process at anytime (the user process can set it to outside the
bounds of the array).
So, it is not safe that __vmx_handle_exit() uses the 'ndata' that way.
Fixes: 1aa561b1a4c0 ("kvm: x86: Add "last CPU" to some KVM_EXIT information")
Signed-off-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Message-Id: <20210413154739.490299-1-reijiw(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 32cf8287d4a7..29b40e092d13 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6027,19 +6027,19 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
exit_reason.basic != EXIT_REASON_PML_FULL &&
exit_reason.basic != EXIT_REASON_APIC_ACCESS &&
exit_reason.basic != EXIT_REASON_TASK_SWITCH)) {
+ int ndata = 3;
+
vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_DELIVERY_EV;
- vcpu->run->internal.ndata = 3;
vcpu->run->internal.data[0] = vectoring_info;
vcpu->run->internal.data[1] = exit_reason.full;
vcpu->run->internal.data[2] = vcpu->arch.exit_qualification;
if (exit_reason.basic == EXIT_REASON_EPT_MISCONFIG) {
- vcpu->run->internal.ndata++;
- vcpu->run->internal.data[3] =
+ vcpu->run->internal.data[ndata++] =
vmcs_read64(GUEST_PHYSICAL_ADDRESS);
}
- vcpu->run->internal.data[vcpu->run->internal.ndata++] =
- vcpu->arch.last_vmentry_cpu;
+ vcpu->run->internal.data[ndata++] = vcpu->arch.last_vmentry_cpu;
+ vcpu->run->internal.ndata = ndata;
return 0;
}
From: Ping-Ke Shih <pkshih(a)realtek.com>
Using a kernel with the Undefined Behaviour Sanity Checker (UBSAN) enabled, the
following array overrun is logged:
================================================================================
UBSAN: array-index-out-of-bounds in /home/finger/wireless-drivers-next/drivers/net/wireless/realtek/rtw88/phy.c:1789:34
index 5 is out of range for type 'u8 [5]'
CPU: 2 PID: 84 Comm: kworker/u16:3 Tainted: G O 5.12.0-rc5-00086-gd88bba47038e-dirty #651
Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.50 09/29/2014
Workqueue: phy0 ieee80211_scan_work [mac80211]
Call Trace:
dump_stack+0x64/0x7c
ubsan_epilogue+0x5/0x40
__ubsan_handle_out_of_bounds.cold+0x43/0x48
rtw_get_tx_power_params+0x83a/drivers/net/wireless/realtek/rtw88/0xad0 [rtw_core]
? rtw_pci_read16+0x20/0x20 [rtw_pci]
? check_hw_ready+0x50/0x90 [rtw_core]
rtw_phy_get_tx_power_index+0x4d/0xd0 [rtw_core]
rtw_phy_set_tx_power_level+0xee/0x1b0 [rtw_core]
rtw_set_channel+0xab/0x110 [rtw_core]
rtw_ops_config+0x87/0xc0 [rtw_core]
ieee80211_hw_config+0x9d/0x130 [mac80211]
ieee80211_scan_state_set_channel+0x81/0x170 [mac80211]
ieee80211_scan_work+0x19f/0x2a0 [mac80211]
process_one_work+0x1dd/0x3a0
worker_thread+0x49/0x330
? rescuer_thread+0x3a0/0x3a0
kthread+0x134/0x150
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x22/0x30
================================================================================
The statement where an array is being overrun is shown in the following snippet:
if (rate <= DESC_RATE11M)
tx_power = pwr_idx_2g->cck_base[group];
else
====> tx_power = pwr_idx_2g->bw40_base[group];
The associated arrays are defined in main.h as follows:
struct rtw_2g_txpwr_idx {
u8 cck_base[6];
u8 bw40_base[5];
struct rtw_2g_1s_pwr_idx_diff ht_1s_diff;
struct rtw_2g_ns_pwr_idx_diff ht_2s_diff;
struct rtw_2g_ns_pwr_idx_diff ht_3s_diff;
struct rtw_2g_ns_pwr_idx_diff ht_4s_diff;
};
The problem arises because the value of group is 5 for channel 14. The trivial
increase in the dimension of bw40_base fails as this struct must match the layout of
efuse. The fix is to add the rate as an argument to rtw_get_channel_group() and set
the group for channel 14 to 4 if rate <= DESC_RATE11M.
This patch fixes commit fa6dfe6bff24 ("rtw88: resolve order of tx power setting routines")
Fixes: fa6dfe6bff24 ("rtw88: resolve order of tx power setting routines")
Reported-by: Богдан Пилипенко <bogdan.pylypenko107(a)gmail.com>
Signed-off-by: Larry Finger <Larry.Finger(a)lwfinger.net>
Signed-off-by: Ping-Ke Shih <pkshih(a)realtek.com>
Cc: Stable <stable(a)vger.kernel.org>
---
drivers/net/wireless/realtek/rtw88/phy.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw88/phy.c b/drivers/net/wireless/realtek/rtw88/phy.c
index e114ddecac09..0b3da5bef703 100644
--- a/drivers/net/wireless/realtek/rtw88/phy.c
+++ b/drivers/net/wireless/realtek/rtw88/phy.c
@@ -1584,7 +1584,7 @@ void rtw_phy_load_tables(struct rtw_dev *rtwdev)
}
EXPORT_SYMBOL(rtw_phy_load_tables);
-static u8 rtw_get_channel_group(u8 channel)
+static u8 rtw_get_channel_group(u8 channel, u8 rate)
{
switch (channel) {
default:
@@ -1628,6 +1628,7 @@ static u8 rtw_get_channel_group(u8 channel)
case 106:
return 4;
case 14:
+ return rate <= DESC_RATE11M ? 5 : 4;
case 108:
case 110:
case 112:
@@ -1879,7 +1880,7 @@ void rtw_get_tx_power_params(struct rtw_dev *rtwdev, u8 path, u8 rate, u8 bw,
s8 *remnant = &pwr_param->pwr_remnant;
pwr_idx = &rtwdev->efuse.txpwr_idx_table[path];
- group = rtw_get_channel_group(ch);
+ group = rtw_get_channel_group(ch, rate);
/* base power index for 2.4G/5G */
if (IS_CH_2G_BAND(ch)) {
--
2.30.2
The rsi_resume() does access the bus to enable interrupts on the RSI
SDIO WiFi card, however when calling sdio_claim_host() in the resume
path, it is possible the bus is already claimed and sdio_claim_host()
spins indefinitelly. Enable the SDIO card interrupts in resume_noirq
instead to prevent anything else from claiming the SDIO bus first.
Fixes: 20db07332736 ("rsi: sdio suspend and resume support")
Signed-off-by: Marek Vasut <marex(a)denx.de>
Cc: Amitkumar Karwar <amit.karwar(a)redpinesignals.com>
Cc: Angus Ainslie <angus(a)akkea.ca>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Kalle Valo <kvalo(a)codeaurora.org>
Cc: Karun Eagalapati <karun256(a)gmail.com>
Cc: Martin Kepplinger <martink(a)posteo.de>
Cc: Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm>
Cc: Siva Rebbagondla <siva8118(a)gmail.com>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
---
drivers/net/wireless/rsi/rsi_91x_sdio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c b/drivers/net/wireless/rsi/rsi_91x_sdio.c
index 122174fca672..8465a4ee9b61 100644
--- a/drivers/net/wireless/rsi/rsi_91x_sdio.c
+++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c
@@ -1513,7 +1513,7 @@ static int rsi_restore(struct device *dev)
}
static const struct dev_pm_ops rsi_pm_ops = {
.suspend = rsi_suspend,
- .resume = rsi_resume,
+ .resume_noirq = rsi_resume,
.freeze = rsi_freeze,
.thaw = rsi_thaw,
.restore = rsi_restore,
--
2.30.2
Hi Greg, hi Sasha
Please consider to apply commit 7c03e2cda4a5 ("vfs: move
cap_convert_nscap() call into vfs_setxattr()") to stable series at
least back to 4.19.y. It applies to there (but have not tested older
series) and could test a build on top of 5.10.y with the commit.
The commit was applied in 5.11-rc1 and from the commit message:
vfs: move cap_convert_nscap() call into vfs_setxattr()
cap_convert_nscap() does permission checking as well as conversion of the
xattr value conditionally based on fs's user-ns.
This is needed by overlayfs and probably other layered fs (ecryptfs) and is
what vfs_foo() is supposed to do anyway.
Additionally, in fact additionally for distribtuions kernels which do
allow unprivileged overlayfs mounts this as as well broader
consequences, as explained in
https://www.openwall.com/lists/oss-security/2021/04/16/1 .
Thanks for considering!
Regards,
Salvatore
The following commit has been merged into the timers/core branch of tip:
Commit-ID: 2d036dfa5f10df9782f5278fc591d79d283c1fad
Gitweb: https://git.kernel.org/tip/2d036dfa5f10df9782f5278fc591d79d283c1fad
Author: Chen Jun <chenjun102(a)huawei.com>
AuthorDate: Wed, 14 Apr 2021 03:04:49
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Sat, 17 Apr 2021 14:55:06 +02:00
posix-timers: Preserve return value in clock_adjtime32()
The return value on success (>= 0) is overwritten by the return value of
put_old_timex32(). That works correct in the fault case, but is wrong for
the success case where put_old_timex32() returns 0.
Just check the return value of put_old_timex32() and return -EFAULT in case
it is not zero.
[ tglx: Massage changelog ]
Fixes: 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts")
Signed-off-by: Chen Jun <chenjun102(a)huawei.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Richard Cochran <richardcochran(a)gmail.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20210414030449.90692-1-chenjun102@huawei.com
---
kernel/time/posix-timers.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index bf540f5..dd5697d 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -1191,8 +1191,8 @@ SYSCALL_DEFINE2(clock_adjtime32, clockid_t, which_clock,
err = do_clock_adjtime(which_clock, &ktx);
- if (err >= 0)
- err = put_old_timex32(utp, &ktx);
+ if (err >= 0 && put_old_timex32(utp, &ktx))
+ return -EFAULT;
return err;
}
This is the start of the stable review cycle for the 5.10.31 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Apr 2021 14:44:01 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.31-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.31-rc1
Juergen Gross <jgross(a)suse.com>
xen/events: fix setting irq affinity
Russell King <rmk+kernel(a)armlinux.org.uk>
net: sfp: cope with SFPs that set both LOS normal and LOS inverted
Russell King <rmk+kernel(a)armlinux.org.uk>
net: sfp: relax bitrate-derived mode check
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
Florian Westphal <fw(a)strlen.de>
netfilter: x_tables: fix compat match/target pad out-of-bound write
Pavel Begunkov <asml.silence(a)gmail.com>
block: don't ignore REQ_NOWAIT for direct IO
Zihao Yu <yuzihao(a)ict.ac.cn>
riscv,entry: fix misaligned base for excp_vect_table
Jens Axboe <axboe(a)kernel.dk>
io_uring: don't mark S_ISBLK async work as unbounded
Damien Le Moal <damien.lemoal(a)wdc.com>
null_blk: fix command timeout completion handling
Matthew Wilcox (Oracle) <willy(a)infradead.org>
idr test suite: Create anchor before launching throbber
Matthew Wilcox (Oracle) <willy(a)infradead.org>
idr test suite: Take RCU read lock in idr_find_test_1
Matthew Wilcox (Oracle) <willy(a)infradead.org>
radix tree test suite: Register the main thread with the RCU library
Yufen Yu <yuyufen(a)huawei.com>
block: only update parent bi_status when bio fail
Matthew Wilcox (Oracle) <willy(a)infradead.org>
radix tree test suite: Fix compilation
Matthew Wilcox (Oracle) <willy(a)infradead.org>
XArray: Fix splitting to non-zero orders
Mikko Perttunen <mperttunen(a)nvidia.com>
gpu: host1x: Use different lock classes for each client
Dmitry Osipenko <digetx(a)gmail.com>
drm/tegra: dc: Don't set PLL clock to 0Hz
Stefan Raspl <raspl(a)linux.ibm.com>
tools/kvm_stat: Add restart delay
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ftrace: Check if pages were allocated before calling free_pages()
Bob Peterson <rpeterso(a)redhat.com>
gfs2: report "already frozen/thawed" errors
Arnd Bergmann <arnd(a)arndb.de>
drm/imx: imx-ldb: fix out of bounds array access warning
Suzuki K Poulose <suzuki.poulose(a)arm.com>
KVM: arm64: Disable guest access to trace filter controls
Suzuki K Poulose <suzuki.poulose(a)arm.com>
KVM: arm64: Hide system instruction access to Trace registers
Andrew Price <anprice(a)redhat.com>
gfs2: Flag a withdraw if init_threads() fails
Jia-Ju Bai <baijiaju1990(a)gmail.com>
interconnect: core: fix error return code of icc_link_destroy()
-------------
Diffstat:
Makefile | 4 +--
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/kernel/cpufeature.c | 1 -
arch/arm64/kvm/debug.c | 2 ++
arch/riscv/kernel/entry.S | 1 +
block/bio.c | 2 +-
drivers/block/null_blk.h | 1 +
drivers/block/null_blk_main.c | 26 ++++++++++++++----
drivers/gpu/drm/imx/imx-ldb.c | 10 +++++++
drivers/gpu/drm/tegra/dc.c | 10 +++----
drivers/gpu/host1x/bus.c | 10 ++++---
drivers/interconnect/core.c | 2 ++
drivers/net/phy/sfp-bus.c | 11 ++++----
drivers/net/phy/sfp.c | 36 +++++++++++++++----------
drivers/xen/events/events_base.c | 4 +--
fs/block_dev.c | 4 +++
fs/gfs2/super.c | 14 ++++++----
fs/io_uring.c | 2 +-
include/linux/host1x.h | 9 ++++++-
kernel/trace/ftrace.c | 9 ++++---
lib/test_xarray.c | 26 +++++++++---------
lib/xarray.c | 4 +--
net/ipv4/netfilter/arp_tables.c | 2 ++
net/ipv4/netfilter/ip_tables.c | 2 ++
net/ipv6/netfilter/ip6_tables.c | 2 ++
net/netfilter/x_tables.c | 10 ++-----
tools/kvm/kvm_stat/kvm_stat.service | 1 +
tools/perf/util/map.c | 7 +++--
tools/testing/radix-tree/idr-test.c | 10 +++++--
tools/testing/radix-tree/linux/compiler_types.h | 0
tools/testing/radix-tree/multiorder.c | 2 ++
tools/testing/radix-tree/xarray.c | 2 ++
32 files changed, 149 insertions(+), 78 deletions(-)
This is the start of the stable review cycle for the 5.4.113 release.
There are 18 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Apr 2021 14:44:01 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.113-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.113-rc1
Juergen Gross <jgross(a)suse.com>
xen/events: fix setting irq affinity
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
Chris Wilson <chris(a)chris-wilson.co.uk>
perf tools: Use %zd for size_t printf formats on 32-bit
Jiri Olsa <jolsa(a)redhat.com>
perf tools: Use %define api.pure full instead of %pure-parser
Saravana Kannan <saravanak(a)google.com>
driver core: Fix locking bug in deferred_probe_timeout_work_func()
Florian Westphal <fw(a)strlen.de>
netfilter: x_tables: fix compat match/target pad out-of-bound write
Pavel Begunkov <asml.silence(a)gmail.com>
block: don't ignore REQ_NOWAIT for direct IO
Zihao Yu <yuzihao(a)ict.ac.cn>
riscv,entry: fix misaligned base for excp_vect_table
Matthew Wilcox (Oracle) <willy(a)infradead.org>
idr test suite: Create anchor before launching throbber
Matthew Wilcox (Oracle) <willy(a)infradead.org>
idr test suite: Take RCU read lock in idr_find_test_1
Matthew Wilcox (Oracle) <willy(a)infradead.org>
radix tree test suite: Register the main thread with the RCU library
Yufen Yu <yuyufen(a)huawei.com>
block: only update parent bi_status when bio fail
Dmitry Osipenko <digetx(a)gmail.com>
drm/tegra: dc: Don't set PLL clock to 0Hz
Bob Peterson <rpeterso(a)redhat.com>
gfs2: report "already frozen/thawed" errors
Arnd Bergmann <arnd(a)arndb.de>
drm/imx: imx-ldb: fix out of bounds array access warning
Suzuki K Poulose <suzuki.poulose(a)arm.com>
KVM: arm64: Disable guest access to trace filter controls
Suzuki K Poulose <suzuki.poulose(a)arm.com>
KVM: arm64: Hide system instruction access to Trace registers
Jia-Ju Bai <baijiaju1990(a)gmail.com>
interconnect: core: fix error return code of icc_link_destroy()
-------------
Diffstat:
Makefile | 4 ++--
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/kernel/cpufeature.c | 1 -
arch/arm64/kvm/debug.c | 2 ++
arch/riscv/kernel/entry.S | 1 +
block/bio.c | 2 +-
drivers/base/dd.c | 8 +++++---
drivers/gpu/drm/imx/imx-ldb.c | 10 ++++++++++
drivers/gpu/drm/tegra/dc.c | 10 +++++-----
drivers/interconnect/core.c | 2 ++
drivers/xen/events/events_base.c | 4 ++--
fs/block_dev.c | 4 ++++
fs/gfs2/super.c | 10 ++++++----
net/ipv4/netfilter/arp_tables.c | 2 ++
net/ipv4/netfilter/ip_tables.c | 2 ++
net/ipv6/netfilter/ip6_tables.c | 2 ++
net/netfilter/x_tables.c | 10 ++--------
tools/perf/util/expr.y | 3 ++-
tools/perf/util/map.c | 7 +++----
tools/perf/util/parse-events.y | 2 +-
tools/perf/util/session.c | 2 +-
tools/perf/util/zstd.c | 2 +-
tools/testing/radix-tree/idr-test.c | 10 ++++++++--
tools/testing/radix-tree/multiorder.c | 2 ++
tools/testing/radix-tree/xarray.c | 2 ++
25 files changed, 69 insertions(+), 36 deletions(-)
This is the start of the stable review cycle for the 4.19.188 release.
There are 13 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Apr 2021 14:44:01 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.188-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.188-rc1
Juergen Gross <jgross(a)suse.com>
xen/events: fix setting irq affinity
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
Saravana Kannan <saravanak(a)google.com>
driver core: Fix locking bug in deferred_probe_timeout_work_func()
Florian Westphal <fw(a)strlen.de>
netfilter: x_tables: fix compat match/target pad out-of-bound write
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
staging: m57621-mmc: delete driver from the tree.
Florian Fainelli <f.fainelli(a)gmail.com>
net: phy: broadcom: Only advertise EEE for supported modes
Zihao Yu <yuzihao(a)ict.ac.cn>
riscv,entry: fix misaligned base for excp_vect_table
Yufen Yu <yuyufen(a)huawei.com>
block: only update parent bi_status when bio fail
Dmitry Osipenko <digetx(a)gmail.com>
drm/tegra: dc: Don't set PLL clock to 0Hz
Bob Peterson <rpeterso(a)redhat.com>
gfs2: report "already frozen/thawed" errors
Arnd Bergmann <arnd(a)arndb.de>
drm/imx: imx-ldb: fix out of bounds array access warning
Suzuki K Poulose <suzuki.poulose(a)arm.com>
KVM: arm64: Disable guest access to trace filter controls
Suzuki K Poulose <suzuki.poulose(a)arm.com>
KVM: arm64: Hide system instruction access to Trace registers
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/kernel/cpufeature.c | 1 -
arch/arm64/kvm/debug.c | 2 +
arch/riscv/kernel/entry.S | 1 +
block/bio.c | 2 +-
drivers/base/dd.c | 8 +-
drivers/gpu/drm/imx/imx-ldb.c | 10 +
drivers/gpu/drm/tegra/dc.c | 10 +-
drivers/net/phy/bcm-phy-lib.c | 11 +-
drivers/staging/Kconfig | 2 -
drivers/staging/Makefile | 1 -
drivers/staging/mt7621-mmc/Kconfig | 16 -
drivers/staging/mt7621-mmc/Makefile | 42 -
drivers/staging/mt7621-mmc/TODO | 8 -
drivers/staging/mt7621-mmc/board.h | 63 -
drivers/staging/mt7621-mmc/dbg.c | 307 ----
drivers/staging/mt7621-mmc/dbg.h | 149 --
drivers/staging/mt7621-mmc/mt6575_sd.h | 488 -------
drivers/staging/mt7621-mmc/sd.c | 2392 --------------------------------
drivers/xen/events/events_base.c | 4 +-
fs/gfs2/super.c | 10 +-
net/ipv4/netfilter/arp_tables.c | 2 +
net/ipv4/netfilter/ip_tables.c | 2 +
net/ipv6/netfilter/ip6_tables.c | 2 +
net/netfilter/x_tables.c | 10 +-
tools/perf/util/map.c | 7 +-
27 files changed, 54 insertions(+), 3501 deletions(-)
While this code is executed with the wait_lock held, a reader can
acquire the lock without holding wait_lock. The writer side loops
checking the value with the atomic_cond_read_acquire(), but only truly
acquires the lock when the compare-and-exchange is completed
successfully which isn’t ordered. This exposes the window between the
acquire and the cmpxchg to an A-B-A problem which allows reads following
the lock acquisition to observe values speculatively before the write
lock is truly acquired.
We've seen a problem in epoll where the reader does a xchg while
holding the read lock, but the writer can see a value change out from under it.
Writer | Reader 2
--------------------------------------------------------------------------------
ep_scan_ready_list() |
|- write_lock_irq() |
|- queued_write_lock_slowpath() |
|- atomic_cond_read_acquire() |
| read_lock_irqsave(&ep->lock, flags);
--> (observes value before unlock)| chain_epi_lockless()
| | epi->next = xchg(&ep->ovflist, epi);
| | read_unlock_irqrestore(&ep->lock, flags);
| |
| atomic_cmpxchg_relaxed() |
|-- READ_ONCE(ep->ovflist); |
A core can order the read of the ovflist ahead of the
atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire semantics
addresses this issue at which point the atomic_cond_read can be switched
to use relaxed semantics.
Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock")
Signed-off-by: Ali Saidi <alisaidi(a)amazon.com>
Cc: stable(a)vger.kernel.org
Acked-by: Will Deacon <will(a)kernel.org>
Tested-by: Steve Capper <steve.capper(a)arm.com>
Reviewed-by: Steve Capper <steve.capper(a)arm.com>
---
kernel/locking/qrwlock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 4786dd271b45..10770f6ac4d9 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -73,8 +73,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
/* When no more readers or writers, set the locked flag */
do {
- atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
- } while (atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING,
+ atomic_cond_read_relaxed(&lock->cnts, VAL == _QW_WAITING);
+ } while (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING,
_QW_LOCKED) != _QW_WAITING);
unlock:
arch_spin_unlock(&lock->wait_lock);
--
2.24.4.AMZN