This is the start of the stable review cycle for the 4.9.115 release. There are 28 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jul 25 12:24:13 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.115-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.115-rc1
Alan Jenkins alan.christopher.jenkins@gmail.com block: do not use interruptible wait anywhere
Chuck Lever chuck.lever@oracle.com xprtrdma: Return -ENOBUFS when no pages are available
Mathias Nyman mathias.nyman@linux.intel.com xhci: Fix perceived dead host due to runtime suspend race with event handler
Stefano Brivio sbrivio@redhat.com skbuff: Unconditionally copy pfmemalloc in __skb_clone()
Stefano Brivio sbrivio@redhat.com net: Don't copy pfmemalloc flag in __copy_skb_header()
Alexander Couzens lynxis@fe80.eu net: usb: asix: replace mii_nway_restart in resume path
Sanjeev Bansal sanjeevb.bansal@broadcom.com tg3: Add higher cpu clock for 5762.
Matevz Vucnik vucnikm@gmail.com qmi_wwan: add support for Quectel EG91
Gustavo A. R. Silva gustavo@embeddedor.com ptp: fix missing break in switch
Heiner Kallweit hkallweit1@gmail.com net: phy: fix flag masking in __set_phy_supported
David Ahern dsahern@gmail.com net/ipv4: Set oif in fib_compute_spec_dst
Lorenzo Colitti lorenzo@google.com net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort
Davidlohr Bueso dave@stgolabs.net lib/rhashtable: consider param->min_size when setting initial table size
Colin Ian King colin.king@canonical.com ipv6: fix useless rol32 call on hash
Tyler Hicks tyhicks@canonical.com ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
Toke Høiland-Jørgensen toke@toke.dk gen_stats: Fix netlink stats dumping in the presence of padding
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915: Fix hotplug irq ack on i965/g4x
Gustavo A. R. Silva gustavo@embeddedor.com vfio/pci: Fix potential Spectre v1
Hugh Dickins hughd@google.com mm/huge_memory.c: fix data loss when splitting a file pmd
Jing Xia jing.xia.mail@gmail.com mm: memcg: fix use after free in mem_cgroup_iter()
Alexey Brodkin Alexey.Brodkin@synopsys.com ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs
Vineet Gupta vgupta@synopsys.com ARC: mm: allow mprotect to make stack mappings executable
Alexey Brodkin abrodkin@synopsys.com ARC: Fix CONFIG_SWAP
Takashi Iwai tiwai@suse.de ALSA: rawmidi: Change resized buffers atomically
OGAWA Hirofumi hirofumi@mail.parknet.co.jp fat: fix memory allocation failure handling of match_strdup()
Dewet Thibaut thibaut.dewet@nokia.com x86/MCE: Remove min interval polling limitation
Ville Syrjälä ville.syrjala@linux.intel.com x86/apm: Don't access __preempt_count with zeroed fs
Lan Tianyu tianyu.lan@intel.com KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
-------------
Diffstat:
Makefile | 4 +-- arch/arc/configs/axs101_defconfig | 1 - arch/arc/configs/axs103_defconfig | 1 - arch/arc/configs/axs103_smp_defconfig | 1 - arch/arc/configs/nsim_700_defconfig | 1 - arch/arc/configs/nsim_hs_defconfig | 1 - arch/arc/configs/nsim_hs_smp_defconfig | 1 - arch/arc/configs/nsimosci_defconfig | 1 - arch/arc/configs/nsimosci_hs_defconfig | 1 - arch/arc/configs/nsimosci_hs_smp_defconfig | 1 - arch/arc/include/asm/page.h | 2 +- arch/arc/include/asm/pgtable.h | 2 +- arch/x86/include/asm/apm.h | 6 ----- arch/x86/kernel/apm_32.c | 5 ++++ arch/x86/kernel/cpu/mcheck/mce.c | 3 --- block/blk-core.c | 9 +++---- drivers/gpu/drm/i915/i915_irq.c | 32 ++++++++++++++++++++++-- drivers/net/ethernet/broadcom/tg3.c | 9 +++++++ drivers/net/phy/phy_device.c | 7 ++---- drivers/net/usb/asix_devices.c | 4 ++- drivers/net/usb/qmi_wwan.c | 1 + drivers/ptp/ptp_chardev.c | 1 + drivers/usb/host/xhci.c | 40 +++++++++++++++++++++++++++--- drivers/usb/host/xhci.h | 4 +++ drivers/vfio/pci/vfio_pci.c | 4 +++ fs/fat/inode.c | 20 +++++++++------ include/linux/skbuff.h | 10 ++++---- include/net/ipv6.h | 2 +- lib/rhashtable.c | 17 ++++++++----- mm/huge_memory.c | 2 ++ mm/memcontrol.c | 2 +- net/core/gen_stats.c | 16 ++++++++++-- net/core/skbuff.c | 1 + net/ipv4/fib_frontend.c | 1 + net/ipv4/sysctl_net_ipv4.c | 5 ++-- net/ipv4/tcp.c | 3 +-- net/sunrpc/xprtrdma/rpc_rdma.c | 2 +- sound/core/rawmidi.c | 20 ++++++++++----- virt/kvm/eventfd.c | 6 ++++- 39 files changed, 176 insertions(+), 73 deletions(-)
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dewet Thibaut thibaut.dewet@nokia.com
commit fbdb328c6bae0a7c78d75734a738b66b86dffc96 upstream.
commit b3b7c4795c ("x86/MCE: Serialize sysfs changes") introduced a min interval limitation when setting the check interval for polled MCEs. However, the logic is that 0 disables polling for corrected MCEs, see Documentation/x86/x86_64/machinecheck. The limitation prevents disabling.
Remove this limitation and allow the value 0 to disable polling again.
Fixes: b3b7c4795c ("x86/MCE: Serialize sysfs changes") Signed-off-by: Dewet Thibaut thibaut.dewet@nokia.com Signed-off-by: Alexander Sverdlin alexander.sverdlin@nokia.com [ Massage commit message. ] Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Tony Luck tony.luck@intel.com Cc: linux-edac linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180716084927.24869-1-alexander.sverdlin@nokia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/mcheck/mce.c | 3 --- 1 file changed, 3 deletions(-)
--- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2397,9 +2397,6 @@ static ssize_t store_int_with_restart(st if (check_interval == old_check_interval) return ret;
- if (check_interval < 1) - check_interval = 1; - mutex_lock(&mce_sysfs_mutex); mce_restart(); mutex_unlock(&mce_sysfs_mutex);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: OGAWA Hirofumi hirofumi@mail.parknet.co.jp
commit 35033ab988c396ad7bce3b6d24060c16a9066db8 upstream.
In parse_options(), if match_strdup() failed, parse_options() leaves opts->iocharset in unexpected state (i.e. still pointing the freed string). And this can be the cause of double free.
To fix, this initialize opts->iocharset always when freeing.
Link: http://lkml.kernel.org/r/8736wp9dzc.fsf@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi hirofumi@mail.parknet.co.jp Reported-by: syzbot+90b8e10515ae88228a92@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fat/inode.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-)
--- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -696,13 +696,21 @@ static void fat_set_state(struct super_b brelse(bh); }
+static void fat_reset_iocharset(struct fat_mount_options *opts) +{ + if (opts->iocharset != fat_default_iocharset) { + /* Note: opts->iocharset can be NULL here */ + kfree(opts->iocharset); + opts->iocharset = fat_default_iocharset; + } +} + static void delayed_free(struct rcu_head *p) { struct msdos_sb_info *sbi = container_of(p, struct msdos_sb_info, rcu); unload_nls(sbi->nls_disk); unload_nls(sbi->nls_io); - if (sbi->options.iocharset != fat_default_iocharset) - kfree(sbi->options.iocharset); + fat_reset_iocharset(&sbi->options); kfree(sbi); }
@@ -1117,7 +1125,7 @@ static int parse_options(struct super_bl opts->fs_fmask = opts->fs_dmask = current_umask(); opts->allow_utime = -1; opts->codepage = fat_default_codepage; - opts->iocharset = fat_default_iocharset; + fat_reset_iocharset(opts); if (is_vfat) { opts->shortname = VFAT_SFN_DISPLAY_WINNT|VFAT_SFN_CREATE_WIN95; opts->rodir = 0; @@ -1274,8 +1282,7 @@ static int parse_options(struct super_bl
/* vfat specific */ case Opt_charset: - if (opts->iocharset != fat_default_iocharset) - kfree(opts->iocharset); + fat_reset_iocharset(opts); iocharset = match_strdup(&args[0]); if (!iocharset) return -ENOMEM; @@ -1866,8 +1873,7 @@ out_fail: iput(fat_inode); unload_nls(sbi->nls_io); unload_nls(sbi->nls_disk); - if (sbi->options.iocharset != fat_default_iocharset) - kfree(sbi->options.iocharset); + fat_reset_iocharset(&sbi->options); sb->s_fs_info = NULL; kfree(sbi); return error;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 39675f7a7c7e7702f7d5341f1e0d01db746543a0 upstream.
The SNDRV_RAWMIDI_IOCTL_PARAMS ioctl may resize the buffers and the current code is racy. For example, the sequencer client may write to buffer while it being resized.
As a simple workaround, let's switch to the resized buffer inside the stream runtime lock.
Reported-by: syzbot+52f83f0ea8df16932f7f@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/rawmidi.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -635,7 +635,7 @@ static int snd_rawmidi_info_select_user( int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params * params) { - char *newbuf; + char *newbuf, *oldbuf; struct snd_rawmidi_runtime *runtime = substream->runtime; if (substream->append && substream->use_count > 1) @@ -648,13 +648,17 @@ int snd_rawmidi_output_params(struct snd return -EINVAL; } if (params->buffer_size != runtime->buffer_size) { - newbuf = krealloc(runtime->buffer, params->buffer_size, - GFP_KERNEL); + newbuf = kmalloc(params->buffer_size, GFP_KERNEL); if (!newbuf) return -ENOMEM; + spin_lock_irq(&runtime->lock); + oldbuf = runtime->buffer; runtime->buffer = newbuf; runtime->buffer_size = params->buffer_size; runtime->avail = runtime->buffer_size; + runtime->appl_ptr = runtime->hw_ptr = 0; + spin_unlock_irq(&runtime->lock); + kfree(oldbuf); } runtime->avail_min = params->avail_min; substream->active_sensing = !params->no_active_sensing; @@ -665,7 +669,7 @@ EXPORT_SYMBOL(snd_rawmidi_output_params) int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params * params) { - char *newbuf; + char *newbuf, *oldbuf; struct snd_rawmidi_runtime *runtime = substream->runtime;
snd_rawmidi_drain_input(substream); @@ -676,12 +680,16 @@ int snd_rawmidi_input_params(struct snd_ return -EINVAL; } if (params->buffer_size != runtime->buffer_size) { - newbuf = krealloc(runtime->buffer, params->buffer_size, - GFP_KERNEL); + newbuf = kmalloc(params->buffer_size, GFP_KERNEL); if (!newbuf) return -ENOMEM; + spin_lock_irq(&runtime->lock); + oldbuf = runtime->buffer; runtime->buffer = newbuf; runtime->buffer_size = params->buffer_size; + runtime->appl_ptr = runtime->hw_ptr = 0; + spin_unlock_irq(&runtime->lock); + kfree(oldbuf); } runtime->avail_min = params->avail_min; return 0;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Brodkin abrodkin@synopsys.com
commit 6e3761145a9ba3ce267c330b6bff51cf6a057b06 upstream.
swap was broken on ARC due to silly copy-paste issue.
We encode offset from swapcache page in __swp_entry() as (off << 13) but were not decoding back in __swp_offset() as (off >> 13) - it was still (off << 13).
This finally fixes swap usage on ARC.
| # mkswap /dev/sda2 | | # swapon -a -e /dev/sda2 | Adding 500728k swap on /dev/sda2. Priority:-2 extents:1 across:500728k | | # free | total used free shared buffers cached | Mem: 765104 13456 751648 4736 8 4736 | -/+ buffers/cache: 8712 756392 | Swap: 500728 0 500728
Cc: stable@vger.kernel.org Signed-off-by: Alexey Brodkin abrodkin@synopsys.com Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arc/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arc/include/asm/pgtable.h +++ b/arch/arc/include/asm/pgtable.h @@ -378,7 +378,7 @@ void update_mmu_cache(struct vm_area_str
/* Decode a PTE containing swap "identifier "into constituents */ #define __swp_type(pte_lookalike) (((pte_lookalike).val) & 0x1f) -#define __swp_offset(pte_lookalike) ((pte_lookalike).val << 13) +#define __swp_offset(pte_lookalike) ((pte_lookalike).val >> 13)
/* NOPs, to keep generic kernel happy */ #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vineet Gupta vgupta@synopsys.com
commit 93312b6da4df31e4102ce5420e6217135a16c7ea upstream.
mprotect(EXEC) was failing for stack mappings as default vm flags was missing MAYEXEC.
This was triggered by glibc test suite nptl/tst-execstack testcase
What is surprising is that despite running LTP for years on, we didn't catch this issue as it lacks a directed test case.
gcc dejagnu tests with nested functions also requiring exec stack work fine though because they rely on the GNU_STACK segment spit out by compiler and handled in kernel elf loader.
This glibc case is different as the stack is non exec to begin with and a dlopen of shared lib with GNU_STACK segment triggers the exec stack proceedings using a mprotect(PROT_EXEC) which was broken.
CC: stable@vger.kernel.org Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arc/include/asm/page.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arc/include/asm/page.h +++ b/arch/arc/include/asm/page.h @@ -105,7 +105,7 @@ typedef pte_t * pgtable_t; #define virt_addr_valid(kaddr) pfn_valid(virt_to_pfn(kaddr))
/* Default Permissions for stack/heaps pages (Non Executable) */ -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE) +#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
#define WANT_PAGE_VIRTUAL 1
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Brodkin Alexey.Brodkin@synopsys.com
commit 64234961c145606b36eaa82c47b11be842b21049 upstream.
We used to have pre-set CONFIG_INITRAMFS_SOURCE with local path to intramfs in ARC defconfigs. This was quite convenient for in-house development but not that convenient for newcomers who obviusly don't have folders like "arc_initramfs" next to the Linux source tree. Which leads to quite surprising failure of defconfig building: ------------------------------->8----------------------------- ../scripts/gen_initramfs_list.sh: Cannot open '../../arc_initramfs_hs/' ../usr/Makefile:57: recipe for target 'usr/initramfs_data.cpio.gz' failed make[2]: *** [usr/initramfs_data.cpio.gz] Error 1 ------------------------------->8-----------------------------
So now when more and more people start to deal with our defconfigs let's make their life easier with removal of CONFIG_INITRAMFS_SOURCE.
Signed-off-by: Alexey Brodkin abrodkin@synopsys.com Cc: Kevin Hilman khilman@baylibre.com Cc: stable@vger.kernel.org Signed-off-by: Alexey Brodkin abrodkin@synopsys.com Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arc/configs/axs101_defconfig | 1 - arch/arc/configs/axs103_defconfig | 1 - arch/arc/configs/axs103_smp_defconfig | 1 - arch/arc/configs/nsim_700_defconfig | 1 - arch/arc/configs/nsim_hs_defconfig | 1 - arch/arc/configs/nsim_hs_smp_defconfig | 1 - arch/arc/configs/nsimosci_defconfig | 1 - arch/arc/configs/nsimosci_hs_defconfig | 1 - arch/arc/configs/nsimosci_hs_smp_defconfig | 1 - 9 files changed, 9 deletions(-)
--- a/arch/arc/configs/axs101_defconfig +++ b/arch/arc/configs/axs101_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs/" CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y # CONFIG_VM_EVENT_COUNTERS is not set --- a/arch/arc/configs/axs103_defconfig +++ b/arch/arc/configs/axs103_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../../arc_initramfs_hs/" CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y # CONFIG_VM_EVENT_COUNTERS is not set --- a/arch/arc/configs/axs103_smp_defconfig +++ b/arch/arc/configs/axs103_smp_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../../arc_initramfs_hs/" CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y # CONFIG_VM_EVENT_COUNTERS is not set --- a/arch/arc/configs/nsim_700_defconfig +++ b/arch/arc/configs/nsim_700_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y --- a/arch/arc/configs/nsim_hs_defconfig +++ b/arch/arc/configs/nsim_hs_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../../arc_initramfs_hs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y --- a/arch/arc/configs/nsim_hs_smp_defconfig +++ b/arch/arc/configs/nsim_hs_smp_defconfig @@ -9,7 +9,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y --- a/arch/arc/configs/nsimosci_defconfig +++ b/arch/arc/configs/nsimosci_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y --- a/arch/arc/configs/nsimosci_hs_defconfig +++ b/arch/arc/configs/nsimosci_hs_defconfig @@ -11,7 +11,6 @@ CONFIG_NAMESPACES=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/" CONFIG_KALLSYMS_ALL=y CONFIG_EMBEDDED=y CONFIG_PERF_EVENTS=y --- a/arch/arc/configs/nsimosci_hs_smp_defconfig +++ b/arch/arc/configs/nsimosci_hs_smp_defconfig @@ -9,7 +9,6 @@ CONFIG_IKCONFIG_PROC=y # CONFIG_UTS_NS is not set # CONFIG_PID_NS is not set CONFIG_BLK_DEV_INITRD=y -CONFIG_INITRAMFS_SOURCE="../arc_initramfs_hs/" CONFIG_PERF_EVENTS=y # CONFIG_COMPAT_BRK is not set CONFIG_KPROBES=y
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jing Xia jing.xia.mail@gmail.com
commit 9f15bde671355c351cf20d9f879004b234353100 upstream.
It was reported that a kernel crash happened in mem_cgroup_iter(), which can be triggered if the legacy cgroup-v1 non-hierarchical mode is used.
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f ...... Call trace: mem_cgroup_iter+0x2e0/0x6d4 shrink_zone+0x8c/0x324 balance_pgdat+0x450/0x640 kswapd+0x130/0x4b8 kthread+0xe8/0xfc ret_from_fork+0x10/0x20
mem_cgroup_iter(): ...... if (css_tryget(css)) <-- crash here break; ......
The crashing reason is that mem_cgroup_iter() uses the memcg object whose pointer is stored in iter->position, which has been freed before and filled with POISON_FREE(0x6b).
And the root cause of the use-after-free issue is that invalidate_reclaim_iterators() fails to reset the value of iter->position to NULL when the css of the memcg is released in non- hierarchical mode.
Link: http://lkml.kernel.org/r/1531994807-25639-1-git-send-email-jing.xia@unisoc.c... Fixes: 6df38689e0e9 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim") Signed-off-by: Jing Xia jing.xia.mail@gmail.com Acked-by: Michal Hocko mhocko@suse.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Vladimir Davydov vdavydov.dev@gmail.com Cc: chunyan.zhang@unisoc.com Cc: Shakeel Butt shakeelb@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -895,7 +895,7 @@ static void invalidate_reclaim_iterators int nid; int i;
- while ((memcg = parent_mem_cgroup(memcg))) { + for (; memcg; memcg = parent_mem_cgroup(memcg)) { for_each_node(nid) { mz = mem_cgroup_nodeinfo(memcg, nid); for (i = 0; i <= DEF_PRIORITY; i++) {
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins hughd@google.com
commit e1f1b1572e8db87a56609fd05bef76f98f0e456a upstream.
__split_huge_pmd_locked() must check if the cleared huge pmd was dirty, and propagate that to PageDirty: otherwise, data may be lost when a huge tmpfs page is modified then split then reclaimed.
How has this taken so long to be noticed? Because there was no problem when the huge page is written by a write system call (shmem_write_end() calls set_page_dirty()), nor when the page is allocated for a write fault (fault_dirty_shared_page() calls set_page_dirty()); but when allocated for a read fault (which MAP_POPULATE simulates), no set_page_dirty().
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1807111741430.1106@eggly.anvils Fixes: d21b9e57c74c ("thp: handle file pages in split_huge_pmd()") Signed-off-by: Hugh Dickins hughd@google.com Reported-by: Ashwin Chaugule ashwinch@google.com Reviewed-by: Yang Shi yang.shi@linux.alibaba.com Reviewed-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: "Huang, Ying" ying.huang@intel.com Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/huge_memory.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1642,6 +1642,8 @@ static void __split_huge_pmd_locked(stru if (vma_is_dax(vma)) return; page = pmd_page(_pmd); + if (!PageDirty(page) && pmd_dirty(_pmd)) + set_page_dirty(page); if (!PageReferenced(page) && pmd_young(_pmd)) SetPageReferenced(page); page_remove_rmap(page, true);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit 0e714d27786ce1fb3efa9aac58abc096e68b1c2a upstream.
info.index can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/vfio/pci/vfio_pci.c:734 vfio_pci_ioctl() warn: potential spectre issue 'vdev->region'
Fix this by sanitizing info.index before indirectly using it to index vdev->region
Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/vfio/pci/vfio_pci.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -28,6 +28,7 @@ #include <linux/uaccess.h> #include <linux/vfio.h> #include <linux/vgaarb.h> +#include <linux/nospec.h>
#include "vfio_pci_private.h"
@@ -755,6 +756,9 @@ static long vfio_pci_ioctl(void *device_ if (info.index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) return -EINVAL; + info.index = array_index_nospec(info.index, + VFIO_PCI_NUM_REGIONS + + vdev->num_regions);
i = info.index - VFIO_PCI_NUM_REGIONS;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tyler Hicks tyhicks@canonical.com
[ Upstream commit 70ba5b6db96ff7324b8cfc87e0d0383cf59c9677 ]
The low and high values of the net.ipv4.ping_group_range sysctl were being silently forced to the default disabled state when a write to the sysctl contained GIDs that didn't map to the associated user namespace. Confusingly, the sysctl's write operation would return success and then a subsequent read of the sysctl would indicate that the low and high values are the overflowgid.
This patch changes the behavior by clearly returning an error when the sysctl write operation receives a GID range that doesn't map to the associated user namespace. In such a situation, the previous value of the sysctl is preserved and that range will be returned in a subsequent read of the sysctl.
Signed-off-by: Tyler Hicks tyhicks@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/sysctl_net_ipv4.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -140,8 +140,9 @@ static int ipv4_ping_group_range(struct if (write && ret == 0) { low = make_kgid(user_ns, urange[0]); high = make_kgid(user_ns, urange[1]); - if (!gid_valid(low) || !gid_valid(high) || - (urange[1] < urange[0]) || gid_lt(high, low)) { + if (!gid_valid(low) || !gid_valid(high)) + return -EINVAL; + if (urange[1] < urange[0] || gid_lt(high, low)) { low = make_kgid(&init_user_ns, 1); high = make_kgid(&init_user_ns, 0); }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Colin Ian King colin.king@canonical.com
[ Upstream commit 169dc027fb02492ea37a0575db6a658cf922b854 ]
The rol32 call is currently rotating hash but the rol'd value is being discarded. I believe the current code is incorrect and hash should be assigned the rotated value returned from rol32.
Thanks to David Lebrun for spotting this.
Signed-off-by: Colin Ian King colin.king@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ipv6.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -794,7 +794,7 @@ static inline __be32 ip6_make_flowlabel( * to minimize possbility that any useful information to an * attacker is leaked. Only lower 20 bits are relevant. */ - rol32(hash, 16); + hash = rol32(hash, 16);
flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Davidlohr Bueso dave@stgolabs.net
[ Upstream commit 107d01f5ba10f4162c38109496607eb197059064 ]
rhashtable_init() currently does not take into account the user-passed min_size parameter unless param->nelem_hint is set as well. As such, the default size (number of buckets) will always be HASH_DEFAULT_SIZE even if the smallest allowed size is larger than that. Remediate this by unconditionally calling into rounded_hashtable_size() and handling things accordingly.
Signed-off-by: Davidlohr Bueso dbueso@suse.de Acked-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/rhashtable.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
--- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -783,8 +783,16 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop);
static size_t rounded_hashtable_size(const struct rhashtable_params *params) { - return max(roundup_pow_of_two(params->nelem_hint * 4 / 3), - (unsigned long)params->min_size); + size_t retsize; + + if (params->nelem_hint) + retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3), + (unsigned long)params->min_size); + else + retsize = max(HASH_DEFAULT_SIZE, + (unsigned long)params->min_size); + + return retsize; }
static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed) @@ -841,8 +849,6 @@ int rhashtable_init(struct rhashtable *h struct bucket_table *tbl; size_t size;
- size = HASH_DEFAULT_SIZE; - if ((!params->key_len && !params->obj_hashfn) || (params->obj_hashfn && !params->obj_cmpfn)) return -EINVAL; @@ -869,8 +875,7 @@ int rhashtable_init(struct rhashtable *h
ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
- if (params->nelem_hint) - size = rounded_hashtable_size(&ht->p); + size = rounded_hashtable_size(&ht->p);
/* The maximum (not average) chain length grows with the * size of the hash table, at a rate of (log N)/(log log N).
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Colitti lorenzo@google.com
[ Upstream commit acc2cf4e37174646a24cba42fa53c668b2338d4e ]
When tcp_diag_destroy closes a TCP_NEW_SYN_RECV socket, it first frees it by calling inet_csk_reqsk_queue_drop_and_and_put in tcp_abort, and then frees it again by calling sock_gen_put.
Since tcp_abort only has one caller, and all the other codepaths in tcp_abort don't free the socket, just remove the free in that function.
Cc: David Ahern dsa@cumulusnetworks.com Tested: passes Android sock_diag_test.py, which exercises this codepath Fixes: d7226c7a4dd1 ("net: diag: Fix refcnt leak in error path destroying socket") Signed-off-by: Lorenzo Colitti lorenzo@google.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: David Ahern dsa@cumulusnetworks.com Tested-by: David Ahern dsa@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3238,8 +3238,7 @@ int tcp_abort(struct sock *sk, int err) struct request_sock *req = inet_reqsk(sk);
local_bh_disable(); - inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, - req); + inet_csk_reqsk_queue_drop(req->rsk_listener, req); local_bh_enable(); return 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Ahern dsahern@gmail.com
[ Upstream commit e7372197e15856ec4ee66b668020a662994db103 ]
Xin reported that icmp replies may not use the address on the device the echo request is received if the destination address is broadcast. Instead a route lookup is done without considering VRF context. Fix by setting oif in flow struct to the master device if it is enslaved. That directs the lookup to the VRF table. If the device is not enslaved, oif is still 0 so no affect.
Fixes: cd2fbe1b6b51 ("net: Use VRF device index for lookups on RX") Reported-by: Xin Long lucien.xin@gmail.com Signed-off-by: David Ahern dsahern@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/fib_frontend.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -290,6 +290,7 @@ __be32 fib_compute_spec_dst(struct sk_bu if (!ipv4_is_zeronet(ip_hdr(skb)->saddr)) { struct flowi4 fl4 = { .flowi4_iif = LOOPBACK_IFINDEX, + .flowi4_oif = l3mdev_master_ifindex_rcu(dev), .daddr = ip_hdr(skb)->saddr, .flowi4_tos = RT_TOS(ip_hdr(skb)->tos), .flowi4_scope = scope,
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit df8ed346d4a806a6eef2db5924285e839604b3f9 ]
Currently also the pause flags are removed from phydev->supported because they're not included in PHY_DEFAULT_FEATURES. I don't think this is intended, especially when considering that this function can be called via phy_set_max_speed() anywhere in a driver. Change the masking to mask out only the values we're going to change. In addition remove the misleading comment, job of this small function is just to adjust the supported and advertised speeds.
Fixes: f3a6bd393c2c ("phylib: Add phy_set_max_speed helper") Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/phy_device.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
--- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1579,11 +1579,8 @@ static int gen10g_resume(struct phy_devi
static int __set_phy_supported(struct phy_device *phydev, u32 max_speed) { - /* The default values for phydev->supported are provided by the PHY - * driver "features" member, we want to reset to sane defaults first - * before supporting higher speeds. - */ - phydev->supported &= PHY_DEFAULT_FEATURES; + phydev->supported &= ~(PHY_1000BT_FEATURES | PHY_100BT_FEATURES | + PHY_10BT_FEATURES);
switch (max_speed) { default:
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Gustavo A. R. Silva" gustavo@embeddedor.com
[ Upstream commit 9ba8376ce1e2cbf4ce44f7e4bee1d0648e10d594 ]
It seems that a *break* is missing in order to avoid falling through to the default case. Otherwise, checking *chan* makes no sense.
Fixes: 72df7a7244c0 ("ptp: Allow reassigning calibration pin function") Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Acked-by: Richard Cochran richardcochran@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ptp/ptp_chardev.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/ptp/ptp_chardev.c +++ b/drivers/ptp/ptp_chardev.c @@ -89,6 +89,7 @@ int ptp_set_pinfunc(struct ptp_clock *pt case PTP_PF_PHYSYNC: if (chan != 0) return -EINVAL; + break; default: return -EINVAL; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sanjeev Bansal sanjeevb.bansal@broadcom.com
[ Upstream commit 3a498606bb04af603a46ebde8296040b2de350d1 ]
This patch has fix for TX timeout while running bi-directional traffic with 100 Mbps using 5762.
Signed-off-by: Sanjeev Bansal sanjeevb.bansal@broadcom.com Signed-off-by: Siva Reddy Kallam siva.kallam@broadcom.com Reviewed-by: Michael Chan michael.chan@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/broadcom/tg3.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -9276,6 +9276,15 @@ static int tg3_chip_reset(struct tg3 *tp
tg3_restore_clk(tp);
+ /* Increase the core clock speed to fix tx timeout issue for 5762 + * with 100Mbps link speed. + */ + if (tg3_asic_rev(tp) == ASIC_REV_5762) { + val = tr32(TG3_CPMU_CLCK_ORIDE_ENABLE); + tw32(TG3_CPMU_CLCK_ORIDE_ENABLE, val | + TG3_CPMU_MAC_ORIDE_ENABLE); + } + /* Reprobe ASF enable state. */ tg3_flag_clear(tp, ENABLE_ASF); tp->phy_flags &= ~(TG3_PHYFLG_1G_ON_VAUX_OK |
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Couzens lynxis@fe80.eu
[ Upstream commit 5c968f48021a9b3faa61ac2543cfab32461c0e05 ]
mii_nway_restart is not pm aware which results in a rtnl deadlock. Implement mii_nway_restart manual by setting BMCR_ANRESTART if BMCR_ANENABLE is set.
To reproduce: * plug an asix based usb network interface * wait until the device enters PM (~5 sec) * `ip link set eth1 up` will never return
Fixes: d9fe64e51114 ("net: asix: Add in_pm parameter") Signed-off-by: Alexander Couzens lynxis@fe80.eu Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/asix_devices.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -640,10 +640,12 @@ static void ax88772_restore_phy(struct u priv->presvd_phy_advertise);
/* Restore BMCR */ + if (priv->presvd_phy_bmcr & BMCR_ANENABLE) + priv->presvd_phy_bmcr |= BMCR_ANRESTART; + asix_mdio_write_nopm(dev->net, dev->mii.phy_id, MII_BMCR, priv->presvd_phy_bmcr);
- mii_nway_restart(&dev->mii); priv->presvd_phy_advertise = 0; priv->presvd_phy_bmcr = 0; }
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Brivio sbrivio@redhat.com
[ Upstream commit 8b7008620b8452728cadead460a36f64ed78c460 ]
The pfmemalloc flag indicates that the skb was allocated from the PFMEMALLOC reserves, and the flag is currently copied on skb copy and clone.
However, an skb copied from an skb flagged with pfmemalloc wasn't necessarily allocated from PFMEMALLOC reserves, and on the other hand an skb allocated that way might be copied from an skb that wasn't.
So we should not copy the flag on skb copy, and rather decide whether to allow an skb to be associated with sockets unrelated to page reclaim depending only on how it was allocated.
Move the pfmemalloc flag before headers_start[0] using an existing 1-bit hole, so that __copy_skb_header() doesn't copy it.
When cloning, we'll now take care of this flag explicitly, contravening to the warning comment of __skb_clone().
While at it, restore the newline usage introduced by commit b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()") to visually separate bytes used in bitfields after headers_start[0], that was gone after commit a9e419dc7be6 ("netfilter: merge ctinfo into nfct pointer storage area"), and describe the pfmemalloc flag in the kernel-doc structure comment.
This doesn't change the size of sk_buff or cacheline boundaries, but consolidates the 15 bits hole before tc_index into a 2 bytes hole before csum, that could now be filled more easily.
Reported-by: Patrick Talbert ptalbert@redhat.com Fixes: c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves") Signed-off-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/skbuff.h | 10 +++++----- net/core/skbuff.c | 2 ++ 2 files changed, 7 insertions(+), 5 deletions(-)
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -602,6 +602,7 @@ static inline bool skb_mstamp_after(cons * @hash: the packet hash * @queue_mapping: Queue mapping for multiqueue devices * @xmit_more: More SKBs are pending for this queue + * @pfmemalloc: skbuff was allocated from PFMEMALLOC reserves * @ndisc_nodetype: router type (from link layer) * @ooo_okay: allow the mapping of a socket to a queue to be changed * @l4_hash: indicate hash is a canonical 4-tuple hash over transport @@ -692,7 +693,7 @@ struct sk_buff { peeked:1, head_frag:1, xmit_more:1, - __unused:1; /* one bit hole */ + pfmemalloc:1; kmemcheck_bitfield_end(flags1);
/* fields enclosed in headers_start/headers_end are copied @@ -712,19 +713,18 @@ struct sk_buff {
__u8 __pkt_type_offset[0]; __u8 pkt_type:3; - __u8 pfmemalloc:1; __u8 ignore_df:1; __u8 nfctinfo:3; - __u8 nf_trace:1; + __u8 ip_summed:2; __u8 ooo_okay:1; __u8 l4_hash:1; __u8 sw_hash:1; __u8 wifi_acked_valid:1; __u8 wifi_acked:1; - __u8 no_fcs:1; + /* Indicates the inner headers are valid in the skbuff. */ __u8 encapsulation:1; __u8 encap_hdr_csum:1; @@ -732,11 +732,11 @@ struct sk_buff { __u8 csum_complete_sw:1; __u8 csum_level:2; __u8 csum_bad:1; - #ifdef CONFIG_IPV6_NDISC_NODETYPE __u8 ndisc_nodetype:2; #endif __u8 ipvs_property:1; + __u8 inner_protocol_type:1; __u8 remcsum_offload:1; #ifdef CONFIG_NET_SWITCHDEV --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -904,6 +904,8 @@ static struct sk_buff *__skb_clone(struc n->cloned = 1; n->nohdr = 0; n->peeked = 0; + if (skb->pfmemalloc) + n->pfmemalloc = 1; n->destructor = NULL; C(tail); C(end);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Brivio sbrivio@redhat.com
[ Upstream commit e78bfb0751d4e312699106ba7efbed2bab1a53ca ]
Commit 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()") introduced a different handling for the pfmemalloc flag in copy and clone paths.
In __skb_clone(), now, the flag is set only if it was set in the original skb, but not cleared if it wasn't. This is wrong and might lead to socket buffers being flagged with pfmemalloc even if the skb data wasn't allocated from pfmemalloc reserves. Copy the flag instead of ORing it.
Reported-by: Sabrina Dubroca sd@queasysnail.net Fixes: 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()") Signed-off-by: Stefano Brivio sbrivio@redhat.com Tested-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/skbuff.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -904,8 +904,7 @@ static struct sk_buff *__skb_clone(struc n->cloned = 1; n->nohdr = 0; n->peeked = 0; - if (skb->pfmemalloc) - n->pfmemalloc = 1; + C(pfmemalloc); n->destructor = NULL; C(tail); C(end);
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit 229bc19fd7aca4f37964af06e3583c1c8f36b5d6 upstream.
Don't rely on event interrupt (EINT) bit alone to detect pending port change in resume. If no change event is detected the host may be suspended again, oterwise roothubs are resumed.
There is a lag in xHC setting EINT. If we don't notice the pending change in resume, and the controller is runtime suspeded again, it causes the event handler to assume host is dead as it will fail to read xHC registers once PCI puts the controller to D3 state.
[ 268.520969] xhci_hcd: xhci_resume: starting port polling. [ 268.520985] xhci_hcd: xhci_hub_status_data: stopping port polling. [ 268.521030] xhci_hcd: xhci_suspend: stopping port polling. [ 268.521040] xhci_hcd: // Setting command ring address to 0x349bd001 [ 268.521139] xhci_hcd: Port Status Change Event for port 3 [ 268.521149] xhci_hcd: resume root hub [ 268.521163] xhci_hcd: port resume event for port 3 [ 268.521168] xhci_hcd: xHC is not running. [ 268.521174] xhci_hcd: handle_port_status: starting port polling. [ 268.596322] xhci_hcd: xhci_hc_died: xHCI host controller not responding, assume dead
The EINT lag is described in a additional note in xhci specs 4.19.2:
"Due to internal xHC scheduling and system delays, there will be a lag between a change bit being set and the Port Status Change Event that it generated being written to the Event Ring. If SW reads the PORTSC and sees a change bit set, there is no guarantee that the corresponding Port Status Change Event has already been written into the Event Ring."
Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/host/xhci.c | 40 +++++++++++++++++++++++++++++++++++++--- drivers/usb/host/xhci.h | 4 ++++ 2 files changed, 41 insertions(+), 3 deletions(-)
--- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -891,6 +891,41 @@ static void xhci_disable_port_wake_on_bi spin_unlock_irqrestore(&xhci->lock, flags); }
+static bool xhci_pending_portevent(struct xhci_hcd *xhci) +{ + __le32 __iomem **port_array; + int port_index; + u32 status; + u32 portsc; + + status = readl(&xhci->op_regs->status); + if (status & STS_EINT) + return true; + /* + * Checking STS_EINT is not enough as there is a lag between a change + * bit being set and the Port Status Change Event that it generated + * being written to the Event Ring. See note in xhci 1.1 section 4.19.2. + */ + + port_index = xhci->num_usb2_ports; + port_array = xhci->usb2_ports; + while (port_index--) { + portsc = readl(port_array[port_index]); + if (portsc & PORT_CHANGE_MASK || + (portsc & PORT_PLS_MASK) == XDEV_RESUME) + return true; + } + port_index = xhci->num_usb3_ports; + port_array = xhci->usb3_ports; + while (port_index--) { + portsc = readl(port_array[port_index]); + if (portsc & PORT_CHANGE_MASK || + (portsc & PORT_PLS_MASK) == XDEV_RESUME) + return true; + } + return false; +} + /* * Stop HC (not bus-specific) * @@ -987,7 +1022,7 @@ EXPORT_SYMBOL_GPL(xhci_suspend); */ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) { - u32 command, temp = 0, status; + u32 command, temp = 0; struct usb_hcd *hcd = xhci_to_hcd(xhci); struct usb_hcd *secondary_hcd; int retval = 0; @@ -1109,8 +1144,7 @@ int xhci_resume(struct xhci_hcd *xhci, b done: if (retval == 0) { /* Resume root hubs only when have pending events. */ - status = readl(&xhci->op_regs->status); - if (status & STS_EINT) { + if (xhci_pending_portevent(xhci)) { usb_hcd_resume_root_hub(xhci->shared_hcd); usb_hcd_resume_root_hub(hcd); } --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -385,6 +385,10 @@ struct xhci_op_regs { #define PORT_PLC (1 << 22) /* port configure error change - port failed to configure its link partner */ #define PORT_CEC (1 << 23) +#define PORT_CHANGE_MASK (PORT_CSC | PORT_PEC | PORT_WRC | PORT_OCC | \ + PORT_RC | PORT_PLC | PORT_CEC) + + /* Cold Attach Status - xHC can set this bit to report device attached during * Sx state. Warm port reset should be perfomed to clear this bit and move port * to connected state.
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chuck Lever chuck.lever@oracle.com
commit a8f688ec437dc2045cc8f0c89fe877d5803850da upstream.
The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the transport never calls xprt_write_space() when more pages become available. -ENOBUFS will trigger the correct "delay briefly and call again" logic.
Fixes: 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract") Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/xprtrdma/rpc_rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -229,7 +229,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt /* alloc the pagelist for receiving buffer */ ppages[p] = alloc_page(GFP_ATOMIC); if (!ppages[p]) - return -EAGAIN; + return -ENOBUFS; } seg[n].mr_page = ppages[p]; seg[n].mr_offset = (void *)(unsigned long) page_base;
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Jenkins alan.christopher.jenkins@gmail.com
commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 upstream.
When blk_queue_enter() waits for a queue to unfreeze, or unset the PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI device is resumed asynchronously, i.e. after un-freezing userspace tasks.
So that commit exposed the bug as a regression in v4.15. A mysterious SIGBUS (or -EIO) sometimes happened during the time the device was being resumed. Most frequently, there was no kernel log message, and we saw Xorg or Xwayland killed by SIGBUS.[1]
[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
Without this fix, I get an IO error in this test:
# dd if=/dev/sda of=/dev/null iflag=direct & \ while killall -SIGUSR1 dd; do sleep 0.1; done & \ echo mem > /sys/power/state ; \ sleep 5; killall dd # stop after 5 seconds
The interruptible wait was added to blk_queue_enter in commit 3ef28e83ab15 ("block: generic request_queue reference counting"). Before then, the interruptible wait was only in blk-mq, but I don't think it could ever have been correct.
Reviewed-by: Bart Van Assche bart.vanassche@wdc.com Cc: stable@vger.kernel.org Signed-off-by: Alan Jenkins alan.christopher.jenkins@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-core.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
--- a/block/blk-core.c +++ b/block/blk-core.c @@ -652,7 +652,6 @@ EXPORT_SYMBOL(blk_alloc_queue); int blk_queue_enter(struct request_queue *q, bool nowait) { while (true) { - int ret;
if (percpu_ref_tryget_live(&q->q_usage_counter)) return 0; @@ -660,13 +659,11 @@ int blk_queue_enter(struct request_queue if (nowait) return -EBUSY;
- ret = wait_event_interruptible(q->mq_freeze_wq, - !atomic_read(&q->mq_freeze_depth) || - blk_queue_dying(q)); + wait_event(q->mq_freeze_wq, + !atomic_read(&q->mq_freeze_depth) || + blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; - if (ret) - return ret; } }
On Mon, Jul 23, 2018 at 02:25:00PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.115 release. There are 28 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jul 25 12:24:13 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.115-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled with -Werror, and installed onto my OnePlus 6.
No initial issues noticed in dmesg or general usage.
Thanks! Nathan
On Mon, Jul 23, 2018 at 08:12:09AM -0700, Nathan Chancellor wrote:
On Mon, Jul 23, 2018 at 02:25:00PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.115 release. There are 28 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jul 25 12:24:13 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.115-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled with -Werror, and installed onto my OnePlus 6.
No initial issues noticed in dmesg or general usage.
Thanks for testing 2 of these and letting me know.
greg k-h
On 23 July 2018 at 17:55, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.9.115 release. There are 28 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jul 25 12:24:13 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.115-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm and x86_64.
Summary ------------------------------------------------------------------------
kernel: 4.9.115-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.9.y git commit: e318b12e62e640f5d4bf996b28097812d10ec02c git describe: v4.9.114-29-ge318b12e62e6 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.114-29-...
No regressions (compared to build v4.9.113-67-g97637db053bf)
Ran 16395 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Mon, Jul 23, 2018 at 02:25:00PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.115 release. There are 28 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed Jul 25 12:24:13 UTC 2018. Anything received after that time might be too late.
Build results: total: 148 pass: 148 fail: 0 Qemu test results: total: 166 pass: 166 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
linux-stable-mirror@lists.linaro.org