This is the start of the stable review cycle for the 3.18.117 release. There are 27 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 29 10:26:38 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.117-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 3.18.117-rc1
Arnd Bergmann arnd@arndb.de turn off -Wattribute-alias
Arnd Bergmann arnd@arndb.de ARM: fix put_user() for gcc-8
Anssi Hannula anssi.hannula@bitwise.fi can: xilinx_can: fix RX overflow interrupt not being enabled
Anssi Hannula anssi.hannula@bitwise.fi can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
Anssi Hannula anssi.hannula@bitwise.fi can: xilinx_can: fix device dropping off bus on RX overrun
Anssi Hannula anssi.hannula@bitwise.fi can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
Jerry Zhang zhangjerry@google.com usb: gadget: f_fs: Only return delayed status when len is 0
Bin Liu b-liu@ti.com usb: core: handle hub C_PORT_OVER_CURRENT condition
Lubomir Rintel lkundrak@v3.sk usb: cdc_acm: Add quirk for Castles VEGA3000
Eric Dumazet edumazet@google.com tcp: detect malicious patterns in tcp_collapse_ofo_queue()
Eric Dumazet edumazet@google.com tcp: avoid collapses in tcp_prune_queue() if possible
Yuchung Cheng ycheng@google.com tcp: do not delay ACK in DCTCP upon CE status change
Yuchung Cheng ycheng@google.com tcp: do not cancel delay-AcK on DCTCP special ACK
Yuchung Cheng ycheng@google.com tcp: helpers to send special DCTCP ack
Yuchung Cheng ycheng@google.com tcp: fix dctcp delayed ACK schedule
Roopa Prabhu roopa@cumulusnetworks.com rtnetlink: add rtnl_link_state check in rtnl_configure_link
Jack Morgenstein jackm@dev.mellanox.co.il net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
Paolo Abeni pabeni@redhat.com ip: hash fragments consistently
Stefano Brivio sbrivio@redhat.com skbuff: Unconditionally copy pfmemalloc in __skb_clone()
Stefano Brivio sbrivio@redhat.com net: Don't copy pfmemalloc flag in __copy_skb_header()
Gustavo A. R. Silva gustavo@embeddedor.com ptp: fix missing break in switch
Tyler Hicks tyhicks@canonical.com ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
Vineet Gupta vgupta@synopsys.com ARC: mm: allow mprotect to make stack mappings executable
Alexey Brodkin abrodkin@synopsys.com ARC: Fix CONFIG_SWAP
Takashi Iwai tiwai@suse.de ALSA: rawmidi: Change resized buffers atomically
OGAWA Hirofumi hirofumi@mail.parknet.co.jp fat: fix memory allocation failure handling of match_strdup()
Dewet Thibaut thibaut.dewet@nokia.com x86/MCE: Remove min interval polling limitation
-------------
Diffstat:
Makefile | 5 +- arch/arc/include/asm/page.h | 2 +- arch/arc/include/asm/pgtable.h | 2 +- arch/arm/include/asm/uaccess.h | 2 +- arch/x86/kernel/cpu/mcheck/mce.c | 3 - drivers/net/can/xilinx_can.c | 98 ++++++++++++++++------ .../net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +- drivers/ptp/ptp_chardev.c | 1 + drivers/usb/class/cdc-acm.c | 3 + drivers/usb/core/hub.c | 8 +- drivers/usb/gadget/function/f_fs.c | 2 +- fs/fat/inode.c | 20 +++-- include/linux/skbuff.h | 12 +-- include/net/tcp.h | 2 + net/core/rtnetlink.c | 9 +- net/core/skbuff.c | 1 + net/ipv4/ip_output.c | 2 + net/ipv4/sysctl_net_ipv4.c | 5 +- net/ipv4/tcp_dctcp.c | 50 ++++------- net/ipv4/tcp_input.c | 21 ++++- net/ipv4/tcp_output.c | 33 ++++++-- net/ipv6/ip6_output.c | 2 + sound/core/rawmidi.c | 20 +++-- 23 files changed, 198 insertions(+), 107 deletions(-)
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dewet Thibaut thibaut.dewet@nokia.com
commit fbdb328c6bae0a7c78d75734a738b66b86dffc96 upstream.
commit b3b7c4795c ("x86/MCE: Serialize sysfs changes") introduced a min interval limitation when setting the check interval for polled MCEs. However, the logic is that 0 disables polling for corrected MCEs, see Documentation/x86/x86_64/machinecheck. The limitation prevents disabling.
Remove this limitation and allow the value 0 to disable polling again.
Fixes: b3b7c4795c ("x86/MCE: Serialize sysfs changes") Signed-off-by: Dewet Thibaut thibaut.dewet@nokia.com Signed-off-by: Alexander Sverdlin alexander.sverdlin@nokia.com [ Massage commit message. ] Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Tony Luck tony.luck@intel.com Cc: linux-edac linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180716084927.24869-1-alexander.sverdlin@nokia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/mcheck/mce.c | 3 --- 1 file changed, 3 deletions(-)
--- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2240,9 +2240,6 @@ static ssize_t store_int_with_restart(st if (check_interval == old_check_interval) return ret;
- if (check_interval < 1) - check_interval = 1; - mutex_lock(&mce_sysfs_mutex); mce_restart(); mutex_unlock(&mce_sysfs_mutex);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: OGAWA Hirofumi hirofumi@mail.parknet.co.jp
commit 35033ab988c396ad7bce3b6d24060c16a9066db8 upstream.
In parse_options(), if match_strdup() failed, parse_options() leaves opts->iocharset in unexpected state (i.e. still pointing the freed string). And this can be the cause of double free.
To fix, this initialize opts->iocharset always when freeing.
Link: http://lkml.kernel.org/r/8736wp9dzc.fsf@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi hirofumi@mail.parknet.co.jp Reported-by: syzbot+90b8e10515ae88228a92@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fat/inode.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-)
--- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -610,13 +610,21 @@ static void fat_set_state(struct super_b brelse(bh); }
+static void fat_reset_iocharset(struct fat_mount_options *opts) +{ + if (opts->iocharset != fat_default_iocharset) { + /* Note: opts->iocharset can be NULL here */ + kfree(opts->iocharset); + opts->iocharset = fat_default_iocharset; + } +} + static void delayed_free(struct rcu_head *p) { struct msdos_sb_info *sbi = container_of(p, struct msdos_sb_info, rcu); unload_nls(sbi->nls_disk); unload_nls(sbi->nls_io); - if (sbi->options.iocharset != fat_default_iocharset) - kfree(sbi->options.iocharset); + fat_reset_iocharset(&sbi->options); kfree(sbi); }
@@ -1031,7 +1039,7 @@ static int parse_options(struct super_bl opts->fs_fmask = opts->fs_dmask = current_umask(); opts->allow_utime = -1; opts->codepage = fat_default_codepage; - opts->iocharset = fat_default_iocharset; + fat_reset_iocharset(opts); if (is_vfat) { opts->shortname = VFAT_SFN_DISPLAY_WINNT|VFAT_SFN_CREATE_WIN95; opts->rodir = 0; @@ -1181,8 +1189,7 @@ static int parse_options(struct super_bl
/* vfat specific */ case Opt_charset: - if (opts->iocharset != fat_default_iocharset) - kfree(opts->iocharset); + fat_reset_iocharset(opts); iocharset = match_strdup(&args[0]); if (!iocharset) return -ENOMEM; @@ -1774,8 +1781,7 @@ out_fail: iput(fat_inode); unload_nls(sbi->nls_io); unload_nls(sbi->nls_disk); - if (sbi->options.iocharset != fat_default_iocharset) - kfree(sbi->options.iocharset); + fat_reset_iocharset(&sbi->options); sb->s_fs_info = NULL; kfree(sbi); return error;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 39675f7a7c7e7702f7d5341f1e0d01db746543a0 upstream.
The SNDRV_RAWMIDI_IOCTL_PARAMS ioctl may resize the buffers and the current code is racy. For example, the sequencer client may write to buffer while it being resized.
As a simple workaround, let's switch to the resized buffer inside the stream runtime lock.
Reported-by: syzbot+52f83f0ea8df16932f7f@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/rawmidi.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
--- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -645,7 +645,7 @@ static int snd_rawmidi_info_select_user( int snd_rawmidi_output_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params * params) { - char *newbuf; + char *newbuf, *oldbuf; struct snd_rawmidi_runtime *runtime = substream->runtime; if (substream->append && substream->use_count > 1) @@ -658,13 +658,17 @@ int snd_rawmidi_output_params(struct snd return -EINVAL; } if (params->buffer_size != runtime->buffer_size) { - newbuf = krealloc(runtime->buffer, params->buffer_size, - GFP_KERNEL); + newbuf = kmalloc(params->buffer_size, GFP_KERNEL); if (!newbuf) return -ENOMEM; + spin_lock_irq(&runtime->lock); + oldbuf = runtime->buffer; runtime->buffer = newbuf; runtime->buffer_size = params->buffer_size; runtime->avail = runtime->buffer_size; + runtime->appl_ptr = runtime->hw_ptr = 0; + spin_unlock_irq(&runtime->lock); + kfree(oldbuf); } runtime->avail_min = params->avail_min; substream->active_sensing = !params->no_active_sensing; @@ -675,7 +679,7 @@ EXPORT_SYMBOL(snd_rawmidi_output_params) int snd_rawmidi_input_params(struct snd_rawmidi_substream *substream, struct snd_rawmidi_params * params) { - char *newbuf; + char *newbuf, *oldbuf; struct snd_rawmidi_runtime *runtime = substream->runtime;
snd_rawmidi_drain_input(substream); @@ -686,12 +690,16 @@ int snd_rawmidi_input_params(struct snd_ return -EINVAL; } if (params->buffer_size != runtime->buffer_size) { - newbuf = krealloc(runtime->buffer, params->buffer_size, - GFP_KERNEL); + newbuf = kmalloc(params->buffer_size, GFP_KERNEL); if (!newbuf) return -ENOMEM; + spin_lock_irq(&runtime->lock); + oldbuf = runtime->buffer; runtime->buffer = newbuf; runtime->buffer_size = params->buffer_size; + runtime->appl_ptr = runtime->hw_ptr = 0; + spin_unlock_irq(&runtime->lock); + kfree(oldbuf); } runtime->avail_min = params->avail_min; return 0;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Brodkin abrodkin@synopsys.com
commit 6e3761145a9ba3ce267c330b6bff51cf6a057b06 upstream.
swap was broken on ARC due to silly copy-paste issue.
We encode offset from swapcache page in __swp_entry() as (off << 13) but were not decoding back in __swp_offset() as (off >> 13) - it was still (off << 13).
This finally fixes swap usage on ARC.
| # mkswap /dev/sda2 | | # swapon -a -e /dev/sda2 | Adding 500728k swap on /dev/sda2. Priority:-2 extents:1 across:500728k | | # free | total used free shared buffers cached | Mem: 765104 13456 751648 4736 8 4736 | -/+ buffers/cache: 8712 756392 | Swap: 500728 0 500728
Cc: stable@vger.kernel.org Signed-off-by: Alexey Brodkin abrodkin@synopsys.com Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arc/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arc/include/asm/pgtable.h +++ b/arch/arc/include/asm/pgtable.h @@ -372,7 +372,7 @@ void update_mmu_cache(struct vm_area_str
/* Decode a PTE containing swap "identifier "into constituents */ #define __swp_type(pte_lookalike) (((pte_lookalike).val) & 0x1f) -#define __swp_offset(pte_lookalike) ((pte_lookalike).val << 13) +#define __swp_offset(pte_lookalike) ((pte_lookalike).val >> 13)
/* NOPs, to keep generic kernel happy */ #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vineet Gupta vgupta@synopsys.com
commit 93312b6da4df31e4102ce5420e6217135a16c7ea upstream.
mprotect(EXEC) was failing for stack mappings as default vm flags was missing MAYEXEC.
This was triggered by glibc test suite nptl/tst-execstack testcase
What is surprising is that despite running LTP for years on, we didn't catch this issue as it lacks a directed test case.
gcc dejagnu tests with nested functions also requiring exec stack work fine though because they rely on the GNU_STACK segment spit out by compiler and handled in kernel elf loader.
This glibc case is different as the stack is non exec to begin with and a dlopen of shared lib with GNU_STACK segment triggers the exec stack proceedings using a mprotect(PROT_EXEC) which was broken.
CC: stable@vger.kernel.org Signed-off-by: Vineet Gupta vgupta@synopsys.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arc/include/asm/page.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arc/include/asm/page.h +++ b/arch/arc/include/asm/page.h @@ -97,7 +97,7 @@ typedef unsigned long pgtable_t; #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT)
/* Default Permissions for stack/heaps pages (Non Executable) */ -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE) +#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
#define WANT_PAGE_VIRTUAL 1
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tyler Hicks tyhicks@canonical.com
[ Upstream commit 70ba5b6db96ff7324b8cfc87e0d0383cf59c9677 ]
The low and high values of the net.ipv4.ping_group_range sysctl were being silently forced to the default disabled state when a write to the sysctl contained GIDs that didn't map to the associated user namespace. Confusingly, the sysctl's write operation would return success and then a subsequent read of the sysctl would indicate that the low and high values are the overflowgid.
This patch changes the behavior by clearly returning an error when the sysctl write operation receives a GID range that doesn't map to the associated user namespace. In such a situation, the previous value of the sysctl is preserved and that range will be returned in a subsequent read of the sysctl.
Signed-off-by: Tyler Hicks tyhicks@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/sysctl_net_ipv4.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -134,8 +134,9 @@ static int ipv4_ping_group_range(struct if (write && ret == 0) { low = make_kgid(user_ns, urange[0]); high = make_kgid(user_ns, urange[1]); - if (!gid_valid(low) || !gid_valid(high) || - (urange[1] < urange[0]) || gid_lt(high, low)) { + if (!gid_valid(low) || !gid_valid(high)) + return -EINVAL; + if (urange[1] < urange[0] || gid_lt(high, low)) { low = make_kgid(&init_user_ns, 1); high = make_kgid(&init_user_ns, 0); }
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: "Gustavo A. R. Silva" gustavo@embeddedor.com
[ Upstream commit 9ba8376ce1e2cbf4ce44f7e4bee1d0648e10d594 ]
It seems that a *break* is missing in order to avoid falling through to the default case. Otherwise, checking *chan* makes no sense.
Fixes: 72df7a7244c0 ("ptp: Allow reassigning calibration pin function") Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Acked-by: Richard Cochran richardcochran@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ptp/ptp_chardev.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/ptp/ptp_chardev.c +++ b/drivers/ptp/ptp_chardev.c @@ -88,6 +88,7 @@ int ptp_set_pinfunc(struct ptp_clock *pt case PTP_PF_PHYSYNC: if (chan != 0) return -EINVAL; + break; default: return -EINVAL; }
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Brivio sbrivio@redhat.com
[ Upstream commit 8b7008620b8452728cadead460a36f64ed78c460 ]
The pfmemalloc flag indicates that the skb was allocated from the PFMEMALLOC reserves, and the flag is currently copied on skb copy and clone.
However, an skb copied from an skb flagged with pfmemalloc wasn't necessarily allocated from PFMEMALLOC reserves, and on the other hand an skb allocated that way might be copied from an skb that wasn't.
So we should not copy the flag on skb copy, and rather decide whether to allow an skb to be associated with sockets unrelated to page reclaim depending only on how it was allocated.
Move the pfmemalloc flag before headers_start[0] using an existing 1-bit hole, so that __copy_skb_header() doesn't copy it.
When cloning, we'll now take care of this flag explicitly, contravening to the warning comment of __skb_clone().
While at it, restore the newline usage introduced by commit b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()") to visually separate bytes used in bitfields after headers_start[0], that was gone after commit a9e419dc7be6 ("netfilter: merge ctinfo into nfct pointer storage area"), and describe the pfmemalloc flag in the kernel-doc structure comment.
This doesn't change the size of sk_buff or cacheline boundaries, but consolidates the 15 bits hole before tc_index into a 2 bytes hole before csum, that could now be filled more easily.
Reported-by: Patrick Talbert ptalbert@redhat.com Fixes: c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves") Signed-off-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/skbuff.h | 12 ++++++------ net/core/skbuff.c | 2 ++ 2 files changed, 8 insertions(+), 6 deletions(-)
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -475,6 +475,7 @@ static inline u32 skb_mstamp_us_delta(co * @hash: the packet hash * @queue_mapping: Queue mapping for multiqueue devices * @xmit_more: More SKBs are pending for this queue + * @pfmemalloc: skbuff was allocated from PFMEMALLOC reserves * @ndisc_nodetype: router type (from link layer) * @ooo_okay: allow the mapping of a socket to a queue to be changed * @l4_hash: indicate hash is a canonical 4-tuple hash over transport @@ -551,8 +552,8 @@ struct sk_buff { fclone:2, peeked:1, head_frag:1, - xmit_more:1; - /* one bit hole */ + xmit_more:1, + pfmemalloc:1; kmemcheck_bitfield_end(flags1);
/* fields enclosed in headers_start/headers_end are copied @@ -572,19 +573,18 @@ struct sk_buff {
__u8 __pkt_type_offset[0]; __u8 pkt_type:3; - __u8 pfmemalloc:1; __u8 ignore_df:1; __u8 nfctinfo:3; - __u8 nf_trace:1; + __u8 ip_summed:2; __u8 ooo_okay:1; __u8 l4_hash:1; __u8 sw_hash:1; __u8 wifi_acked_valid:1; __u8 wifi_acked:1; - __u8 no_fcs:1; + /* Indicates the inner headers are valid in the skbuff. */ __u8 encapsulation:1; __u8 encap_hdr_csum:1; @@ -592,11 +592,11 @@ struct sk_buff { __u8 csum_complete_sw:1; __u8 csum_level:2; __u8 csum_bad:1; - #ifdef CONFIG_IPV6_NDISC_NODETYPE __u8 ndisc_nodetype:2; #endif __u8 ipvs_property:1; + __u8 inner_protocol_type:1; /* 4 or 6 bit hole */
--- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -780,6 +780,8 @@ static struct sk_buff *__skb_clone(struc n->cloned = 1; n->nohdr = 0; n->peeked = 0; + if (skb->pfmemalloc) + n->pfmemalloc = 1; n->destructor = NULL; C(tail); C(end);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefano Brivio sbrivio@redhat.com
[ Upstream commit e78bfb0751d4e312699106ba7efbed2bab1a53ca ]
Commit 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()") introduced a different handling for the pfmemalloc flag in copy and clone paths.
In __skb_clone(), now, the flag is set only if it was set in the original skb, but not cleared if it wasn't. This is wrong and might lead to socket buffers being flagged with pfmemalloc even if the skb data wasn't allocated from pfmemalloc reserves. Copy the flag instead of ORing it.
Reported-by: Sabrina Dubroca sd@queasysnail.net Fixes: 8b7008620b84 ("net: Don't copy pfmemalloc flag in __copy_skb_header()") Signed-off-by: Stefano Brivio sbrivio@redhat.com Tested-by: Sabrina Dubroca sd@queasysnail.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/skbuff.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -780,8 +780,7 @@ static struct sk_buff *__skb_clone(struc n->cloned = 1; n->nohdr = 0; n->peeked = 0; - if (skb->pfmemalloc) - n->pfmemalloc = 1; + C(pfmemalloc); n->destructor = NULL; C(tail); C(end);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 3dd1c9a1270736029ffca670e9bd0265f4120600 ]
The skb hash for locally generated ip[v6] fragments belonging to the same datagram can vary in several circumstances: * for connected UDP[v6] sockets, the first fragment get its hash via set_owner_w()/skb_set_hash_from_sk() * for unconnected IPv6 UDPv6 sockets, the first fragment can get its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if auto_flowlabel is enabled
For the following frags the hash is usually computed via skb_get_hash(). The above can cause OoO for unconnected IPv6 UDPv6 socket: in that scenario the egress tx queue can be selected on a per packet basis via the skb hash. It may also fool flow-oriented schedulers to place fragments belonging to the same datagram in different flows.
Fix the issue by copying the skb hash from the head frag into the others at fragmentation time.
Before this commit: perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8" netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n & perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1 perf script probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0 probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0
After this commit: probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0 probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
Fixes: b73c3d0e4f0e ("net: Save TX flow hash in sock and set in skbuf on xmit") Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/ip_output.c | 2 ++ net/ipv6/ip6_output.c | 2 ++ 2 files changed, 4 insertions(+)
--- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -459,6 +459,8 @@ static void ip_copy_metadata(struct sk_b to->dev = from->dev; to->mark = from->mark;
+ skb_copy_hash(to, from); + /* Copy the flags to each fragment. */ IPCB(to)->flags = IPCB(from)->flags;
--- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -533,6 +533,8 @@ static void ip6_copy_metadata(struct sk_ to->dev = from->dev; to->mark = from->mark;
+ skb_copy_hash(to, from); + #ifdef CONFIG_NET_SCHED to->tc_index = from->tc_index; #endif
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jack Morgenstein jackm@dev.mellanox.co.il
[ Upstream commit 958c696f5a7274d9447a458ad7aa70719b29a50a ]
Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp context, rather than the one passed in the input modifier.
However, the qp number in the qp context is not defined as a required parameter by the FW. Therefore, drivers may choose to not specify the qp number in the qp context for the reset-to-init transition.
Thus, we must save the qp number passed in the command input modifier -- which is always present. (This saved qp number is used as the input modifier for command 2RST_QP when a slave's qp's are destroyed).
Fixes: c82e9aa0a8bc ("mlx4_core: resource tracking for HCA resources used by guests") Signed-off-by: Jack Morgenstein jackm@dev.mellanox.co.il Signed-off-by: Tariq Toukan tariqt@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c +++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c @@ -2709,7 +2709,7 @@ int mlx4_RST2INIT_QP_wrapper(struct mlx4 u32 srqn = qp_get_srqn(qpc) & 0xffffff; int use_srq = (qp_get_srqn(qpc) >> 24) & 1; struct res_srq *srq; - int local_qpn = be32_to_cpu(qpc->local_qpn) & 0xffffff; + int local_qpn = vhcr->in_modifier & 0xffffff;
err = qp_res_start_move_to(dev, slave, qpn, RES_QP_HW, &qp, 0); if (err)
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Roopa Prabhu roopa@cumulusnetworks.com
[ Upstream commit 5025f7f7d506fba9b39e7fe8ca10f6f34cb9bc2d ]
rtnl_configure_link sets dev->rtnl_link_state to RTNL_LINK_INITIALIZED and unconditionally calls __dev_notify_flags to notify user-space of dev flags.
current call sequence for rtnl_configure_link rtnetlink_newlink rtnl_link_ops->newlink rtnl_configure_link (unconditionally notifies userspace of default and new dev flags)
If a newlink handler wants to call rtnl_configure_link early, we will end up with duplicate notifications to user-space.
This patch fixes rtnl_configure_link to check rtnl_link_state and call __dev_notify_flags with gchanges = 0 if already RTNL_LINK_INITIALIZED.
Later in the series, this patch will help the following sequence where a driver implementing newlink can call rtnl_configure_link to initialize the link early.
makes the following call sequence work: rtnetlink_newlink rtnl_link_ops->newlink (vxlan) -> rtnl_configure_link (initializes link and notifies user-space of default dev flags) rtnl_configure_link (updates dev flags if requested by user ifm and notifies user-space of new dev flags)
Signed-off-by: Roopa Prabhu roopa@cumulusnetworks.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/rtnetlink.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1842,9 +1842,12 @@ int rtnl_configure_link(struct net_devic return err; }
- dev->rtnl_link_state = RTNL_LINK_INITIALIZED; - - __dev_notify_flags(dev, old_flags, ~0U); + if (dev->rtnl_link_state == RTNL_LINK_INITIALIZED) { + __dev_notify_flags(dev, old_flags, 0U); + } else { + dev->rtnl_link_state = RTNL_LINK_INITIALIZED; + __dev_notify_flags(dev, old_flags, ~0U); + } return 0; } EXPORT_SYMBOL(rtnl_configure_link);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit b0c05d0e99d98d7f0cd41efc1eeec94efdc3325d ]
Previously, when a data segment was sent an ACK was piggybacked on the data segment without generating a CA_EVENT_NON_DELAYED_ACK event to notify congestion control modules. So the DCTCP ca->delayed_ack_reserved flag could incorrectly stay set when in fact there were no delayed ACKs being reserved. This could result in sending a special ECN notification ACK that carries an older ACK sequence, when in fact there was no need for such an ACK. DCTCP keeps track of the delayed ACK status with its own separate state ca->delayed_ack_reserved. Previously it may accidentally cancel the delayed ACK without updating this field upon sending a special ACK that carries a older ACK sequence. This inconsistency would lead to DCTCP receiver never acknowledging the latest data until the sender times out and retry in some cases.
Packetdrill script (provided by Larry Brakmo)
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0
0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4
0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001
0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001
0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 2:3(1) ack 2001
0.200 < [ect0] . 2001:3001(1000) ack 3 win 257 0.200 < [ect0] . 3001:4001(1000) ack 3 win 257 0.200 > [ect01] . 3:3(0) ack 4001
0.210 < [ce] P. 4001:4501(500) ack 3 win 257
+0.001 read(4, ..., 4500) = 4500 +0 write(4, ..., 1) = 1 +0 > [ect01] PE. 3:4(1) ack 4501
+0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257 // Previously the ACK sequence below would be 4501, causing a long RTO +0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack
+0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data +0 > [ect01] . 4:4(0) ack 6501 // now acks everything
+0.500 < F. 9501:9501(0) ack 4 win 257
Reported-by: Larry Brakmo brakmo@fb.com Signed-off-by: Yuchung Cheng ycheng@google.com Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Neal Cardwell ncardwell@google.com Acked-by: Lawrence Brakmo brakmo@fb.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_dctcp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -131,7 +131,8 @@ static void dctcp_ce_state_0_to_1(struct /* State has changed from CE=0 to CE=1 and delayed * ACK has not sent yet. */ - if (!ca->ce_state && ca->delayed_ack_reserved) { + if (!ca->ce_state && + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { u32 tmp_rcv_nxt;
/* Save current rcv_nxt. */ @@ -161,7 +162,8 @@ static void dctcp_ce_state_1_to_0(struct /* State has changed from CE=1 to CE=0 and delayed * ACK has not sent yet. */ - if (ca->ce_state && ca->delayed_ack_reserved) { + if (ca->ce_state && + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { u32 tmp_rcv_nxt;
/* Save current rcv_nxt. */
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit 2987babb6982306509380fc11b450227a844493b ]
Refactor and create helpers to send the special ACK in DCTCP.
Signed-off-by: Yuchung Cheng ycheng@google.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_output.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-)
--- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -884,8 +884,8 @@ out: * We are working here with either a clone of the original * SKB, or a fresh unique copy made by the retransmit engine. */ -static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, - gfp_t gfp_mask) +static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, + int clone_it, gfp_t gfp_mask, u32 rcv_nxt) { const struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet; @@ -948,7 +948,7 @@ static int tcp_transmit_skb(struct sock th->source = inet->inet_sport; th->dest = inet->inet_dport; th->seq = htonl(tcb->seq); - th->ack_seq = htonl(tp->rcv_nxt); + th->ack_seq = htonl(rcv_nxt); *(((__be16 *)th) + 6) = htons(((tcp_header_size >> 2) << 12) | tcb->tcp_flags);
@@ -1019,6 +1019,13 @@ static int tcp_transmit_skb(struct sock return net_xmit_eval(err); }
+static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, + gfp_t gfp_mask) +{ + return __tcp_transmit_skb(sk, skb, clone_it, gfp_mask, + tcp_sk(sk)->rcv_nxt); +} + /* This routine just queues the buffer for sending. * * NOTE: probe0 timer is not checked, do not forget tcp_push_pending_frames, @@ -3220,7 +3227,7 @@ void tcp_send_delayed_ack(struct sock *s }
/* This routine sends an ack and also updates the window. */ -void tcp_send_ack(struct sock *sk) +void __tcp_send_ack(struct sock *sk, u32 rcv_nxt) { struct sk_buff *buff;
@@ -3249,7 +3256,12 @@ void tcp_send_ack(struct sock *sk)
/* Send it off, this clears delayed acks for us. */ skb_mstamp_get(&buff->skb_mstamp); - tcp_transmit_skb(sk, buff, 0, sk_gfp_atomic(sk, GFP_ATOMIC)); + __tcp_transmit_skb(sk, buff, 0, sk_gfp_atomic(sk, GFP_ATOMIC), rcv_nxt); +} + +void tcp_send_ack(struct sock *sk) +{ + __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt); } EXPORT_SYMBOL_GPL(tcp_send_ack);
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit 27cde44a259c380a3c09066fc4b42de7dde9b1ad ]
Currently when a DCTCP receiver delays an ACK and receive a data packet with a different CE mark from the previous one's, it sends two immediate ACKs acking previous and latest sequences respectly (for ECN accounting).
Previously sending the first ACK may mark off the delayed ACK timer (tcp_event_ack_sent). This may subsequently prevent sending the second ACK to acknowledge the latest sequence (tcp_ack_snd_check). The culprit is that tcp_send_ack() assumes it always acknowleges the latest sequence, which is not true for the first special ACK.
The fix is to not make the assumption in tcp_send_ack and check the actual ack sequence before cancelling the delayed ACK. Further it's safer to pass the ack sequence number as a local variable into tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid future bugs like this.
Reported-by: Neal Cardwell ncardwell@google.com Signed-off-by: Yuchung Cheng ycheng@google.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/tcp.h | 1 + net/ipv4/tcp_dctcp.c | 34 ++++------------------------------ net/ipv4/tcp_output.c | 11 ++++++++--- 3 files changed, 13 insertions(+), 33 deletions(-)
--- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -530,6 +530,7 @@ int tcp_send_synack(struct sock *); bool tcp_syn_flood_action(struct sock *sk, const struct sk_buff *skb, const char *proto); void tcp_push_one(struct sock *, unsigned int mss_now); +void __tcp_send_ack(struct sock *sk, u32 rcv_nxt); void tcp_send_ack(struct sock *sk); void tcp_send_delayed_ack(struct sock *sk); void tcp_send_loss_probe(struct sock *sk); --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -132,21 +132,8 @@ static void dctcp_ce_state_0_to_1(struct * ACK has not sent yet. */ if (!ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { - u32 tmp_rcv_nxt; - - /* Save current rcv_nxt. */ - tmp_rcv_nxt = tp->rcv_nxt; - - /* Generate previous ack with CE=0. */ - tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; - tp->rcv_nxt = ca->prior_rcv_nxt; - - tcp_send_ack(sk); - - /* Recover current rcv_nxt. */ - tp->rcv_nxt = tmp_rcv_nxt; - } + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt);
ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 1; @@ -163,21 +150,8 @@ static void dctcp_ce_state_1_to_0(struct * ACK has not sent yet. */ if (ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) { - u32 tmp_rcv_nxt; - - /* Save current rcv_nxt. */ - tmp_rcv_nxt = tp->rcv_nxt; - - /* Generate previous ack with CE=1. */ - tp->ecn_flags |= TCP_ECN_DEMAND_CWR; - tp->rcv_nxt = ca->prior_rcv_nxt; - - tcp_send_ack(sk); - - /* Recover current rcv_nxt. */ - tp->rcv_nxt = tmp_rcv_nxt; - } + inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt);
ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 0; --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -183,8 +183,13 @@ static void tcp_event_data_sent(struct t }
/* Account for an ACK we sent. */ -static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts) +static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts, + u32 rcv_nxt) { + struct tcp_sock *tp = tcp_sk(sk); + + if (unlikely(rcv_nxt != tp->rcv_nxt)) + return; /* Special ACK sent by DCTCP to reflect ECN */ tcp_dec_quickack_mode(sk, pkts); inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK); } @@ -990,7 +995,7 @@ static int __tcp_transmit_skb(struct soc icsk->icsk_af_ops->send_check(sk, skb);
if (likely(tcb->tcp_flags & TCPHDR_ACK)) - tcp_event_ack_sent(sk, tcp_skb_pcount(skb)); + tcp_event_ack_sent(sk, tcp_skb_pcount(skb), rcv_nxt);
if (skb->len != tcp_header_size) tcp_event_data_sent(tp, sk); @@ -3258,12 +3263,12 @@ void __tcp_send_ack(struct sock *sk, u32 skb_mstamp_get(&buff->skb_mstamp); __tcp_transmit_skb(sk, buff, 0, sk_gfp_atomic(sk, GFP_ATOMIC), rcv_nxt); } +EXPORT_SYMBOL_GPL(__tcp_send_ack);
void tcp_send_ack(struct sock *sk) { __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt); } -EXPORT_SYMBOL_GPL(tcp_send_ack);
/* This routine sends a packet with an out of date sequence * number. It assumes the other end will try to ack it.
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yuchung Cheng ycheng@google.com
[ Upstream commit a0496ef2c23b3b180902dd185d0d63ccbc624cf8 ]
Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change has to be sent immediately so the sender can respond quickly:
""" When receiving packets, the CE codepoint MUST be processed as follows:
1. If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to true and send an immediate ACK.
2. If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE to false and send an immediate ACK. """
Previously DCTCP implementation may continue to delay the ACK. This patch fixes that to implement the RFC by forcing an immediate ACK.
Tested with this packetdrill script provided by Larry Brakmo
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0
0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001
0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001
0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 +0.005 < [ce] . 2001:3001(1000) ack 2 win 257
+0.000 > [ect01] . 2:2(0) ack 2001 // Previously the ACK below would be delayed by 40ms +0.000 > [ect01] E. 2:2(0) ack 3001
+0.500 < F. 9501:9501(0) ack 4 win 257
Signed-off-by: Yuchung Cheng ycheng@google.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/tcp.h | 1 + net/ipv4/tcp_dctcp.c | 30 ++++++++++++++++++------------ net/ipv4/tcp_input.c | 3 ++- 3 files changed, 21 insertions(+), 13 deletions(-)
--- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -372,6 +372,7 @@ ssize_t tcp_splice_read(struct socket *s struct pipe_inode_info *pipe, size_t len, unsigned int flags);
+void tcp_enter_quickack_mode(struct sock *sk); static inline void tcp_dec_quickack_mode(struct sock *sk, const unsigned int pkts) { --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -128,12 +128,15 @@ static void dctcp_ce_state_0_to_1(struct struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk);
- /* State has changed from CE=0 to CE=1 and delayed - * ACK has not sent yet. - */ - if (!ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) - __tcp_send_ack(sk, ca->prior_rcv_nxt); + if (!ca->ce_state) { + /* State has changed from CE=0 to CE=1, force an immediate + * ACK to reflect the new CE state. If an ACK was delayed, + * send that first to reflect the prior CE state. + */ + if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); + tcp_enter_quickack_mode(sk); + }
ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 1; @@ -146,12 +149,15 @@ static void dctcp_ce_state_1_to_0(struct struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk);
- /* State has changed from CE=1 to CE=0 and delayed - * ACK has not sent yet. - */ - if (ca->ce_state && - inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) - __tcp_send_ack(sk, ca->prior_rcv_nxt); + if (ca->ce_state) { + /* State has changed from CE=1 to CE=0, force an immediate + * ACK to reflect the new CE state. If an ACK was delayed, + * send that first to reflect the prior CE state. + */ + if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); + tcp_enter_quickack_mode(sk); + }
ca->prior_rcv_nxt = tp->rcv_nxt; ca->ce_state = 0; --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -182,13 +182,14 @@ static void tcp_incr_quickack(struct soc icsk->icsk_ack.quick = min(quickacks, TCP_MAX_QUICKACKS); }
-static void tcp_enter_quickack_mode(struct sock *sk) +void tcp_enter_quickack_mode(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); tcp_incr_quickack(sk); icsk->icsk_ack.pingpong = 0; icsk->icsk_ack.ato = TCP_ATO_MIN; } +EXPORT_SYMBOL(tcp_enter_quickack_mode);
/* Send ACKs quickly, if "quick" count is not exhausted * and the session is not interactive.
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit f4a3313d8e2ca9fd8d8f45e40a2903ba782607e7 ]
Right after a TCP flow is created, receiving tiny out of order packets allways hit the condition :
if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf) tcp_clamp_window(sk);
tcp_clamp_window() increases sk_rcvbuf to match sk_rmem_alloc (guarded by tcp_rmem[2])
Calling tcp_collapse_ofo_queue() in this case is not useful, and offers a O(N^2) surface attack to malicious peers.
Better not attempt anything before full queue capacity is reached, forcing attacker to spend lots of resource and allow us to more easily detect the abuse.
Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Acked-by: Yuchung Cheng ycheng@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4738,6 +4738,9 @@ static int tcp_prune_queue(struct sock * else if (sk_under_memory_pressure(sk)) tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss);
+ if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf) + return 0; + tcp_collapse_ofo_queue(sk); if (!skb_queue_empty(&sk->sk_receive_queue)) tcp_collapse(sk, &sk->sk_receive_queue,
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 3d4bf93ac12003f9b8e1e2de37fe27983deebdcf ]
In case an attacker feeds tiny packets completely out of order, tcp_collapse_ofo_queue() might scan the whole rb-tree, performing expensive copies, but not changing socket memory usage at all.
1) Do not attempt to collapse tiny skbs. 2) Add logic to exit early when too many tiny skbs are detected.
We prefer not doing aggressive collapsing (which copies packets) for pathological flows, and revert to tcp_prune_ofo_queue() which will be less expensive.
In the future, we might add the possibility of terminating flows that are proven to be malicious.
Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Soheil Hassas Yeganeh soheil@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/tcp_input.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4652,6 +4652,7 @@ restart: static void tcp_collapse_ofo_queue(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); + u32 range_truesize, sum_tiny = 0; struct sk_buff *skb = skb_peek(&tp->out_of_order_queue); struct sk_buff *head; u32 start, end; @@ -4661,6 +4662,7 @@ static void tcp_collapse_ofo_queue(struc
start = TCP_SKB_CB(skb)->seq; end = TCP_SKB_CB(skb)->end_seq; + range_truesize = skb->truesize; head = skb;
for (;;) { @@ -4675,8 +4677,17 @@ static void tcp_collapse_ofo_queue(struc if (!skb || after(TCP_SKB_CB(skb)->seq, end) || before(TCP_SKB_CB(skb)->end_seq, start)) { - tcp_collapse(sk, &tp->out_of_order_queue, - head, skb, start, end); + /* Do not attempt collapsing tiny skbs */ + if (range_truesize != head->truesize || + end - start >= SKB_WITH_OVERHEAD(SK_MEM_QUANTUM)) { + tcp_collapse(sk, &tp->out_of_order_queue, + head, skb, start, end); + } else { + sum_tiny += range_truesize; + if (sum_tiny > sk->sk_rcvbuf >> 3) + return; + } + head = skb; if (!skb) break;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lubomir Rintel lkundrak@v3.sk
commit 1445cbe476fc3dd09c0b380b206526a49403c071 upstream.
The device (a POS terminal) implements CDC ACM, but has not union descriptor.
Signed-off-by: Lubomir Rintel lkundrak@v3.sk Acked-by: Oliver Neukum oneukum@suse.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/class/cdc-acm.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1785,6 +1785,9 @@ static const struct usb_device_id acm_id { USB_DEVICE(0x09d8, 0x0320), /* Elatec GmbH TWN3 */ .driver_info = NO_UNION_NORMAL, /* has misplaced union descriptor */ }, + { USB_DEVICE(0x0ca6, 0xa050), /* Castles VEGA3000 */ + .driver_info = NO_UNION_NORMAL, /* reports zero length descriptor */ + },
{ USB_DEVICE(0x2912, 0x0001), /* ATOL FPrint */ .driver_info = CLEAR_HALT_CONDITIONS,
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bin Liu b-liu@ti.com
commit 249a32b7eeb3edb6897dd38f89651a62163ac4ed upstream.
Based on USB2.0 Spec Section 11.12.5,
"If a hub has per-port power switching and per-port current limiting, an over-current on one port may still cause the power on another port to fall below specific minimums. In this case, the affected port is placed in the Power-Off state and C_PORT_OVER_CURRENT is set for the port, but PORT_OVER_CURRENT is not set."
so let's check C_PORT_OVER_CURRENT too for over current condition.
Fixes: 08d1dec6f405 ("usb:hub set hub->change_bits when over-current happens") Cc: stable@vger.kernel.org Tested-by: Alessandro Antenucci antenucci@korg.it Signed-off-by: Bin Liu b-liu@ti.com Acked-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/core/hub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1151,10 +1151,14 @@ static void hub_activate(struct usb_hub
if (!udev || udev->state == USB_STATE_NOTATTACHED) { /* Tell hub_wq to disconnect the device or - * check for a new connection + * check for a new connection or over current condition. + * Based on USB2.0 Spec Section 11.12.5, + * C_PORT_OVER_CURRENT could be set while + * PORT_OVER_CURRENT is not. So check for any of them. */ if (udev || (portstatus & USB_PORT_STAT_CONNECTION) || - (portstatus & USB_PORT_STAT_OVERCURRENT)) + (portstatus & USB_PORT_STAT_OVERCURRENT) || + (portchange & USB_PORT_STAT_C_OVERCURRENT)) set_bit(port1, hub->change_bits);
} else if (portstatus & USB_PORT_STAT_ENABLE) {
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jerry Zhang zhangjerry@google.com
commit 4d644abf25698362bd33d17c9ddc8f7122c30f17 upstream.
Commit 1b9ba000 ("Allow function drivers to pause control transfers") states that USB_GADGET_DELAYED_STATUS is only supported if data phase is 0 bytes.
It seems that when the length is not 0 bytes, there is no need to explicitly delay the data stage since the transfer is not completed until the user responds. However, when the length is 0, there is no data stage and the transfer is finished once setup() returns, hence there is a need to explicitly delay completion.
This manifests as the following bugs:
Prior to 946ef68ad4e4 ('Let setup() return USB_GADGET_DELAYED_STATUS'), when setup is 0 bytes, ffs would require user to queue a 0 byte request in order to clear setup state. However, that 0 byte request was actually not needed and would hang and cause errors in other setup requests.
After the above commit, 0 byte setups work since the gadget now accepts empty queues to ep0 to clear the delay, but all other setups hang.
Fixes: 946ef68ad4e4 ("Let setup() return USB_GADGET_DELAYED_STATUS") Signed-off-by: Jerry Zhang zhangjerry@google.com Cc: stable stable@vger.kernel.org Acked-by: Felipe Balbi felipe.balbi@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/gadget/function/f_fs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -2987,7 +2987,7 @@ static int ffs_func_setup(struct usb_fun __ffs_event_add(ffs, FUNCTIONFS_SETUP); spin_unlock_irqrestore(&ffs->ev.waitq.lock, flags);
- return USB_GADGET_DELAYED_STATUS; + return creq->wLength == 0 ? USB_GADGET_DELAYED_STATUS : 0; }
static void ffs_func_suspend(struct usb_function *f)
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anssi Hannula anssi.hannula@bitwise.fi
commit 32852c561bffd613d4ed7ec464b1e03e1b7b6c5c upstream.
If the device gets into a state where RXNEMP (RX FIFO not empty) interrupt is asserted without RXOK (new frame received successfully) interrupt being asserted, xcan_rx_poll() will continue to try to clear RXNEMP without actually reading frames from RX FIFO. If the RX FIFO is not empty, the interrupt will not be cleared and napi_schedule() will just be called again.
This situation can occur when:
(a) xcan_rx() returns without reading RX FIFO due to an error condition. The code tries to clear both RXOK and RXNEMP but RXNEMP will not clear due to a frame still being in the FIFO. The frame will never be read from the FIFO as RXOK is no longer set.
(b) A frame is received between xcan_rx_poll() reading interrupt status and clearing RXOK. RXOK will be cleared, but RXNEMP will again remain set as the new message is still in the FIFO.
I'm able to trigger case (b) by flooding the bus with frames under load.
There does not seem to be any benefit in using both RXNEMP and RXOK in the way the driver does, and the polling example in the reference manual (UG585 v1.10 18.3.7 Read Messages from RxFIFO) also says that either RXOK or RXNEMP can be used for detecting incoming messages.
Fix the issue and simplify the RX processing by only using RXNEMP without RXOK.
Tested with the integrated CAN on Zynq-7000 SoC.
Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula anssi.hannula@bitwise.fi Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/xilinx_can.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-)
--- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -100,7 +100,7 @@ enum xcan_reg { #define XCAN_INTR_ALL (XCAN_IXR_TXOK_MASK | XCAN_IXR_BSOFF_MASK |\ XCAN_IXR_WKUP_MASK | XCAN_IXR_SLP_MASK | \ XCAN_IXR_RXNEMP_MASK | XCAN_IXR_ERROR_MASK | \ - XCAN_IXR_ARBLST_MASK | XCAN_IXR_RXOK_MASK) + XCAN_IXR_ARBLST_MASK)
/* CAN register bit shift - XCAN_<REG>_<BIT>_SHIFT */ #define XCAN_BTR_SJW_SHIFT 7 /* Synchronous jump width */ @@ -710,15 +710,7 @@ static int xcan_rx_poll(struct napi_stru
isr = priv->read_reg(priv, XCAN_ISR_OFFSET); while ((isr & XCAN_IXR_RXNEMP_MASK) && (work_done < quota)) { - if (isr & XCAN_IXR_RXOK_MASK) { - priv->write_reg(priv, XCAN_ICR_OFFSET, - XCAN_IXR_RXOK_MASK); - work_done += xcan_rx(ndev); - } else { - priv->write_reg(priv, XCAN_ICR_OFFSET, - XCAN_IXR_RXNEMP_MASK); - break; - } + work_done += xcan_rx(ndev); priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_RXNEMP_MASK); isr = priv->read_reg(priv, XCAN_ISR_OFFSET); } @@ -729,7 +721,7 @@ static int xcan_rx_poll(struct napi_stru if (work_done < quota) { napi_complete(napi); ier = priv->read_reg(priv, XCAN_IER_OFFSET); - ier |= (XCAN_IXR_RXOK_MASK | XCAN_IXR_RXNEMP_MASK); + ier |= XCAN_IXR_RXNEMP_MASK; priv->write_reg(priv, XCAN_IER_OFFSET, ier); } return work_done; @@ -801,9 +793,9 @@ static irqreturn_t xcan_interrupt(int ir }
/* Check for the type of receive interrupt and Processing it */ - if (isr & (XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK)) { + if (isr & XCAN_IXR_RXNEMP_MASK) { ier = priv->read_reg(priv, XCAN_IER_OFFSET); - ier &= ~(XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK); + ier &= ~XCAN_IXR_RXNEMP_MASK; priv->write_reg(priv, XCAN_IER_OFFSET, ier); napi_schedule(&priv->napi); }
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anssi Hannula anssi.hannula@bitwise.fi
commit 2574fe54515ed3487405de329e4e9f13d7098c10 upstream.
The xilinx_can driver performs a software reset when an RX overrun is detected. This causes the device to enter Configuration mode where no messages are received or transmitted.
The documentation does not mention any need to perform a reset on an RX overrun, and testing by inducing an RX overflow also indicated that the device continues to work just fine without a reset.
Remove the software reset.
Tested with the integrated CAN on Zynq-7000 SoC.
Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula anssi.hannula@bitwise.fi Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/xilinx_can.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -598,7 +598,6 @@ static void xcan_err_interrupt(struct ne if (isr & XCAN_IXR_RXOFLW_MASK) { stats->rx_over_errors++; stats->rx_errors++; - priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_RESET_MASK); if (skb) { cf->can_id |= CAN_ERR_CRTL; cf->data[1] |= CAN_ERR_CRTL_RX_OVERFLOW;
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anssi Hannula anssi.hannula@bitwise.fi
commit 620050d9c2be15c47017ba95efe59e0832e99a56 upstream.
The xilinx_can driver assumes that the TXOK interrupt only clears after it has been acknowledged as many times as there have been successfully sent frames.
However, the documentation does not mention such behavior, instead saying just that the interrupt is cleared when the clear bit is set.
Similarly, testing seems to also suggest that it is immediately cleared regardless of the amount of frames having been sent. Performing some heavy TX load and then going back to idle has the tx_head drifting further away from tx_tail over time, steadily reducing the amount of frames the driver keeps in the TX FIFO (but not to zero, as the TXOK interrupt always frees up space for 1 frame from the driver's perspective, so frames continue to be sent) and delaying the local echo frames.
The TX FIFO tracking is also otherwise buggy as it does not account for TX FIFO being cleared after software resets, causing BUG!, TX FIFO full when queue awake! messages to be output.
There does not seem to be any way to accurately track the state of the TX FIFO for local echo support while using the full TX FIFO.
The Zynq version of the HW (but not the soft-AXI version) has watermark programming support and with it an additional TX-FIFO-empty interrupt bit.
Modify the driver to only put 1 frame into TX FIFO at a time on soft-AXI and 2 frames at a time on Zynq. On Zynq the TXFEMP interrupt bit is used to detect whether 1 or 2 frames have been sent at interrupt processing time.
Tested with the integrated CAN on Zynq-7000 SoC. The 1-frame-FIFO mode was also tested.
An alternative way to solve this would be to drop local echo support but keep using the full TX FIFO.
v2: Add FIFO space check before TX queue wake with locking to synchronize with queue stop. This avoids waking the queue when xmit() had just filled it.
v3: Keep local echo support and reduce the amount of frames in FIFO instead as suggested by Marc Kleine-Budde.
Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula anssi.hannula@bitwise.fi Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/xilinx_can.c | 79 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 67 insertions(+), 12 deletions(-)
--- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -25,8 +25,10 @@ #include <linux/module.h> #include <linux/netdevice.h> #include <linux/of.h> +#include <linux/of_device.h> #include <linux/platform_device.h> #include <linux/skbuff.h> +#include <linux/spinlock.h> #include <linux/string.h> #include <linux/types.h> #include <linux/can/dev.h> @@ -117,6 +119,7 @@ enum xcan_reg { /** * struct xcan_priv - This definition define CAN driver instance * @can: CAN private data structure. + * @tx_lock: Lock for synchronizing TX interrupt handling * @tx_head: Tx CAN packets ready to send on the queue * @tx_tail: Tx CAN packets successfully sended on the queue * @tx_max: Maximum number packets the driver can send @@ -131,6 +134,7 @@ enum xcan_reg { */ struct xcan_priv { struct can_priv can; + spinlock_t tx_lock; unsigned int tx_head; unsigned int tx_tail; unsigned int tx_max; @@ -158,6 +162,11 @@ static const struct can_bittiming_const .brp_inc = 1, };
+#define XCAN_CAP_WATERMARK 0x0001 +struct xcan_devtype_data { + unsigned int caps; +}; + /** * xcan_write_reg_le - Write a value to the device register little endian * @priv: Driver private data structure @@ -237,6 +246,10 @@ static int set_reset_mode(struct net_dev usleep_range(500, 10000); }
+ /* reset clears FIFOs */ + priv->tx_head = 0; + priv->tx_tail = 0; + return 0; }
@@ -391,6 +404,7 @@ static int xcan_start_xmit(struct sk_buf struct net_device_stats *stats = &ndev->stats; struct can_frame *cf = (struct can_frame *)skb->data; u32 id, dlc, data[2] = {0, 0}; + unsigned long flags;
if (can_dropped_invalid_skb(ndev, skb)) return NETDEV_TX_OK; @@ -438,6 +452,9 @@ static int xcan_start_xmit(struct sk_buf data[1] = be32_to_cpup((__be32 *)(cf->data + 4));
can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max); + + spin_lock_irqsave(&priv->tx_lock, flags); + priv->tx_head++;
/* Write the Frame to Xilinx CAN TX FIFO */ @@ -453,10 +470,16 @@ static int xcan_start_xmit(struct sk_buf stats->tx_bytes += cf->can_dlc; }
+ /* Clear TX-FIFO-empty interrupt for xcan_tx_interrupt() */ + if (priv->tx_max > 1) + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXFEMP_MASK); + /* Check if the TX buffer is full */ if ((priv->tx_head - priv->tx_tail) == priv->tx_max) netif_stop_queue(ndev);
+ spin_unlock_irqrestore(&priv->tx_lock, flags); + return NETDEV_TX_OK; }
@@ -1023,6 +1046,18 @@ static int __maybe_unused xcan_resume(st
static SIMPLE_DEV_PM_OPS(xcan_dev_pm_ops, xcan_suspend, xcan_resume);
+static const struct xcan_devtype_data xcan_zynq_data = { + .caps = XCAN_CAP_WATERMARK, +}; + +/* Match table for OF platform binding */ +static const struct of_device_id xcan_of_match[] = { + { .compatible = "xlnx,zynq-can-1.0", .data = &xcan_zynq_data }, + { .compatible = "xlnx,axi-can-1.00.a", }, + { /* end of list */ }, +}; +MODULE_DEVICE_TABLE(of, xcan_of_match); + /** * xcan_probe - Platform registration call * @pdev: Handle to the platform device structure @@ -1037,8 +1072,10 @@ static int xcan_probe(struct platform_de struct resource *res; /* IO mem resources */ struct net_device *ndev; struct xcan_priv *priv; + const struct of_device_id *of_id; + int caps = 0; void __iomem *addr; - int ret, rx_max, tx_max; + int ret, rx_max, tx_max, tx_fifo_depth;
/* Get the virtual base address for the device */ res = platform_get_resource(pdev, IORESOURCE_MEM, 0); @@ -1048,7 +1085,8 @@ static int xcan_probe(struct platform_de goto err; }
- ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth", &tx_max); + ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth", + &tx_fifo_depth); if (ret < 0) goto err;
@@ -1056,6 +1094,30 @@ static int xcan_probe(struct platform_de if (ret < 0) goto err;
+ of_id = of_match_device(xcan_of_match, &pdev->dev); + if (of_id) { + const struct xcan_devtype_data *devtype_data = of_id->data; + + if (devtype_data) + caps = devtype_data->caps; + } + + /* There is no way to directly figure out how many frames have been + * sent when the TXOK interrupt is processed. If watermark programming + * is supported, we can have 2 frames in the FIFO and use TXFEMP + * to determine if 1 or 2 frames have been sent. + * Theoretically we should be able to use TXFWMEMP to determine up + * to 3 frames, but it seems that after putting a second frame in the + * FIFO, with watermark at 2 frames, it can happen that TXFWMEMP (less + * than 2 frames in FIFO) is set anyway with no TXOK (a frame was + * sent), which is not a sensible state - possibly TXFWMEMP is not + * completely synchronized with the rest of the bits? + */ + if (caps & XCAN_CAP_WATERMARK) + tx_max = min(tx_fifo_depth, 2); + else + tx_max = 1; + /* Create a CAN device instance */ ndev = alloc_candev(sizeof(struct xcan_priv), tx_max); if (!ndev) @@ -1070,6 +1132,7 @@ static int xcan_probe(struct platform_de CAN_CTRLMODE_BERR_REPORTING; priv->reg_base = addr; priv->tx_max = tx_max; + spin_lock_init(&priv->tx_lock);
/* Get IRQ for the device */ ndev->irq = platform_get_irq(pdev, 0); @@ -1137,9 +1200,9 @@ static int xcan_probe(struct platform_de devm_can_led_init(ndev); clk_disable_unprepare(priv->bus_clk); clk_disable_unprepare(priv->can_clk); - netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth:%d\n", + netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth: actual %d, using %d\n", priv->reg_base, ndev->irq, priv->can.clock.freq, - priv->tx_max); + tx_fifo_depth, priv->tx_max);
return 0;
@@ -1175,14 +1238,6 @@ static int xcan_remove(struct platform_d return 0; }
-/* Match table for OF platform binding */ -static struct of_device_id xcan_of_match[] = { - { .compatible = "xlnx,zynq-can-1.0", }, - { .compatible = "xlnx,axi-can-1.00.a", }, - { /* end of list */ }, -}; -MODULE_DEVICE_TABLE(of, xcan_of_match); - static struct platform_driver xcan_driver = { .probe = xcan_probe, .remove = xcan_remove,
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anssi Hannula anssi.hannula@bitwise.fi
commit 83997997252f5d3fc7f04abc24a89600c2b504ab upstream.
RX overflow interrupt (RXOFLW) is disabled even though xcan_interrupt() processes it. This means that an RX overflow interrupt will only be processed when another interrupt gets asserted (e.g. for RX/TX).
Fix that by enabling the RXOFLW interrupt.
Fixes: b1201e44f50b ("can: xilinx CAN controller support") Signed-off-by: Anssi Hannula anssi.hannula@bitwise.fi Cc: Michal Simek michal.simek@xilinx.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/net/can/xilinx_can.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -102,7 +102,7 @@ enum xcan_reg { #define XCAN_INTR_ALL (XCAN_IXR_TXOK_MASK | XCAN_IXR_BSOFF_MASK |\ XCAN_IXR_WKUP_MASK | XCAN_IXR_SLP_MASK | \ XCAN_IXR_RXNEMP_MASK | XCAN_IXR_ERROR_MASK | \ - XCAN_IXR_ARBLST_MASK) + XCAN_IXR_RXOFLW_MASK | XCAN_IXR_ARBLST_MASK)
/* CAN register bit shift - XCAN_<REG>_<BIT>_SHIFT */ #define XCAN_BTR_SJW_SHIFT 7 /* Synchronous jump width */
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
Building kernels before linux-4.7 with gcc-8 results in many build failures when gcc triggers a check that was meant to catch broken compilers:
/tmp/ccCGMQmS.s:648: Error: .err encountered
According to the discussion in the gcc bugzilla, a local "register asm()" variable is still supposed to be the correct way to force an inline assembly to use a particular register, but marking it 'const' lets the compiler do optimizations that break that, i.e the compiler is free to treat the variable as either 'const' or 'register' in that case.
Upstream commit 9f73bd8bb445 ("ARM: uaccess: remove put_user() code duplication") fixed this problem in linux-4.8 as part of a larger change, but seems a little too big to be backported to 4.4.
Let's take the simplest fix and change only the one broken line in the same way as newer kernels.
Suggested-by: Bernd Edlinger bernd.edlinger@hotmail.de Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85745 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86673 Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -220,7 +220,7 @@ extern int __put_user_8(void *, unsigned ({ \ unsigned long __limit = current_thread_info()->addr_limit - 1; \ const typeof(*(p)) __user *__tmp_p = (p); \ - register const typeof(*(p)) __r2 asm("r2") = (x); \ + register typeof(*(p)) __r2 asm("r2") = (x); \ register const typeof(*(p)) __user *__p asm("r0") = __tmp_p; \ register unsigned long __l asm("r1") = __limit; \ register int __e asm("r0"); \
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
Starting with gcc-8.1, we get a warning about all system call definitions, which use an alias between functions with incompatible prototypes, e.g.:
In file included from ../mm/process_vm_access.c:19: ../include/linux/syscalls.h:211:18: warning: 'sys_process_vm_readv' alias between functions of incompatible types 'long int(pid_t, const struct iovec *, long unsigned int, const struct iovec *, long unsigned int, long unsigned int)' {aka 'long int(int, const struct iovec *, long unsigned int, const struct iovec *, long unsigned int, long unsigned int)'} and 'long int(long int, long int, long int, long int, long int, long int)' [-Wattribute-alias] asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ ^~~ ../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^~~~~~~~~~~~~~~~~ ../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~~~~~~~~~~~~~ ../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec, ^~~~~~~~~~~~~~~ ../include/linux/syscalls.h:215:18: note: aliased declaration here asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ ^~~ ../include/linux/syscalls.h:207:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^~~~~~~~~~~~~~~~~ ../include/linux/syscalls.h:201:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__) ^~~~~~~~~~~~~~~ ../mm/process_vm_access.c:300:1: note: in expansion of macro 'SYSCALL_DEFINE6' SYSCALL_DEFINE6(process_vm_readv, pid_t, pid, const struct iovec __user *, lvec,
This is really noisy and does not indicate a real problem. In the latest mainline kernel, this was addressed by commit bee20031772a ("disable -Wattribute-alias warning for SYSCALL_DEFINEx()"), which seems too invasive to backport.
This takes a much simpler approach and just disables the warning across the kernel.
Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Makefile | 1 + 1 file changed, 1 insertion(+)
--- a/Makefile +++ b/Makefile @@ -613,6 +613,7 @@ KBUILD_CFLAGS += $(call cc-disable-warni KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation) KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow) KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context) +KBUILD_CFLAGS += $(call cc-disable-warning, attribute-alias) KBUILD_CFLAGS += $(call cc-option,-fno-PIE) KBUILD_AFLAGS += $(call cc-option,-fno-PIE)
On Fri, Jul 27, 2018 at 12:26:37PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 3.18.117 release. There are 27 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 29 10:26:38 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.117-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled with -Werror, and installed onto my Pixel XL.
No initial issues noticed in dmesg or general usage.
Thanks! Nathan
On Fri, Jul 27, 2018 at 05:21:23AM -0700, Nathan Chancellor wrote:
On Fri, Jul 27, 2018 at 12:26:37PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 3.18.117 release. There are 27 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 29 10:26:38 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.117-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y and the diffstat can be found below.
thanks,
greg k-h
Merged, compiled with -Werror, and installed onto my Pixel XL.
No initial issues noticed in dmesg or general usage.
Thanks for testing 3 of these and letting me know.
greg k-h
On Fri, Jul 27, 2018 at 12:26:37PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 3.18.117 release. There are 27 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 29 10:26:38 UTC 2018. Anything received after that time might be too late.
For v3.18.116-28-geb91dc3:
Build results: total: 138 pass: 138 fail: 0 Qemu test results: total: 132 pass: 132 fail: 0
Details are available at http://kerneltests.org/builders/.
Guenter
On 07/27/2018 04:26 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 3.18.117 release. There are 27 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun Jul 29 10:26:38 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.117-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org