This is a note to let you know that I've just added the patch titled
netfilter: nft_set_hash: disable fast_ops for 2-len keys
to the 4.13-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
netfilter-nft_set_hash-disable-fast_ops-for-2-len-keys.patch
and it can be found in the queue-4.13 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 0414c78f14861cb704d6e6888efd53dd36e3bdde Mon Sep 17 00:00:00 2001
From: Anatole Denis <anatole(a)rezel.net>
Date: Wed, 4 Oct 2017 01:17:14 +0100
Subject: netfilter: nft_set_hash: disable fast_ops for 2-len keys
From: Anatole Denis <anatole(a)rezel.net>
commit 0414c78f14861cb704d6e6888efd53dd36e3bdde upstream.
jhash_1word of a u16 is a different value from jhash of the same u16 with
length 2.
Since elements are always inserted in sets using jhash over the actual
klen, this would lead to incorrect lookups on fixed-size sets with a key
length of 2, as they would be inserted with hash value jhash(key, 2) and
looked up with hash value jhash_1word(key), which is different.
Example reproducer(v4.13+), using anonymous sets which always have a
fixed size:
table inet t {
chain c {
type filter hook output priority 0; policy accept;
tcp dport { 10001, 10003, 10005, 10007, 10009 } counter packets 4 bytes 240 reject
tcp dport 10001 counter packets 4 bytes 240 reject
tcp dport 10003 counter packets 4 bytes 240 reject
tcp dport 10005 counter packets 4 bytes 240 reject
tcp dport 10007 counter packets 0 bytes 0 reject
tcp dport 10009 counter packets 4 bytes 240 reject
}
}
then use nc -z localhost <port> to probe; incorrectly hashed ports will
pass through the set lookup and increment the counter of an individual
rule.
jhash being seeded with a random value, it is not deterministic which
ports will incorrectly hash, but in testing with 5 ports in the set I
always had 4 or 5 with an incorrect hash value.
Signed-off-by: Anatole Denis <anatole(a)rezel.net>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/netfilter/nft_set_hash.c | 1 -
1 file changed, 1 deletion(-)
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -643,7 +643,6 @@ nft_hash_select_ops(const struct nft_ctx
{
if (desc->size) {
switch (desc->klen) {
- case 2:
case 4:
return &nft_hash_fast_ops;
default:
Patches currently in stable-queue which might be from anatole(a)rezel.net are
queue-4.13/netfilter-nft_set_hash-disable-fast_ops-for-2-len-keys.patch
This is a note to let you know that I've just added the patch titled
netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"
to the 4.13-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
netfilter-nat-revert-netfilter-nat-convert-nat-bysrc-hash-to-rhashtable.patch
and it can be found in the queue-4.13 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e1bf1687740ce1a3598a1c5e452b852ff2190682 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw(a)strlen.de>
Date: Wed, 6 Sep 2017 14:39:51 +0200
Subject: netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"
From: Florian Westphal <fw(a)strlen.de>
commit e1bf1687740ce1a3598a1c5e452b852ff2190682 upstream.
This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1.
It was not a good idea. The custom hash table was a much better
fit for this purpose.
A fast lookup is not essential, in fact for most cases there is no lookup
at all because original tuple is not taken and can be used as-is.
What needs to be fast is insertion and deletion.
rhlist removal however requires a rhlist walk.
We can have thousands of entries in such a list if source port/addresses
are reused for multiple flows, if this happens removal requests are so
expensive that deletions of a few thousand flows can take several
seconds(!).
The advantages that we got from rhashtable are:
1) table auto-sizing
2) multiple locks
1) would be nice to have, but it is not essential as we have at
most one lookup per new flow, so even a million flows in the bysource
table are not a problem compared to current deletion cost.
2) is easy to add to custom hash table.
I tried to add hlist_node to rhlist to speed up rhltable_remove but this
isn't doable without changing semantics. rhltable_remove_fast will
check that the to-be-deleted object is part of the table and that
requires a list walk that we want to avoid.
Furthermore, using hlist_node increases size of struct rhlist_head, which
in turn increases nf_conn size.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821
Reported-by: Ivan Babrou <ibobrik(a)gmail.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/net/netfilter/nf_conntrack.h | 3
include/net/netfilter/nf_nat.h | 1
net/netfilter/nf_nat_core.c | 128 ++++++++++++++---------------------
3 files changed, 53 insertions(+), 79 deletions(-)
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -17,7 +17,6 @@
#include <linux/bitops.h>
#include <linux/compiler.h>
#include <linux/atomic.h>
-#include <linux/rhashtable.h>
#include <linux/netfilter/nf_conntrack_tcp.h>
#include <linux/netfilter/nf_conntrack_dccp.h>
@@ -83,7 +82,7 @@ struct nf_conn {
possible_net_t ct_net;
#if IS_ENABLED(CONFIG_NF_NAT)
- struct rhlist_head nat_bysource;
+ struct hlist_node nat_bysource;
#endif
/* all members below initialized via memset */
u8 __nfct_init_offset[0];
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -1,6 +1,5 @@
#ifndef _NF_NAT_H
#define _NF_NAT_H
-#include <linux/rhashtable.h>
#include <linux/netfilter_ipv4.h>
#include <linux/netfilter/nf_nat.h>
#include <net/netfilter/nf_conntrack_tuple.h>
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -30,19 +30,17 @@
#include <net/netfilter/nf_conntrack_zones.h>
#include <linux/netfilter/nf_nat.h>
+static DEFINE_SPINLOCK(nf_nat_lock);
+
static DEFINE_MUTEX(nf_nat_proto_mutex);
static const struct nf_nat_l3proto __rcu *nf_nat_l3protos[NFPROTO_NUMPROTO]
__read_mostly;
static const struct nf_nat_l4proto __rcu **nf_nat_l4protos[NFPROTO_NUMPROTO]
__read_mostly;
-struct nf_nat_conn_key {
- const struct net *net;
- const struct nf_conntrack_tuple *tuple;
- const struct nf_conntrack_zone *zone;
-};
-
-static struct rhltable nf_nat_bysource_table;
+static struct hlist_head *nf_nat_bysource __read_mostly;
+static unsigned int nf_nat_htable_size __read_mostly;
+static unsigned int nf_nat_hash_rnd __read_mostly;
inline const struct nf_nat_l3proto *
__nf_nat_l3proto_find(u8 family)
@@ -118,17 +116,19 @@ int nf_xfrm_me_harder(struct net *net, s
EXPORT_SYMBOL(nf_xfrm_me_harder);
#endif /* CONFIG_XFRM */
-static u32 nf_nat_bysource_hash(const void *data, u32 len, u32 seed)
+/* We keep an extra hash for each conntrack, for fast searching. */
+static unsigned int
+hash_by_src(const struct net *n, const struct nf_conntrack_tuple *tuple)
{
- const struct nf_conntrack_tuple *t;
- const struct nf_conn *ct = data;
+ unsigned int hash;
+
+ get_random_once(&nf_nat_hash_rnd, sizeof(nf_nat_hash_rnd));
- t = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
/* Original src, to ensure we map it consistently if poss. */
+ hash = jhash2((u32 *)&tuple->src, sizeof(tuple->src) / sizeof(u32),
+ tuple->dst.protonum ^ nf_nat_hash_rnd ^ net_hash_mix(n));
- seed ^= net_hash_mix(nf_ct_net(ct));
- return jhash2((const u32 *)&t->src, sizeof(t->src) / sizeof(u32),
- t->dst.protonum ^ seed);
+ return reciprocal_scale(hash, nf_nat_htable_size);
}
/* Is this tuple already taken? (not by us) */
@@ -184,28 +184,6 @@ same_src(const struct nf_conn *ct,
t->src.u.all == tuple->src.u.all);
}
-static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg,
- const void *obj)
-{
- const struct nf_nat_conn_key *key = arg->key;
- const struct nf_conn *ct = obj;
-
- if (!same_src(ct, key->tuple) ||
- !net_eq(nf_ct_net(ct), key->net) ||
- !nf_ct_zone_equal(ct, key->zone, IP_CT_DIR_ORIGINAL))
- return 1;
-
- return 0;
-}
-
-static struct rhashtable_params nf_nat_bysource_params = {
- .head_offset = offsetof(struct nf_conn, nat_bysource),
- .obj_hashfn = nf_nat_bysource_hash,
- .obj_cmpfn = nf_nat_bysource_cmp,
- .nelem_hint = 256,
- .min_size = 1024,
-};
-
/* Only called for SRC manip */
static int
find_appropriate_src(struct net *net,
@@ -216,26 +194,22 @@ find_appropriate_src(struct net *net,
struct nf_conntrack_tuple *result,
const struct nf_nat_range *range)
{
+ unsigned int h = hash_by_src(net, tuple);
const struct nf_conn *ct;
- struct nf_nat_conn_key key = {
- .net = net,
- .tuple = tuple,
- .zone = zone
- };
- struct rhlist_head *hl, *h;
-
- hl = rhltable_lookup(&nf_nat_bysource_table, &key,
- nf_nat_bysource_params);
- rhl_for_each_entry_rcu(ct, h, hl, nat_bysource) {
- nf_ct_invert_tuplepr(result,
- &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
- result->dst = tuple->dst;
+ hlist_for_each_entry_rcu(ct, &nf_nat_bysource[h], nat_bysource) {
+ if (same_src(ct, tuple) &&
+ net_eq(net, nf_ct_net(ct)) &&
+ nf_ct_zone_equal(ct, zone, IP_CT_DIR_ORIGINAL)) {
+ /* Copy source part from reply tuple. */
+ nf_ct_invert_tuplepr(result,
+ &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+ result->dst = tuple->dst;
- if (in_range(l3proto, l4proto, result, range))
- return 1;
+ if (in_range(l3proto, l4proto, result, range))
+ return 1;
+ }
}
-
return 0;
}
@@ -408,6 +382,7 @@ nf_nat_setup_info(struct nf_conn *ct,
const struct nf_nat_range *range,
enum nf_nat_manip_type maniptype)
{
+ struct net *net = nf_ct_net(ct);
struct nf_conntrack_tuple curr_tuple, new_tuple;
/* Can't setup nat info for confirmed ct. */
@@ -447,19 +422,14 @@ nf_nat_setup_info(struct nf_conn *ct,
}
if (maniptype == NF_NAT_MANIP_SRC) {
- struct nf_nat_conn_key key = {
- .net = nf_ct_net(ct),
- .tuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
- .zone = nf_ct_zone(ct),
- };
- int err;
-
- err = rhltable_insert_key(&nf_nat_bysource_table,
- &key,
- &ct->nat_bysource,
- nf_nat_bysource_params);
- if (err)
- return NF_DROP;
+ unsigned int srchash;
+
+ srchash = hash_by_src(net,
+ &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+ spin_lock_bh(&nf_nat_lock);
+ hlist_add_head_rcu(&ct->nat_bysource,
+ &nf_nat_bysource[srchash]);
+ spin_unlock_bh(&nf_nat_lock);
}
/* It's done. */
@@ -568,8 +538,9 @@ static int nf_nat_proto_clean(struct nf_
* will delete entry from already-freed table.
*/
clear_bit(IPS_SRC_NAT_DONE_BIT, &ct->status);
- rhltable_remove(&nf_nat_bysource_table, &ct->nat_bysource,
- nf_nat_bysource_params);
+ spin_lock_bh(&nf_nat_lock);
+ hlist_del_rcu(&ct->nat_bysource);
+ spin_unlock_bh(&nf_nat_lock);
/* don't delete conntrack. Although that would make things a lot
* simpler, we'd end up flushing all conntracks on nat rmmod.
@@ -697,9 +668,11 @@ EXPORT_SYMBOL_GPL(nf_nat_l3proto_unregis
/* No one using conntrack by the time this called. */
static void nf_nat_cleanup_conntrack(struct nf_conn *ct)
{
- if (ct->status & IPS_SRC_NAT_DONE)
- rhltable_remove(&nf_nat_bysource_table, &ct->nat_bysource,
- nf_nat_bysource_params);
+ if (ct->status & IPS_SRC_NAT_DONE) {
+ spin_lock_bh(&nf_nat_lock);
+ hlist_del_rcu(&ct->nat_bysource);
+ spin_unlock_bh(&nf_nat_lock);
+ }
}
static struct nf_ct_ext_type nat_extend __read_mostly = {
@@ -823,13 +796,16 @@ static int __init nf_nat_init(void)
{
int ret;
- ret = rhltable_init(&nf_nat_bysource_table, &nf_nat_bysource_params);
- if (ret)
- return ret;
+ /* Leave them the same for the moment. */
+ nf_nat_htable_size = nf_conntrack_htable_size;
+
+ nf_nat_bysource = nf_ct_alloc_hashtable(&nf_nat_htable_size, 0);
+ if (!nf_nat_bysource)
+ return -ENOMEM;
ret = nf_ct_extend_register(&nat_extend);
if (ret < 0) {
- rhltable_destroy(&nf_nat_bysource_table);
+ nf_ct_free_hashtable(nf_nat_bysource, nf_nat_htable_size);
printk(KERN_ERR "nf_nat_core: Unable to register extension\n");
return ret;
}
@@ -863,8 +839,8 @@ static void __exit nf_nat_cleanup(void)
for (i = 0; i < NFPROTO_NUMPROTO; i++)
kfree(nf_nat_l4protos[i]);
-
- rhltable_destroy(&nf_nat_bysource_table);
+ synchronize_net();
+ nf_ct_free_hashtable(nf_nat_bysource, nf_nat_htable_size);
}
MODULE_LICENSE("GPL");
Patches currently in stable-queue which might be from fw(a)strlen.de are
queue-4.13/netfilter-nat-revert-netfilter-nat-convert-nat-bysrc-hash-to-rhashtable.patch
This is a note to let you know that I've just added the patch titled
KEYS: trusted: sanitize all key material
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
keys-trusted-sanitize-all-key-material.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From ee618b4619b72527aaed765f0f0b74072b281159 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 8 Jun 2017 14:49:18 +0100
Subject: KEYS: trusted: sanitize all key material
From: Eric Biggers <ebiggers(a)google.com>
commit ee618b4619b72527aaed765f0f0b74072b281159 upstream.
As the previous patch did for encrypted-keys, zero sensitive any
potentially sensitive data related to the "trusted" key type before it
is freed. Notably, we were not zeroing the tpm_buf structures in which
the actual key is stored for TPM seal and unseal, nor were we zeroing
the trusted_key_payload in certain error paths.
Cc: Mimi Zohar <zohar(a)linux.vnet.ibm.com>
Cc: David Safford <safford(a)us.ibm.com>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: James Morris <james.l.morris(a)oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
security/keys/trusted.c | 35 +++++++++++++++++------------------
1 file changed, 17 insertions(+), 18 deletions(-)
--- a/security/keys/trusted.c
+++ b/security/keys/trusted.c
@@ -69,7 +69,7 @@ static int TSS_sha1(const unsigned char
}
ret = crypto_shash_digest(&sdesc->shash, data, datalen, digest);
- kfree(sdesc);
+ kzfree(sdesc);
return ret;
}
@@ -113,7 +113,7 @@ static int TSS_rawhmac(unsigned char *di
if (!ret)
ret = crypto_shash_final(&sdesc->shash, digest);
out:
- kfree(sdesc);
+ kzfree(sdesc);
return ret;
}
@@ -164,7 +164,7 @@ static int TSS_authhmac(unsigned char *d
paramdigest, TPM_NONCE_SIZE, h1,
TPM_NONCE_SIZE, h2, 1, &c, 0, 0);
out:
- kfree(sdesc);
+ kzfree(sdesc);
return ret;
}
@@ -245,7 +245,7 @@ static int TSS_checkhmac1(unsigned char
if (memcmp(testhmac, authdata, SHA1_DIGEST_SIZE))
ret = -EINVAL;
out:
- kfree(sdesc);
+ kzfree(sdesc);
return ret;
}
@@ -346,7 +346,7 @@ static int TSS_checkhmac2(unsigned char
if (memcmp(testhmac2, authdata2, SHA1_DIGEST_SIZE))
ret = -EINVAL;
out:
- kfree(sdesc);
+ kzfree(sdesc);
return ret;
}
@@ -563,7 +563,7 @@ static int tpm_seal(struct tpm_buf *tb,
*bloblen = storedsize;
}
out:
- kfree(td);
+ kzfree(td);
return ret;
}
@@ -677,7 +677,7 @@ static int key_seal(struct trusted_key_p
if (ret < 0)
pr_info("trusted_key: srkseal failed (%d)\n", ret);
- kfree(tb);
+ kzfree(tb);
return ret;
}
@@ -702,7 +702,7 @@ static int key_unseal(struct trusted_key
/* pull migratable flag out of sealed key */
p->migratable = p->key[--p->key_len];
- kfree(tb);
+ kzfree(tb);
return ret;
}
@@ -961,12 +961,12 @@ static int trusted_instantiate(struct ke
if (!ret && options->pcrlock)
ret = pcrlock(options->pcrlock);
out:
- kfree(datablob);
- kfree(options);
+ kzfree(datablob);
+ kzfree(options);
if (!ret)
rcu_assign_keypointer(key, payload);
else
- kfree(payload);
+ kzfree(payload);
return ret;
}
@@ -975,8 +975,7 @@ static void trusted_rcu_free(struct rcu_
struct trusted_key_payload *p;
p = container_of(rcu, struct trusted_key_payload, rcu);
- memset(p->key, 0, p->key_len);
- kfree(p);
+ kzfree(p);
}
/*
@@ -1018,7 +1017,7 @@ static int trusted_update(struct key *ke
ret = datablob_parse(datablob, new_p, new_o);
if (ret != Opt_update) {
ret = -EINVAL;
- kfree(new_p);
+ kzfree(new_p);
goto out;
}
/* copy old key values, and reseal with new pcrs */
@@ -1031,22 +1030,22 @@ static int trusted_update(struct key *ke
ret = key_seal(new_p, new_o);
if (ret < 0) {
pr_info("trusted_key: key_seal failed (%d)\n", ret);
- kfree(new_p);
+ kzfree(new_p);
goto out;
}
if (new_o->pcrlock) {
ret = pcrlock(new_o->pcrlock);
if (ret < 0) {
pr_info("trusted_key: pcrlock failed (%d)\n", ret);
- kfree(new_p);
+ kzfree(new_p);
goto out;
}
}
rcu_assign_keypointer(key, new_p);
call_rcu(&p->rcu, trusted_rcu_free);
out:
- kfree(datablob);
- kfree(new_o);
+ kzfree(datablob);
+ kzfree(new_o);
return ret;
}
Patches currently in stable-queue which might be from ebiggers(a)google.com are
queue-3.18/keys-trusted-sanitize-all-key-material.patch
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a3c812f7cfd80cf51e8f5b7034f7418f6beb56c1 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Thu, 2 Nov 2017 00:47:12 +0000
Subject: [PATCH] KEYS: trusted: fix writing past end of buffer in
trusted_read()
When calling keyctl_read() on a key of type "trusted", if the
user-supplied buffer was too small, the kernel ignored the buffer length
and just wrote past the end of the buffer, potentially corrupting
userspace memory. Fix it by instead returning the size required, as per
the documentation for keyctl_read().
We also don't even fill the buffer at all in this case, as this is
slightly easier to implement than doing a short read, and either
behavior appears to be permitted. It also makes it match the behavior
of the "encrypted" key type.
Fixes: d00a1c72f7f4 ("keys: add new trusted key-type")
Reported-by: Ben Hutchings <ben(a)decadent.org.uk>
Cc: <stable(a)vger.kernel.org> # v2.6.38+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Reviewed-by: Mimi Zohar <zohar(a)linux.vnet.ibm.com>
Reviewed-by: James Morris <james.l.morris(a)oracle.com>
Signed-off-by: James Morris <james.l.morris(a)oracle.com>
diff --git a/security/keys/trusted.c b/security/keys/trusted.c
index bd85315cbfeb..98aa89ff7bfd 100644
--- a/security/keys/trusted.c
+++ b/security/keys/trusted.c
@@ -1147,20 +1147,21 @@ static long trusted_read(const struct key *key, char __user *buffer,
p = dereference_key_locked(key);
if (!p)
return -EINVAL;
- if (!buffer || buflen <= 0)
- return 2 * p->blob_len;
- ascii_buf = kmalloc(2 * p->blob_len, GFP_KERNEL);
- if (!ascii_buf)
- return -ENOMEM;
- bufp = ascii_buf;
- for (i = 0; i < p->blob_len; i++)
- bufp = hex_byte_pack(bufp, p->blob[i]);
- if ((copy_to_user(buffer, ascii_buf, 2 * p->blob_len)) != 0) {
+ if (buffer && buflen >= 2 * p->blob_len) {
+ ascii_buf = kmalloc(2 * p->blob_len, GFP_KERNEL);
+ if (!ascii_buf)
+ return -ENOMEM;
+
+ bufp = ascii_buf;
+ for (i = 0; i < p->blob_len; i++)
+ bufp = hex_byte_pack(bufp, p->blob[i]);
+ if (copy_to_user(buffer, ascii_buf, 2 * p->blob_len) != 0) {
+ kzfree(ascii_buf);
+ return -EFAULT;
+ }
kzfree(ascii_buf);
- return -EFAULT;
}
- kzfree(ascii_buf);
return 2 * p->blob_len;
}
Hello,
please integrate/backport the following commit into stable kernel release 4.4:
Commit-ID: 2b02c20ce0c28974b44e69a2e2f5ddc6a470ad6f (see https://github.com/torvalds/linux/commit/2b02c20ce0c28974b44e69a2e2f5ddc6a4…)
Subject: "cdc_ncm: Set NTB format again after altsetting switch for Huawei devices"
(The patch has been submitted and merged into the mainline.)
We regard this commit as a fix/workaround for a change in firmware behavior observed in some recent huawei devices (e.g.: Huawei MS2131 LTE stick).
As these devices are unusable with the 4.4.x LTS kernel we suggest to include this workarround as there will be more and more LTE sticks to expect until EOS in 2022...
See also the mail below of the originator of the commit.
(It may also make sense to integrate it into the other stable kernel release)
Thank you very much!
Kind regards,
Matthias
----
Sender: Enrico Mioso <mrkiko.rs(a)gmail.com>
Date: 08.11.2017 23:21
Subject: Re: Question regarding commit: cdc_ncm: Set NTB format again after altsetting switch for Huawei devices
Hello Mathias.
First of all, thank you for your interest in huawei_cdc_ncm driver and contacting me.
This commit could be backported to 4.4 kernel, I didn't send that to stable(a)vger.kernel.org .
Let me know if there are any difficulties doing so. Or I might do the submission in some days.
I think that commit could be related to a bug: even if it's arguable. That bug infact, is not related to the Linux kernel itself, but to the huawei firmware.
So I don't think there will be a bug tracker entry.
Thanks for all,
Enrico
On Mon, Nov 06, 2017 at 04:57:21PM -0800, Dan Williams wrote:
> Until there is a solution to the dma-to-dax vs truncate problem it is
> not safe to allow RDMA to create long standing memory registrations
> against filesytem-dax vmas.
Looks good:
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
> +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
> + unsigned int gup_flags, struct page **pages,
> + struct vm_area_struct **vmas)
> +{
> + struct vm_area_struct **__vmas = vmas;
How about calling the vma argument vma_arg, and the one used vma to
make thigns a little more readable?
> + struct vm_area_struct *vma_prev = NULL;
> + long rc, i;
> +
> + if (!pages)
> + return -EINVAL;
> +
> + if (!vmas && IS_ENABLED(CONFIG_FS_DAX)) {
> + __vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
> + GFP_KERNEL);
> + if (!__vmas)
> + return -ENOMEM;
> + }
> +
> + rc = get_user_pages(start, nr_pages, gup_flags, pages, __vmas);
> +
> + /* skip scan for fs-dax vmas if they are compile time disabled */
> + if (!IS_ENABLED(CONFIG_FS_DAX))
> + goto out;
Instead of all this IS_ENABLED magic I'd recomment to just conditionally
compile this function and define it to get_user_pages in the header
if FS_DAX is disabled.
Else this looks fine to me.
Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable. In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().
kernel BUG at ./include/linux/hugetlb.h:244!
[..]
RIP: 0010:follow_devmap_pud+0x482/0x490
[..]
Call Trace:
follow_page_mask+0x28c/0x6e0
__get_user_pages+0xe4/0x6c0
get_user_pages_unlocked+0x130/0x1b0
get_user_pages_fast+0x89/0xb0
iov_iter_get_pages_alloc+0x114/0x4a0
nfs_direct_read_schedule_iovec+0xd2/0x350
? nfs_start_io_direct+0x63/0x70
nfs_file_direct_read+0x1e0/0x250
nfs_file_read+0x90/0xc0
Use pud_access_permitted() to implement pud_write(), a later cleanup can
remove {pte,pmd,pud}_write and replace them with
{pte,pmd,pud}_access_permitted() drectly so that we only have one set of
helpers these kinds of checks. For now, implementing pud_write()
simplifies -stable backports.
Cc: <stable(a)vger.kernel.org>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
Sending this as RFC for opinion on whether this should just be a
pud_flags() & _PAGE_RW check, like pmd_write, or pud_access_permitted()
that also takes protection keys into account.
include/linux/hugetlb.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index fbf5b31d47ee..6a142b240ef7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -242,8 +242,7 @@ static inline int pgd_write(pgd_t pgd)
#ifndef pud_write
static inline int pud_write(pud_t pud)
{
- BUG();
- return 0;
+ return pud_access_permitted(pud, WRITE);
}
#endif