Hi,
Please apply upstream commit: 64ba4b15e5c0 ("exfat: check if cluster num is valid")
to stable 5.18.y and 5.17.y
Backports for 5.15.y and 5.10.y will follow soon.
--
Thanks,
Tadeusz
Hi Greg and Shasha!
It has been a while since you heard from xfs team.
We are trying to change things and get xfs fixes flowing to stable
again. Crossing my fingers that we will make this last this time :)
Please see this message from Darrick [4] about xfs stable plans.
My team will be focusing on 5.10.y and Ted and Leah's team will be
focusing on 5.15.y at this time.
This v2 is being sent to stable after testing and after v1 was sent
for review of the xfs list [5].
v2 includes an extra patch that Christoph has backported and tested
and was going to send to stable.
Please see my cover letter to xfs with more details about my plans
for 5.10.y below:
Hi all!
During LSFMM 2022, I have had an opportunity to speak with developers
from several different companies that showed interest in collaborating
on the effort of improving the state of xfs code in LTS kernels.
I would like to kick-off this effort for the 5.10 LTS kernel, in the
hope that others will join me in the future to produce a better common
baseline for everyone to build on.
This is the first of 6 series of stable patch candidates that
I collected from xfs releases v5.11..v5.18 [1].
My intention is to post the parts for review on the xfs list on
a ~weekly basis and forward them to stable only after xfs developers
have had the chance to review the selection.
I used a gadget that I developed "b4 rn" that produces high level
"release notes" with references to the posted patch series and also
looks for mentions of fstest names in the discussions on lore.
I then used an elimination process to select the stable tree candidate
patches. The selection process is documented in the git log of [1].
After I had candidates, Luis has helped me to set up a kdevops testing
environment on a server that Samsung has contributed to the effort.
Luis and I have spent a considerable amount of time to establish the
expunge lists that produce stable baseline results for v5.10.y [2].
Eventually, we ran the auto group test over 100 times to sanitize the
baseline, on the following configurations:
reflink_normapbt (default), reflink, reflink_1024, nocrc, nocrc_512.
The patches in this part are from circa v5.11 release.
They have been through 36 auto group runs with the configs listed above
and no regressions from baseline were observed.
At least two of the fixes have regression tests in fstests that were used
to verify the fix. I also annotated [3] the fix commits in the tests.
I would like to thank Luis for his huge part in this still ongoing effort
and I would like to thank Samsung for contributing the hardware resources
to drive this effort.
Your inputs on the selection in this part and in upcoming parts [1]
are most welcome!
Thanks,
Amir.
[1] https://github.com/amir73il/b4/blob/xfs-5.10.y/xfs-5.10..5.17-fixes.rst
[2] https://github.com/linux-kdevops/kdevops/tree/master/workflows/fstests/expu…
[3] https://lore.kernel.org/fstests/20220520143249.2103631-1-amir73il@gmail.com/
[4] https://lore.kernel.org/linux-xfs/Yo6ePjvpC7nhgek+@magnolia/
[5] https://lore.kernel.org/linux-xfs/20220525111715.2769700-1-amir73il@gmail.c…
Changes since v1:
- Send to stable
- Add patch from Christoph
Darrick J. Wong (3):
xfs: detect overflows in bmbt records
xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks
xfs: fix an ABBA deadlock in xfs_rename
Dave Chinner (1):
xfs: Fix CIL throttle hang when CIL space used going backwards
Kaixu Xia (1):
xfs: show the proper user quota options
fs/xfs/libxfs/xfs_bmap.c | 5 +++++
fs/xfs/libxfs/xfs_dir2.h | 2 --
fs/xfs/libxfs/xfs_dir2_sf.c | 2 +-
fs/xfs/xfs_buf_item.c | 37 ++++++++++++++++----------------
fs/xfs/xfs_inode.c | 42 ++++++++++++++++++++++---------------
fs/xfs/xfs_inode_item.c | 14 +++++++++++++
fs/xfs/xfs_iwalk.c | 2 +-
fs/xfs/xfs_log_cil.c | 22 ++++++++++++++-----
fs/xfs/xfs_super.c | 10 +++++----
9 files changed, 87 insertions(+), 49 deletions(-)
--
2.25.1
We recently started building with Poky Kirkstone (quite a leap
from our ancient and venerable branch of Sumo) which includes
a newer set of tools in the toolchain:
binutils 2.30 -> 2.38
gcc 7.3.3 -> 11.2.0
glibc 2.27 -> 2.35
This uncovered some issues while cross-compiling on the 4.x
kernels. The following patches help in building the 4.19
branch again.
These backports are already applied all the way down to 5.4.
Arnaldo Carvalho de Melo (2):
perf bench: Share some global variables to fix build with gcc 10
perf tests bp_account: Make global variable static
Ben Hutchings (1):
libtraceevent: Fix build with binutils 2.35
tools/lib/traceevent/Makefile | 2 +-
tools/perf/bench/bench.h | 4 ++++
tools/perf/bench/futex-hash.c | 12 ++++++------
tools/perf/bench/futex-lock-pi.c | 11 +++++------
tools/perf/tests/bp_account.c | 2 +-
5 files changed, 17 insertions(+), 14 deletions(-)
--
2.32.0
From: Miri Korenblit <miriam.rachel.korenblit(a)intel.com>
commit 1b7b3ac8ff3317cdcf07a1c413de9bdb68019c2b upstream.
We used to set regulatory info before the registration of
the device and then the regulatory info didn't get set, because
the device isn't registered so there isn't a device to set the
regulatory info for. So set the regulatory info after the device
registration.
Call reg_process_self_managed_hints() once again after the device
registration because it does nothing before it.
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20210618133832.c96eadcffe80.I86799c2c866b…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
---
net/wireless/core.c | 7 ++++---
net/wireless/reg.c | 1 +
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 68660781aa51..7c66f99046ac 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -4,6 +4,7 @@
* Copyright 2006-2010 Johannes Berg <johannes(a)sipsolutions.net>
* Copyright 2013-2014 Intel Mobile Communications GmbH
* Copyright 2015-2017 Intel Deutschland GmbH
+ * Copyright (C) 2018-2021 Intel Corporation
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -835,9 +836,6 @@ int wiphy_register(struct wiphy *wiphy)
return res;
}
- /* set up regulatory info */
- wiphy_regulatory_register(wiphy);
-
list_add_rcu(&rdev->list, &cfg80211_rdev_list);
cfg80211_rdev_list_generation++;
@@ -851,6 +849,9 @@ int wiphy_register(struct wiphy *wiphy)
cfg80211_debugfs_rdev_add(rdev);
nl80211_notify_wiphy(rdev, NL80211_CMD_NEW_WIPHY);
+ /* set up regulatory info */
+ wiphy_regulatory_register(wiphy);
+
if (wiphy->regulatory_flags & REGULATORY_CUSTOM_REG) {
struct regulatory_request request;
diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index c7825b951f72..dd8503a3ef1e 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -3756,6 +3756,7 @@ void wiphy_regulatory_register(struct wiphy *wiphy)
wiphy_update_regulatory(wiphy, lr->initiator);
wiphy_all_share_dfs_chan_state(wiphy);
+ reg_process_self_managed_hints();
}
void wiphy_regulatory_deregister(struct wiphy *wiphy)
--
2.36.1
This is the start of the stable review cycle for the 4.19.237 release.
There are 20 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 27 Mar 2022 15:04:08 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.237-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.237-rc1
Arnd Bergmann <arnd(a)arndb.de>
nds32: fix access_ok() checks in get/put_user
Linus Lüssing <ll(a)simonwunderlich.de>
mac80211: fix potential double free on mesh join
Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
crypto: qat - disable registration of algorithms
Werner Sembach <wse(a)tuxedocomputers.com>
ACPI: video: Force backlight native for Clevo NL5xRU and NL5xNU
Maximilian Luz <luzmaximilian(a)gmail.com>
ACPI: battery: Add device HID and quirk for Microsoft Surface Go 3
Mark Cilissen <mark(a)yotsuba.nl>
ACPI / x86: Work around broken XSDT on Advantech DAC-BJ01 board
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: initialize registers in nft_do_chain()
Stephane Graber <stgraber(a)ubuntu.com>
drivers: net: xgene: Fix regression in CRC stripping
Giacomo Guiduzzi <guiduzzi.giacomo(a)gmail.com>
ALSA: pci: fix reading of swapped values from pcmreg in AC97 codec
Jonathan Teh <jonathan.teh(a)outlook.com>
ALSA: cmipci: Restore aux vol on suspend/resume
Lars-Peter Clausen <lars(a)metafoo.de>
ALSA: usb-audio: Add mute TLV for playback volumes on RODE NT-USB
Takashi Iwai <tiwai(a)suse.de>
ALSA: pcm: Add stream lock during PCM reset ioctl operations
Takashi Iwai <tiwai(a)suse.de>
ALSA: oss: Fix PCM OSS buffer allocation overflow
Takashi Iwai <tiwai(a)suse.de>
ASoC: sti: Fix deadlock via snd_pcm_stop_xrun() call
Eric Dumazet <edumazet(a)google.com>
llc: fix netdevice reference leaks in llc_ui_bind()
Chuansheng Liu <chuansheng.liu(a)intel.com>
thermal: int340x: fix memory leak in int3400_notify()
Oliver Graute <oliver.graute(a)kococonnector.com>
staging: fbtft: fb_st7789v: reset display before initialization
Steffen Klassert <steffen.klassert(a)secunet.com>
esp: Fix possible buffer overflow in ESP transformation
Tadeusz Struk <tadeusz.struk(a)linaro.org>
net: ipv6: fix skb_over_panic in __ip6_append_data
Jordy Zomer <jordy(a)pwning.systems>
nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION
-------------
Diffstat:
Makefile | 4 +-
arch/nds32/include/asm/uaccess.h | 22 +++++--
arch/x86/kernel/acpi/boot.c | 24 ++++++++
drivers/acpi/battery.c | 12 ++++
drivers/acpi/video_detect.c | 75 +++++++++++++++++++++++
drivers/crypto/qat/qat_common/qat_crypto.c | 8 +++
drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 12 ++--
drivers/nfc/st21nfca/se.c | 10 +++
drivers/staging/fbtft/fb_st7789v.c | 2 +
drivers/thermal/int340x_thermal/int3400_thermal.c | 4 ++
include/net/esp.h | 2 +
include/net/sock.h | 3 +
net/core/sock.c | 3 -
net/ipv4/esp4.c | 5 ++
net/ipv6/esp6.c | 5 ++
net/ipv6/ip6_output.c | 4 +-
net/llc/af_llc.c | 8 +++
net/mac80211/cfg.c | 3 -
net/netfilter/nf_tables_core.c | 2 +-
sound/core/oss/pcm_oss.c | 12 ++--
sound/core/oss/pcm_plugin.c | 5 +-
sound/core/pcm_native.c | 4 ++
sound/pci/ac97/ac97_codec.c | 4 +-
sound/pci/cmipci.c | 3 +-
sound/soc/sti/uniperif_player.c | 6 +-
sound/soc/sti/uniperif_reader.c | 2 +-
sound/usb/mixer_quirks.c | 7 ++-
27 files changed, 214 insertions(+), 37 deletions(-)
When the system runs out of enclave memory, SGX can reclaim EPC pages
by swapping to normal RAM. These backing pages are allocated via a
per-enclave shared memory area. Since SGX allows unlimited over
commit on EPC memory, the reclaimer thread can allocate a large
number of backing RAM pages in response to EPC memory pressure.
When the shared memory backing RAM allocation occurs during
the reclaimer thread context, the shared memory is charged to
the root memory control group, and the shmem usage of the enclave
is not properly accounted for, making cgroups ineffective at
limiting the amount of RAM an enclave can consume.
For example, when using a cgroup to launch a set of test
enclaves, the kernel does not properly account for 50% - 75% of
shmem page allocations on average. In the worst case, when
nearly all allocations occur during the reclaimer thread, the
kernel accounts less than a percent of the amount of shmem used
by the enclave's cgroup to the correct cgroup.
SGX stores a list of mm_structs that are associated with
an enclave. Pick one of them during reclaim and charge that
mm's memcg with the shmem allocation. The one that gets picked
is arbitrary, but this list almost always only has one mm. The
cases where there is more than one mm with different memcg's
are not worth considering.
Create a new function - sgx_encl_alloc_backing(). This function
is used whenever a new backing storage page needs to be
allocated. Previously the same function was used for page
allocation as well as retrieving a previously allocated page.
Prior to backing page allocation, if there is a mm_struct associated
with the enclave that is requesting the allocation, it is set
as the active memory control group.
Signed-off-by: Kristen Carlson Accardi <kristen(a)linux.intel.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: stable(a)vger.kernel.org
---
V2 -> V3:
Changed memcg variable names in sgx_encl_alloc_backing()
and removed some whitespace.
V1 -> V2:
Changed sgx_encl_set_active_memcg() to simply return the correct
memcg for the enclave and renamed to sgx_encl_get_mem_cgroup().
Created helper function current_is_ksgxd() to improve readability.
Use mmget_not_zero()/mmput_async() when searching mm_list.
Move call to set_active_memcg() to sgx_encl_alloc_backing() and
use mem_cgroup_put() to avoid leaking a memcg reference.
Address review feedback regarding comments and commit log.
---
arch/x86/kernel/cpu/sgx/encl.c | 105 ++++++++++++++++++++++++++++++++-
arch/x86/kernel/cpu/sgx/encl.h | 11 +++-
arch/x86/kernel/cpu/sgx/main.c | 4 +-
3 files changed, 114 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..6f05e3d919f7 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -32,7 +32,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
else
page_index = PFN_DOWN(encl->size);
- ret = sgx_encl_get_backing(encl, page_index, &b);
+ ret = sgx_encl_lookup_backing(encl, page_index, &b);
if (ret)
return ret;
@@ -574,7 +574,7 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
* 0 on success,
* -errno otherwise.
*/
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
+static int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
struct sgx_backing *backing)
{
pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
@@ -601,6 +601,107 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
return 0;
}
+/*
+ * When called from ksgxd, returns the mem_cgroup of a struct mm stored
+ * in the enclave's mm_list. When not called from ksgxd, just returns
+ * the mem_cgroup of the current task.
+ */
+static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl)
+{
+ struct mem_cgroup *memcg = NULL;
+ struct sgx_encl_mm *encl_mm;
+ int idx;
+
+ /*
+ * If called from normal task context, return the mem_cgroup
+ * of the current task's mm. The remainder of the handling is for
+ * ksgxd.
+ */
+ if (!current_is_ksgxd())
+ return get_mem_cgroup_from_mm(current->mm);
+
+ /*
+ * Search the enclave's mm_list to find an mm associated with
+ * this enclave to charge the allocation to.
+ */
+ idx = srcu_read_lock(&encl->srcu);
+
+ list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) {
+ if (!mmget_not_zero(encl_mm->mm))
+ continue;
+
+ memcg = get_mem_cgroup_from_mm(encl_mm->mm);
+
+ mmput_async(encl_mm->mm);
+
+ break;
+ }
+
+ srcu_read_unlock(&encl->srcu, idx);
+
+ /*
+ * In the rare case that there isn't an mm associated with
+ * the enclave, set memcg to the current active mem_cgroup.
+ * This will be the root mem_cgroup if there is no active
+ * mem_cgroup.
+ */
+ if (!memcg)
+ return get_mem_cgroup_from_mm(NULL);
+
+ return memcg;
+}
+
+/**
+ * sgx_encl_alloc_backing() - allocate a new backing storage page
+ * @encl: an enclave pointer
+ * @page_index: enclave page index
+ * @backing: data for accessing backing storage for the page
+ *
+ * When called from ksgxd, sets the active memcg from one of the
+ * mms in the enclave's mm_list prior to any backing page allocation,
+ * in order to ensure that shmem page allocations are charged to the
+ * enclave.
+ *
+ * Return:
+ * 0 on success,
+ * -errno otherwise.
+ */
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+ struct sgx_backing *backing)
+{
+ struct mem_cgroup *encl_memcg = sgx_encl_get_mem_cgroup(encl);
+ struct mem_cgroup *memcg = set_active_memcg(encl_memcg);
+ int ret;
+
+ ret = sgx_encl_get_backing(encl, page_index, backing);
+
+ set_active_memcg(memcg);
+ mem_cgroup_put(encl_memcg);
+
+ return ret;
+}
+
+/**
+ * sgx_encl_lookup_backing() - retrieve an existing backing storage page
+ * @encl: an enclave pointer
+ * @page_index: enclave page index
+ * @backing: data for accessing backing storage for the page
+ *
+ * Retrieve a backing page for loading data back into an EPC page with ELDU.
+ * It is the caller's responsibility to ensure that it is appropriate to use
+ * sgx_encl_lookup_backing() rather than sgx_encl_alloc_backing(). If lookup is
+ * not used correctly, this will cause an allocation which is not accounted for.
+ *
+ * Return:
+ * 0 on success,
+ * -errno otherwise.
+ */
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+ struct sgx_backing *backing)
+{
+ return sgx_encl_get_backing(encl, page_index, backing);
+}
+
/**
* sgx_encl_put_backing() - Unpin the backing storage
* @backing: data for accessing backing storage for the page
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index fec43ca65065..2de3b150ab00 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -100,13 +100,20 @@ static inline int sgx_encl_find(struct mm_struct *mm, unsigned long addr,
return 0;
}
+static inline bool current_is_ksgxd(void)
+{
+ return current->mm ? false : true;
+}
+
int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
unsigned long end, unsigned long vm_flags);
void sgx_encl_release(struct kref *ref);
int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
-int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
- struct sgx_backing *backing);
+int sgx_encl_lookup_backing(struct sgx_encl *encl, unsigned long page_index,
+ struct sgx_backing *backing);
+int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index,
+ struct sgx_backing *backing);
void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
int sgx_encl_test_and_clear_young(struct mm_struct *mm,
struct sgx_encl_page *page);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 4b41efc9e367..7d41c8538795 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -310,7 +310,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page,
encl->secs_child_cnt--;
if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) {
- ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size),
+ ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size),
&secs_backing);
if (ret)
goto out;
@@ -381,7 +381,7 @@ static void sgx_reclaim_pages(void)
goto skip;
page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base);
- ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]);
+ ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]);
if (ret)
goto skip;
--
2.20.1
From: Stephen Brennan <stephen.s.brennan(a)oracle.com>
A rare BUG_ON triggered in assoc_array_gc:
[3430308.818153] kernel BUG at lib/assoc_array.c:1609!
Which corresponded to the statement currently at line 1593 upstream:
BUG_ON(assoc_array_ptr_is_meta(p));
Using the data from the core dump, I was able to generate a userspace
reproducer[1] and determine the cause of the bug.
[1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc
After running the iterator on the entire branch, an internal tree node
looked like the following:
NODE (nr_leaves_on_branch: 3)
SLOT [0] NODE (2 leaves)
SLOT [1] NODE (1 leaf)
SLOT [2..f] NODE (empty)
In the userspace reproducer, the pr_devel output when compressing this
node was:
-- compress node 0x5607cc089380 --
free=0, leaves=0
[0] retain node 2/1 [nx 0]
[1] fold node 1/1 [nx 0]
[2] fold node 0/1 [nx 2]
[3] fold node 0/2 [nx 2]
[4] fold node 0/3 [nx 2]
[5] fold node 0/4 [nx 2]
[6] fold node 0/5 [nx 2]
[7] fold node 0/6 [nx 2]
[8] fold node 0/7 [nx 2]
[9] fold node 0/8 [nx 2]
[10] fold node 0/9 [nx 2]
[11] fold node 0/10 [nx 2]
[12] fold node 0/11 [nx 2]
[13] fold node 0/12 [nx 2]
[14] fold node 0/13 [nx 2]
[15] fold node 0/14 [nx 2]
after: 3
At slot 0, an internal node with 2 leaves could not be folded into the
node, because there was only one available slot (slot 0). Thus, the
internal node was retained. At slot 1, the node had one leaf, and was
able to be folded in successfully. The remaining nodes had no leaves,
and so were removed. By the end of the compression stage, there were 14
free slots, and only 3 leaf nodes. The tree was ascended and then its
parent node was compressed. When this node was seen, it could not be
folded, due to the internal node it contained.
The invariant for compression in this function is: whenever
nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
leaf nodes. The compression step currently cannot guarantee this, given
the corner case shown above.
To fix this issue, retry compression whenever we have retained a node,
and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
compression will then allow the node in slot 1 to be folded in,
satisfying the invariant. Below is the output of the reproducer once the
fix is applied:
-- compress node 0x560e9c562380 --
free=0, leaves=0
[0] retain node 2/1 [nx 0]
[1] fold node 1/1 [nx 0]
[2] fold node 0/1 [nx 2]
[3] fold node 0/2 [nx 2]
[4] fold node 0/3 [nx 2]
[5] fold node 0/4 [nx 2]
[6] fold node 0/5 [nx 2]
[7] fold node 0/6 [nx 2]
[8] fold node 0/7 [nx 2]
[9] fold node 0/8 [nx 2]
[10] fold node 0/9 [nx 2]
[11] fold node 0/10 [nx 2]
[12] fold node 0/11 [nx 2]
[13] fold node 0/12 [nx 2]
[14] fold node 0/13 [nx 2]
[15] fold node 0/14 [nx 2]
internal nodes remain despite enough space, retrying
-- compress node 0x560e9c562380 --
free=14, leaves=1
[0] fold node 2/15 [nx 0]
after: 3
Changes
=======
DH:
- Use false instead of 0.
- Reorder the inserted lines in a couple of places to put retained before
next_slot.
ver #2)
- Fix typo in pr_devel, correct comparison to "<="
Fixes: 3cb989501c26 ("Add a generic associative array implementation.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Stephen Brennan <stephen.s.brennan(a)oracle.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
cc: Jarkko Sakkinen <jarkko(a)kernel.org>
cc: Andrew Morton <akpm(a)linux-foundation.org>
cc: keyrings(a)vger.kernel.org
Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.… # v1
Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.… # v2
---
lib/assoc_array.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/lib/assoc_array.c b/lib/assoc_array.c
index 079c72e26493..ca0b4f360c1a 100644
--- a/lib/assoc_array.c
+++ b/lib/assoc_array.c
@@ -1461,6 +1461,7 @@ int assoc_array_gc(struct assoc_array *array,
struct assoc_array_ptr *cursor, *ptr;
struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp;
unsigned long nr_leaves_on_tree;
+ bool retained;
int keylen, slot, nr_free, next_slot, i;
pr_devel("-->%s()\n", __func__);
@@ -1536,6 +1537,7 @@ int assoc_array_gc(struct assoc_array *array,
goto descend;
}
+retry_compress:
pr_devel("-- compress node %p --\n", new_n);
/* Count up the number of empty slots in this node and work out the
@@ -1553,6 +1555,7 @@ int assoc_array_gc(struct assoc_array *array,
pr_devel("free=%d, leaves=%lu\n", nr_free, new_n->nr_leaves_on_branch);
/* See what we can fold in */
+ retained = false;
next_slot = 0;
for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) {
struct assoc_array_shortcut *s;
@@ -1602,9 +1605,14 @@ int assoc_array_gc(struct assoc_array *array,
pr_devel("[%d] retain node %lu/%d [nx %d]\n",
slot, child->nr_leaves_on_branch, nr_free + 1,
next_slot);
+ retained = true;
}
}
+ if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) {
+ pr_devel("internal nodes remain despite enough space, retrying\n");
+ goto retry_compress;
+ }
pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
nr_leaves_on_tree = new_n->nr_leaves_on_branch;
Syzbot found a Use After Free bug in compute_effective_progs().
The reproducer creates a number of BPF links, and causes a fault
injected alloc to fail, while calling bpf_link_detach on them.
Link detach triggers the link to be freed by bpf_link_free(),
which calls __cgroup_bpf_detach() and update_effective_progs().
If the memory allocation in this function fails, the function restores
the pointer to the bpf_cgroup_link on the cgroup list, but the memory
gets freed just after it returns. After this, every subsequent call to
update_effective_progs() causes this already deallocated pointer to be
dereferenced in prog_list_length(), and triggers KASAN UAF error.
To fix this don't preserve the pointer to the link on the cgroup list
in __cgroup_bpf_detach(), but proceed with the cleanup and retry calling
update_effective_progs() again afterwards.
Cc: "Alexei Starovoitov" <ast(a)kernel.org>
Cc: "Daniel Borkmann" <daniel(a)iogearbox.net>
Cc: "Andrii Nakryiko" <andrii(a)kernel.org>
Cc: "Martin KaFai Lau" <kafai(a)fb.com>
Cc: "Song Liu" <songliubraving(a)fb.com>
Cc: "Yonghong Song" <yhs(a)fb.com>
Cc: "John Fastabend" <john.fastabend(a)gmail.com>
Cc: "KP Singh" <kpsingh(a)kernel.org>
Cc: <netdev(a)vger.kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <stable(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Link: https://syzkaller.appspot.com/bug?id=8ebf179a95c2a2670f7cf1ba62429ec044369d…
Fixes: af6eea57437a ("bpf: Implement bpf_link-based cgroup BPF program attachment")
Reported-by: <syzbot+f264bffdfbd5614f3bb2(a)syzkaller.appspotmail.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk(a)linaro.org>
---
kernel/bpf/cgroup.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 128028efda64..b6307337a3c7 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -723,10 +723,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
pl->link = NULL;
err = update_effective_progs(cgrp, atype);
- if (err)
- goto cleanup;
-
- /* now can actually delete it from this cgroup list */
+ /*
+ * Proceed regardless of error. The link and/or prog will be freed
+ * just after this function returns so just delete it from this
+ * cgroup list and retry calling update_effective_progs again later.
+ */
list_del(&pl->node);
kfree(pl);
if (list_empty(progs))
@@ -735,12 +736,11 @@ static int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
if (old_prog)
bpf_prog_put(old_prog);
static_branch_dec(&cgroup_bpf_enabled_key[atype]);
- return 0;
-cleanup:
- /* restore back prog or link */
- pl->prog = old_prog;
- pl->link = link;
+ /* In case of error call update_effective_progs again */
+ if (err)
+ err = update_effective_progs(cgrp, atype);
+
return err;
}
@@ -881,6 +881,7 @@ static void bpf_cgroup_link_release(struct bpf_link *link)
struct bpf_cgroup_link *cg_link =
container_of(link, struct bpf_cgroup_link, link);
struct cgroup *cg;
+ int err;
/* link might have been auto-detached by dying cgroup already,
* in that case our work is done here
@@ -896,8 +897,10 @@ static void bpf_cgroup_link_release(struct bpf_link *link)
return;
}
- WARN_ON(__cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
- cg_link->type));
+ err = __cgroup_bpf_detach(cg_link->cgroup, NULL, cg_link,
+ cg_link->type);
+ if (err)
+ pr_warn("cgroup_bpf_detach() failed, err %d\n", err);
cg = cg_link->cgroup;
cg_link->cgroup = NULL;
--
2.35.1
unmap_grant_pages() currently waits for the pages to no longer be used.
In https://github.com/QubesOS/qubes-issues/issues/7481, this lead to a
deadlock against i915: i915 was waiting for gntdev's MMU notifier to
finish, while gntdev was waiting for i915 to free its pages. I also
believe this is responsible for various deadlocks I have experienced in
the past.
Avoid these problems by making unmap_grant_pages async. This requires
making it return void, as any errors will not be available when the
function returns. Fortunately, the only use of the return value is a
WARN_ON(), which can be replaced by a WARN_ON when the error is
detected. Additionally, a failed call will not prevent further calls
from being made, but this is harmless.
Because unmap_grant_pages is now async, the grant handle will be sent to
INVALID_GRANT_HANDLE too late to prevent multiple unmaps of the same
handle. Instead, a separate bool array is allocated for this purpose.
This wastes memory, but stuffing this information in padding bytes is
too fragile. Furthermore, it is necessary to grab a reference to the
map before making the asynchronous call, and release the reference when
the call returns.
Fixes: 745282256c75 ("xen/gntdev: safely unmap grants in case they are still in use")
Cc: stable(a)vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi(a)invisiblethingslab.com>
---
drivers/xen/gntdev-common.h | 5 ++
drivers/xen/gntdev.c | 100 +++++++++++++++++++-----------------
2 files changed, 59 insertions(+), 46 deletions(-)
diff --git a/drivers/xen/gntdev-common.h b/drivers/xen/gntdev-common.h
index 20d7d059dadb..a268cdb1f7bf 100644
--- a/drivers/xen/gntdev-common.h
+++ b/drivers/xen/gntdev-common.h
@@ -16,6 +16,7 @@
#include <linux/mmu_notifier.h>
#include <linux/types.h>
#include <xen/interface/event_channel.h>
+#include <xen/grant_table.h>
struct gntdev_dmabuf_priv;
@@ -56,6 +57,7 @@ struct gntdev_grant_map {
struct gnttab_unmap_grant_ref *unmap_ops;
struct gnttab_map_grant_ref *kmap_ops;
struct gnttab_unmap_grant_ref *kunmap_ops;
+ bool *being_removed;
struct page **pages;
unsigned long pages_vm_start;
@@ -73,6 +75,9 @@ struct gntdev_grant_map {
/* Needed to avoid allocation in gnttab_dma_free_pages(). */
xen_pfn_t *frames;
#endif
+
+ /* Needed to avoid allocation in __unmap_grant_pages */
+ struct gntab_unmap_queue_data unmap_data;
};
struct gntdev_grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count,
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 59ffea800079..90bd2b5ef7dd 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -35,6 +35,7 @@
#include <linux/slab.h>
#include <linux/highmem.h>
#include <linux/refcount.h>
+#include <linux/workqueue.h>
#include <xen/xen.h>
#include <xen/grant_table.h>
@@ -62,8 +63,8 @@ MODULE_PARM_DESC(limit,
static int use_ptemod;
-static int unmap_grant_pages(struct gntdev_grant_map *map,
- int offset, int pages);
+static void unmap_grant_pages(struct gntdev_grant_map *map,
+ int offset, int pages);
static struct miscdevice gntdev_miscdev;
@@ -120,6 +121,7 @@ static void gntdev_free_map(struct gntdev_grant_map *map)
kvfree(map->unmap_ops);
kvfree(map->kmap_ops);
kvfree(map->kunmap_ops);
+ kvfree(map->being_removed);
kfree(map);
}
@@ -140,10 +142,13 @@ struct gntdev_grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count,
add->unmap_ops = kvmalloc_array(count, sizeof(add->unmap_ops[0]),
GFP_KERNEL);
add->pages = kvcalloc(count, sizeof(add->pages[0]), GFP_KERNEL);
+ add->being_removed =
+ kvcalloc(count, sizeof(add->being_removed[0]), GFP_KERNEL);
if (NULL == add->grants ||
NULL == add->map_ops ||
NULL == add->unmap_ops ||
- NULL == add->pages)
+ NULL == add->pages ||
+ NULL == add->being_removed)
goto err;
if (use_ptemod) {
add->kmap_ops = kvmalloc_array(count, sizeof(add->kmap_ops[0]),
@@ -349,79 +354,84 @@ int gntdev_map_grant_pages(struct gntdev_grant_map *map)
return err;
}
-static int __unmap_grant_pages(struct gntdev_grant_map *map, int offset,
- int pages)
+static void __unmap_grant_pages_done(int result,
+ struct gntab_unmap_queue_data *data)
{
- int i, err = 0;
- struct gntab_unmap_queue_data unmap_data;
-
- if (map->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) {
- int pgno = (map->notify.addr >> PAGE_SHIFT);
- if (pgno >= offset && pgno < offset + pages) {
- /* No need for kmap, pages are in lowmem */
- uint8_t *tmp = pfn_to_kaddr(page_to_pfn(map->pages[pgno]));
- tmp[map->notify.addr & (PAGE_SIZE-1)] = 0;
- map->notify.flags &= ~UNMAP_NOTIFY_CLEAR_BYTE;
- }
- }
-
- unmap_data.unmap_ops = map->unmap_ops + offset;
- unmap_data.kunmap_ops = use_ptemod ? map->kunmap_ops + offset : NULL;
- unmap_data.pages = map->pages + offset;
- unmap_data.count = pages;
-
- err = gnttab_unmap_refs_sync(&unmap_data);
- if (err)
- return err;
+ unsigned int i;
+ struct gntdev_grant_map *map = data->data;
+ unsigned int offset = data->unmap_ops - map->unmap_ops;
- for (i = 0; i < pages; i++) {
- if (map->unmap_ops[offset+i].status)
- err = -EINVAL;
+ for (i = 0; i < data->count; i++) {
+ WARN_ON(map->unmap_ops[offset+i].status);
pr_debug("unmap handle=%d st=%d\n",
map->unmap_ops[offset+i].handle,
map->unmap_ops[offset+i].status);
map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
if (use_ptemod) {
- if (map->kunmap_ops[offset+i].status)
- err = -EINVAL;
+ WARN_ON(map->kunmap_ops[offset+i].status);
pr_debug("kunmap handle=%u st=%d\n",
map->kunmap_ops[offset+i].handle,
map->kunmap_ops[offset+i].status);
map->kunmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
}
}
- return err;
+
+ /* Release reference taken by __unmap_grant_pages */
+ gntdev_put_map(NULL, map);
}
-static int unmap_grant_pages(struct gntdev_grant_map *map, int offset,
- int pages)
+static void __unmap_grant_pages(struct gntdev_grant_map *map, int offset,
+ int pages)
+{
+ if (map->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) {
+ int pgno = (map->notify.addr >> PAGE_SHIFT);
+
+ if (pgno >= offset && pgno < offset + pages) {
+ /* No need for kmap, pages are in lowmem */
+ uint8_t *tmp = pfn_to_kaddr(page_to_pfn(map->pages[pgno]));
+
+ tmp[map->notify.addr & (PAGE_SIZE-1)] = 0;
+ map->notify.flags &= ~UNMAP_NOTIFY_CLEAR_BYTE;
+ }
+ }
+
+ map->unmap_data.unmap_ops = map->unmap_ops + offset;
+ map->unmap_data.kunmap_ops = use_ptemod ? map->kunmap_ops + offset : NULL;
+ map->unmap_data.pages = map->pages + offset;
+ map->unmap_data.count = pages;
+ map->unmap_data.done = __unmap_grant_pages_done;
+ map->unmap_data.data = map;
+ refcount_inc(&map->users); /* to keep map alive during async call below */
+
+ gnttab_unmap_refs_async(&map->unmap_data);
+}
+
+static void unmap_grant_pages(struct gntdev_grant_map *map, int offset,
+ int pages)
{
- int range, err = 0;
+ int range;
pr_debug("unmap %d+%d [%d+%d]\n", map->index, map->count, offset, pages);
/* It is possible the requested range will have a "hole" where we
* already unmapped some of the grants. Only unmap valid ranges.
*/
- while (pages && !err) {
- while (pages &&
- map->unmap_ops[offset].handle == INVALID_GRANT_HANDLE) {
+ while (pages) {
+ while (pages && map->being_removed[offset]) {
offset++;
pages--;
}
range = 0;
while (range < pages) {
- if (map->unmap_ops[offset + range].handle ==
- INVALID_GRANT_HANDLE)
+ if (map->being_removed[offset + range])
break;
+ map->being_removed[offset + range] = true;
range++;
}
- err = __unmap_grant_pages(map, offset, range);
+ __unmap_grant_pages(map, offset, range);
offset += range;
pages -= range;
}
-
- return err;
}
/* ------------------------------------------------------------------ */
@@ -473,7 +483,6 @@ static bool gntdev_invalidate(struct mmu_interval_notifier *mn,
struct gntdev_grant_map *map =
container_of(mn, struct gntdev_grant_map, notifier);
unsigned long mstart, mend;
- int err;
if (!mmu_notifier_range_blockable(range))
return false;
@@ -494,10 +503,9 @@ static bool gntdev_invalidate(struct mmu_interval_notifier *mn,
map->index, map->count,
map->vma->vm_start, map->vma->vm_end,
range->start, range->end, mstart, mend);
- err = unmap_grant_pages(map,
+ unmap_grant_pages(map,
(mstart - map->vma->vm_start) >> PAGE_SHIFT,
(mend - mstart) >> PAGE_SHIFT);
- WARN_ON(err);
return true;
}
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Running kernel-doc script on drivers/hid/hid-uclogic-params.c, it found
6 warnings for hid_dbg() wrapper functions below:
drivers/hid/hid-uclogic-params.c:48: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Dump tablet interface pen parameters with hid_dbg(), indented with one tab.
drivers/hid/hid-uclogic-params.c:48: warning: missing initial short description on line:
* Dump tablet interface pen parameters with hid_dbg(), indented with one tab.
drivers/hid/hid-uclogic-params.c:48: info: Scanning doc for function Dump
drivers/hid/hid-uclogic-params.c:80: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Dump tablet interface frame parameters with hid_dbg(), indented with two
drivers/hid/hid-uclogic-params.c:80: warning: missing initial short description on line:
* Dump tablet interface frame parameters with hid_dbg(), indented with two
drivers/hid/hid-uclogic-params.c:80: info: Scanning doc for function Dump
drivers/hid/hid-uclogic-params.c:105: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Dump tablet interface parameters with hid_dbg().
drivers/hid/hid-uclogic-params.c:105: warning: missing initial short description on line:
* Dump tablet interface parameters with hid_dbg().
One of them is reported by kernel test robot.
Fix these warnings by properly format kernel-doc comment for these
functions.
Link: https://lore.kernel.org/linux-doc/202205272033.XFYlYj8k-lkp@intel.com/
Fixes: a228809fa6f39c ("HID: uclogic: Move param printing to a function")
Reported-by: kernel test robot <lkp(a)intel.com>
Cc: Nikolai Kondrashov <spbnick(a)gmail.com>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
Cc: "José Expósito" <jose.exposito89(a)gmail.com>
Cc: llvm(a)lists.linux.dev
Cc: stable(a)vger.kernel.org # v5.18
Cc: linux-input(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
---
Changes since v1 [1]:
- Approach the warning by fixing kernel-doc comments formatting
(suggested by Jonathan Corbet)
[1]: https://lore.kernel.org/linux-doc/20220528091403.160169-1-bagasdotme@gmail.…
drivers/hid/hid-uclogic-params.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/drivers/hid/hid-uclogic-params.c b/drivers/hid/hid-uclogic-params.c
index db838f16282d64..647bbd3e000e2f 100644
--- a/drivers/hid/hid-uclogic-params.c
+++ b/drivers/hid/hid-uclogic-params.c
@@ -23,11 +23,11 @@
/**
* uclogic_params_pen_inrange_to_str() - Convert a pen in-range reporting type
* to a string.
- *
* @inrange: The in-range reporting type to convert.
*
- * Returns:
- * The string representing the type, or NULL if the type is unknown.
+ * Return:
+ * * The string representing the type, or
+ * * NULL if the type is unknown.
*/
static const char *uclogic_params_pen_inrange_to_str(
enum uclogic_params_pen_inrange inrange)
@@ -45,10 +45,12 @@ static const char *uclogic_params_pen_inrange_to_str(
}
/**
- * Dump tablet interface pen parameters with hid_dbg(), indented with one tab.
- *
+ * uclogic_params_pen_hid_dbg() - Dump tablet interface pen parameters
* @hdev: The HID device the pen parameters describe.
* @pen: The pen parameters to dump.
+ *
+ * Dump tablet interface pen parameters with hid_dbg(). The dump is indented
+ * with a tab.
*/
static void uclogic_params_pen_hid_dbg(const struct hid_device *hdev,
const struct uclogic_params_pen *pen)
@@ -77,11 +79,12 @@ static void uclogic_params_pen_hid_dbg(const struct hid_device *hdev,
}
/**
- * Dump tablet interface frame parameters with hid_dbg(), indented with two
- * tabs.
- *
+ * uclogic_params_frame_hid_dbg() - Dump tablet interface frame parameters
* @hdev: The HID device the pen parameters describe.
* @frame: The frame parameters to dump.
+ *
+ * Dump tablet interface frame parameters with hid_dbg(). The dump is
+ * indented with two tabs.
*/
static void uclogic_params_frame_hid_dbg(
const struct hid_device *hdev,
@@ -102,10 +105,11 @@ static void uclogic_params_frame_hid_dbg(
}
/**
- * Dump tablet interface parameters with hid_dbg().
- *
+ * uclogic_params_hid_dbg() - Dump tablet interface parameters
* @hdev: The HID device the parameters describe.
* @params: The parameters to dump.
+ *
+ * Dump tablet interface parameters with hid_dbg().
*/
void uclogic_params_hid_dbg(const struct hid_device *hdev,
const struct uclogic_params *params)
base-commit: 8ab2afa23bd197df47819a87f0265c0ac95c5b6a
--
An old man doll... just what I always wanted! - Clara