This is the start of the stable review cycle for the 5.17.13 release. There are 75 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.13-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.17.13-rc1
Kumar Kartikeya Dwivedi memxor@gmail.com bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access
Kumar Kartikeya Dwivedi memxor@gmail.com bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access
Yuntao Wang ytcoode@gmail.com bpf: Fix excessive memory allocation in stack_map_alloc()
KP Singh kpsingh@kernel.org bpf: Fix usage of trace RCU in local storage.
Liu Jian liujian56@huawei.com bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes
Alexei Starovoitov ast@kernel.org bpf: Fix combination of jit blinding and pointers to bpf subprogs.
Yuntao Wang ytcoode@gmail.com bpf: Fix potential array overflow in bpf_trampoline_get_progs()
Chuck Lever chuck.lever@oracle.com NFSD: Fix possible sleep during nfsd4_release_lockowner()
Trond Myklebust trond.myklebust@hammerspace.com NFS: Memory allocation failures are not server fatal errors
Akira Yokosawa akiyks@gmail.com docs: submitting-patches: Fix crossref to 'The canonical patch format'
Xiu Jianfeng xiujianfeng@huawei.com tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()
Stefan Mahnke-Hartmann stefan.mahnke-hartmann@infineon.com tpm: Fix buffer access in tpm2_get_tpm_pt()
Bryan O'Donoghue bryan.odonoghue@linaro.org media: i2c: imx412: Fix power_off ordering
Bryan O'Donoghue bryan.odonoghue@linaro.org media: i2c: imx412: Fix reset GPIO polarity
Reinette Chatre reinette.chatre@intel.com x86/sgx: Ensure no data in PCMD page after truncate
Reinette Chatre reinette.chatre@intel.com x86/sgx: Fix race between reclaimer and page fault handler
Reinette Chatre reinette.chatre@intel.com x86/sgx: Obtain backing storage page with enclave mutex held
Reinette Chatre reinette.chatre@intel.com x86/sgx: Mark PCMD page as dirty when modifying contents
Reinette Chatre reinette.chatre@intel.com x86/sgx: Disconnect backing page references from dirty status
Tao Jin tao-j@outlook.com HID: multitouch: add quirks to enable Lenovo X12 trackpoint
Marek Maślanka mm@semihalf.com HID: multitouch: Add support for Google Whiskers Touchpad
Randy Dunlap rdunlap@infradead.org fs/ntfs3: validate BOOT sectors_per_clusters
Mariusz Tkaczyk mariusz.tkaczyk@linux.intel.com raid5: introduce MD_BROKEN
Sarthak Kukreti sarthakkukreti@google.com dm verity: set DM_TARGET_IMMUTABLE feature flag
Mikulas Patocka mpatocka@redhat.com dm stats: add cond_resched when looping over entries
Mikulas Patocka mpatocka@redhat.com dm crypt: make printing of the key constant-time
Dan Carpenter dan.carpenter@oracle.com dm integrity: fix error code in dm_integrity_ctr()
Jonathan Bakker xc-racer2@live.ca ARM: dts: s5pv210: Correct interrupt name for bluetooth in Aries
Steven Rostedt rostedt@goodmis.org Bluetooth: hci_qca: Use del_timer_sync() before freeing
Craig McLure craig@mclure.net ALSA: usb-audio: Configure sync endpoints before data
Takashi Iwai tiwai@suse.de ALSA: usb-audio: Add missing ep_idx in fixed EP quirks
Takashi Iwai tiwai@suse.de ALSA: usb-audio: Workaround for clock setup on TEAC devices
Akira Yokosawa akiyks@gmail.com tools/memory-model/README: Update klitmus7 compat table
Sultan Alsawaf sultan@kerneltoast.com zsmalloc: fix races between asynchronous zspage free and page migration
Vitaly Chikunov vt@altlinux.org crypto: ecrdsa - Fix incorrect use of vli_cmp
Fabio Estevam festevam@denx.de crypto: caam - fix i.MX6SX entropy delay value
Ashish Kalra ashish.kalra@amd.com KVM: SVM: Use kzalloc for sev ioctl interfaces to prevent kernel data leak
Hou Wenlong houwenlong.hwl@antgroup.com KVM: x86/mmu: Don't rebuild page when the page is synced and no tlb flushing is required
Sean Christopherson seanjc@google.com KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2
Yanfei Xu yanfei.xu@intel.com KVM: x86: Fix the intel_pt PMI handling wrongly considered from guest
Maxim Levitsky mlevitsk@redhat.com KVM: x86: avoid loading a vCPU after .vm_destroy was called
Sean Christopherson seanjc@google.com KVM: x86: avoid calling x86 emulator without a decoded instruction
Maxim Levitsky mlevitsk@redhat.com KVM: x86: fix typo in __try_cmpxchg_user causing non-atomicness
Sean Christopherson seanjc@google.com KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses
Sean Christopherson seanjc@google.com KVM: x86: Use __try_cmpxchg_user() to update guest PTE A/D bits
Peter Zijlstra peterz@infradead.org x86/uaccess: Implement macros for CMPXCHG on user addresses
Paolo Bonzini pbonzini@redhat.com x86, kvm: use correct GFP flags for preemption disabled
Sean Christopherson seanjc@google.com x86/kvm: Alloc dummy async #PF token outside of raw spinlock
Sean Christopherson seanjc@google.com x86/fpu: KVM: Set the base guest FPU uABI size to sizeof(struct kvm_xsave)
Xiaomeng Tong xiam0nd.tong@gmail.com KVM: PPC: Book3S HV: fix incorrect NULL check on list iterator
Florian Westphal fw@strlen.de netfilter: conntrack: re-fetch conntrack after insertion
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: double hook unregistration in netns path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: hold mutex on netns pre_exit path
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: sanitize nft_set_desc_concat_parse()
Phil Sutter phil@nwl.cc netfilter: nft_limit: Clone packet limits' cost value
Yuezhang Mo Yuezhang.Mo@sony.com exfat: fix referencing wrong parent directory information after renaming
Tadeusz Struk tadeusz.struk@linaro.org exfat: check if cluster num is valid
Gustavo A. R. Silva gustavoars@kernel.org drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency()
Alex Elder elder@linaro.org net: ipa: compute proper aggregation limit
David Howells dhowells@redhat.com pipe: Fix missing lock in pipe_resize_ring()
Kuniyuki Iwashima kuniyu@amazon.co.jp pipe: make poll_usage boolean and annotate its access
Stephen Brennan stephen.s.brennan@oracle.com assoc_array: Fix BUG_ON during garbage collect
Dan Carpenter dan.carpenter@oracle.com i2c: ismt: prevent memory corruption in ismt_access()
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: disallow non-stateful expression in sets earlier
Piyush Malgujar pmalgujar@marvell.com drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers
Mika Westerberg mika.westerberg@linux.intel.com i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging
Joel Stanley joel@jms.id.au net: ftgmac100: Disable hardware checksum on AST2600
Lin Ma linma@zju.edu.cn nfc: pn533: Fix buggy cleanup order
Thomas Bartschies thomas.bartschies@cvk.de net: af_key: check encryption module availability consistency
Al Viro viro@zeniv.linux.org.uk percpu_ref_init(): clean ->percpu_count_ref on failure
Quentin Perret qperret@google.com KVM: arm64: Don't hypercall before EL2 init
IotaHydrae writeforever@foxmail.com pinctrl: sunxi: fix f1c100s uart2 function
Dustin L. Howett dustin@howett.net ALSA: hda/realtek: Add quirk for the Framework Laptop
Gabriele Mazzotta gabriele.mzt@gmail.com ALSA: hda/realtek: Add quirk for Dell Latitude 7520
Forest Crossman cyrozap@gmail.com ALSA: usb-audio: Don't get sample rate for MCT Trigger 5 USB-to-HDMI
-------------
Diffstat:
Documentation/process/submitting-patches.rst | 2 +- Makefile | 4 +- arch/arm/boot/dts/s5pv210-aries.dtsi | 2 +- arch/arm64/kvm/arm.c | 3 +- arch/powerpc/kvm/book3s_hv_uvmem.c | 8 +- arch/x86/include/asm/uaccess.h | 142 ++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/encl.c | 113 ++++++++++++++++++-- arch/x86/kernel/cpu/sgx/encl.h | 2 +- arch/x86/kernel/cpu/sgx/main.c | 13 ++- arch/x86/kernel/fpu/core.c | 17 ++- arch/x86/kernel/kvm.c | 41 +++++--- arch/x86/kvm/mmu/mmu.c | 18 ++-- arch/x86/kvm/mmu/paging_tmpl.h | 38 +------ arch/x86/kvm/svm/nested.c | 3 - arch/x86/kvm/svm/sev.c | 12 +-- arch/x86/kvm/vmx/nested.c | 3 - arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/x86.c | 76 +++++++------- crypto/ecrdsa.c | 8 +- drivers/bluetooth/hci_qca.c | 4 +- drivers/char/tpm/tpm2-cmd.c | 11 +- drivers/char/tpm/tpm_ibmvtpm.c | 1 + drivers/crypto/caam/ctrl.c | 18 ++++ drivers/gpu/drm/i915/intel_pm.c | 2 +- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-multitouch.c | 9 ++ drivers/i2c/busses/i2c-ismt.c | 17 +++ drivers/i2c/busses/i2c-thunderx-pcidrv.c | 1 + drivers/md/dm-crypt.c | 14 ++- drivers/md/dm-integrity.c | 2 - drivers/md/dm-stats.c | 8 ++ drivers/md/dm-verity-target.c | 1 + drivers/md/raid5.c | 47 ++++----- drivers/media/i2c/imx412.c | 8 +- drivers/net/ethernet/faraday/ftgmac100.c | 5 + drivers/net/ipa/ipa_endpoint.c | 4 +- drivers/nfc/pn533/pn533.c | 5 +- drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c | 2 +- fs/exfat/balloc.c | 8 +- fs/exfat/exfat_fs.h | 6 ++ fs/exfat/fatent.c | 6 -- fs/exfat/namei.c | 27 +---- fs/nfs/internal.h | 1 + fs/nfsd/nfs4state.c | 12 +-- fs/ntfs3/super.c | 10 +- fs/pipe.c | 33 +++--- include/linux/bpf_local_storage.h | 4 +- include/linux/pipe_fs_i.h | 2 +- include/net/netfilter/nf_conntrack_core.h | 7 +- kernel/bpf/bpf_inode_storage.c | 4 +- kernel/bpf/bpf_local_storage.c | 29 ++++-- kernel/bpf/bpf_task_storage.c | 4 +- kernel/bpf/core.c | 10 ++ kernel/bpf/stackmap.c | 1 - kernel/bpf/trampoline.c | 18 ++-- kernel/bpf/verifier.c | 17 ++- lib/assoc_array.c | 8 ++ lib/percpu-refcount.c | 1 + mm/zsmalloc.c | 37 ++++++- net/core/bpf_sk_storage.c | 6 +- net/core/filter.c | 4 +- net/key/af_key.c | 6 +- net/netfilter/nf_tables_api.c | 94 ++++++++++++----- net/netfilter/nft_limit.c | 2 + sound/pci/hda/patch_realtek.c | 54 ++++++++++ sound/usb/clock.c | 7 ++ sound/usb/pcm.c | 17 +-- sound/usb/quirks-table.h | 3 + sound/usb/quirks.c | 2 + tools/memory-model/README | 3 +- 70 files changed, 798 insertions(+), 312 deletions(-)
From: Forest Crossman cyrozap@gmail.com
[ Upstream commit d7be213849232a2accb219d537edf056d29186b4 ]
This device doesn't support reading the sample rate, so we need to apply this quirk to avoid a 15-second delay waiting for three timeouts.
Signed-off-by: Forest Crossman cyrozap@gmail.com Link: https://lore.kernel.org/r/20220504002444.114011-2-cyrozap@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c index ab9f3da49941..fbbe59054c3f 100644 --- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -1822,6 +1822,8 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = { QUIRK_FLAG_IGNORE_CTL_ERROR), DEVICE_FLG(0x06f8, 0xd002, /* Hercules DJ Console (Macintosh Edition) */ QUIRK_FLAG_IGNORE_CTL_ERROR), + DEVICE_FLG(0x0711, 0x5800, /* MCT Trigger 5 USB-to-HDMI */ + QUIRK_FLAG_GET_SAMPLE_RATE), DEVICE_FLG(0x074d, 0x3553, /* Outlaw RR2150 (Micronas UAC3553B) */ QUIRK_FLAG_GET_SAMPLE_RATE), DEVICE_FLG(0x08bb, 0x2702, /* LineX FM Transmitter */
From: Gabriele Mazzotta gabriele.mzt@gmail.com
[ Upstream commit 1efcdd9c1f34f5a6590bc9ac5471e562fb011386 ]
The driver is currently using ALC269_FIXUP_DELL4_MIC_NO_PRESENCE for the Latitude 7520, but this fixup chain has some issues:
- The internal mic is really loud and the recorded audio is distorted at "standard" audio levels.
- There are pop noises at system startup and when plugging/unplugging headphone jacks.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215885 Signed-off-by: Gabriele Mazzotta gabriele.mzt@gmail.com Link: https://lore.kernel.org/r/20220501124237.4667-1-gabriele.mzt@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 43 +++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index e38acdbe1a3b..cc3cf65ad5b9 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6773,6 +6773,41 @@ static void alc256_fixup_mic_no_presence_and_resume(struct hda_codec *codec, } }
+static void alc_fixup_dell4_mic_no_presence_quiet(struct hda_codec *codec, + const struct hda_fixup *fix, + int action) +{ + struct alc_spec *spec = codec->spec; + struct hda_input_mux *imux = &spec->gen.input_mux; + int i; + + alc269_fixup_limit_int_mic_boost(codec, fix, action); + + switch (action) { + case HDA_FIXUP_ACT_PRE_PROBE: + /** + * Set the vref of pin 0x19 (Headset Mic) and pin 0x1b (Headphone Mic) + * to Hi-Z to avoid pop noises at startup and when plugging and + * unplugging headphones. + */ + snd_hda_codec_set_pin_target(codec, 0x19, PIN_VREFHIZ); + snd_hda_codec_set_pin_target(codec, 0x1b, PIN_VREFHIZ); + break; + case HDA_FIXUP_ACT_PROBE: + /** + * Make the internal mic (0x12) the default input source to + * prevent pop noises on cold boot. + */ + for (i = 0; i < imux->num_items; i++) { + if (spec->gen.imux_pins[i] == 0x12) { + spec->gen.cur_mux[0] = i; + break; + } + } + break; + } +} + enum { ALC269_FIXUP_GPIO2, ALC269_FIXUP_SONY_VAIO, @@ -6814,6 +6849,7 @@ enum { ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, + ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, ALC269_FIXUP_HEADSET_MODE, ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC, ALC269_FIXUP_ASPIRE_HEADSET_MIC, @@ -8770,6 +8806,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC285_FIXUP_HP_MUTE_LED, }, + [ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc_fixup_dell4_mic_no_presence_quiet, + .chained = true, + .chain_id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, + }, };
static const struct snd_pci_quirk alc269_fixup_tbl[] = { @@ -8860,6 +8902,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1028, 0x09bf, "Dell Precision", ALC233_FIXUP_ASUS_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0a2e, "Dell", ALC236_FIXUP_DELL_AIO_HEADSET_MIC), SND_PCI_QUIRK(0x1028, 0x0a30, "Dell", ALC236_FIXUP_DELL_AIO_HEADSET_MIC), + SND_PCI_QUIRK(0x1028, 0x0a38, "Dell Latitude 7520", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET), SND_PCI_QUIRK(0x1028, 0x0a58, "Dell", ALC255_FIXUP_DELL_HEADSET_MIC), SND_PCI_QUIRK(0x1028, 0x0a61, "Dell XPS 15 9510", ALC289_FIXUP_DUAL_SPK), SND_PCI_QUIRK(0x1028, 0x0a62, "Dell Precision 5560", ALC289_FIXUP_DUAL_SPK),
From: Dustin L. Howett dustin@howett.net
[ Upstream commit 309d7363ca3d9fcdb92ff2d958be14d7e8707f68 ]
Some board revisions of the Framework Laptop have an ALC295 with a disconnected or faulty headset mic presence detect.
The "dell-headset-multi" fixup addresses this issue, but also enables an inoperative "Headphone Mic" input device whenever a headset is connected.
Adding a new quirk chain specific to the Framework Laptop resolves this issue. The one introduced here is based on the System76 "no headphone mic" quirk chain.
The VID:PID f111:0001 have been allocated to Framework Computer for this board revision.
Revision history: - v2: Moved to a custom quirk chain to suppress the "Headphone Mic" pincfg.
Signed-off-by: Dustin L. Howett dustin@howett.net Link: https://lore.kernel.org/r/20220511010759.3554-1-dustin@howett.net Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index cc3cf65ad5b9..53d1586b71ec 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7036,6 +7036,7 @@ enum { ALC287_FIXUP_LEGION_16ACHG6, ALC287_FIXUP_CS35L41_I2C_2, ALC285_FIXUP_HP_SPEAKERS_MICMUTE_LED, + ALC295_FIXUP_FRAMEWORK_LAPTOP_MIC_NO_PRESENCE, };
static const struct hda_fixup alc269_fixups[] = { @@ -8812,6 +8813,15 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, }, + [ALC295_FIXUP_FRAMEWORK_LAPTOP_MIC_NO_PRESENCE] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x19, 0x02a1112c }, /* use as headset mic, without its own jack detect */ + { } + }, + .chained = true, + .chain_id = ALC269_FIXUP_HEADSET_MODE_NO_HP_MIC + }, };
static const struct snd_pci_quirk alc269_fixup_tbl[] = { @@ -9292,6 +9302,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x8086, 0x2074, "Intel NUC 8", ALC233_FIXUP_INTEL_NUC8_DMIC), SND_PCI_QUIRK(0x8086, 0x2080, "Intel NUC 8 Rugged", ALC256_FIXUP_INTEL_NUC8_RUGGED), SND_PCI_QUIRK(0x8086, 0x2081, "Intel NUC 10", ALC256_FIXUP_INTEL_NUC10), + SND_PCI_QUIRK(0xf111, 0x0001, "Framework Laptop", ALC295_FIXUP_FRAMEWORK_LAPTOP_MIC_NO_PRESENCE),
#if 0 /* Below is a quirk table taken from the old code.
From: IotaHydrae writeforever@foxmail.com
[ Upstream commit fa8785e5931367e2b43f2c507f26bcf3e281c0ca ]
Change suniv f1c100s pinctrl,PD14 multiplexing function lvds1 to uart2
When the pin PD13 and PD14 is setting up to uart2 function in dts, there's an error occurred: 1c20800.pinctrl: unsupported function uart2 on pin PD14
Because 'uart2' is not any one multiplexing option of PD14, and pinctrl don't know how to configure it.
So change the pin PD14 lvds1 function to uart2.
Signed-off-by: IotaHydrae writeforever@foxmail.com Reviewed-by: Andre Przywara andre.przywara@arm.com Link: https://lore.kernel.org/r/tencent_70C1308DDA794C81CAEF389049055BACEC09@qq.co... Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c b/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c index 2801ca706273..68a5b627fb9b 100644 --- a/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c +++ b/drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c @@ -204,7 +204,7 @@ static const struct sunxi_desc_pin suniv_f1c100s_pins[] = { SUNXI_FUNCTION(0x0, "gpio_in"), SUNXI_FUNCTION(0x1, "gpio_out"), SUNXI_FUNCTION(0x2, "lcd"), /* D20 */ - SUNXI_FUNCTION(0x3, "lvds1"), /* RX */ + SUNXI_FUNCTION(0x3, "uart2"), /* RX */ SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 14)), SUNXI_PIN(SUNXI_PINCTRL_PIN(D, 15), SUNXI_FUNCTION(0x0, "gpio_in"),
From: Quentin Perret qperret@google.com
[ Upstream commit 2e40316753ee552fb598e8da8ca0d20a04e67453 ]
Will reported the following splat when running with Protected KVM enabled:
[ 2.427181] ------------[ cut here ]------------ [ 2.427668] WARNING: CPU: 3 PID: 1 at arch/arm64/kvm/mmu.c:489 __create_hyp_private_mapping+0x118/0x1ac [ 2.428424] Modules linked in: [ 2.429040] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc2-00084-g8635adc4efc7 #1 [ 2.429589] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 2.430286] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2.430734] pc : __create_hyp_private_mapping+0x118/0x1ac [ 2.431091] lr : create_hyp_exec_mappings+0x40/0x80 [ 2.431377] sp : ffff80000803baf0 [ 2.431597] x29: ffff80000803bb00 x28: 0000000000000000 x27: 0000000000000000 [ 2.432156] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 2.432561] x23: ffffcd96c343b000 x22: 0000000000000000 x21: ffff80000803bb40 [ 2.433004] x20: 0000000000000004 x19: 0000000000001800 x18: 0000000000000000 [ 2.433343] x17: 0003e68cf7efdd70 x16: 0000000000000004 x15: fffffc81f602a2c8 [ 2.434053] x14: ffffdf8380000000 x13: ffffcd9573200000 x12: ffffcd96c343b000 [ 2.434401] x11: 0000000000000004 x10: ffffcd96c1738000 x9 : 0000000000000004 [ 2.434812] x8 : ffff80000803bb40 x7 : 7f7f7f7f7f7f7f7f x6 : 544f422effff306b [ 2.435136] x5 : 000000008020001e x4 : ffff207d80a88c00 x3 : 0000000000000005 [ 2.435480] x2 : 0000000000001800 x1 : 000000014f4ab800 x0 : 000000000badca11 [ 2.436149] Call trace: [ 2.436600] __create_hyp_private_mapping+0x118/0x1ac [ 2.437576] create_hyp_exec_mappings+0x40/0x80 [ 2.438180] kvm_init_vector_slots+0x180/0x194 [ 2.458941] kvm_arch_init+0x80/0x274 [ 2.459220] kvm_init+0x48/0x354 [ 2.459416] arm_init+0x20/0x2c [ 2.459601] do_one_initcall+0xbc/0x238 [ 2.459809] do_initcall_level+0x94/0xb4 [ 2.460043] do_initcalls+0x54/0x94 [ 2.460228] do_basic_setup+0x1c/0x28 [ 2.460407] kernel_init_freeable+0x110/0x178 [ 2.460610] kernel_init+0x20/0x1a0 [ 2.460817] ret_from_fork+0x10/0x20 [ 2.461274] ---[ end trace 0000000000000000 ]---
Indeed, the Protected KVM mode promotes __create_hyp_private_mapping() to a hypercall as EL1 no longer has access to the hypervisor's stage-1 page-table. However, the call from kvm_init_vector_slots() happens after pKVM has been initialized on the primary CPU, but before it has been initialized on secondaries. As such, if the KVM initcall procedure is migrated from one CPU to another in this window, the hypercall may end up running on a CPU for which EL2 has not been initialized.
Fortunately, the pKVM hypervisor doesn't rely on the host to re-map the vectors in the private range, so the hypercall in question is in fact superfluous. Skip it when pKVM is enabled.
Reported-by: Will Deacon will@kernel.org Signed-off-by: Quentin Perret qperret@google.com [maz: simplified the checks slightly] Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20220513092607.35233-1-qperret@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kvm/arm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 25d8aff273a1..4dd5f3b08aaa 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1496,7 +1496,8 @@ static int kvm_init_vector_slots(void) base = kern_hyp_va(kvm_ksym_ref(__bp_harden_hyp_vecs)); kvm_init_vector_slot(base, HYP_VECTOR_SPECTRE_DIRECT);
- if (kvm_system_needs_idmapped_vectors() && !has_vhe()) { + if (kvm_system_needs_idmapped_vectors() && + !is_protected_kvm_enabled()) { err = create_hyp_exec_mappings(__pa_symbol(__bp_harden_hyp_vecs), __BP_HARDEN_HYP_VECS_SZ, &base); if (err)
From: Al Viro viro@zeniv.linux.org.uk
[ Upstream commit a91714312eb16f9ecd1f7f8b3efe1380075f28d4 ]
That way percpu_ref_exit() is safe after failing percpu_ref_init(). At least one user (cgroup_create()) had a double-free that way; there might be other similar bugs. Easier to fix in percpu_ref_init(), rather than playing whack-a-mole in sloppy users...
Usual symptoms look like a messed refcounting in one of subsystems that use percpu allocations (might be percpu-refcount, might be something else). Having refcounts for two different objects share memory is Not Nice(tm)...
Reported-by: syzbot+5b1e53987f858500ec00@syzkaller.appspotmail.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org --- lib/percpu-refcount.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index af9302141bcf..e5c5315da274 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -76,6 +76,7 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release, data = kzalloc(sizeof(*ref->data), gfp); if (!data) { free_percpu((void __percpu *)ref->percpu_count_ptr); + ref->percpu_count_ptr = 0; return -ENOMEM; }
From: Thomas Bartschies thomas.bartschies@cvk.de
[ Upstream commit 015c44d7bff3f44d569716117becd570c179ca32 ]
Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel produces invalid pfkey acquire messages, when these encryption modules are disabled. This happens because the availability of the algos wasn't checked in all necessary functions. This patch adds these checks.
Signed-off-by: Thomas Bartschies thomas.bartschies@cvk.de Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/key/af_key.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/key/af_key.c b/net/key/af_key.c index 92e9d75dba2f..339d95df19d3 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -2900,7 +2900,7 @@ static int count_ah_combs(const struct xfrm_tmpl *t) break; if (!aalg->pfkey_supported) continue; - if (aalg_tmpl_set(t, aalg)) + if (aalg_tmpl_set(t, aalg) && aalg->available) sz += sizeof(struct sadb_comb); } return sz + sizeof(struct sadb_prop); @@ -2918,7 +2918,7 @@ static int count_esp_combs(const struct xfrm_tmpl *t) if (!ealg->pfkey_supported) continue;
- if (!(ealg_tmpl_set(t, ealg))) + if (!(ealg_tmpl_set(t, ealg) && ealg->available)) continue;
for (k = 1; ; k++) { @@ -2929,7 +2929,7 @@ static int count_esp_combs(const struct xfrm_tmpl *t) if (!aalg->pfkey_supported) continue;
- if (aalg_tmpl_set(t, aalg)) + if (aalg_tmpl_set(t, aalg) && aalg->available) sz += sizeof(struct sadb_comb); } }
From: Lin Ma linma@zju.edu.cn
[ Upstream commit b8cedb7093b2d1394cae9b86494cba4b62d3a30a ]
When removing the pn533 device (i2c or USB), there is a logic error. The original code first cancels the worker (flush_delayed_work) and then destroys the workqueue (destroy_workqueue), leaving the timer the last one to be deleted (del_timer). This result in a possible race condition in a multi-core preempt-able kernel. That is, if the cleanup (pn53x_common_clean) is concurrently run with the timer handler (pn533_listen_mode_timer), the timer can queue the poll_work to the already destroyed workqueue, causing use-after-free.
This patch reorder the cleanup: it uses the del_timer_sync to make sure the handler is finished before the routine will destroy the workqueue. Note that the timer cannot be activated by the worker again.
static void pn533_wq_poll(struct work_struct *work) ... rc = pn533_send_poll_frame(dev); if (rc) return;
if (cur_mod->len == 0 && dev->poll_mod_count > 1) mod_timer(&dev->listen_timer, ...);
That is, the mod_timer can be called only when pn533_send_poll_frame() returns no error, which is impossible because the device is detaching and the lower driver should return ENODEV code.
Signed-off-by: Lin Ma linma@zju.edu.cn Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nfc/pn533/pn533.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/nfc/pn533/pn533.c b/drivers/nfc/pn533/pn533.c index a491db46e3bd..d9f6367b9993 100644 --- a/drivers/nfc/pn533/pn533.c +++ b/drivers/nfc/pn533/pn533.c @@ -2787,13 +2787,14 @@ void pn53x_common_clean(struct pn533 *priv) { struct pn533_cmd *cmd, *n;
+ /* delete the timer before cleanup the worker */ + del_timer_sync(&priv->listen_timer); + flush_delayed_work(&priv->poll_work); destroy_workqueue(priv->wq);
skb_queue_purge(&priv->resp_q);
- del_timer(&priv->listen_timer); - list_for_each_entry_safe(cmd, n, &priv->cmd_queue, queue) { list_del(&cmd->queue); kfree(cmd);
From: Joel Stanley joel@jms.id.au
[ Upstream commit 6fd45e79e8b93b8d22fb8fe22c32fbad7e9190bd ]
The AST2600 when using the i210 NIC over NC-SI has been observed to produce incorrect checksum results with specific MTU values. This was first observed when sending data across a long distance set of networks.
On a local network, the following test was performed using a 1MB file of random data.
On the receiver run this script:
#!/bin/bash while [ 1 ]; do # Zero the stats nstat -r > /dev/null nc -l 9899 > test-file # Check for checksum errors TcpInCsumErrors=$(nstat | grep TcpInCsumErrors) if [ -z "$TcpInCsumErrors" ]; then echo No TcpInCsumErrors else echo TcpInCsumErrors = $TcpInCsumErrors fi done
On an AST2600 system:
# nc <IP of receiver host> 9899 < test-file
The test was repeated with various MTU values:
# ip link set mtu 1410 dev eth0
The observed results:
1500 - good 1434 - bad 1400 - good 1410 - bad 1420 - good
The test was repeated after disabling tx checksumming:
# ethtool -K eth0 tx-checksumming off
And all MTU values tested resulted in transfers without error.
An issue with the driver cannot be ruled out, however there has been no bug discovered so far.
David has done the work to take the original bug report of slow data transfer between long distance connections and triaged it down to this test case.
The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers to the patch that introduced AST2600 support.
Reported-by: David Wilder wilder@us.ibm.com Reviewed-by: Dylan Hung dylan_hung@aspeedtech.com Signed-off-by: Joel Stanley joel@jms.id.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c index caf48023f8ea..5231818943c6 100644 --- a/drivers/net/ethernet/faraday/ftgmac100.c +++ b/drivers/net/ethernet/faraday/ftgmac100.c @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device *pdev) /* AST2400 doesn't have working HW checksum generation */ if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac"))) netdev->hw_features &= ~NETIF_F_HW_CSUM; + + /* AST2600 tx checksum with NCSI is broken */ + if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac")) + netdev->hw_features &= ~NETIF_F_HW_CSUM; + if (np && of_get_property(np, "no-hw-checksum", NULL)) netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM); netdev->features |= netdev->hw_features;
From: Mika Westerberg mika.westerberg@linux.intel.com
[ Upstream commit 17a0f3acdc6ec8b89ad40f6e22165a4beee25663 ]
Before sending a MSI the hardware writes information pertinent to the interrupt cause to a memory location pointed by SMTICL register. This memory holds three double words where the least significant bit tells whether the interrupt cause of master/target/error is valid. The driver does not use this but we need to set it up because otherwise it will perform DMA write to the default address (0) and this will cause an IOMMU fault such as below:
DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Write] Request device [00:12.0] PASID ffffffff fault addr 0 [fault reason 05] PTE Write access is not set
To prevent this from happening, provide a proper DMA buffer for this that then gets mapped by the IOMMU accordingly.
Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: From: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-ismt.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/drivers/i2c/busses/i2c-ismt.c b/drivers/i2c/busses/i2c-ismt.c index f4820fd3dc13..c01430ce103a 100644 --- a/drivers/i2c/busses/i2c-ismt.c +++ b/drivers/i2c/busses/i2c-ismt.c @@ -82,6 +82,7 @@
#define ISMT_DESC_ENTRIES 2 /* number of descriptor entries */ #define ISMT_MAX_RETRIES 3 /* number of SMBus retries to attempt */ +#define ISMT_LOG_ENTRIES 3 /* number of interrupt cause log entries */
/* Hardware Descriptor Constants - Control Field */ #define ISMT_DESC_CWRL 0x01 /* Command/Write Length */ @@ -175,6 +176,8 @@ struct ismt_priv { u8 head; /* ring buffer head pointer */ struct completion cmp; /* interrupt completion */ u8 buffer[I2C_SMBUS_BLOCK_MAX + 16]; /* temp R/W data buffer */ + dma_addr_t log_dma; + u32 *log; };
static const struct pci_device_id ismt_ids[] = { @@ -411,6 +414,9 @@ static int ismt_access(struct i2c_adapter *adap, u16 addr, memset(desc, 0, sizeof(struct ismt_desc)); desc->tgtaddr_rw = ISMT_DESC_ADDR_RW(addr, read_write);
+ /* Always clear the log entries */ + memset(priv->log, 0, ISMT_LOG_ENTRIES * sizeof(u32)); + /* Initialize common control bits */ if (likely(pci_dev_msi_enabled(priv->pci_dev))) desc->control = ISMT_DESC_INT | ISMT_DESC_FAIR; @@ -708,6 +714,8 @@ static void ismt_hw_init(struct ismt_priv *priv) /* initialize the Master Descriptor Base Address (MDBA) */ writeq(priv->io_rng_dma, priv->smba + ISMT_MSTR_MDBA);
+ writeq(priv->log_dma, priv->smba + ISMT_GR_SMTICL); + /* initialize the Master Control Register (MCTRL) */ writel(ISMT_MCTRL_MEIE, priv->smba + ISMT_MSTR_MCTRL);
@@ -795,6 +803,12 @@ static int ismt_dev_init(struct ismt_priv *priv) priv->head = 0; init_completion(&priv->cmp);
+ priv->log = dmam_alloc_coherent(&priv->pci_dev->dev, + ISMT_LOG_ENTRIES * sizeof(u32), + &priv->log_dma, GFP_KERNEL); + if (!priv->log) + return -ENOMEM; + return 0; }
From: Piyush Malgujar pmalgujar@marvell.com
[ Upstream commit 03a35bc856ddc09f2cc1f4701adecfbf3b464cb3 ]
Due to i2c->adap.dev.fwnode not being set, ACPI_COMPANION() wasn't properly found for TWSI controllers.
Signed-off-by: Szymon Balcerak sbalcerak@marvell.com Signed-off-by: Piyush Malgujar pmalgujar@marvell.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-thunderx-pcidrv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/i2c/busses/i2c-thunderx-pcidrv.c b/drivers/i2c/busses/i2c-thunderx-pcidrv.c index 12c90aa0900e..a77cd86fe75e 100644 --- a/drivers/i2c/busses/i2c-thunderx-pcidrv.c +++ b/drivers/i2c/busses/i2c-thunderx-pcidrv.c @@ -213,6 +213,7 @@ static int thunder_i2c_probe_pci(struct pci_dev *pdev, i2c->adap.bus_recovery_info = &octeon_i2c_recovery_info; i2c->adap.dev.parent = dev; i2c->adap.dev.of_node = pdev->dev.of_node; + i2c->adap.dev.fwnode = dev->fwnode; snprintf(i2c->adap.name, sizeof(i2c->adap.name), "Cavium ThunderX i2c adapter at %s", dev_name(dev)); i2c_set_adapdata(&i2c->adap, i2c);
From: Pablo Neira Ayuso pablo@netfilter.org
commit 520778042ccca019f3ffa136dd0ca565c486cedd upstream.
Since 3e135cd499bf ("netfilter: nft_dynset: dynamic stateful expression instantiation"), it is possible to attach stateful expressions to set elements.
cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") introduces conditional destruction on the object to accomodate transaction semantics.
nft_expr_init() calls expr->ops->init() first, then check for NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful lookup expressions which points to a set, which might lead to UAF since the set is not properly detached from the set->binding for this case. Anyway, this combination is non-sense from nf_tables perspective.
This patch fixes this problem by checking for NFT_STATEFUL_EXPR before expr->ops->init() is called.
The reporter provides a KASAN splat and a poc reproducer (similar to those autogenerated by syzbot to report use-after-free errors). It is unknown to me if they are using syzbot or if they use similar automated tool to locate the bug that they are reporting.
For the record, this is the KASAN splat.
[ 85.431824] ================================================================== [ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20 [ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776 [ 85.434756] [ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ #2 [ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Fixes: 0b2d8a7b638b ("netfilter: nf_tables: add helper functions for expression handling") Reported-and-tested-by: Aaron Adams edg-e@nccgroup.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2794,27 +2794,31 @@ static struct nft_expr *nft_expr_init(co
err = nf_tables_expr_parse(ctx, nla, &expr_info); if (err < 0) - goto err1; + goto err_expr_parse; + + err = -EOPNOTSUPP; + if (!(expr_info.ops->type->flags & NFT_EXPR_STATEFUL)) + goto err_expr_stateful;
err = -ENOMEM; expr = kzalloc(expr_info.ops->size, GFP_KERNEL); if (expr == NULL) - goto err2; + goto err_expr_stateful;
err = nf_tables_newexpr(ctx, &expr_info, expr); if (err < 0) - goto err3; + goto err_expr_new;
return expr; -err3: +err_expr_new: kfree(expr); -err2: +err_expr_stateful: owner = expr_info.ops->type->owner; if (expr_info.ops->type->release_ops) expr_info.ops->type->release_ops(expr_info.ops);
module_put(owner); -err1: +err_expr_parse: return ERR_PTR(err); }
@@ -5334,9 +5338,6 @@ struct nft_expr *nft_set_elem_expr_alloc return expr;
err = -EOPNOTSUPP; - if (!(expr->ops->type->flags & NFT_EXPR_STATEFUL)) - goto err_set_elem_expr; - if (expr->ops->type->flags & NFT_EXPR_GC) { if (set->flags & NFT_SET_TIMEOUT) goto err_set_elem_expr;
From: Dan Carpenter dan.carpenter@oracle.com
commit 690b2549b19563ec5ad53e5c82f6a944d910086e upstream.
The "data->block[0]" variable comes from the user and is a number between 0-255. It needs to be capped to prevent writing beyond the end of dma_buffer[].
Fixes: 5e9a97b1f449 ("i2c: ismt: Adding support for I2C_SMBUS_BLOCK_PROC_CALL") Reported-and-tested-by: Zheyu Ma zheyuma97@gmail.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-ismt.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/i2c/busses/i2c-ismt.c +++ b/drivers/i2c/busses/i2c-ismt.c @@ -528,6 +528,9 @@ static int ismt_access(struct i2c_adapte
case I2C_SMBUS_BLOCK_PROC_CALL: dev_dbg(dev, "I2C_SMBUS_BLOCK_PROC_CALL\n"); + if (data->block[0] > I2C_SMBUS_BLOCK_MAX) + return -EINVAL; + dma_size = I2C_SMBUS_BLOCK_MAX; desc->tgtaddr_rw = ISMT_DESC_ADDR_RW(addr, 1); desc->wr_len_cmd = data->block[0] + 1;
From: Stephen Brennan stephen.s.brennan@oracle.com
commit d1dc87763f406d4e67caf16dbe438a5647692395 upstream.
A rare BUG_ON triggered in assoc_array_gc:
[3430308.818153] kernel BUG at lib/assoc_array.c:1609!
Which corresponded to the statement currently at line 1593 upstream:
BUG_ON(assoc_array_ptr_is_meta(p));
Using the data from the core dump, I was able to generate a userspace reproducer[1] and determine the cause of the bug.
[1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc
After running the iterator on the entire branch, an internal tree node looked like the following:
NODE (nr_leaves_on_branch: 3) SLOT [0] NODE (2 leaves) SLOT [1] NODE (1 leaf) SLOT [2..f] NODE (empty)
In the userspace reproducer, the pr_devel output when compressing this node was:
-- compress node 0x5607cc089380 -- free=0, leaves=0 [0] retain node 2/1 [nx 0] [1] fold node 1/1 [nx 0] [2] fold node 0/1 [nx 2] [3] fold node 0/2 [nx 2] [4] fold node 0/3 [nx 2] [5] fold node 0/4 [nx 2] [6] fold node 0/5 [nx 2] [7] fold node 0/6 [nx 2] [8] fold node 0/7 [nx 2] [9] fold node 0/8 [nx 2] [10] fold node 0/9 [nx 2] [11] fold node 0/10 [nx 2] [12] fold node 0/11 [nx 2] [13] fold node 0/12 [nx 2] [14] fold node 0/13 [nx 2] [15] fold node 0/14 [nx 2] after: 3
At slot 0, an internal node with 2 leaves could not be folded into the node, because there was only one available slot (slot 0). Thus, the internal node was retained. At slot 1, the node had one leaf, and was able to be folded in successfully. The remaining nodes had no leaves, and so were removed. By the end of the compression stage, there were 14 free slots, and only 3 leaf nodes. The tree was ascended and then its parent node was compressed. When this node was seen, it could not be folded, due to the internal node it contained.
The invariant for compression in this function is: whenever nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all leaf nodes. The compression step currently cannot guarantee this, given the corner case shown above.
To fix this issue, retry compression whenever we have retained a node, and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second compression will then allow the node in slot 1 to be folded in, satisfying the invariant. Below is the output of the reproducer once the fix is applied:
-- compress node 0x560e9c562380 -- free=0, leaves=0 [0] retain node 2/1 [nx 0] [1] fold node 1/1 [nx 0] [2] fold node 0/1 [nx 2] [3] fold node 0/2 [nx 2] [4] fold node 0/3 [nx 2] [5] fold node 0/4 [nx 2] [6] fold node 0/5 [nx 2] [7] fold node 0/6 [nx 2] [8] fold node 0/7 [nx 2] [9] fold node 0/8 [nx 2] [10] fold node 0/9 [nx 2] [11] fold node 0/10 [nx 2] [12] fold node 0/11 [nx 2] [13] fold node 0/12 [nx 2] [14] fold node 0/13 [nx 2] [15] fold node 0/14 [nx 2] internal nodes remain despite enough space, retrying -- compress node 0x560e9c562380 -- free=14, leaves=1 [0] fold node 2/15 [nx 0] after: 3
Changes ======= DH: - Use false instead of 0. - Reorder the inserted lines in a couple of places to put retained before next_slot.
ver #2) - Fix typo in pr_devel, correct comparison to "<="
Fixes: 3cb989501c26 ("Add a generic associative array implementation.") Cc: stable@vger.kernel.org Signed-off-by: Stephen Brennan stephen.s.brennan@oracle.com Signed-off-by: David Howells dhowells@redhat.com cc: Andrew Morton akpm@linux-foundation.org cc: keyrings@vger.kernel.org Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.c... # v1 Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.c... # v2 Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/assoc_array.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/lib/assoc_array.c +++ b/lib/assoc_array.c @@ -1461,6 +1461,7 @@ int assoc_array_gc(struct assoc_array *a struct assoc_array_ptr *cursor, *ptr; struct assoc_array_ptr *new_root, *new_parent, **new_ptr_pp; unsigned long nr_leaves_on_tree; + bool retained; int keylen, slot, nr_free, next_slot, i;
pr_devel("-->%s()\n", __func__); @@ -1536,6 +1537,7 @@ continue_node: goto descend; }
+retry_compress: pr_devel("-- compress node %p --\n", new_n);
/* Count up the number of empty slots in this node and work out the @@ -1553,6 +1555,7 @@ continue_node: pr_devel("free=%d, leaves=%lu\n", nr_free, new_n->nr_leaves_on_branch);
/* See what we can fold in */ + retained = false; next_slot = 0; for (slot = 0; slot < ASSOC_ARRAY_FAN_OUT; slot++) { struct assoc_array_shortcut *s; @@ -1602,9 +1605,14 @@ continue_node: pr_devel("[%d] retain node %lu/%d [nx %d]\n", slot, child->nr_leaves_on_branch, nr_free + 1, next_slot); + retained = true; } }
+ if (retained && new_n->nr_leaves_on_branch <= ASSOC_ARRAY_FAN_OUT) { + pr_devel("internal nodes remain despite enough space, retrying\n"); + goto retry_compress; + } pr_devel("after: %lu\n", new_n->nr_leaves_on_branch);
nr_leaves_on_tree = new_n->nr_leaves_on_branch;
From: Kuniyuki Iwashima kuniyu@amazon.co.jp
commit f485922d8fe4e44f6d52a5bb95a603b7c65554bb upstream.
Patch series "Fix data-races around epoll reported by KCSAN."
This series suppresses a false positive KCSAN's message and fixes a real data-race.
This patch (of 2):
pipe_poll() runs locklessly and assigns 1 to poll_usage. Once poll_usage is set to 1, it never changes in other places. However, concurrent writes of a value trigger KCSAN, so let's make KCSAN happy.
BUG: KCSAN: data-race in pipe_poll / pipe_poll
write to 0xffff8880042f6678 of 4 bytes by task 174 on cpu 3: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
write to 0xffff8880042f6678 of 4 bytes by task 177 on cpu 1: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 177 Comm: epoll_race Not tainted 5.17.0-58927-gf443e374ae13 #6 Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014
Link: https://lkml.kernel.org/r/20220322002653.33865-1-kuniyu@amazon.co.jp Link: https://lkml.kernel.org/r/20220322002653.33865-2-kuniyu@amazon.co.jp Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.co.jp Cc: Alexander Duyck alexander.h.duyck@intel.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Davidlohr Bueso dave@stgolabs.net Cc: Kuniyuki Iwashima kuni1840@gmail.com Cc: "Soheil Hassas Yeganeh" soheil@google.com Cc: "Sridhar Samudrala" sridhar.samudrala@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pipe.c | 2 +- include/linux/pipe_fs_i.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/fs/pipe.c +++ b/fs/pipe.c @@ -653,7 +653,7 @@ pipe_poll(struct file *filp, poll_table unsigned int head, tail;
/* Epoll has some historical nasty semantics, this enables them */ - pipe->poll_usage = 1; + WRITE_ONCE(pipe->poll_usage, true);
/* * Reading pipe state only -- no need for acquiring the semaphore. --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -71,7 +71,7 @@ struct pipe_inode_info { unsigned int files; unsigned int r_counter; unsigned int w_counter; - unsigned int poll_usage; + bool poll_usage; struct page *tmp_page; struct fasync_struct *fasync_readers; struct fasync_struct *fasync_writers;
From: David Howells dhowells@redhat.com
commit 189b0ddc245139af81198d1a3637cac74f96e13a upstream.
pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to prevent post_one_notification() from trying to insert into the ring whilst the ring is being replaced.
The occupancy check must be done after the lock is taken, and the lock must be taken after the new ring is allocated.
The bug can lead to an oops looking something like:
BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840 Read of size 4 at addr ffff88801cc72a70 by task poc/27196 ... Call Trace: post_one_notification.isra.0+0x62e/0x840 __post_watch_notification+0x3b7/0x650 key_create_or_update+0xb8b/0xd20 __do_sys_add_key+0x175/0x340 __x64_sys_add_key+0xbe/0x140 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae
Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero Day Initiative.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17291 Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pipe.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-)
--- a/fs/pipe.c +++ b/fs/pipe.c @@ -1245,30 +1245,33 @@ unsigned int round_pipe_size(unsigned lo
/* * Resize the pipe ring to a number of slots. + * + * Note the pipe can be reduced in capacity, but only if the current + * occupancy doesn't exceed nr_slots; if it does, EBUSY will be + * returned instead. */ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots) { struct pipe_buffer *bufs; unsigned int head, tail, mask, n;
- /* - * We can shrink the pipe, if arg is greater than the ring occupancy. - * Since we don't expect a lot of shrink+grow operations, just free and - * allocate again like we would do for growing. If the pipe currently - * contains more buffers than arg, then return busy. - */ - mask = pipe->ring_size - 1; - head = pipe->head; - tail = pipe->tail; - n = pipe_occupancy(pipe->head, pipe->tail); - if (nr_slots < n) - return -EBUSY; - bufs = kcalloc(nr_slots, sizeof(*bufs), GFP_KERNEL_ACCOUNT | __GFP_NOWARN); if (unlikely(!bufs)) return -ENOMEM;
+ spin_lock_irq(&pipe->rd_wait.lock); + mask = pipe->ring_size - 1; + head = pipe->head; + tail = pipe->tail; + + n = pipe_occupancy(head, tail); + if (nr_slots < n) { + spin_unlock_irq(&pipe->rd_wait.lock); + kfree(bufs); + return -EBUSY; + } + /* * The pipe array wraps around, so just start the new one at zero * and adjust the indices. @@ -1300,6 +1303,8 @@ int pipe_resize_ring(struct pipe_inode_i pipe->tail = tail; pipe->head = head;
+ spin_unlock_irq(&pipe->rd_wait.lock); + /* This might have made more room for writers */ wake_up_interruptible(&pipe->wr_wait); return 0;
From: Alex Elder elder@linaro.org
commit c5794097b269f15961ed78f7f27b50e51766dec9 upstream.
The aggregation byte limit for an endpoint is currently computed based on the endpoint's receive buffer size.
However, some bytes at the front of each receive buffer are reserved on the assumption that--as with SKBs--it might be useful to insert data (such as headers) before what lands in the buffer.
The aggregation byte limit currently doesn't take into account that reserved space, and as a result, aggregation could require space past that which is available in the buffer.
Fix this by reducing the size used to compute the aggregation byte limit by the NET_SKB_PAD offset reserved for each receive buffer.
Signed-off-by: Alex Elder elder@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ipa/ipa_endpoint.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ipa/ipa_endpoint.c +++ b/drivers/net/ipa/ipa_endpoint.c @@ -723,13 +723,15 @@ static void ipa_endpoint_init_aggr(struc
if (endpoint->data->aggregation) { if (!endpoint->toward_ipa) { + u32 buffer_size; bool close_eof; u32 limit;
val |= u32_encode_bits(IPA_ENABLE_AGGR, AGGR_EN_FMASK); val |= u32_encode_bits(IPA_GENERIC, AGGR_TYPE_FMASK);
- limit = ipa_aggr_size_kb(IPA_RX_BUFFER_SIZE); + buffer_size = IPA_RX_BUFFER_SIZE - NET_SKB_PAD; + limit = ipa_aggr_size_kb(buffer_size); val |= aggr_byte_limit_encoded(version, limit);
limit = IPA_AGGR_TIME_LIMIT;
From: Gustavo A. R. Silva gustavoars@kernel.org
commit 336feb502a715909a8136eb6a62a83d7268a353b upstream.
Fix the following -Wstringop-overflow warnings when building with GCC-11:
drivers/gpu/drm/i915/intel_pm.c:3106:9: warning: ‘intel_read_wm_latency’ accessing 16 bytes in a region of size 10 [-Wstringop-overflow=] 3106 | intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/i915/intel_pm.c:3106:9: note: referencing argument 2 of type ‘u16 *’ {aka ‘short unsigned int *’} drivers/gpu/drm/i915/intel_pm.c:2861:13: note: in a call to function ‘intel_read_wm_latency’ 2861 | static void intel_read_wm_latency(struct drm_i915_private *dev_priv, | ^~~~~~~~~~~~~~~~~~~~~
by removing the over-specified array size from the argument declarations.
It seems that this code is actually safe because the size of the array depends on the hardware generation, and the function checks for that.
Notice that wm can be an array of 5 elements: drivers/gpu/drm/i915/intel_pm.c:3109: intel_read_wm_latency(dev_priv, dev_priv->wm.pri_latency);
or an array of 8 elements: drivers/gpu/drm/i915/intel_pm.c:3131: intel_read_wm_latency(dev_priv, dev_priv->wm.skl_latency);
and the compiler legitimately complains about that.
This helps with the ongoing efforts to globally enable -Wstringop-overflow.
Link: https://github.com/KSPP/linux/issues/181 Signed-off-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/intel_pm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -2876,7 +2876,7 @@ static void ilk_compute_wm_level(const s }
static void intel_read_wm_latency(struct drm_i915_private *dev_priv, - u16 wm[8]) + u16 wm[]) { struct intel_uncore *uncore = &dev_priv->uncore;
From: Tadeusz Struk tadeusz.struk@linaro.org
commit 64ba4b15e5c045f8b746c6da5fc9be9a6b00b61d upstream.
Syzbot reported slab-out-of-bounds read in exfat_clear_bitmap. This was triggered by reproducer calling truncute with size 0, which causes the following trace:
BUG: KASAN: slab-out-of-bounds in exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 Read of size 8 at addr ffff888115aa9508 by task syz-executor251/365
Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack_lvl+0x1e2/0x24b lib/dump_stack.c:118 print_address_description+0x81/0x3c0 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report+0x1a4/0x1f0 mm/kasan/report.c:436 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:309 exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 exfat_free_cluster+0x25a/0x4a0 fs/exfat/fatent.c:181 __exfat_truncate+0x99e/0xe00 fs/exfat/file.c:217 exfat_truncate+0x11b/0x4f0 fs/exfat/file.c:243 exfat_setattr+0xa03/0xd40 fs/exfat/file.c:339 notify_change+0xb76/0xe10 fs/attr.c:336 do_truncate+0x1ea/0x2d0 fs/open.c:65
Move the is_valid_cluster() helper from fatent.c to a common header to make it reusable in other *.c files. And add is_valid_cluster() to validate if cluster number is within valid range in exfat_clear_bitmap() and exfat_set_bitmap().
Link: https://syzkaller.appspot.com/bug?id=50381fc73821ecae743b8cf24b4c9a04776f767... Reported-by: syzbot+a4087e40b9c13aad7892@syzkaller.appspotmail.com Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tadeusz Struk tadeusz.struk@linaro.org Reviewed-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/exfat/balloc.c | 8 ++++++-- fs/exfat/exfat_fs.h | 6 ++++++ fs/exfat/fatent.c | 6 ------ 3 files changed, 12 insertions(+), 8 deletions(-)
--- a/fs/exfat/balloc.c +++ b/fs/exfat/balloc.c @@ -148,7 +148,9 @@ int exfat_set_bitmap(struct inode *inode struct super_block *sb = inode->i_sb; struct exfat_sb_info *sbi = EXFAT_SB(sb);
- WARN_ON(clu < EXFAT_FIRST_CLUSTER); + if (!is_valid_cluster(sbi, clu)) + return -EINVAL; + ent_idx = CLUSTER_TO_BITMAP_ENT(clu); i = BITMAP_OFFSET_SECTOR_INDEX(sb, ent_idx); b = BITMAP_OFFSET_BIT_IN_SECTOR(sb, ent_idx); @@ -166,7 +168,9 @@ void exfat_clear_bitmap(struct inode *in struct exfat_sb_info *sbi = EXFAT_SB(sb); struct exfat_mount_options *opts = &sbi->options;
- WARN_ON(clu < EXFAT_FIRST_CLUSTER); + if (!is_valid_cluster(sbi, clu)) + return; + ent_idx = CLUSTER_TO_BITMAP_ENT(clu); i = BITMAP_OFFSET_SECTOR_INDEX(sb, ent_idx); b = BITMAP_OFFSET_BIT_IN_SECTOR(sb, ent_idx); --- a/fs/exfat/exfat_fs.h +++ b/fs/exfat/exfat_fs.h @@ -380,6 +380,12 @@ static inline int exfat_sector_to_cluste EXFAT_RESERVED_CLUSTERS; }
+static inline bool is_valid_cluster(struct exfat_sb_info *sbi, + unsigned int clus) +{ + return clus >= EXFAT_FIRST_CLUSTER && clus < sbi->num_clusters; +} + /* super.c */ int exfat_set_volume_dirty(struct super_block *sb); int exfat_clear_volume_dirty(struct super_block *sb); --- a/fs/exfat/fatent.c +++ b/fs/exfat/fatent.c @@ -81,12 +81,6 @@ int exfat_ent_set(struct super_block *sb return 0; }
-static inline bool is_valid_cluster(struct exfat_sb_info *sbi, - unsigned int clus) -{ - return clus >= EXFAT_FIRST_CLUSTER && clus < sbi->num_clusters; -} - int exfat_ent_get(struct super_block *sb, unsigned int loc, unsigned int *content) {
From: Yuezhang Mo Yuezhang.Mo@sony.com
commit d8dad2588addd1d861ce19e7df3b702330f0c7e3 upstream.
During renaming, the parent directory information maybe updated. But the file/directory still references to the old parent directory information.
This bug will cause 2 problems.
(1) The renamed file can not be written.
[10768.175172] exFAT-fs (sda1): error, failed to bmap (inode : 7afd50e4 iblock : 0, err : -5) [10768.184285] exFAT-fs (sda1): Filesystem has been set read-only ash: write error: Input/output error
(2) Some dentries of the renamed file/directory are not set to deleted after removing the file/directory.
exfat_update_parent_info() is a workaround for the wrong parent directory information being used after renaming. Now that bug is fixed, this is no longer needed, so remove it.
Fixes: 5f2aa075070c ("exfat: add inode operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Yuezhang Mo Yuezhang.Mo@sony.com Reviewed-by: Andy Wu Andy.Wu@sony.com Reviewed-by: Aoyama Wataru wataru.aoyama@sony.com Reviewed-by: Daniel Palmer daniel.palmer@sony.com Reviewed-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/exfat/namei.c | 27 +-------------------------- 1 file changed, 1 insertion(+), 26 deletions(-)
--- a/fs/exfat/namei.c +++ b/fs/exfat/namei.c @@ -1062,6 +1062,7 @@ static int exfat_rename_file(struct inod
exfat_remove_entries(inode, p_dir, oldentry, 0, num_old_entries); + ei->dir = *p_dir; ei->entry = newentry; } else { if (exfat_get_entry_type(epold) == TYPE_FILE) { @@ -1149,28 +1150,6 @@ static int exfat_move_file(struct inode return 0; }
-static void exfat_update_parent_info(struct exfat_inode_info *ei, - struct inode *parent_inode) -{ - struct exfat_sb_info *sbi = EXFAT_SB(parent_inode->i_sb); - struct exfat_inode_info *parent_ei = EXFAT_I(parent_inode); - loff_t parent_isize = i_size_read(parent_inode); - - /* - * the problem that struct exfat_inode_info caches wrong parent info. - * - * because of flag-mismatch of ei->dir, - * there is abnormal traversing cluster chain. - */ - if (unlikely(parent_ei->flags != ei->dir.flags || - parent_isize != EXFAT_CLU_TO_B(ei->dir.size, sbi) || - parent_ei->start_clu != ei->dir.dir)) { - exfat_chain_set(&ei->dir, parent_ei->start_clu, - EXFAT_B_TO_CLU_ROUND_UP(parent_isize, sbi), - parent_ei->flags); - } -} - /* rename or move a old file into a new file */ static int __exfat_rename(struct inode *old_parent_inode, struct exfat_inode_info *ei, struct inode *new_parent_inode, @@ -1201,8 +1180,6 @@ static int __exfat_rename(struct inode * return -ENOENT; }
- exfat_update_parent_info(ei, old_parent_inode); - exfat_chain_dup(&olddir, &ei->dir); dentry = ei->entry;
@@ -1223,8 +1200,6 @@ static int __exfat_rename(struct inode * goto out; }
- exfat_update_parent_info(new_ei, new_parent_inode); - p_dir = &(new_ei->dir); new_entry = new_ei->entry; ep = exfat_get_dentry(sb, p_dir, new_entry, &new_bh);
From: Phil Sutter phil@nwl.cc
commit 558254b0b602b8605d7246a10cfeb584b1fcabfc upstream.
When cloning a packet-based limit expression, copy the cost value as well. Otherwise the new limit is not functional anymore.
Fixes: 3b9e2ea6c11bf ("netfilter: nft_limit: move stateful fields out of expression data") Signed-off-by: Phil Sutter phil@nwl.cc Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_limit.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/net/netfilter/nft_limit.c +++ b/net/netfilter/nft_limit.c @@ -213,6 +213,8 @@ static int nft_limit_pkts_clone(struct n struct nft_limit_priv_pkts *priv_dst = nft_expr_priv(dst); struct nft_limit_priv_pkts *priv_src = nft_expr_priv(src);
+ priv_dst->cost = priv_src->cost; + return nft_limit_clone(&priv_dst->limit, &priv_src->limit); }
From: Pablo Neira Ayuso pablo@netfilter.org
commit fecf31ee395b0295f2d7260aa29946b7605f7c85 upstream.
Add several sanity checks for nft_set_desc_concat_parse():
- validate desc->field_count not larger than desc->field_len array. - field length cannot be larger than desc->field_len (ie. U8_MAX) - total length of the concatenation cannot be larger than register array.
Joint work with Florian Westphal.
Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields") Reported-by: zhangziming.zzm@antgroup.com Reviewed-by: Stefano Brivio sbrivio@redhat.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4167,6 +4167,9 @@ static int nft_set_desc_concat_parse(con u32 len; int err;
+ if (desc->field_count >= ARRAY_SIZE(desc->field_len)) + return -E2BIG; + err = nla_parse_nested_deprecated(tb, NFTA_SET_FIELD_MAX, attr, nft_concat_policy, NULL); if (err < 0) @@ -4176,9 +4179,8 @@ static int nft_set_desc_concat_parse(con return -EINVAL;
len = ntohl(nla_get_be32(tb[NFTA_SET_FIELD_LEN])); - - if (len * BITS_PER_BYTE / 32 > NFT_REG32_COUNT) - return -E2BIG; + if (!len || len > U8_MAX) + return -EINVAL;
desc->field_len[desc->field_count++] = len;
@@ -4189,7 +4191,8 @@ static int nft_set_desc_concat(struct nf const struct nlattr *nla) { struct nlattr *attr; - int rem, err; + u32 num_regs = 0; + int rem, err, i;
nla_for_each_nested(attr, nla, rem) { if (nla_type(attr) != NFTA_LIST_ELEM) @@ -4200,6 +4203,12 @@ static int nft_set_desc_concat(struct nf return err; }
+ for (i = 0; i < desc->field_count; i++) + num_regs += DIV_ROUND_UP(desc->field_len[i], sizeof(u32)); + + if (num_regs > NFT_REG32_COUNT) + return -E2BIG; + return 0; }
From: Pablo Neira Ayuso pablo@netfilter.org
commit 3923b1e4406680d57da7e873da77b1683035d83f upstream.
clean_net() runs in workqueue while walking over the lists, grab mutex.
Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -9813,7 +9813,11 @@ static int __net_init nf_tables_init_net
static void __net_exit nf_tables_pre_exit_net(struct net *net) { + struct nftables_pernet *nft_net = nft_pernet(net); + + mutex_lock(&nft_net->commit_mutex); __nft_release_hooks(net); + mutex_unlock(&nft_net->commit_mutex); }
static void __net_exit nf_tables_exit_net(struct net *net)
From: Pablo Neira Ayuso pablo@netfilter.org
commit f9a43007d3f7ba76d5e7f9421094f00f2ef202f8 upstream.
__nft_release_hooks() is called from pre_netns exit path which unregisters the hooks, then the NETDEV_UNREGISTER event is triggered which unregisters the hooks again.
[ 565.221461] WARNING: CPU: 18 PID: 193 at net/netfilter/core.c:495 __nf_unregister_net_hook+0x247/0x270 [...] [ 565.246890] CPU: 18 PID: 193 Comm: kworker/u64:1 Tainted: G E 5.18.0-rc7+ #27 [ 565.253682] Workqueue: netns cleanup_net [ 565.257059] RIP: 0010:__nf_unregister_net_hook+0x247/0x270 [...] [ 565.297120] Call Trace: [ 565.300900] <TASK> [ 565.304683] nf_tables_flowtable_event+0x16a/0x220 [nf_tables] [ 565.308518] raw_notifier_call_chain+0x63/0x80 [ 565.312386] unregister_netdevice_many+0x54f/0xb50
Unregister and destroy netdev hook from netns pre_exit via kfree_rcu so the NETDEV_UNREGISTER path see unregistered hooks.
Fixes: 767d1216bff8 ("netfilter: nftables: fix possible UAF over chains from packet path in netns") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 54 +++++++++++++++++++++++++++++++----------- 1 file changed, 41 insertions(+), 13 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -222,12 +222,18 @@ err_register: }
static void nft_netdev_unregister_hooks(struct net *net, - struct list_head *hook_list) + struct list_head *hook_list, + bool release_netdev) { - struct nft_hook *hook; + struct nft_hook *hook, *next;
- list_for_each_entry(hook, hook_list, list) + list_for_each_entry_safe(hook, next, hook_list, list) { nf_unregister_net_hook(net, &hook->ops); + if (release_netdev) { + list_del(&hook->list); + kfree_rcu(hook, rcu); + } + } }
static int nf_tables_register_hook(struct net *net, @@ -253,9 +259,10 @@ static int nf_tables_register_hook(struc return nf_register_net_hook(net, &basechain->ops); }
-static void nf_tables_unregister_hook(struct net *net, - const struct nft_table *table, - struct nft_chain *chain) +static void __nf_tables_unregister_hook(struct net *net, + const struct nft_table *table, + struct nft_chain *chain, + bool release_netdev) { struct nft_base_chain *basechain; const struct nf_hook_ops *ops; @@ -270,11 +277,19 @@ static void nf_tables_unregister_hook(st return basechain->type->ops_unregister(net, ops);
if (nft_base_chain_netdev(table->family, basechain->ops.hooknum)) - nft_netdev_unregister_hooks(net, &basechain->hook_list); + nft_netdev_unregister_hooks(net, &basechain->hook_list, + release_netdev); else nf_unregister_net_hook(net, &basechain->ops); }
+static void nf_tables_unregister_hook(struct net *net, + const struct nft_table *table, + struct nft_chain *chain) +{ + return __nf_tables_unregister_hook(net, table, chain, false); +} + static void nft_trans_commit_list_add_tail(struct net *net, struct nft_trans *trans) { struct nftables_pernet *nft_net = nft_pernet(net); @@ -7222,13 +7237,25 @@ static void nft_unregister_flowtable_hoo FLOW_BLOCK_UNBIND); }
-static void nft_unregister_flowtable_net_hooks(struct net *net, - struct list_head *hook_list) +static void __nft_unregister_flowtable_net_hooks(struct net *net, + struct list_head *hook_list, + bool release_netdev) { - struct nft_hook *hook; + struct nft_hook *hook, *next;
- list_for_each_entry(hook, hook_list, list) + list_for_each_entry_safe(hook, next, hook_list, list) { nf_unregister_net_hook(net, &hook->ops); + if (release_netdev) { + list_del(&hook->list); + kfree_rcu(hook); + } + } +} + +static void nft_unregister_flowtable_net_hooks(struct net *net, + struct list_head *hook_list) +{ + __nft_unregister_flowtable_net_hooks(net, hook_list, false); }
static int nft_register_flowtable_net_hooks(struct net *net, @@ -9672,9 +9699,10 @@ static void __nft_release_hook(struct ne struct nft_chain *chain;
list_for_each_entry(chain, &table->chains, list) - nf_tables_unregister_hook(net, table, chain); + __nf_tables_unregister_hook(net, table, chain, true); list_for_each_entry(flowtable, &table->flowtables, list) - nft_unregister_flowtable_net_hooks(net, &flowtable->hook_list); + __nft_unregister_flowtable_net_hooks(net, &flowtable->hook_list, + true); }
static void __nft_release_hooks(struct net *net)
From: Florian Westphal fw@strlen.de
commit 56b14ecec97f39118bf85c9ac2438c5a949509ed upstream.
In case the conntrack is clashing, insertion can free skb->_nfct and set skb->_nfct to the already-confirmed entry.
This wasn't found before because the conntrack entry and the extension space used to free'd after an rcu grace period, plus the race needs events enabled to trigger.
Reported-by: syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race") Fixes: 2ad9d7747c10 ("netfilter: conntrack: free extension area immediately") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/netfilter/nf_conntrack_core.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/include/net/netfilter/nf_conntrack_core.h +++ b/include/net/netfilter/nf_conntrack_core.h @@ -58,8 +58,13 @@ static inline int nf_conntrack_confirm(s int ret = NF_ACCEPT;
if (ct) { - if (!nf_ct_is_confirmed(ct)) + if (!nf_ct_is_confirmed(ct)) { ret = __nf_conntrack_confirm(skb); + + if (ret == NF_ACCEPT) + ct = (struct nf_conn *)skb_nfct(skb); + } + if (likely(ret == NF_ACCEPT)) nf_ct_deliver_cached_events(ct); }
From: Xiaomeng Tong xiam0nd.tong@gmail.com
commit 300981abddcb13f8f06ad58f52358b53a8096775 upstream.
The bug is here: if (!p) return ret;
The list iterator value 'p' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found.
To fix the bug, Use a new value 'iter' as the list iterator, while use the old value 'p' as a dedicated variable to point to the found element.
Fixes: dfaa973ae960 ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Xiaomeng Tong xiam0nd.tong@gmail.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/kvm/book3s_hv_uvmem.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -360,13 +360,15 @@ static bool kvmppc_gfn_is_uvmem_pfn(unsi static bool kvmppc_next_nontransitioned_gfn(const struct kvm_memory_slot *memslot, struct kvm *kvm, unsigned long *gfn) { - struct kvmppc_uvmem_slot *p; + struct kvmppc_uvmem_slot *p = NULL, *iter; bool ret = false; unsigned long i;
- list_for_each_entry(p, &kvm->arch.uvmem_pfns, list) - if (*gfn >= p->base_pfn && *gfn < p->base_pfn + p->nr_pfns) + list_for_each_entry(iter, &kvm->arch.uvmem_pfns, list) + if (*gfn >= iter->base_pfn && *gfn < iter->base_pfn + iter->nr_pfns) { + p = iter; break; + } if (!p) return ret; /*
From: Sean Christopherson seanjc@google.com
commit d187ba5312307d51818beafaad87d28a7d939adf upstream.
Set the starting uABI size of KVM's guest FPU to 'struct kvm_xsave', i.e. to KVM's historical uABI size. When saving FPU state for usersapce, KVM (well, now the FPU) sets the FP+SSE bits in the XSAVE header even if the host doesn't support XSAVE. Setting the XSAVE header allows the VM to be migrated to a host that does support XSAVE without the new host having to handle FPU state that may or may not be compatible with XSAVE.
Setting the uABI size to the host's default size results in out-of-bounds writes (setting the FP+SSE bits) and data corruption (that is thankfully caught by KASAN) when running on hosts without XSAVE, e.g. on Core2 CPUs.
WARN if the default size is larger than KVM's historical uABI size; all features that can push the FPU size beyond the historical size must be opt-in.
================================================================== BUG: KASAN: slab-out-of-bounds in fpu_copy_uabi_to_guest_fpstate+0x86/0x130 Read of size 8 at addr ffff888011e33a00 by task qemu-build/681 CPU: 1 PID: 681 Comm: qemu-build Not tainted 5.18.0-rc5-KASAN-amd64 #1 Hardware name: /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010 Call Trace: <TASK> dump_stack_lvl+0x34/0x45 print_report.cold+0x45/0x575 kasan_report+0x9b/0xd0 fpu_copy_uabi_to_guest_fpstate+0x86/0x130 kvm_arch_vcpu_ioctl+0x72a/0x1c50 [kvm] kvm_vcpu_ioctl+0x47f/0x7b0 [kvm] __x64_sys_ioctl+0x5de/0xc90 do_syscall_64+0x31/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Allocated by task 0: (stack is not available) The buggy address belongs to the object at ffff888011e33800 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 0 bytes to the right of 512-byte region [ffff888011e33800, ffff888011e33a00) The buggy address belongs to the physical page: page:0000000089cd4adb refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11e30 head:0000000089cd4adb order:2 compound_mapcount:0 compound_pincount:0 flags: 0x4000000000010200(slab|head|zone=1) raw: 4000000000010200 dead000000000100 dead000000000122 ffff888001041c80 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888011e33900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888011e33980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888011e33a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff888011e33a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888011e33b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint
Fixes: be50b2065dfa ("kvm: x86: Add support for getting/setting expanded xstate buffer") Fixes: c60427dd50ba ("x86/fpu: Add uabi_size to guest_fpu") Reported-by: Zdenek Kaspar zkaspar82@gmail.com Cc: Maciej S. Szmigiero mail@maciej.szmigiero.name Cc: Paolo Bonzini pbonzini@redhat.com Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Tested-by: Zdenek Kaspar zkaspar82@gmail.com Message-Id: 20220504001219.983513-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/fpu/core.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -14,6 +14,8 @@ #include <asm/traps.h> #include <asm/irq_regs.h>
+#include <uapi/asm/kvm.h> + #include <linux/hardirq.h> #include <linux/pkeys.h> #include <linux/vmalloc.h> @@ -232,7 +234,20 @@ bool fpu_alloc_guest_fpstate(struct fpu_ gfpu->fpstate = fpstate; gfpu->xfeatures = fpu_user_cfg.default_features; gfpu->perm = fpu_user_cfg.default_features; - gfpu->uabi_size = fpu_user_cfg.default_size; + + /* + * KVM sets the FP+SSE bits in the XSAVE header when copying FPU state + * to userspace, even when XSAVE is unsupported, so that restoring FPU + * state on a different CPU that does support XSAVE can cleanly load + * the incoming state using its natural XSAVE. In other words, KVM's + * uABI size may be larger than this host's default size. Conversely, + * the default size should never be larger than KVM's base uABI size; + * all features that can expand the uABI size must be opt-in. + */ + gfpu->uabi_size = sizeof(struct kvm_xsave); + if (WARN_ON_ONCE(fpu_user_cfg.default_size > gfpu->uabi_size)) + gfpu->uabi_size = fpu_user_cfg.default_size; + fpu_init_guest_permissions(gfpu);
return true;
From: Sean Christopherson seanjc@google.com
commit 0547758a6de3cc71a0cfdd031a3621a30db6a68b upstream.
Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the the dummy async #PF token, the allocator is preemptible on PREEMPT_RT kernels and must not be called from truly atomic contexts.
Opportunistically document why it's ok to loop on allocation failure, i.e. why the function won't get stuck in an infinite loop.
Reported-by: Yajun Deng yajun.deng@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/kvm.c | 41 +++++++++++++++++++++++++++-------------- 1 file changed, 27 insertions(+), 14 deletions(-)
--- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -191,7 +191,7 @@ void kvm_async_pf_task_wake(u32 token) { u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS); struct kvm_task_sleep_head *b = &async_pf_sleepers[key]; - struct kvm_task_sleep_node *n; + struct kvm_task_sleep_node *n, *dummy = NULL;
if (token == ~0) { apf_task_wake_all(); @@ -203,28 +203,41 @@ again: n = _find_apf_task(b, token); if (!n) { /* - * async PF was not yet handled. - * Add dummy entry for the token. + * Async #PF not yet handled, add a dummy entry for the token. + * Allocating the token must be down outside of the raw lock + * as the allocator is preemptible on PREEMPT_RT kernels. */ - n = kzalloc(sizeof(*n), GFP_ATOMIC); - if (!n) { + if (!dummy) { + raw_spin_unlock(&b->lock); + dummy = kzalloc(sizeof(*dummy), GFP_KERNEL); + /* - * Allocation failed! Busy wait while other cpu - * handles async PF. + * Continue looping on allocation failure, eventually + * the async #PF will be handled and allocating a new + * node will be unnecessary. + */ + if (!dummy) + cpu_relax(); + + /* + * Recheck for async #PF completion before enqueueing + * the dummy token to avoid duplicate list entries. */ - raw_spin_unlock(&b->lock); - cpu_relax(); goto again; } - n->token = token; - n->cpu = smp_processor_id(); - init_swait_queue_head(&n->wq); - hlist_add_head(&n->link, &b->list); + dummy->token = token; + dummy->cpu = smp_processor_id(); + init_swait_queue_head(&dummy->wq); + hlist_add_head(&dummy->link, &b->list); + dummy = NULL; } else { apf_task_wake_one(n); } raw_spin_unlock(&b->lock); - return; + + /* A dummy token might be allocated and ultimately not used. */ + if (dummy) + kfree(dummy); } EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
From: Paolo Bonzini pbonzini@redhat.com
commit baec4f5a018fe2d708fc1022330dba04b38b5fe3 upstream.
Commit ddd7ed842627 ("x86/kvm: Alloc dummy async #PF token outside of raw spinlock") leads to the following Smatch static checker warning:
arch/x86/kernel/kvm.c:212 kvm_async_pf_task_wake() warn: sleeping in atomic context
arch/x86/kernel/kvm.c 202 raw_spin_lock(&b->lock); 203 n = _find_apf_task(b, token); 204 if (!n) { 205 /* 206 * Async #PF not yet handled, add a dummy entry for the token. 207 * Allocating the token must be down outside of the raw lock 208 * as the allocator is preemptible on PREEMPT_RT kernels. 209 */ 210 if (!dummy) { 211 raw_spin_unlock(&b->lock); --> 212 dummy = kzalloc(sizeof(*dummy), GFP_KERNEL); ^^^^^^^^^^ Smatch thinks the caller has preempt disabled. The `smdb.py preempt kvm_async_pf_task_wake` output call tree is:
sysvec_kvm_asyncpf_interrupt() <- disables preempt -> __sysvec_kvm_asyncpf_interrupt() -> kvm_async_pf_task_wake()
The caller is this:
arch/x86/kernel/kvm.c 290 DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_asyncpf_interrupt) 291 { 292 struct pt_regs *old_regs = set_irq_regs(regs); 293 u32 token; 294 295 ack_APIC_irq(); 296 297 inc_irq_stat(irq_hv_callback_count); 298 299 if (__this_cpu_read(apf_reason.enabled)) { 300 token = __this_cpu_read(apf_reason.token); 301 kvm_async_pf_task_wake(token); 302 __this_cpu_write(apf_reason.token, 0); 303 wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1); 304 } 305 306 set_irq_regs(old_regs); 307 }
The DEFINE_IDTENTRY_SYSVEC() is a wrapper that calls this function from the call_on_irqstack_cond(). It's inside the call_on_irqstack_cond() where preempt is disabled (unless it's already disabled). The irq_enter/exit_rcu() functions disable/enable preempt.
Reported-by: Dan Carpenter dan.carpenter@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -209,7 +209,7 @@ again: */ if (!dummy) { raw_spin_unlock(&b->lock); - dummy = kzalloc(sizeof(*dummy), GFP_KERNEL); + dummy = kzalloc(sizeof(*dummy), GFP_ATOMIC);
/* * Continue looping on allocation failure, eventually
From: Peter Zijlstra peterz@infradead.org
commit 989b5db215a2f22f89d730b607b071d964780f10 upstream.
Add support for CMPXCHG loops on userspace addresses. Provide both an "unsafe" version for tight loops that do their own uaccess begin/end, as well as a "safe" version for use cases where the CMPXCHG is not buried in a loop, e.g. KVM will resume the guest instead of looping when emulation of a guest atomic accesses fails the CMPXCHG.
Provide 8-byte versions for 32-bit kernels so that KVM can do CMPXCHG on guest PAE PTEs, which are accessed via userspace addresses.
Guard the asm_volatile_goto() variation with CC_HAS_ASM_GOTO_TIED_OUTPUT, the "+m" constraint fails on some compilers that otherwise support CC_HAS_ASM_GOTO_OUTPUT.
Cc: stable@vger.kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Co-developed-by: Sean Christopherson seanjc@google.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220202004945.2540433-3-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/uaccess.h | 142 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 142 insertions(+)
--- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -409,6 +409,103 @@ do { \
#endif // CONFIG_CC_HAS_ASM_GOTO_OUTPUT
+#ifdef CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm_volatile_goto("\n" \ + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ + _ASM_EXTABLE_UA(1b, %l[label]) \ + : CC_OUT(z) (success), \ + [ptr] "+m" (*_ptr), \ + [old] "+a" (__old) \ + : [new] ltype (__new) \ + : "memory" \ + : label); \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) + +#ifdef CONFIG_X86_32 +#define __try_cmpxchg64_user_asm(_ptr, _pold, _new, label) ({ \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm_volatile_goto("\n" \ + "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \ + _ASM_EXTABLE_UA(1b, %l[label]) \ + : CC_OUT(z) (success), \ + "+A" (__old), \ + [ptr] "+m" (*_ptr) \ + : "b" ((u32)__new), \ + "c" ((u32)((u64)__new >> 32)) \ + : "memory" \ + : label); \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) +#endif // CONFIG_X86_32 +#else // !CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT +#define __try_cmpxchg_user_asm(itype, ltype, _ptr, _pold, _new, label) ({ \ + int __err = 0; \ + bool success; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm volatile("\n" \ + "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\ + CC_SET(z) \ + "2:\n" \ + _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, \ + %[errout]) \ + : CC_OUT(z) (success), \ + [errout] "+r" (__err), \ + [ptr] "+m" (*_ptr), \ + [old] "+a" (__old) \ + : [new] ltype (__new) \ + : "memory", "cc"); \ + if (unlikely(__err)) \ + goto label; \ + if (unlikely(!success)) \ + *_old = __old; \ + likely(success); }) + +#ifdef CONFIG_X86_32 +/* + * Unlike the normal CMPXCHG, hardcode ECX for both success/fail and error. + * There are only six GPRs available and four (EAX, EBX, ECX, and EDX) are + * hardcoded by CMPXCHG8B, leaving only ESI and EDI. If the compiler uses + * both ESI and EDI for the memory operand, compilation will fail if the error + * is an input+output as there will be no register available for input. + */ +#define __try_cmpxchg64_user_asm(_ptr, _pold, _new, label) ({ \ + int __result; \ + __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \ + __typeof__(*(_ptr)) __old = *_old; \ + __typeof__(*(_ptr)) __new = (_new); \ + asm volatile("\n" \ + "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \ + "mov $0, %%ecx\n\t" \ + "setz %%cl\n" \ + "2:\n" \ + _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, %%ecx) \ + : [result]"=c" (__result), \ + "+A" (__old), \ + [ptr] "+m" (*_ptr) \ + : "b" ((u32)__new), \ + "c" ((u32)((u64)__new >> 32)) \ + : "memory", "cc"); \ + if (unlikely(__result < 0)) \ + goto label; \ + if (unlikely(!__result)) \ + *_old = __old; \ + likely(__result); }) +#endif // CONFIG_X86_32 +#endif // CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT + /* FIXME: this hack is definitely wrong -AK */ struct __large_struct { unsigned long buf[100]; }; #define __m(x) (*(struct __large_struct __user *)(x)) @@ -501,6 +598,51 @@ do { \ } while (0) #endif // CONFIG_CC_HAS_ASM_GOTO_OUTPUT
+extern void __try_cmpxchg_user_wrong_size(void); + +#ifndef CONFIG_X86_32 +#define __try_cmpxchg64_user_asm(_ptr, _oldp, _nval, _label) \ + __try_cmpxchg_user_asm("q", "r", (_ptr), (_oldp), (_nval), _label) +#endif + +/* + * Force the pointer to u<size> to match the size expected by the asm helper. + * clang/LLVM compiles all cases and only discards the unused paths after + * processing errors, which breaks i386 if the pointer is an 8-byte value. + */ +#define unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ + bool __ret; \ + __chk_user_ptr(_ptr); \ + switch (sizeof(*(_ptr))) { \ + case 1: __ret = __try_cmpxchg_user_asm("b", "q", \ + (__force u8 *)(_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 2: __ret = __try_cmpxchg_user_asm("w", "r", \ + (__force u16 *)(_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 4: __ret = __try_cmpxchg_user_asm("l", "r", \ + (__force u32 *)(_ptr), (_oldp), \ + (_nval), _label); \ + break; \ + case 8: __ret = __try_cmpxchg64_user_asm((__force u64 *)(_ptr), (_oldp),\ + (_nval), _label); \ + break; \ + default: __try_cmpxchg_user_wrong_size(); \ + } \ + __ret; }) + +/* "Returns" 0 on success, 1 on failure, -EFAULT if the access faults. */ +#define __try_cmpxchg_user(_ptr, _oldp, _nval, _label) ({ \ + int __ret = -EFAULT; \ + __uaccess_begin_nospec(); \ + __ret = !unsafe_try_cmpxchg_user(_ptr, _oldp, _nval, _label); \ +_label: \ + __uaccess_end(); \ + __ret; \ + }) + /* * We want the unsafe accessors to always be inlined and use * the error labels - thus the macro games.
From: Sean Christopherson seanjc@google.com
commit f122dfe4476890d60b8c679128cd2259ec96a24c upstream.
Use the recently introduced __try_cmpxchg_user() to update guest PTE A/D bits instead of mapping the PTE into kernel address space. The VM_PFNMAP path is broken as it assumes that vm_pgoff is the base pfn of the mapped VMA range, which is conceptually wrong as vm_pgoff is the offset relative to the file and has nothing to do with the pfn. The horrific hack worked for the original use case (backing guest memory with /dev/mem), but leads to accessing "random" pfns for pretty much any other VM_PFNMAP case.
Fixes: bd53cb35a3e9 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs") Debugged-by: Tadeusz Struk tadeusz.struk@linaro.org Tested-by: Tadeusz Struk tadeusz.struk@linaro.org Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220202004945.2540433-4-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/mmu/paging_tmpl.h | 38 +------------------------------------- 1 file changed, 1 insertion(+), 37 deletions(-)
--- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -144,42 +144,6 @@ static bool FNAME(is_rsvd_bits_set)(stru FNAME(is_bad_mt_xwr)(&mmu->guest_rsvd_check, gpte); }
-static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, - pt_element_t __user *ptep_user, unsigned index, - pt_element_t orig_pte, pt_element_t new_pte) -{ - signed char r; - - if (!user_access_begin(ptep_user, sizeof(pt_element_t))) - return -EFAULT; - -#ifdef CMPXCHG - asm volatile("1:" LOCK_PREFIX CMPXCHG " %[new], %[ptr]\n" - "setnz %b[r]\n" - "2:" - _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, %k[r]) - : [ptr] "+m" (*ptep_user), - [old] "+a" (orig_pte), - [r] "=q" (r) - : [new] "r" (new_pte) - : "memory"); -#else - asm volatile("1:" LOCK_PREFIX "cmpxchg8b %[ptr]\n" - "setnz %b[r]\n" - "2:" - _ASM_EXTABLE_TYPE_REG(1b, 2b, EX_TYPE_EFAULT_REG, %k[r]) - : [ptr] "+m" (*ptep_user), - [old] "+A" (orig_pte), - [r] "=q" (r) - : [new_lo] "b" ((u32)new_pte), - [new_hi] "c" ((u32)(new_pte >> 32)) - : "memory"); -#endif - - user_access_end(); - return r; -} - static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, u64 *spte, u64 gpte) @@ -278,7 +242,7 @@ static int FNAME(update_accessed_dirty_b if (unlikely(!walker->pte_writable[level - 1])) continue;
- ret = FNAME(cmpxchg_gpte)(vcpu, mmu, ptep_user, index, orig_pte, pte); + ret = __try_cmpxchg_user(ptep_user, &orig_pte, pte, fault); if (ret) return ret;
From: Sean Christopherson seanjc@google.com
commit 1c2361f667f3648855ceae25f1332c18413fdb9f upstream.
Use the recently introduce __try_cmpxchg_user() to emulate atomic guest accesses via the associated userspace address instead of mapping the backing pfn into kernel address space. Using kvm_vcpu_map() is unsafe as it does not coordinate with KVM's mmu_notifier to ensure the hva=>pfn translation isn't changed/unmapped in the memremap() path, i.e. when there's no struct page and thus no elevated refcount.
Fixes: 42e35f8072c3 ("KVM/X86: Use kvm_vcpu_map in emulator_cmpxchg_emulated") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220202004945.2540433-5-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 35 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 21 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7168,15 +7168,8 @@ static int emulator_write_emulated(struc exception, &write_emultor); }
-#define CMPXCHG_TYPE(t, ptr, old, new) \ - (cmpxchg((t *)(ptr), *(t *)(old), *(t *)(new)) == *(t *)(old)) - -#ifdef CONFIG_X86_64 -# define CMPXCHG64(ptr, old, new) CMPXCHG_TYPE(u64, ptr, old, new) -#else -# define CMPXCHG64(ptr, old, new) \ - (cmpxchg64((u64 *)(ptr), *(u64 *)(old), *(u64 *)(new)) == *(u64 *)(old)) -#endif +#define emulator_try_cmpxchg_user(t, ptr, old, new) \ + (__try_cmpxchg_user((t __user *)(ptr), (t *)(old), *(t *)(new), efault ## t))
static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt, unsigned long addr, @@ -7185,12 +7178,11 @@ static int emulator_cmpxchg_emulated(str unsigned int bytes, struct x86_exception *exception) { - struct kvm_host_map map; struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); u64 page_line_mask; + unsigned long hva; gpa_t gpa; - char *kaddr; - bool exchanged; + int r;
/* guests cmpxchg8b have to be emulated atomically */ if (bytes > 8 || (bytes & (bytes - 1))) @@ -7214,31 +7206,32 @@ static int emulator_cmpxchg_emulated(str if (((gpa + bytes - 1) & page_line_mask) != (gpa & page_line_mask)) goto emul_write;
- if (kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), &map)) + hva = kvm_vcpu_gfn_to_hva(vcpu, gpa_to_gfn(gpa)); + if (kvm_is_error_hva(addr)) goto emul_write;
- kaddr = map.hva + offset_in_page(gpa); + hva += offset_in_page(gpa);
switch (bytes) { case 1: - exchanged = CMPXCHG_TYPE(u8, kaddr, old, new); + r = emulator_try_cmpxchg_user(u8, hva, old, new); break; case 2: - exchanged = CMPXCHG_TYPE(u16, kaddr, old, new); + r = emulator_try_cmpxchg_user(u16, hva, old, new); break; case 4: - exchanged = CMPXCHG_TYPE(u32, kaddr, old, new); + r = emulator_try_cmpxchg_user(u32, hva, old, new); break; case 8: - exchanged = CMPXCHG64(kaddr, old, new); + r = emulator_try_cmpxchg_user(u64, hva, old, new); break; default: BUG(); }
- kvm_vcpu_unmap(vcpu, &map, true); - - if (!exchanged) + if (r < 0) + goto emul_write; + if (r) return X86EMUL_CMPXCHG_FAILED;
kvm_page_track_write(vcpu, gpa, new, bytes);
From: Maxim Levitsky mlevitsk@redhat.com
commit 33fbe6befa622c082f7d417896832856814bdde0 upstream.
This shows up as a TDP MMU leak when running nested. Non-working cmpxchg on L0 relies makes L1 install two different shadow pages under same spte, and one of them is leaked.
Fixes: 1c2361f667f36 ("KVM: x86: Use __try_cmpxchg_user() to emulate atomic accesses") Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20220512101420.306759-1-mlevitsk@redhat.com Reviewed-by: Sean Christopherson seanjc@google.com Reviewed-by: Vitaly Kuznetsov vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7207,7 +7207,7 @@ static int emulator_cmpxchg_emulated(str goto emul_write;
hva = kvm_vcpu_gfn_to_hva(vcpu, gpa_to_gfn(gpa)); - if (kvm_is_error_hva(addr)) + if (kvm_is_error_hva(hva)) goto emul_write;
hva += offset_in_page(gpa);
From: Sean Christopherson seanjc@google.com
commit fee060cd52d69c114b62d1a2948ea9648b5131f9 upstream.
Whenever x86_decode_emulated_instruction() detects a breakpoint, it returns the value that kvm_vcpu_check_breakpoint() writes into its pass-by-reference second argument. Unfortunately this is completely bogus because the expected outcome of x86_decode_emulated_instruction is an EMULATION_* value.
Then, if kvm_vcpu_check_breakpoint() does "*r = 0" (corresponding to a KVM_EXIT_DEBUG userspace exit), it is misunderstood as EMULATION_OK and x86_emulate_instruction() is called without having decoded the instruction. This causes various havoc from running with a stale emulation context.
The fix is to move the call to kvm_vcpu_check_breakpoint() where it was before commit 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with decoding") introduced x86_decode_emulated_instruction(). The other caller of the function does not need breakpoint checks, because it is invoked as part of a vmexit and the processor has already checked those before executing the instruction that #GP'd.
This fixes CVE-2022-1852.
Reported-by: Qiuhao Li qiuhao@sysec.org Reported-by: Gaoning Pan pgn@zju.edu.cn Reported-by: Yongkang Jia kangel@zju.edu.cn Fixes: 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with decoding") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220311032801.3467418-2-seanjc@google.com [Rewrote commit message according to Qiuhao's report, since a patch already existed to fix the bug. - Paolo] Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8169,7 +8169,7 @@ int kvm_skip_emulated_instruction(struct } EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
-static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r) +static bool kvm_vcpu_check_code_breakpoint(struct kvm_vcpu *vcpu, int *r) { if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) && (vcpu->arch.guest_debug_dr7 & DR7_BP_EN_MASK)) { @@ -8238,25 +8238,23 @@ static bool is_vmware_backdoor_opcode(st }
/* - * Decode to be emulated instruction. Return EMULATION_OK if success. + * Decode an instruction for emulation. The caller is responsible for handling + * code breakpoints. Note, manually detecting code breakpoints is unnecessary + * (and wrong) when emulating on an intercepted fault-like exception[*], as + * code breakpoints have higher priority and thus have already been done by + * hardware. + * + * [*] Except #MC, which is higher priority, but KVM should never emulate in + * response to a machine check. */ int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type, void *insn, int insn_len) { - int r = EMULATION_OK; struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + int r;
init_emulate_ctxt(vcpu);
- /* - * We will reenter on the same instruction since we do not set - * complete_userspace_io. This does not handle watchpoints yet, - * those would be handled in the emulate_ops. - */ - if (!(emulation_type & EMULTYPE_SKIP) && - kvm_vcpu_check_breakpoint(vcpu, &r)) - return r; - r = x86_decode_insn(ctxt, insn, insn_len, emulation_type);
trace_kvm_emulate_insn_start(vcpu); @@ -8289,6 +8287,15 @@ int x86_emulate_instruction(struct kvm_v if (!(emulation_type & EMULTYPE_NO_DECODE)) { kvm_clear_exception_queue(vcpu);
+ /* + * Return immediately if RIP hits a code breakpoint, such #DBs + * are fault-like and are higher priority than any faults on + * the code fetch itself. + */ + if (!(emulation_type & EMULTYPE_SKIP) && + kvm_vcpu_check_code_breakpoint(vcpu, &r)) + return r; + r = x86_decode_emulated_instruction(vcpu, emulation_type, insn, insn_len); if (r != EMULATION_OK) {
From: Maxim Levitsky mlevitsk@redhat.com
commit 6fcee03df6a1a3101a77344be37bb85c6142d56c upstream.
This can cause various unexpected issues, since VM is partially destroyed at that point.
For example when AVIC is enabled, this causes avic_vcpu_load to access physical id page entry which is already freed by .vm_destroy.
Fixes: 8221c1370056 ("svm: Manage vcpu load/unload when enable AVIC") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky mlevitsk@redhat.com Message-Id: 20220322172449.235575-2-mlevitsk@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/x86.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11655,20 +11655,15 @@ static void kvm_unload_vcpu_mmu(struct k vcpu_put(vcpu); }
-static void kvm_free_vcpus(struct kvm *kvm) +static void kvm_unload_vcpu_mmus(struct kvm *kvm) { unsigned long i; struct kvm_vcpu *vcpu;
- /* - * Unpin any mmu pages first. - */ kvm_for_each_vcpu(i, vcpu, kvm) { kvm_clear_async_pf_completion_queue(vcpu); kvm_unload_vcpu_mmu(vcpu); } - - kvm_destroy_vcpus(kvm); }
void kvm_arch_sync_events(struct kvm *kvm) @@ -11774,11 +11769,12 @@ void kvm_arch_destroy_vm(struct kvm *kvm __x86_set_memory_region(kvm, TSS_PRIVATE_MEMSLOT, 0, 0); mutex_unlock(&kvm->slots_lock); } + kvm_unload_vcpu_mmus(kvm); static_call_cond(kvm_x86_vm_destroy)(kvm); kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1)); kvm_pic_destroy(kvm); kvm_ioapic_destroy(kvm); - kvm_free_vcpus(kvm); + kvm_destroy_vcpus(kvm); kvfree(rcu_dereference_check(kvm->arch.apic_map, 1)); kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1)); kvm_mmu_uninit_vm(kvm);
From: Yanfei Xu yanfei.xu@intel.com
commit ffd1925a596ce68bed7d81c61cb64bc35f788a9d upstream.
When kernel handles the vm-exit caused by external interrupts and NMI, it always sets kvm_intr_type to tell if it's dealing an IRQ or NMI. For the PMI scenario, it could be IRQ or NMI.
However, intel_pt PMIs are only generated for HARDWARE perf events, and HARDWARE events are always configured to generate NMIs. Use kvm_handling_nmi_from_guest() to precisely identify if the intel_pt PMI came from the guest; this avoids false positives if an intel_pt PMI/NMI arrives while the host is handling an unrelated IRQ VM-Exit.
Fixes: db215756ae59 ("KVM: x86: More precisely identify NMI from guest when handling PMI") Signed-off-by: Yanfei Xu yanfei.xu@intel.com Message-Id: 20220523140821.1345605-1-yanfei.xu@intel.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/vmx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7858,7 +7858,7 @@ static unsigned int vmx_handle_intel_pt_ struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
/* '0' on failure so that the !PT case can use a RET0 static call. */ - if (!kvm_arch_pmi_in_guest(vcpu)) + if (!vcpu || !kvm_handling_nmi_from_guest(vcpu)) return 0;
kvm_make_request(KVM_REQ_PMI, vcpu);
From: Sean Christopherson seanjc@google.com
commit 45846661d10422ce9e22da21f8277540b29eca22 upstream.
Remove WARNs that sanity check that KVM never lets a triple fault for L2 escape and incorrectly end up in L1. In normal operation, the sanity check is perfectly valid, but it incorrectly assumes that it's impossible for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through KVM_RUN (which guarantees kvm_check_nested_state() will see and handle the triple fault).
The WARN can currently be triggered if userspace injects a machine check while L2 is active and CR4.MCE=0. And a future fix to allow save/restore of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't lost on migration, will make it trivially easy for userspace to trigger the WARN.
Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is tempting, but wrong, especially if/when the request is saved/restored, e.g. if userspace restores events (including a triple fault) and then restores nested state (which may forcibly leave guest mode). Ignoring the fact that KVM doesn't currently provide the necessary APIs, it's userspace's responsibility to manage pending events during save/restore.
------------[ cut here ]------------ WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Modules linked in: kvm_intel kvm irqbypass CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Call Trace: <TASK> vmx_leave_nested+0x30/0x40 [kvm_intel] vmx_set_nested_state+0xca/0x3e0 [kvm_intel] kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm] kvm_vcpu_ioctl+0x4b9/0x660 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ---[ end trace 0000000000000000 ]---
Fixes: cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1") Cc: stable@vger.kernel.org Cc: Chenyi Qiang chenyi.qiang@intel.com Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20220407002315.78092-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/nested.c | 3 --- arch/x86/kvm/vmx/nested.c | 3 --- 2 files changed, 6 deletions(-)
--- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -790,9 +790,6 @@ int nested_svm_vmexit(struct vcpu_svm *s struct kvm_host_map map; int rc;
- /* Triple faults in L2 should never escape. */ - WARN_ON_ONCE(kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)); - rc = kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.vmcb12_gpa), &map); if (rc) { if (rc == -EINVAL) --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4518,9 +4518,6 @@ void nested_vmx_vmexit(struct kvm_vcpu * /* trying to cancel vmlaunch/vmresume is a bug */ WARN_ON_ONCE(vmx->nested.nested_run_pending);
- /* Similarly, triple faults in L2 should never escape. */ - WARN_ON_ONCE(kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)); - if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) { /* * KVM_REQ_GET_NESTED_STATE_PAGES is also used to map
From: Hou Wenlong houwenlong.hwl@antgroup.com
commit 8d5678a76689acbf91245a3791fe853ab773090f upstream.
Before Commit c3e5e415bc1e6 ("KVM: X86: Change kvm_sync_page() to return true when remote flush is needed"), the return value of kvm_sync_page() indicates whether the page is synced, and kvm_mmu_get_page() would rebuild page when the sync fails. But now, kvm_sync_page() returns false when the page is synced and no tlb flushing is required, which leads to rebuild page in kvm_mmu_get_page(). So return the return value of mmu->sync_page() directly and check it in kvm_mmu_get_page(). If the sync fails, the page will be zapped and the invalid_list is not empty, so set flush as true is accepted in mmu_sync_children().
Cc: stable@vger.kernel.org Fixes: c3e5e415bc1e6 ("KVM: X86: Change kvm_sync_page() to return true when remote flush is needed") Signed-off-by: Hou Wenlong houwenlong.hwl@antgroup.com Acked-by: Lai Jiangshan jiangshanlai@gmail.com Message-Id: 0dabeeb789f57b0d793f85d073893063e692032d.1647336064.git.houwenlong.hwl@antgroup.com [mmu_sync_children should not flush if the page is zapped. - Paolo] Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/mmu/mmu.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
--- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1894,17 +1894,14 @@ static void kvm_mmu_commit_zap_page(stru &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)]) \ if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else
-static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, +static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct list_head *invalid_list) { int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
- if (ret < 0) { + if (ret < 0) kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); - return false; - } - - return !!ret; + return ret; }
static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm, @@ -2033,7 +2030,7 @@ static int mmu_sync_children(struct kvm_
for_each_sp(pages, sp, parents, i) { kvm_unlink_unsync_page(vcpu->kvm, sp); - flush |= kvm_sync_page(vcpu, sp, &invalid_list); + flush |= kvm_sync_page(vcpu, sp, &invalid_list) > 0; mmu_pages_clear_parents(&parents); } if (need_resched() || rwlock_needbreak(&vcpu->kvm->mmu_lock)) { @@ -2074,6 +2071,7 @@ static struct kvm_mmu_page *kvm_mmu_get_ struct hlist_head *sp_list; unsigned quadrant; struct kvm_mmu_page *sp; + int ret; int collisions = 0; LIST_HEAD(invalid_list);
@@ -2126,11 +2124,13 @@ static struct kvm_mmu_page *kvm_mmu_get_ * If the sync fails, the page is zapped. If so, break * in order to rebuild it. */ - if (!kvm_sync_page(vcpu, sp, &invalid_list)) + ret = kvm_sync_page(vcpu, sp, &invalid_list); + if (ret < 0) break;
WARN_ON(!list_empty(&invalid_list)); - kvm_flush_remote_tlbs(vcpu->kvm); + if (ret > 0) + kvm_flush_remote_tlbs(vcpu->kvm); }
__clear_sp_write_flooding_count(sp);
From: Ashish Kalra ashish.kalra@amd.com
commit d22d2474e3953996f03528b84b7f52cc26a39403 upstream.
For some sev ioctl interfaces, the length parameter that is passed maybe less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data that PSP firmware returns. In this case, kmalloc will allocate memory that is the size of the input rather than the size of the data. Since PSP firmware doesn't fully overwrite the allocated buffer, these sev ioctl interface may return uninitialized kernel slab memory.
Reported-by: Andy Nguyen theflow@google.com Suggested-by: David Rientjes rientjes@google.com Suggested-by: Peter Gonda pgonda@google.com Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: eaf78265a4ab3 ("KVM: SVM: Move SEV code to separate file") Fixes: 2c07ded06427d ("KVM: SVM: add support for SEV attestation command") Fixes: 4cfdd47d6d95a ("KVM: SVM: Add KVM_SEV SEND_START command") Fixes: d3d1af85e2c75 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Fixes: eba04b20e4861 ("KVM: x86: Account a variety of miscellaneous allocations") Signed-off-by: Ashish Kalra ashish.kalra@amd.com Reviewed-by: Peter Gonda pgonda@google.com Message-Id: 20220516154310.3685678-1-Ashish.Kalra@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/svm/sev.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -684,7 +684,7 @@ static int sev_launch_measure(struct kvm if (params.len > SEV_FW_BLOB_MAX_SIZE) return -EINVAL;
- blob = kmalloc(params.len, GFP_KERNEL_ACCOUNT); + blob = kzalloc(params.len, GFP_KERNEL_ACCOUNT); if (!blob) return -ENOMEM;
@@ -804,7 +804,7 @@ static int __sev_dbg_decrypt_user(struct if (!IS_ALIGNED(dst_paddr, 16) || !IS_ALIGNED(paddr, 16) || !IS_ALIGNED(size, 16)) { - tpage = (void *)alloc_page(GFP_KERNEL); + tpage = (void *)alloc_page(GFP_KERNEL | __GFP_ZERO); if (!tpage) return -ENOMEM;
@@ -1090,7 +1090,7 @@ static int sev_get_attestation_report(st if (params.len > SEV_FW_BLOB_MAX_SIZE) return -EINVAL;
- blob = kmalloc(params.len, GFP_KERNEL_ACCOUNT); + blob = kzalloc(params.len, GFP_KERNEL_ACCOUNT); if (!blob) return -ENOMEM;
@@ -1172,7 +1172,7 @@ static int sev_send_start(struct kvm *kv return -EINVAL;
/* allocate the memory to hold the session data blob */ - session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT); + session_data = kzalloc(params.session_len, GFP_KERNEL_ACCOUNT); if (!session_data) return -ENOMEM;
@@ -1296,11 +1296,11 @@ static int sev_send_update_data(struct k
/* allocate memory for header and transport buffer */ ret = -ENOMEM; - hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT); + hdr = kzalloc(params.hdr_len, GFP_KERNEL_ACCOUNT); if (!hdr) goto e_unpin;
- trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT); + trans_data = kzalloc(params.trans_len, GFP_KERNEL_ACCOUNT); if (!trans_data) goto e_free_hdr;
From: Fabio Estevam festevam@denx.de
commit 4ee4cdad368a26de3967f2975806a9ee2fa245df upstream.
Since commit 358ba762d9f1 ("crypto: caam - enable prediction resistance in HRWNG") the following CAAM errors can be seen on i.MX6SX:
caam_jr 2101000.jr: 20003c5b: CCB: desc idx 60: RNG: Hardware error hwrng: no data available
This error is due to an incorrect entropy delay for i.MX6SX.
Fix it by increasing the minimum entropy delay for i.MX6SX as done in U-Boot: https://patchwork.ozlabs.org/project/uboot/patch/20220415111049.2565744-1-ga...
As explained in the U-Boot patch:
"RNG self tests are run to determine the correct entropy delay. Such tests are executed with different voltages and temperatures to identify the worst case value for the entropy delay. For i.MX6SX, it was determined that after adding a margin value of 1000 the minimum entropy delay should be at least 12000."
Cc: stable@vger.kernel.org Fixes: 358ba762d9f1 ("crypto: caam - enable prediction resistance in HRWNG") Signed-off-by: Fabio Estevam festevam@denx.de Reviewed-by: Horia Geantă horia.geanta@nxp.com Reviewed-by: Vabhav Sharma vabhav.sharma@nxp.com Reviewed-by: Gaurav Jain gaurav.jain@nxp.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/caam/ctrl.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
--- a/drivers/crypto/caam/ctrl.c +++ b/drivers/crypto/caam/ctrl.c @@ -609,6 +609,13 @@ static bool check_version(struct fsl_mc_ } #endif
+static bool needs_entropy_delay_adjustment(void) +{ + if (of_machine_is_compatible("fsl,imx6sx")) + return true; + return false; +} + /* Probe routine for CAAM top (controller) level */ static int caam_probe(struct platform_device *pdev) { @@ -855,6 +862,8 @@ static int caam_probe(struct platform_de * Also, if a handle was instantiated, do not change * the TRNG parameters. */ + if (needs_entropy_delay_adjustment()) + ent_delay = 12000; if (!(ctrlpriv->rng4_sh_init || inst_handles)) { dev_info(dev, "Entropy delay = %u\n", @@ -871,6 +880,15 @@ static int caam_probe(struct platform_de */ ret = instantiate_rng(dev, inst_handles, gen_sk); + /* + * Entropy delay is determined via TRNG characterization. + * TRNG characterization is run across different voltages + * and temperatures. + * If worst case value for ent_dly is identified, + * the loop can be skipped for that platform. + */ + if (needs_entropy_delay_adjustment()) + break; if (ret == -EAGAIN) /* * if here, the loop will rerun,
From: Vitaly Chikunov vt@altlinux.org
commit 7cc7ab73f83ee6d50dc9536bc3355495d8600fad upstream.
Correctly compare values that shall be greater-or-equal and not just greater.
Fixes: 0d7a78643f69 ("crypto: ecrdsa - add EC-RDSA (GOST 34.10) algorithm") Cc: stable@vger.kernel.org Signed-off-by: Vitaly Chikunov vt@altlinux.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/ecrdsa.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/crypto/ecrdsa.c +++ b/crypto/ecrdsa.c @@ -113,15 +113,15 @@ static int ecrdsa_verify(struct akcipher
/* Step 1: verify that 0 < r < q, 0 < s < q */ if (vli_is_zero(r, ndigits) || - vli_cmp(r, ctx->curve->n, ndigits) == 1 || + vli_cmp(r, ctx->curve->n, ndigits) >= 0 || vli_is_zero(s, ndigits) || - vli_cmp(s, ctx->curve->n, ndigits) == 1) + vli_cmp(s, ctx->curve->n, ndigits) >= 0) return -EKEYREJECTED;
/* Step 2: calculate hash (h) of the message (passed as input) */ /* Step 3: calculate e = h \mod q */ vli_from_le64(e, digest, ndigits); - if (vli_cmp(e, ctx->curve->n, ndigits) == 1) + if (vli_cmp(e, ctx->curve->n, ndigits) >= 0) vli_sub(e, e, ctx->curve->n, ndigits); if (vli_is_zero(e, ndigits)) e[0] = 1; @@ -137,7 +137,7 @@ static int ecrdsa_verify(struct akcipher /* Step 6: calculate point C = z_1P + z_2Q, and R = x_c \mod q */ ecc_point_mult_shamir(&cc, z1, &ctx->curve->g, z2, &ctx->pub_key, ctx->curve); - if (vli_cmp(cc.x, ctx->curve->n, ndigits) == 1) + if (vli_cmp(cc.x, ctx->curve->n, ndigits) >= 0) vli_sub(cc.x, cc.x, ctx->curve->n, ndigits);
/* Step 7: if R == r signature is valid */
From: Sultan Alsawaf sultan@kerneltoast.com
commit 2505a981114dcb715f8977b8433f7540854851d8 upstream.
The asynchronous zspage free worker tries to lock a zspage's entire page list without defending against page migration. Since pages which haven't yet been locked can concurrently migrate off the zspage page list while lock_zspage() churns away, lock_zspage() can suffer from a few different lethal races.
It can lock a page which no longer belongs to the zspage and unsafely dereference page_private(), it can unsafely dereference a torn pointer to the next page (since there's a data race), and it can observe a spurious NULL pointer to the next page and thus not lock all of the zspage's pages (since a single page migration will reconstruct the entire page list, and create_page_chain() unconditionally zeroes out each list pointer in the process).
Fix the races by using migrate_read_lock() in lock_zspage() to synchronize with page migration.
Link: https://lkml.kernel.org/r/20220509024703.243847-1-sultan@kerneltoast.com Fixes: 77ff465799c602 ("zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse") Signed-off-by: Sultan Alsawaf sultan@kerneltoast.com Acked-by: Minchan Kim minchan@kernel.org Cc: Nitin Gupta ngupta@vflare.org Cc: Sergey Senozhatsky senozhatsky@chromium.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/zsmalloc.c | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-)
--- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1718,11 +1718,40 @@ static enum fullness_group putback_zspag */ static void lock_zspage(struct zspage *zspage) { - struct page *page = get_first_page(zspage); + struct page *curr_page, *page;
- do { - lock_page(page); - } while ((page = get_next_page(page)) != NULL); + /* + * Pages we haven't locked yet can be migrated off the list while we're + * trying to lock them, so we need to be careful and only attempt to + * lock each page under migrate_read_lock(). Otherwise, the page we lock + * may no longer belong to the zspage. This means that we may wait for + * the wrong page to unlock, so we must take a reference to the page + * prior to waiting for it to unlock outside migrate_read_lock(). + */ + while (1) { + migrate_read_lock(zspage); + page = get_first_page(zspage); + if (trylock_page(page)) + break; + get_page(page); + migrate_read_unlock(zspage); + wait_on_page_locked(page); + put_page(page); + } + + curr_page = page; + while ((page = get_next_page(curr_page))) { + if (trylock_page(page)) { + curr_page = page; + } else { + get_page(page); + migrate_read_unlock(zspage); + wait_on_page_locked(page); + put_page(page); + migrate_read_lock(zspage); + } + } + migrate_read_unlock(zspage); }
static int zs_init_fs_context(struct fs_context *fc)
From: Akira Yokosawa akiyks@gmail.com
commit 5b759db44195bb779828a188bad6b745c18dcd55 upstream.
EXPORT_SYMBOL of do_exec() was removed in v5.17. Unfortunately, kernel modules from klitmus7 7.56 have do_exec() at the end of each kthread.
herdtools7 7.56.1 has addressed the issue.
Update the compatibility table accordingly.
Signed-off-by: Akira Yokosawa akiyks@gmail.com Cc: Luc Maranget luc.maranget@inria.fr Cc: Jade Alglave j.alglave@ucl.ac.uk Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/memory-model/README | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/memory-model/README b/tools/memory-model/README index 9edd402704c4..dab38904206a 100644 --- a/tools/memory-model/README +++ b/tools/memory-model/README @@ -54,7 +54,8 @@ klitmus7 Compatibility Table -- 4.14 7.48 -- 4.15 -- 4.19 7.49 -- 4.20 -- 5.5 7.54 -- - 5.6 -- 7.56 -- + 5.6 -- 5.16 7.56 -- + 5.17 -- 7.56.1 -- ============ ==========
From: Takashi Iwai tiwai@suse.de
commit 5ce0b06ae5e69e23142e73c5c3c0260e9f2ccb4b upstream.
Maris reported that TEAC UD-501 (0644:8043) doesn't work with the typical "clock source 41 is not valid, cannot use" errors on the recent kernels. The currently known workaround so far is to restore (partially) what we've done unconditionally at the clock setup; namely, re-setup the USB interface immediately after the clock is changed. This patch re-introduces the behavior conditionally for TEAC devices.
Further notes: - The USB interface shall be set later in snd_usb_endpoint_configure(), but this seems to be too late. - Even calling usb_set_interface() right after sne_usb_init_sample_rate() doesn't help; so this must be related with the clock validation, too. - The device may still spew the "clock source 41 is not valid" error at the first clock setup. This seems happening at the very first try of clock setup, but it disappears at later attempts. The error is likely harmless because the driver retries the clock setup (such an error is more or less expected on some devices).
Fixes: bf6313a0ff76 ("ALSA: usb-audio: Refactor endpoint management") Reported-and-tested-by: Maris Abele maris7abele@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220521064627.29292-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/clock.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/sound/usb/clock.c +++ b/sound/usb/clock.c @@ -572,6 +572,13 @@ static int set_sample_rate_v2v3(struct s /* continue processing */ }
+ /* FIXME - TEAC devices require the immediate interface setup */ + if (rate != prev_rate && USB_ID_VENDOR(chip->usb_id) == 0x0644) { + usb_set_interface(chip->dev, fmt->iface, fmt->altsetting); + if (chip->quirk_flags & QUIRK_FLAG_IFACE_DELAY) + msleep(50); + } + validation: /* validate clock after rate change */ if (!uac_clock_source_is_valid(chip, fmt, clock))
From: Takashi Iwai tiwai@suse.de
commit 7b0efea4baf02f5e2f89e5f9b75ef891571b45f1 upstream.
The quirk entry for Focusrite Saffire 6 had no proper ep_idx for the capture endpoint, and this confused the driver, resulting in the broken sound. This patch adds the missing ep_idx in the entry.
While we are at it, a couple of other entries (for Digidesign MBox and MOTU MicroBook II) seem to have the same problem, and those are covered as well.
Fixes: bf6313a0ff76 ("ALSA: usb-audio: Refactor endpoint management") Reported-by: André Kapelrud a.kapelrud@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220521065325.426-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks-table.h | 3 +++ 1 file changed, 3 insertions(+)
--- a/sound/usb/quirks-table.h +++ b/sound/usb/quirks-table.h @@ -2672,6 +2672,7 @@ YAMAHA_DEVICE(0x7010, "UB99"), .altset_idx = 1, .attributes = 0, .endpoint = 0x82, + .ep_idx = 1, .ep_attr = USB_ENDPOINT_XFER_ISOC, .datainterval = 1, .maxpacksize = 0x0126, @@ -2875,6 +2876,7 @@ YAMAHA_DEVICE(0x7010, "UB99"), .altset_idx = 1, .attributes = 0x4, .endpoint = 0x81, + .ep_idx = 1, .ep_attr = USB_ENDPOINT_XFER_ISOC | USB_ENDPOINT_SYNC_ASYNC, .maxpacksize = 0x130, @@ -3391,6 +3393,7 @@ YAMAHA_DEVICE(0x7010, "UB99"), .altset_idx = 1, .attributes = 0, .endpoint = 0x03, + .ep_idx = 1, .rates = SNDRV_PCM_RATE_96000, .ep_attr = USB_ENDPOINT_XFER_ISOC | USB_ENDPOINT_SYNC_ASYNC,
From: Craig McLure craig@mclure.net
commit 0e85a22d01dfe9ad9a9d9e87cd4a88acce1aad65 upstream.
Devices such as the TC-Helicon GoXLR require the sync endpoint to be configured in advance of the data endpoint in order for sound output to work.
This patch simply changes the ordering of EP configuration to resolve this.
Fixes: bf6313a0ff76 ("ALSA: usb-audio: Refactor endpoint management") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215079 Signed-off-by: Craig McLure craig@mclure.net Reviewed-by: Jaroslav Kysela perex@perex.cz Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220524062115.25968-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/pcm.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
--- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -439,16 +439,21 @@ static int configure_endpoints(struct sn /* stop any running stream beforehand */ if (stop_endpoints(subs, false)) sync_pending_stops(subs); + if (subs->sync_endpoint) { + err = snd_usb_endpoint_configure(chip, subs->sync_endpoint); + if (err < 0) + return err; + } err = snd_usb_endpoint_configure(chip, subs->data_endpoint); if (err < 0) return err; snd_usb_set_format_quirk(subs, subs->cur_audiofmt); - } - - if (subs->sync_endpoint) { - err = snd_usb_endpoint_configure(chip, subs->sync_endpoint); - if (err < 0) - return err; + } else { + if (subs->sync_endpoint) { + err = snd_usb_endpoint_configure(chip, subs->sync_endpoint); + if (err < 0) + return err; + } }
return 0;
From: Steven Rostedt rostedt@goodmis.org
commit 72ef98445aca568a81c2da050532500a8345ad3a upstream.
While looking at a crash report on a timer list being corrupted, which usually happens when a timer is freed while still active. This is commonly triggered by code calling del_timer() instead of del_timer_sync() just before freeing.
One possible culprit is the hci_qca driver, which does exactly that.
Eric mentioned that wake_retrans_timer could be rearmed via the work queue, so also move the destruction of the work queue before del_timer_sync().
Cc: Eric Dumazet eric.dumazet@gmail.com Cc: stable@vger.kernel.org Fixes: 0ff252c1976da ("Bluetooth: hciuart: Add support QCA chipset for UART") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bluetooth/hci_qca.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -696,9 +696,9 @@ static int qca_close(struct hci_uart *hu skb_queue_purge(&qca->tx_wait_q); skb_queue_purge(&qca->txq); skb_queue_purge(&qca->rx_memdump_q); - del_timer(&qca->tx_idle_timer); - del_timer(&qca->wake_retrans_timer); destroy_workqueue(qca->workqueue); + del_timer_sync(&qca->tx_idle_timer); + del_timer_sync(&qca->wake_retrans_timer); qca->hu = NULL;
kfree_skb(qca->rx_skb);
From: Jonathan Bakker xc-racer2@live.ca
commit 3f5e3d3a8b895c8a11da8b0063ba2022dd9e2045 upstream.
Correct the name of the bluetooth interrupt from host-wake to host-wakeup.
Fixes: 1c65b6184441b ("ARM: dts: s5pv210: Correct BCM4329 bluetooth node") Cc: stable@vger.kernel.org Signed-off-by: Jonathan Bakker xc-racer2@live.ca Link: https://lore.kernel.org/r/CY4PR04MB0567495CFCBDC8D408D44199CB1C9@CY4PR04MB05... Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/s5pv210-aries.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/s5pv210-aries.dtsi +++ b/arch/arm/boot/dts/s5pv210-aries.dtsi @@ -895,7 +895,7 @@ device-wakeup-gpios = <&gpg3 4 GPIO_ACTIVE_HIGH>; interrupt-parent = <&gph2>; interrupts = <5 IRQ_TYPE_LEVEL_HIGH>; - interrupt-names = "host-wake"; + interrupt-names = "host-wakeup"; }; };
From: Dan Carpenter dan.carpenter@oracle.com
commit d3f2a14b8906df913cb04a706367b012db94a6e8 upstream.
The "r" variable shadows an earlier "r" that has function scope. It means that we accidentally return success instead of an error code. Smatch has a warning for this:
drivers/md/dm-integrity.c:4503 dm_integrity_ctr() warn: missing error code 'r'
Fixes: 7eada909bfd7 ("dm: add integrity target") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Reviewed-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-integrity.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -4495,8 +4495,6 @@ try_smaller_buffer: }
if (should_write_sb) { - int r; - init_journal(ic, 0, ic->journal_sections, 0); r = dm_integrity_failed(ic); if (unlikely(r)) {
From: Mikulas Patocka mpatocka@redhat.com
commit 567dd8f34560fa221a6343729474536aa7ede4fd upstream.
The device mapper dm-crypt target is using scnprintf("%02x", cc->key[i]) to report the current key to userspace. However, this is not a constant-time operation and it may leak information about the key via timing, via cache access patterns or via the branch predictor.
Change dm-crypt's key printing to use "%c" instead of "%02x". Also introduce hex2asc() that carefully avoids any branching or memory accesses when converting a number in the range 0 ... 15 to an ascii character.
Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka mpatocka@redhat.com Tested-by: Milan Broz gmazyland@gmail.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-crypt.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -3449,6 +3449,11 @@ static int crypt_map(struct dm_target *t return DM_MAPIO_SUBMITTED; }
+static char hex2asc(unsigned char c) +{ + return c + '0' + ((unsigned)(9 - c) >> 4 & 0x27); +} + static void crypt_status(struct dm_target *ti, status_type_t type, unsigned status_flags, char *result, unsigned maxlen) { @@ -3467,9 +3472,12 @@ static void crypt_status(struct dm_targe if (cc->key_size > 0) { if (cc->key_string) DMEMIT(":%u:%s", cc->key_size, cc->key_string); - else - for (i = 0; i < cc->key_size; i++) - DMEMIT("%02x", cc->key[i]); + else { + for (i = 0; i < cc->key_size; i++) { + DMEMIT("%c%c", hex2asc(cc->key[i] >> 4), + hex2asc(cc->key[i] & 0xf)); + } + } } else DMEMIT("-");
From: Mikulas Patocka mpatocka@redhat.com
commit bfe2b0146c4d0230b68f5c71a64380ff8d361f8b upstream.
dm-stats can be used with a very large number of entries (it is only limited by 1/4 of total system memory), so add rescheduling points to the loops that iterate over the entries.
Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-stats.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/md/dm-stats.c +++ b/drivers/md/dm-stats.c @@ -225,6 +225,7 @@ void dm_stats_cleanup(struct dm_stats *s atomic_read(&shared->in_flight[READ]), atomic_read(&shared->in_flight[WRITE])); } + cond_resched(); } dm_stat_free(&s->rcu_head); } @@ -330,6 +331,7 @@ static int dm_stats_create(struct dm_sta for (ni = 0; ni < n_entries; ni++) { atomic_set(&s->stat_shared[ni].in_flight[READ], 0); atomic_set(&s->stat_shared[ni].in_flight[WRITE], 0); + cond_resched(); }
if (s->n_histogram_entries) { @@ -342,6 +344,7 @@ static int dm_stats_create(struct dm_sta for (ni = 0; ni < n_entries; ni++) { s->stat_shared[ni].tmp.histogram = hi; hi += s->n_histogram_entries + 1; + cond_resched(); } }
@@ -362,6 +365,7 @@ static int dm_stats_create(struct dm_sta for (ni = 0; ni < n_entries; ni++) { p[ni].histogram = hi; hi += s->n_histogram_entries + 1; + cond_resched(); } } } @@ -497,6 +501,7 @@ static int dm_stats_list(struct dm_stats } DMEMIT("\n"); } + cond_resched(); } mutex_unlock(&stats->mutex);
@@ -774,6 +779,7 @@ static void __dm_stat_clear(struct dm_st local_irq_enable(); } } + cond_resched(); } }
@@ -889,6 +895,8 @@ static int dm_stats_print(struct dm_stat
if (unlikely(sz + 1 >= maxlen)) goto buffer_overflow; + + cond_resched(); }
if (clear)
From: Sarthak Kukreti sarthakkukreti@google.com
commit 4caae58406f8ceb741603eee460d79bacca9b1b5 upstream.
The device-mapper framework provides a mechanism to mark targets as immutable (and hence fail table reloads that try to change the target type). Add the DM_TARGET_IMMUTABLE flag to the dm-verity target's feature flags to prevent switching the verity target with a different target type.
Fixes: a4ffc152198e ("dm: add verity target") Cc: stable@vger.kernel.org Signed-off-by: Sarthak Kukreti sarthakkukreti@google.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-verity-target.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -1312,6 +1312,7 @@ bad:
static struct target_type verity_target = { .name = "verity", + .features = DM_TARGET_IMMUTABLE, .version = {1, 8, 0}, .module = THIS_MODULE, .ctr = verity_ctr,
From: Mariusz Tkaczyk mariusz.tkaczyk@linux.intel.com
commit 57668f0a4cc4083a120cc8c517ca0055c4543b59 upstream.
Raid456 module had allowed to achieve failed state. It was fixed by fb73b357fb9 ("raid5: block failing device if raid will be failed"). This fix introduces a bug, now if raid5 fails during IO, it may result with a hung task without completion. Faulty flag on the device is necessary to process all requests and is checked many times, mainly in analyze_stripe(). Allow to set faulty on drive again and set MD_BROKEN if raid is failed.
As a result, this level is allowed to achieve failed state again, but communication with userspace (via -EBUSY status) will be preserved.
This restores possibility to fail array via #mdadm --set-faulty command and will be fixed by additional verification on mdadm side.
Reproduction steps: mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1 mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean mkfs.xfs /dev/md126 -f mount /dev/md126 /mnt/root/
fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 &
echo 1 > /sys/block/nvme2n1/device/device/remove echo 1 > /sys/block/nvme1n1/device/device/remove
[ 1475.787779] Call Trace: [ 1475.793111] __schedule+0x2a6/0x700 [ 1475.799460] schedule+0x38/0xa0 [ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456] [ 1475.813856] ? finish_wait+0x80/0x80 [ 1475.820332] raid5_make_request+0x180/0xb40 [raid456] [ 1475.828281] ? finish_wait+0x80/0x80 [ 1475.834727] ? finish_wait+0x80/0x80 [ 1475.841127] ? finish_wait+0x80/0x80 [ 1475.847480] md_handle_request+0x119/0x190 [ 1475.854390] md_make_request+0x8a/0x190 [ 1475.861041] generic_make_request+0xcf/0x310 [ 1475.868145] submit_bio+0x3c/0x160 [ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60 [ 1475.882070] iomap_dio_bio_actor+0x175/0x390 [ 1475.889149] iomap_apply+0xff/0x310 [ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.909974] iomap_dio_rw+0x2f2/0x490 [ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390 [ 1475.923680] ? atime_needs_update+0x77/0xe0 [ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs] [ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs] [ 1475.953403] aio_read+0xd5/0x180 [ 1475.959395] ? _cond_resched+0x15/0x30 [ 1475.965907] io_submit_one+0x20b/0x3c0 [ 1475.972398] __x64_sys_io_submit+0xa2/0x180 [ 1475.979335] ? do_io_getevents+0x7c/0xc0 [ 1475.986009] do_syscall_64+0x5b/0x1a0 [ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 1476.000255] RIP: 0033:0x7f11fc27978d [ 1476.006631] Code: Bad RIP value. [ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.
Cc: stable@vger.kernel.org Fixes: fb73b357fb9 ("raid5: block failing device if raid will be failed") Reviewd-by: Xiao Ni xni@redhat.com Signed-off-by: Mariusz Tkaczyk mariusz.tkaczyk@linux.intel.com Signed-off-by: Song Liu song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/raid5.c | 47 ++++++++++++++++++++++------------------------- 1 file changed, 22 insertions(+), 25 deletions(-)
--- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -686,17 +686,17 @@ int raid5_calc_degraded(struct r5conf *c return degraded; }
-static int has_failed(struct r5conf *conf) +static bool has_failed(struct r5conf *conf) { - int degraded; + int degraded = conf->mddev->degraded;
- if (conf->mddev->reshape_position == MaxSector) - return conf->mddev->degraded > conf->max_degraded; + if (test_bit(MD_BROKEN, &conf->mddev->flags)) + return true;
- degraded = raid5_calc_degraded(conf); - if (degraded > conf->max_degraded) - return 1; - return 0; + if (conf->mddev->reshape_position != MaxSector) + degraded = raid5_calc_degraded(conf); + + return degraded > conf->max_degraded; }
struct stripe_head * @@ -2877,34 +2877,31 @@ static void raid5_error(struct mddev *md unsigned long flags; pr_debug("raid456: error called\n");
+ pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n", + mdname(mddev), bdevname(rdev->bdev, b)); + spin_lock_irqsave(&conf->device_lock, flags); + set_bit(Faulty, &rdev->flags); + clear_bit(In_sync, &rdev->flags); + mddev->degraded = raid5_calc_degraded(conf);
- if (test_bit(In_sync, &rdev->flags) && - mddev->degraded == conf->max_degraded) { - /* - * Don't allow to achieve failed state - * Don't try to recover this device - */ + if (has_failed(conf)) { + set_bit(MD_BROKEN, &conf->mddev->flags); conf->recovery_disabled = mddev->recovery_disabled; - spin_unlock_irqrestore(&conf->device_lock, flags); - return; + + pr_crit("md/raid:%s: Cannot continue operation (%d/%d failed).\n", + mdname(mddev), mddev->degraded, conf->raid_disks); + } else { + pr_crit("md/raid:%s: Operation continuing on %d devices.\n", + mdname(mddev), conf->raid_disks - mddev->degraded); }
- set_bit(Faulty, &rdev->flags); - clear_bit(In_sync, &rdev->flags); - mddev->degraded = raid5_calc_degraded(conf); spin_unlock_irqrestore(&conf->device_lock, flags); set_bit(MD_RECOVERY_INTR, &mddev->recovery);
set_bit(Blocked, &rdev->flags); set_mask_bits(&mddev->sb_flags, 0, BIT(MD_SB_CHANGE_DEVS) | BIT(MD_SB_CHANGE_PENDING)); - pr_crit("md/raid:%s: Disk failure on %s, disabling device.\n" - "md/raid:%s: Operation continuing on %d devices.\n", - mdname(mddev), - bdevname(rdev->bdev, b), - mdname(mddev), - conf->raid_disks - mddev->degraded); r5c_update_on_rdev_error(mddev, rdev); }
From: Randy Dunlap rdunlap@infradead.org
commit a3b774342fa752a5290c0de36375289dfcf4a260 upstream.
When the NTFS BOOT sectors_per_clusters field is > 0x80, it represents a shift value. Make sure that the shift value is not too large before using it (NTFS max cluster size is 2MB). Return -EVINVAL if it too large.
This prevents negative shift values and shift values that are larger than the field size.
Prevents this UBSAN error:
UBSAN: shift-out-of-bounds in ../fs/ntfs3/super.c:673:16 shift exponent -192 is negative
Link: https://lkml.kernel.org/r/20220502175342.20296-1-rdunlap@infradead.org Fixes: 82cae269cfa9 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Randy Dunlap rdunlap@infradead.org Reported-by: syzbot+1631f09646bc214d2e76@syzkaller.appspotmail.com Reviewed-by: Namjae Jeon linkinjeon@kernel.org Cc: Konstantin Komarov almaz.alexandrovich@paragon-software.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Kari Argillander kari.argillander@stargateuniverse.net Cc: Namjae Jeon linkinjeon@kernel.org Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ntfs3/super.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -668,9 +668,11 @@ static u32 format_size_gb(const u64 byte
static u32 true_sectors_per_clst(const struct NTFS_BOOT *boot) { - return boot->sectors_per_clusters <= 0x80 - ? boot->sectors_per_clusters - : (1u << (0 - boot->sectors_per_clusters)); + if (boot->sectors_per_clusters <= 0x80) + return boot->sectors_per_clusters; + if (boot->sectors_per_clusters >= 0xf4) /* limit shift to 2MB max */ + return 1U << (0 - boot->sectors_per_clusters); + return -EINVAL; }
/* @@ -713,6 +715,8 @@ static int ntfs_init_from_boot(struct su
/* cluster size: 512, 1K, 2K, 4K, ... 2M */ sct_per_clst = true_sectors_per_clst(boot); + if ((int)sct_per_clst < 0) + goto out; if (!is_power_of_2(sct_per_clst)) goto out;
From: Marek Maślanka mm@semihalf.com
commit 1d07cef7fd7599450b3d03e1915efc2a96e1f03f upstream.
The Google Whiskers touchpad does not work properly with the default multitouch configuration. Instead, use the same configuration as Google Rose.
Signed-off-by: Marek Maslanka mm@semihalf.com Acked-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-multitouch.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -2178,6 +2178,9 @@ static const struct hid_device_id mt_dev { .driver_data = MT_CLS_GOOGLE, HID_DEVICE(HID_BUS_ANY, HID_GROUP_ANY, USB_VENDOR_ID_GOOGLE, USB_DEVICE_ID_GOOGLE_TOUCH_ROSE) }, + { .driver_data = MT_CLS_GOOGLE, + HID_DEVICE(BUS_USB, HID_GROUP_MULTITOUCH_WIN_8, USB_VENDOR_ID_GOOGLE, + USB_DEVICE_ID_GOOGLE_WHISKERS) },
/* Generic MT device */ { HID_DEVICE(HID_BUS_ANY, HID_GROUP_MULTITOUCH, HID_ANY_ID, HID_ANY_ID) },
From: Tao Jin tao-j@outlook.com
commit 95cd2cdc88c755dcd0a58b951faeb77742c733a4 upstream.
This applies the similar quirks used by previous generation devices such as X1 tablet for X12 tablet, so that the trackpoint and buttons can work.
This patch was applied and tested working on 5.17.1 .
Cc: stable@vger.kernel.org # 5.8+ given that it relies on 40d5bb87377a Signed-off-by: Tao Jin tao-j@outlook.com Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Link: https://lore.kernel.org/r/CO6PR03MB6241CB276FCDC7F4CEDC34F6E1E29@CO6PR03MB62... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-multitouch.c | 6 ++++++ 2 files changed, 7 insertions(+)
--- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -760,6 +760,7 @@ #define USB_DEVICE_ID_LENOVO_X1_COVER 0x6085 #define USB_DEVICE_ID_LENOVO_X1_TAB 0x60a3 #define USB_DEVICE_ID_LENOVO_X1_TAB3 0x60b5 +#define USB_DEVICE_ID_LENOVO_X12_TAB 0x60fe #define USB_DEVICE_ID_LENOVO_OPTICAL_USB_MOUSE_600E 0x600e #define USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_608D 0x608d #define USB_DEVICE_ID_LENOVO_PIXART_USB_MOUSE_6019 0x6019 --- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -2034,6 +2034,12 @@ static const struct hid_device_id mt_dev USB_VENDOR_ID_LENOVO, USB_DEVICE_ID_LENOVO_X1_TAB3) },
+ /* Lenovo X12 TAB Gen 1 */ + { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT, + HID_DEVICE(BUS_USB, HID_GROUP_MULTITOUCH_WIN_8, + USB_VENDOR_ID_LENOVO, + USB_DEVICE_ID_LENOVO_X12_TAB) }, + /* MosArt panels */ { .driver_data = MT_CLS_CONFIDENCE_MINUS_ONE, MT_USB_DEVICE(USB_VENDOR_ID_ASUS,
From: Reinette Chatre reinette.chatre@intel.com
commit 6bd429643cc265e94a9d19839c771bcc5d008fa8 upstream.
SGX uses shmem backing storage to store encrypted enclave pages and their crypto metadata when enclave pages are moved out of enclave memory. Two shmem backing storage pages are associated with each enclave page - one backing page to contain the encrypted enclave page data and one backing page (shared by a few enclave pages) to contain the crypto metadata used by the processor to verify the enclave page when it is loaded back into the enclave.
sgx_encl_put_backing() is used to release references to the backing storage and, optionally, mark both backing store pages as dirty.
Managing references and dirty status together in this way results in both backing store pages marked as dirty, even if only one of the backing store pages are changed.
Additionally, waiting until the page reference is dropped to set the page dirty risks a race with the page fault handler that may load outdated data into the enclave when a page is faulted right after it is reclaimed.
Consider what happens if the reclaimer writes a page to the backing store and the page is immediately faulted back, before the reclaimer is able to set the dirty bit of the page:
sgx_reclaim_pages() { sgx_vma_fault() { ... sgx_encl_get_backing(); ... ... sgx_reclaimer_write() { mutex_lock(&encl->lock); /* Write data to backing store */ mutex_unlock(&encl->lock); } mutex_lock(&encl->lock); __sgx_encl_eldu() { ... /* * Enclave backing store * page not released * nor marked dirty - * contents may not be * up to date. */ sgx_encl_get_backing(); ... /* * Enclave data restored * from backing store * and PCMD pages that * are not up to date. * ENCLS[ELDU] faults * because of MAC or PCMD * checking failure. */ sgx_encl_put_backing(); } ... /* set page dirty */ sgx_encl_put_backing(); ... mutex_unlock(&encl->lock); } }
Remove the option to sgx_encl_put_backing() to set the backing pages as dirty and set the needed pages as dirty right after receiving important data while enclave mutex is held. This ensures that the page fault handler can get up to date data from a page and prepares the code for a following change where only one of the backing pages need to be marked as dirty.
Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Suggested-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lore.kernel.org/linux-sgx/8922e48f-6646-c7cc-6393-7c78dcf23d23@intel... Link: https://lkml.kernel.org/r/fa9f98986923f43e72ef4c6702a50b2a0b3c42e3.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/encl.c | 10 ++-------- arch/x86/kernel/cpu/sgx/encl.h | 2 +- arch/x86/kernel/cpu/sgx/main.c | 6 ++++-- 3 files changed, 7 insertions(+), 11 deletions(-)
--- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -94,7 +94,7 @@ static int __sgx_encl_eldu(struct sgx_en kunmap_atomic(pcmd_page); kunmap_atomic((void *)(unsigned long)pginfo.contents);
- sgx_encl_put_backing(&b, false); + sgx_encl_put_backing(&b);
sgx_encl_truncate_backing_page(encl, page_index);
@@ -645,15 +645,9 @@ int sgx_encl_get_backing(struct sgx_encl /** * sgx_encl_put_backing() - Unpin the backing storage * @backing: data for accessing backing storage for the page - * @do_write: mark pages dirty */ -void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write) +void sgx_encl_put_backing(struct sgx_backing *backing) { - if (do_write) { - set_page_dirty(backing->pcmd); - set_page_dirty(backing->contents); - } - put_page(backing->pcmd); put_page(backing->contents); } --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -107,7 +107,7 @@ void sgx_encl_release(struct kref *ref); int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index, struct sgx_backing *backing); -void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write); +void sgx_encl_put_backing(struct sgx_backing *backing); int sgx_encl_test_and_clear_young(struct mm_struct *mm, struct sgx_encl_page *page);
--- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -191,6 +191,8 @@ static int __sgx_encl_ewb(struct sgx_epc backing->pcmd_offset;
ret = __ewb(&pginfo, sgx_get_epc_virt_addr(epc_page), va_slot); + set_page_dirty(backing->pcmd); + set_page_dirty(backing->contents);
kunmap_atomic((void *)(unsigned long)(pginfo.metadata - backing->pcmd_offset)); @@ -320,7 +322,7 @@ static void sgx_reclaimer_write(struct s sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL;
- sgx_encl_put_backing(&secs_backing, true); + sgx_encl_put_backing(&secs_backing); }
out: @@ -411,7 +413,7 @@ skip:
encl_page = epc_page->owner; sgx_reclaimer_write(epc_page, &backing[i]); - sgx_encl_put_backing(&backing[i], true); + sgx_encl_put_backing(&backing[i]);
kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
From: Reinette Chatre reinette.chatre@intel.com
commit 2154e1c11b7080aa19f47160bd26b6f39bbd7824 upstream.
Recent commit 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") expanded __sgx_encl_eldu() to clear an enclave page's PCMD (Paging Crypto MetaData) from the PCMD page in the backing store after the enclave page is restored to the enclave.
Since the PCMD page in the backing store is modified the page should be marked as dirty to ensure the modified data is retained.
Cc: stable@vger.kernel.org Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lkml.kernel.org/r/00cd2ac480db01058d112e347b32599c1a806bc4.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/encl.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -84,6 +84,7 @@ static int __sgx_encl_eldu(struct sgx_en }
memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd)); + set_page_dirty(b.pcmd);
/* * The area for the PCMD in the page was zeroed above. Check if the
From: Reinette Chatre reinette.chatre@intel.com
commit 0e4e729a830c1e7f31d3b3fbf8feb355a402b117 upstream.
Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP.
The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away.
The SGX backing storage is accessed on two paths: when there are insufficient free pages in the EPC the reclaimer works to move enclave pages to the backing storage and as enclaves access pages that have been moved to the backing storage they are retrieved from there as part of page fault handling.
An oversubscribed SGX system will often run the reclaimer and page fault handler concurrently and needs to ensure that the backing store is accessed safely between the reclaimer and the page fault handler. This is not the case because the reclaimer accesses the backing store without the enclave mutex while the page fault handler accesses the backing store with the enclave mutex.
Consider the scenario where a page is faulted while a page sharing a PCMD page with the faulted page is being reclaimed. The consequence is a race between the reclaimer and page fault handler, the reclaimer attempting to access a PCMD at the same time it is truncated by the page fault handler. This could result in lost PCMD data. Data may still be lost if the reclaimer wins the race, this is addressed in the following patch.
The reclaimer accesses pages from the backing storage without holding the enclave mutex and runs the risk of concurrently accessing the backing storage with the page fault handler that does access the backing storage with the enclave mutex held.
In the scenario below a PCMD page is truncated from the backing store after all its pages have been loaded in to the enclave at the same time the PCMD page is loaded from the backing store when one of its pages are reclaimed:
sgx_reclaim_pages() { sgx_vma_fault() { ... mutex_lock(&encl->lock); ... __sgx_encl_eldu() { ... if (pcmd_page_empty) { /* * EPC page being reclaimed /* * shares a PCMD page with an * PCMD page truncated * enclave page that is being * while requested from * faulted in. * reclaimer. */ */ sgx_encl_get_backing() <----------> sgx_encl_truncate_backing_page() } mutex_unlock(&encl->lock); } }
In this scenario there is a race between the reclaimer and the page fault handler when the reclaimer attempts to get access to the same PCMD page that is being truncated. This could result in the reclaimer writing to the PCMD page that is then truncated, causing the PCMD data to be lost, or in a new PCMD page being allocated. The lost PCMD data may still occur after protecting the backing store access with the mutex - this is fixed in the next patch. By ensuring the backing store is accessed with the mutex held the enclave page state can be made accurate with the SGX_ENCL_PAGE_BEING_RECLAIMED flag accurately reflecting that a page is in the process of being reclaimed.
Consistently protect the reclaimer's backing store access with the enclave's mutex to ensure that it can safely run concurrently with the page fault handler.
Cc: stable@vger.kernel.org Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer") Reported-by: Haitao Huang haitao.huang@intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lkml.kernel.org/r/fa2e04c561a8555bfe1f4e7adc37d60efc77387b.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -310,6 +310,7 @@ static void sgx_reclaimer_write(struct s sgx_encl_ewb(epc_page, backing); encl_page->epc_page = NULL; encl->secs_child_cnt--; + sgx_encl_put_backing(backing);
if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { ret = sgx_encl_get_backing(encl, PFN_DOWN(encl->size), @@ -381,11 +382,14 @@ static void sgx_reclaim_pages(void) goto skip;
page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); + + mutex_lock(&encl_page->encl->lock); ret = sgx_encl_get_backing(encl_page->encl, page_index, &backing[i]); - if (ret) + if (ret) { + mutex_unlock(&encl_page->encl->lock); goto skip; + }
- mutex_lock(&encl_page->encl->lock); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl_page->encl->lock); continue; @@ -413,7 +417,6 @@ skip:
encl_page = epc_page->owner; sgx_reclaimer_write(epc_page, &backing[i]); - sgx_encl_put_backing(&backing[i]);
kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
From: Reinette Chatre reinette.chatre@intel.com
commit af117837ceb9a78e995804ade4726ad2c2c8981f upstream.
Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP.
The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away.
Consider two enclave pages (ENCLAVE_A and ENCLAVE_B) sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains just one entry, that of ENCLAVE_B.
Scenario proceeds where ENCLAVE_A is being evicted from the enclave while ENCLAVE_B is faulted in.
sgx_reclaim_pages() {
...
/* * Reclaim ENCLAVE_A */ mutex_lock(&encl->lock); /* * Get a reference to ENCLAVE_A's * shmem page where enclave page * encrypted data will be stored * as well as a reference to the * enclave page's PCMD data page, * PCMD_AB. * Release mutex before writing * any data to the shmem pages. */ sgx_encl_get_backing(...); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl->lock);
/* * Fault ENCLAVE_B */
sgx_vma_fault() {
mutex_lock(&encl->lock); /* * Get reference to * ENCLAVE_B's shmem page * as well as PCMD_AB. */ sgx_encl_get_backing(...) /* * Load page back into * enclave via ELDU. */ /* * Release reference to * ENCLAVE_B' shmem page and * PCMD_AB. */ sgx_encl_put_backing(...); /* * PCMD_AB is found empty so * it and ENCLAVE_B's shmem page * are truncated. */ /* Truncate ENCLAVE_B backing page */ sgx_encl_truncate_backing_page(); /* Truncate PCMD_AB */ sgx_encl_truncate_backing_page();
mutex_unlock(&encl->lock);
... } mutex_lock(&encl->lock); encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED; /* * Write encrypted contents of * ENCLAVE_A to ENCLAVE_A shmem * page and its PCMD data to * PCMD_AB. */ sgx_encl_put_backing(...)
/* * Reference to PCMD_AB is * dropped and it is truncated. * ENCLAVE_A's PCMD data is lost. */ mutex_unlock(&encl->lock); }
What happens next depends on whether it is ENCLAVE_A being faulted in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting with a #GP.
If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called a new PCMD page is allocated and providing the empty PCMD data for ENCLAVE_A would cause ENCLS[ELDU] to #GP
If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP during its checks of the PCMD value and the WARN would be encountered.
Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time it obtains a reference to the backing store pages of an enclave page it is in the process of reclaiming, fix the race by only truncating the PCMD page after ensuring that no page sharing the PCMD page is in the process of being reclaimed.
Cc: stable@vger.kernel.org Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page") Reported-by: Haitao Huang haitao.huang@intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lkml.kernel.org/r/ed20a5db516aa813873268e125680041ae11dfcf.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/encl.c | 94 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 93 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -12,6 +12,92 @@ #include "encls.h" #include "sgx.h"
+#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd)) +/* + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to + * determine the page index associated with the first PCMD entry + * within a PCMD page. + */ +#define PCMD_FIRST_MASK GENMASK(4, 0) + +/** + * reclaimer_writing_to_pcmd() - Query if any enclave page associated with + * a PCMD page is in process of being reclaimed. + * @encl: Enclave to which PCMD page belongs + * @start_addr: Address of enclave page using first entry within the PCMD page + * + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is + * stored. The PCMD data of a reclaimed enclave page contains enough + * information for the processor to verify the page at the time + * it is loaded back into the Enclave Page Cache (EPC). + * + * The backing storage to which enclave pages are reclaimed is laid out as + * follows: + * Encrypted enclave pages:SECS page:PCMD pages + * + * Each PCMD page contains the PCMD metadata of + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages. + * + * A PCMD page can only be truncated if it is (a) empty, and (b) not in the + * process of getting data (and thus soon being non-empty). (b) is tested with + * a check if an enclave page sharing the PCMD page is in the process of being + * reclaimed. + * + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it + * intends to reclaim that enclave page - it means that the PCMD page + * associated with that enclave page is about to get some data and thus + * even if the PCMD page is empty, it should not be truncated. + * + * Context: Enclave mutex (&sgx_encl->lock) must be held. + * Return: 1 if the reclaimer is about to write to the PCMD page + * 0 if the reclaimer has no intention to write to the PCMD page + */ +static int reclaimer_writing_to_pcmd(struct sgx_encl *encl, + unsigned long start_addr) +{ + int reclaimed = 0; + int i; + + /* + * PCMD_FIRST_MASK is based on number of PCMD entries within + * PCMD page being 32. + */ + BUILD_BUG_ON(PCMDS_PER_PAGE != 32); + + for (i = 0; i < PCMDS_PER_PAGE; i++) { + struct sgx_encl_page *entry; + unsigned long addr; + + addr = start_addr + i * PAGE_SIZE; + + /* + * Stop when reaching the SECS page - it does not + * have a page_array entry and its reclaim is + * started and completed with enclave mutex held so + * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED + * flag. + */ + if (addr == encl->base + encl->size) + break; + + entry = xa_load(&encl->page_array, PFN_DOWN(addr)); + if (!entry) + continue; + + /* + * VA page slot ID uses same bit as the flag so it is important + * to ensure that the page is not already in backing store. + */ + if (entry->epc_page && + (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) { + reclaimed = 1; + break; + } + } + + return reclaimed; +} + /* * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's * follow right after the EPC data in the backing storage. In addition to the @@ -47,6 +133,7 @@ static int __sgx_encl_eldu(struct sgx_en unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK; struct sgx_encl *encl = encl_page->encl; pgoff_t page_index, page_pcmd_off; + unsigned long pcmd_first_page; struct sgx_pageinfo pginfo; struct sgx_backing b; bool pcmd_page_empty; @@ -58,6 +145,11 @@ static int __sgx_encl_eldu(struct sgx_en else page_index = PFN_DOWN(encl->size);
+ /* + * Address of enclave page using the first entry within the PCMD page. + */ + pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base; + page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
ret = sgx_encl_get_backing(encl, page_index, &b); @@ -99,7 +191,7 @@ static int __sgx_encl_eldu(struct sgx_en
sgx_encl_truncate_backing_page(encl, page_index);
- if (pcmd_page_empty) + if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
return ret;
From: Reinette Chatre reinette.chatre@intel.com
commit e3a3bbe3e99de73043a1d32d36cf4d211dc58c7e upstream.
A PCMD (Paging Crypto MetaData) page contains the PCMD structures of enclave pages that have been encrypted and moved to the shmem backing store. When all enclave pages sharing a PCMD page are loaded in the enclave, there is no need for the PCMD page and it can be truncated from the backing store.
A few issues appeared around the truncation of PCMD pages. The known issues have been addressed but the PCMD handling code could be made more robust by loudly complaining if any new issue appears in this area.
Add a check that will complain with a warning if the PCMD page is not actually empty after it has been truncated. There should never be data in the PCMD page at this point since it is was just checked to be empty and truncated with enclave mutex held and is updated with the enclave mutex held.
Suggested-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Tested-by: Haitao Huang haitao.huang@intel.com Link: https://lkml.kernel.org/r/6495120fed43fafc1496d09dd23df922b9a32709.165238982... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/sgx/encl.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -187,12 +187,20 @@ static int __sgx_encl_eldu(struct sgx_en kunmap_atomic(pcmd_page); kunmap_atomic((void *)(unsigned long)pginfo.contents);
+ get_page(b.pcmd); sgx_encl_put_backing(&b);
sgx_encl_truncate_backing_page(encl, page_index);
- if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) + if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page)) { sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off)); + pcmd_page = kmap_atomic(b.pcmd); + if (memchr_inv(pcmd_page, 0, PAGE_SIZE)) + pr_warn("PCMD page not empty after truncate.\n"); + kunmap_atomic(pcmd_page); + } + + put_page(b.pcmd);
return ret; }
From: Bryan O'Donoghue bryan.odonoghue@linaro.org
commit bb25f071fc92d3d227178a45853347c7b3b45a6b upstream.
The imx412/imx577 sensor has a reset line that is active low not active high. Currently the logic for this is inverted.
The right way to define the reset line is to declare it active low in the DTS and invert the logic currently contained in the driver.
The DTS should represent the hardware does i.e. reset is active low. So: + reset-gpios = <&tlmm 78 GPIO_ACTIVE_LOW>; not: - reset-gpios = <&tlmm 78 GPIO_ACTIVE_HIGH>;
I was a bit reticent about changing this logic since I thought it might negatively impact @intel.com users. Googling a bit though I believe this sensor is used on "Keem Bay" which is clearly a DTS based system and is not upstream yet.
Fixes: 9214e86c0cc1 ("media: i2c: Add imx412 camera sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Reviewed-by: Jacopo Mondi jacopo@jmondi.org Reviewed-by: Daniele Alessandrelli daniele.alessandrelli@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/i2c/imx412.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/media/i2c/imx412.c +++ b/drivers/media/i2c/imx412.c @@ -1011,7 +1011,7 @@ static int imx412_power_on(struct device struct imx412 *imx412 = to_imx412(sd); int ret;
- gpiod_set_value_cansleep(imx412->reset_gpio, 1); + gpiod_set_value_cansleep(imx412->reset_gpio, 0);
ret = clk_prepare_enable(imx412->inclk); if (ret) { @@ -1024,7 +1024,7 @@ static int imx412_power_on(struct device return 0;
error_reset: - gpiod_set_value_cansleep(imx412->reset_gpio, 0); + gpiod_set_value_cansleep(imx412->reset_gpio, 1);
return ret; } @@ -1040,7 +1040,7 @@ static int imx412_power_off(struct devic struct v4l2_subdev *sd = dev_get_drvdata(dev); struct imx412 *imx412 = to_imx412(sd);
- gpiod_set_value_cansleep(imx412->reset_gpio, 0); + gpiod_set_value_cansleep(imx412->reset_gpio, 1);
clk_disable_unprepare(imx412->inclk);
From: Bryan O'Donoghue bryan.odonoghue@linaro.org
commit 9a199694c6a1519522ec73a4571f68abe9f13d5d upstream.
The enable path does - gpio - clock
The disable path does - gpio - clock
Fix the order on the power-off path so that power-off and power-on have the same ordering for clock and gpio.
Fixes: 9214e86c0cc1 ("media: i2c: Add imx412 camera sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Reviewed-by: Jacopo Mondi jacopo@jmondi.org Reviewed-by: Daniele Alessandrelli daniele.alessandrelli@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/i2c/imx412.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/media/i2c/imx412.c +++ b/drivers/media/i2c/imx412.c @@ -1040,10 +1040,10 @@ static int imx412_power_off(struct devic struct v4l2_subdev *sd = dev_get_drvdata(dev); struct imx412 *imx412 = to_imx412(sd);
- gpiod_set_value_cansleep(imx412->reset_gpio, 1); - clk_disable_unprepare(imx412->inclk);
+ gpiod_set_value_cansleep(imx412->reset_gpio, 1); + return 0; }
From: Stefan Mahnke-Hartmann stefan.mahnke-hartmann@infineon.com
commit e57b2523bd37e6434f4e64c7a685e3715ad21e9a upstream.
Under certain conditions uninitialized memory will be accessed. As described by TCG Trusted Platform Module Library Specification, rev. 1.59 (Part 3: Commands), if a TPM2_GetCapability is received, requesting a capability, the TPM in field upgrade mode may return a zero length list. Check the property count in tpm2_get_tpm_pt().
Fixes: 2ab3241161b3 ("tpm: migrate tpm2_get_tpm_pt() to use struct tpm_buf") Cc: stable@vger.kernel.org Signed-off-by: Stefan Mahnke-Hartmann stefan.mahnke-hartmann@infineon.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm2-cmd.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/drivers/char/tpm/tpm2-cmd.c +++ b/drivers/char/tpm/tpm2-cmd.c @@ -400,7 +400,16 @@ ssize_t tpm2_get_tpm_pt(struct tpm_chip if (!rc) { out = (struct tpm2_get_cap_out *) &buf.data[TPM_HEADER_SIZE]; - *value = be32_to_cpu(out->value); + /* + * To prevent failing boot up of some systems, Infineon TPM2.0 + * returns SUCCESS on TPM2_Startup in field upgrade mode. Also + * the TPM2_Getcapability command returns a zero length list + * in field upgrade mode. + */ + if (be32_to_cpu(out->property_cnt) > 0) + *value = be32_to_cpu(out->value); + else + rc = -ENODATA; } tpm_buf_destroy(&buf); return rc;
From: Xiu Jianfeng xiujianfeng@huawei.com
commit d0dc1a7100f19121f6e7450f9cdda11926aa3838 upstream.
Currently it returns zero when CRQ response timed out, it should return an error code instead.
Fixes: d8d74ea3c002 ("tpm: ibmvtpm: Wait for buffer to be set before proceeding") Signed-off-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Stefan Berger stefanb@linux.ibm.com Acked-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_ibmvtpm.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/char/tpm/tpm_ibmvtpm.c +++ b/drivers/char/tpm/tpm_ibmvtpm.c @@ -681,6 +681,7 @@ static int tpm_ibmvtpm_probe(struct vio_ if (!wait_event_timeout(ibmvtpm->crq_queue.wq, ibmvtpm->rtce_buf != NULL, HZ)) { + rc = -ENODEV; dev_err(dev, "CRQ response timed out\n"); goto init_irq_cleanup; }
From: Akira Yokosawa akiyks@gmail.com
commit 6d5aa418b3bd42cdccc36e94ee199af423ef7c84 upstream.
The reference to `explicit_in_reply_to` is pointless as when the reference was added in the form of "#15" [1], Section 15) was "The canonical patch format". The reference of "#15" had not been properly updated in a couple of reorganizations during the plain-text SubmittingPatches era.
Fix it by using `the_canonical_patch_format`.
[1]: 2ae19acaa50a ("Documentation: Add "how to write a good patch summary" to SubmittingPatches")
Signed-off-by: Akira Yokosawa akiyks@gmail.com Fixes: 5903019b2a5e ("Documentation/SubmittingPatches: convert it to ReST markup") Fixes: 9b2c76777acc ("Documentation/SubmittingPatches: enrich the Sphinx output") Cc: Jonathan Corbet corbet@lwn.net Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: stable@vger.kernel.org # v4.9+ Link: https://lore.kernel.org/r/64e105a5-50be-23f2-6cae-903a2ea98e18@gmail.com Signed-off-by: Jonathan Corbet corbet@lwn.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/process/submitting-patches.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Documentation/process/submitting-patches.rst +++ b/Documentation/process/submitting-patches.rst @@ -77,7 +77,7 @@ as you intend it to.
The maintainer will thank you if you write your patch description in a form which can be easily pulled into Linux's source code management -system, ``git``, as a "commit log". See :ref:`explicit_in_reply_to`. +system, ``git``, as a "commit log". See :ref:`the_canonical_patch_format`.
Solve only one problem per patch. If your description starts to get long, that's a sign that you probably need to split up your patch.
From: Trond Myklebust trond.myklebust@hammerspace.com
commit 452284407c18d8a522c3039339b1860afa0025a8 upstream.
We need to filter out ENOMEM in nfs_error_is_fatal_on_server(), because running out of memory on our client is not a server error.
Reported-by: Olga Kornievskaia aglo@umich.edu Fixes: 2dc23afffbca ("NFS: ENOMEM should also be a fatal error.") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfs/internal.h | 1 + 1 file changed, 1 insertion(+)
--- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -827,6 +827,7 @@ static inline bool nfs_error_is_fatal_on case 0: case -ERESTARTSYS: case -EINTR: + case -ENOMEM: return false; } return nfs_error_is_fatal(err);
From: Chuck Lever chuck.lever@oracle.com
commit ce3c4ad7f4ce5db7b4f08a1e237d8dd94b39180b upstream.
nfsd4_release_lockowner() holds clp->cl_lock when it calls check_for_locks(). However, check_for_locks() calls nfsd_file_get() / nfsd_file_put() to access the backing inode's flc_posix list, and nfsd_file_put() can sleep if the inode was recently removed.
Let's instead rely on the stateowner's reference count to gate whether the release is permitted. This should be a reliable indication of locks-in-use since file lock operations and ->lm_get_owner take appropriate references, which are released appropriately when file locks are removed.
Reported-by: Dai Ngo dai.ngo@oracle.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4state.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
--- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7330,16 +7330,12 @@ nfsd4_release_lockowner(struct svc_rqst if (sop->so_is_open_owner || !same_owner_str(sop, owner)) continue;
- /* see if there are still any locks associated with it */ - lo = lockowner(sop); - list_for_each_entry(stp, &sop->so_stateids, st_perstateowner) { - if (check_for_locks(stp->st_stid.sc_file, lo)) { - status = nfserr_locks_held; - spin_unlock(&clp->cl_lock); - return status; - } + if (atomic_read(&sop->so_count) != 1) { + spin_unlock(&clp->cl_lock); + return nfserr_locks_held; }
+ lo = lockowner(sop); nfs4_get_stateowner(sop); break; }
From: Yuntao Wang ytcoode@gmail.com
commit a2aa95b71c9bbec793b5c5fa50f0a80d882b3e8d upstream.
The cnt value in the 'cnt >= BPF_MAX_TRAMP_PROGS' check does not include BPF_TRAMP_MODIFY_RETURN bpf programs, so the number of the attached BPF_TRAMP_MODIFY_RETURN bpf programs in a trampoline can exceed BPF_MAX_TRAMP_PROGS.
When this happens, the assignment '*progs++ = aux->prog' in bpf_trampoline_get_progs() will cause progs array overflow as the progs field in the bpf_tramp_progs struct can only hold at most BPF_MAX_TRAMP_PROGS bpf programs.
Fixes: 88fd9e5352fe ("bpf: Refactor trampoline update code") Signed-off-by: Yuntao Wang ytcoode@gmail.com Link: https://lore.kernel.org/r/20220430130803.210624-1-ytcoode@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/trampoline.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -423,7 +423,7 @@ int bpf_trampoline_link_prog(struct bpf_ { enum bpf_tramp_prog_type kind; int err = 0; - int cnt; + int cnt = 0, i;
kind = bpf_attach_type_to_tramp(prog); mutex_lock(&tr->mutex); @@ -434,7 +434,10 @@ int bpf_trampoline_link_prog(struct bpf_ err = -EBUSY; goto out; } - cnt = tr->progs_cnt[BPF_TRAMP_FENTRY] + tr->progs_cnt[BPF_TRAMP_FEXIT]; + + for (i = 0; i < BPF_TRAMP_MAX; i++) + cnt += tr->progs_cnt[i]; + if (kind == BPF_TRAMP_REPLACE) { /* Cannot attach extension if fentry/fexit are in use. */ if (cnt) { @@ -512,16 +515,19 @@ out:
void bpf_trampoline_put(struct bpf_trampoline *tr) { + int i; + if (!tr) return; mutex_lock(&trampoline_mutex); if (!refcount_dec_and_test(&tr->refcnt)) goto out; WARN_ON_ONCE(mutex_is_locked(&tr->mutex)); - if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FENTRY]))) - goto out; - if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT]))) - goto out; + + for (i = 0; i < BPF_TRAMP_MAX; i++) + if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[i]))) + goto out; + /* This code will be executed even when the last bpf_tramp_image * is alive. All progs are detached from the trampoline and the * trampoline image is patched with jmp into epilogue to skip
From: Alexei Starovoitov ast@kernel.org
commit 4b6313cf99b0d51b49aeaea98ec76ca8161ecb80 upstream.
The combination of jit blinding and pointers to bpf subprogs causes: [ 36.989548] BUG: unable to handle page fault for address: 0000000100000001 [ 36.990342] #PF: supervisor instruction fetch in kernel mode [ 36.990968] #PF: error_code(0x0010) - not-present page [ 36.994859] RIP: 0010:0x100000001 [ 36.995209] Code: Unable to access opcode bytes at RIP 0xffffffd7. [ 37.004091] Call Trace: [ 37.004351] <TASK> [ 37.004576] ? bpf_loop+0x4d/0x70 [ 37.004932] ? bpf_prog_3899083f75e4c5de_F+0xe3/0x13b
The jit blinding logic didn't recognize that ld_imm64 with an address of bpf subprogram is a special instruction and proceeded to randomize it. By itself it wouldn't have been an issue, but jit_subprogs() logic relies on two step process to JIT all subprogs and then JIT them again when addresses of all subprogs are known. Blinding process in the first JIT phase caused second JIT to miss adjustment of special ld_imm64.
Fix this issue by ignoring special ld_imm64 instructions that don't have user controlled constants and shouldn't be blinded.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Reported-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Andrii Nakryiko andrii@kernel.org Acked-by: Martin KaFai Lau kafai@fb.com Link: https://lore.kernel.org/bpf/20220513011025.13344-1-alexei.starovoitov@gmail.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/core.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1157,6 +1157,16 @@ struct bpf_prog *bpf_jit_blind_constants insn = clone->insnsi;
for (i = 0; i < insn_cnt; i++, insn++) { + if (bpf_pseudo_func(insn)) { + /* ld_imm64 with an address of bpf subprog is not + * a user controlled constant. Don't randomize it, + * since it will conflict with jit_subprogs() logic. + */ + insn++; + i++; + continue; + } + /* We temporarily need to hold the original ld64 insn * so that we can still access the first part in the * second blinding run.
From: Liu Jian liujian56@huawei.com
commit 45969b4152c1752089351cd6836a42a566d49bcf upstream.
The data length of skb frags + frag_list may be greater than 0xffff, and skb_header_pointer can not handle negative offset. So, here INT_MAX is used to check the validity of offset. Add the same change to the related function skb_store_bytes.
Fixes: 05c74e5e53f6 ("bpf: add bpf_skb_load_bytes helper") Signed-off-by: Liu Jian liujian56@huawei.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Song Liu songliubraving@fb.com Link: https://lore.kernel.org/bpf/20220416105801.88708-2-liujian56@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/filter.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/core/filter.c +++ b/net/core/filter.c @@ -1687,7 +1687,7 @@ BPF_CALL_5(bpf_skb_store_bytes, struct s
if (unlikely(flags & ~(BPF_F_RECOMPUTE_CSUM | BPF_F_INVALIDATE_HASH))) return -EINVAL; - if (unlikely(offset > 0xffff)) + if (unlikely(offset > INT_MAX)) return -EFAULT; if (unlikely(bpf_try_make_writable(skb, offset + len))) return -EFAULT; @@ -1722,7 +1722,7 @@ BPF_CALL_4(bpf_skb_load_bytes, const str { void *ptr;
- if (unlikely(offset > 0xffff)) + if (unlikely(offset > INT_MAX)) goto err_clear;
ptr = skb_header_pointer(skb, offset, len, to);
From: KP Singh kpsingh@kernel.org
commit dcf456c9a095a6e71f53d6f6f004133ee851ee70 upstream.
bpf_{sk,task,inode}_storage_free() do not need to use call_rcu_tasks_trace as no BPF program should be accessing the owner as it's being destroyed. The only other reader at this point is bpf_local_storage_map_free() which uses normal RCU.
The only path that needs trace RCU are:
* bpf_local_storage_{delete,update} helpers * map_{delete,update}_elem() syscalls
Fixes: 0fe4b381a59e ("bpf: Allow bpf_local_storage to be used by sleepable programs") Signed-off-by: KP Singh kpsingh@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Acked-by: Martin KaFai Lau kafai@fb.com Link: https://lore.kernel.org/bpf/20220418155158.2865678-1-kpsingh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf_local_storage.h | 4 ++-- kernel/bpf/bpf_inode_storage.c | 4 ++-- kernel/bpf/bpf_local_storage.c | 29 +++++++++++++++++++---------- kernel/bpf/bpf_task_storage.c | 4 ++-- net/core/bpf_sk_storage.c | 6 +++--- 5 files changed, 28 insertions(+), 19 deletions(-)
--- a/include/linux/bpf_local_storage.h +++ b/include/linux/bpf_local_storage.h @@ -143,9 +143,9 @@ void bpf_selem_link_storage_nolock(struc
bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, struct bpf_local_storage_elem *selem, - bool uncharge_omem); + bool uncharge_omem, bool use_trace_rcu);
-void bpf_selem_unlink(struct bpf_local_storage_elem *selem); +void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool use_trace_rcu);
void bpf_selem_link_map(struct bpf_local_storage_map *smap, struct bpf_local_storage_elem *selem); --- a/kernel/bpf/bpf_inode_storage.c +++ b/kernel/bpf/bpf_inode_storage.c @@ -90,7 +90,7 @@ void bpf_inode_storage_free(struct inode */ bpf_selem_unlink_map(selem); free_inode_storage = bpf_selem_unlink_storage_nolock( - local_storage, selem, false); + local_storage, selem, false, false); } raw_spin_unlock_bh(&local_storage->lock); rcu_read_unlock(); @@ -149,7 +149,7 @@ static int inode_storage_delete(struct i if (!sdata) return -ENOENT;
- bpf_selem_unlink(SELEM(sdata)); + bpf_selem_unlink(SELEM(sdata), true);
return 0; } --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -106,7 +106,7 @@ static void bpf_selem_free_rcu(struct rc */ bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, struct bpf_local_storage_elem *selem, - bool uncharge_mem) + bool uncharge_mem, bool use_trace_rcu) { struct bpf_local_storage_map *smap; bool free_local_storage; @@ -150,11 +150,16 @@ bool bpf_selem_unlink_storage_nolock(str SDATA(selem)) RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
- call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu); + if (use_trace_rcu) + call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu); + else + kfree_rcu(selem, rcu); + return free_local_storage; }
-static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem) +static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem, + bool use_trace_rcu) { struct bpf_local_storage *local_storage; bool free_local_storage = false; @@ -169,12 +174,16 @@ static void __bpf_selem_unlink_storage(s raw_spin_lock_irqsave(&local_storage->lock, flags); if (likely(selem_linked_to_storage(selem))) free_local_storage = bpf_selem_unlink_storage_nolock( - local_storage, selem, true); + local_storage, selem, true, use_trace_rcu); raw_spin_unlock_irqrestore(&local_storage->lock, flags);
- if (free_local_storage) - call_rcu_tasks_trace(&local_storage->rcu, + if (free_local_storage) { + if (use_trace_rcu) + call_rcu_tasks_trace(&local_storage->rcu, bpf_local_storage_free_rcu); + else + kfree_rcu(local_storage, rcu); + } }
void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, @@ -214,14 +223,14 @@ void bpf_selem_link_map(struct bpf_local raw_spin_unlock_irqrestore(&b->lock, flags); }
-void bpf_selem_unlink(struct bpf_local_storage_elem *selem) +void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool use_trace_rcu) { /* Always unlink from map before unlinking from local_storage * because selem will be freed after successfully unlinked from * the local_storage. */ bpf_selem_unlink_map(selem); - __bpf_selem_unlink_storage(selem); + __bpf_selem_unlink_storage(selem, use_trace_rcu); }
struct bpf_local_storage_data * @@ -454,7 +463,7 @@ bpf_local_storage_update(void *owner, st if (old_sdata) { bpf_selem_unlink_map(SELEM(old_sdata)); bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata), - false); + false, true); }
unlock: @@ -532,7 +541,7 @@ void bpf_local_storage_map_free(struct b migrate_disable(); __this_cpu_inc(*busy_counter); } - bpf_selem_unlink(selem); + bpf_selem_unlink(selem, false); if (busy_counter) { __this_cpu_dec(*busy_counter); migrate_enable(); --- a/kernel/bpf/bpf_task_storage.c +++ b/kernel/bpf/bpf_task_storage.c @@ -102,7 +102,7 @@ void bpf_task_storage_free(struct task_s */ bpf_selem_unlink_map(selem); free_task_storage = bpf_selem_unlink_storage_nolock( - local_storage, selem, false); + local_storage, selem, false, false); } raw_spin_unlock_irqrestore(&local_storage->lock, flags); bpf_task_storage_unlock(); @@ -191,7 +191,7 @@ static int task_storage_delete(struct ta if (!sdata) return -ENOENT;
- bpf_selem_unlink(SELEM(sdata)); + bpf_selem_unlink(SELEM(sdata), true);
return 0; } --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -40,7 +40,7 @@ static int bpf_sk_storage_del(struct soc if (!sdata) return -ENOENT;
- bpf_selem_unlink(SELEM(sdata)); + bpf_selem_unlink(SELEM(sdata), true);
return 0; } @@ -75,8 +75,8 @@ void bpf_sk_storage_free(struct sock *sk * sk_storage. */ bpf_selem_unlink_map(selem); - free_sk_storage = bpf_selem_unlink_storage_nolock(sk_storage, - selem, true); + free_sk_storage = bpf_selem_unlink_storage_nolock( + sk_storage, selem, true, false); } raw_spin_unlock_bh(&sk_storage->lock); rcu_read_unlock();
From: Yuntao Wang ytcoode@gmail.com
commit b45043192b3e481304062938a6561da2ceea46a6 upstream.
The 'n_buckets * (value_size + sizeof(struct stack_map_bucket))' part of the allocated memory for 'smap' is never used after the memlock accounting was removed, thus get rid of it.
[ Note, Daniel:
Commit b936ca643ade ("bpf: rework memlock-based memory accounting for maps") moved `cost += n_buckets * (value_size + sizeof(struct stack_map_bucket))` up and therefore before the bpf_map_area_alloc() allocation, sigh. In a later step commit c85d69135a91 ("bpf: move memory size checks to bpf_map_charge_init()"), and the overflow checks of `cost >= U32_MAX - PAGE_SIZE` moved into bpf_map_charge_init(). And then 370868107bf6 ("bpf: Eliminate rlimit-based memory accounting for stackmap maps") finally removed the bpf_map_charge_init(). Anyway, the original code did the allocation same way as /after/ this fix. ]
Fixes: b936ca643ade ("bpf: rework memlock-based memory accounting for maps") Signed-off-by: Yuntao Wang ytcoode@gmail.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20220407130423.798386-1-ytcoode@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/stackmap.c | 1 - 1 file changed, 1 deletion(-)
--- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -100,7 +100,6 @@ static struct bpf_map *stack_map_alloc(u return ERR_PTR(-E2BIG);
cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap); - cost += n_buckets * (value_size + sizeof(struct stack_map_bucket)); smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr)); if (!smap) return ERR_PTR(-ENOMEM);
From: Kumar Kartikeya Dwivedi memxor@gmail.com
commit 7b3552d3f9f6897851fc453b5131a967167e43c2 upstream.
It is not permitted to write to PTR_TO_MAP_KEY, but the current code in check_helper_mem_access would allow for it, reject this case as well, as helpers taking ARG_PTR_TO_UNINIT_MEM also take PTR_TO_MAP_KEY.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Link: https://lore.kernel.org/r/20220319080827.73251-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4832,6 +4832,11 @@ static int check_helper_mem_access(struc return check_packet_access(env, regno, reg->off, access_size, zero_size_allowed); case PTR_TO_MAP_KEY: + if (meta && meta->raw_mode) { + verbose(env, "R%d cannot write into %s\n", regno, + reg_type_str(env, reg->type)); + return -EACCES; + } return check_mem_region_access(env, regno, reg->off, access_size, reg->map_ptr->key_size, false); case PTR_TO_MAP_VALUE:
From: Kumar Kartikeya Dwivedi memxor@gmail.com
commit 97e6d7dab1ca4648821c790a2b7913d6d5d549db upstream.
The commit being fixed was aiming to disallow users from incorrectly obtaining writable pointer to memory that is only meant to be read. This is enforced now using a MEM_RDONLY flag.
For instance, in case of global percpu variables, when the BTF type is not struct (e.g. bpf_prog_active), the verifier marks register type as PTR_TO_MEM | MEM_RDONLY from bpf_this_cpu_ptr or bpf_per_cpu_ptr helpers. However, when passing such pointer to kfunc, global funcs, or BPF helpers, in check_helper_mem_access, there is no expectation MEM_RDONLY flag will be set, hence it is checked as pointer to writable memory. Later, verifier sets up argument type of global func as PTR_TO_MEM | PTR_MAYBE_NULL, so user can use a global func to get around the limitations imposed by this flag.
This check will also cover global non-percpu variables that may be introduced in kernel BTF in future.
Also, we update the log message for PTR_TO_BUF case to be similar to PTR_TO_MEM case, so that the reason for error is clear to user.
Fixes: 34d3a78c681e ("bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.") Reviewed-by: Hao Luo haoluo@google.com Signed-off-by: Kumar Kartikeya Dwivedi memxor@gmail.com Link: https://lore.kernel.org/r/20220319080827.73251-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4847,13 +4847,23 @@ static int check_helper_mem_access(struc return check_map_access(env, regno, reg->off, access_size, zero_size_allowed); case PTR_TO_MEM: + if (type_is_rdonly_mem(reg->type)) { + if (meta && meta->raw_mode) { + verbose(env, "R%d cannot write into %s\n", regno, + reg_type_str(env, reg->type)); + return -EACCES; + } + } return check_mem_region_access(env, regno, reg->off, access_size, reg->mem_size, zero_size_allowed); case PTR_TO_BUF: if (type_is_rdonly_mem(reg->type)) { - if (meta && meta->raw_mode) + if (meta && meta->raw_mode) { + verbose(env, "R%d cannot write into %s\n", regno, + reg_type_str(env, reg->type)); return -EACCES; + }
buf_info = "rdonly"; max_access = &env->prog->aux->max_rdonly_access;
On Fri, 3 Jun 2022 19:42:44 +0200, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.17.13 release. There are 75 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.13-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
5.17.13-rc1 Successfully Compiled and booted on my Raspberry PI 4b (8g) (bcm2711)
Tested-by: Fox Chen foxhlchen@gmail.com
On Fri, Jun 03, 2022 at 07:42:44PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.13 release. There are 75 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.13-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, armv7, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
Hi Greg,
On Fri, Jun 03, 2022 at 07:42:44PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.13 release. There are 75 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
Build test: mips (gcc version 11.3.1 20220531): 60 configs -> no failure arm (gcc version 11.3.1 20220531): 99 configs -> no failure arm64 (gcc version 11.3.1 20220531): 3 configs -> no failure x86_64 (gcc version 11.3.1 20220531): 4 configs -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1] arm64: Booted on rpi4b (4GB model). No regression. [2] mips: Booted on ci20 board. No regression. [3]
[1]. https://openqa.qa.codethink.co.uk/tests/1263 [2]. https://openqa.qa.codethink.co.uk/tests/1272 [3]. https://openqa.qa.codethink.co.uk/tests/1270
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
On Fri, 3 Jun 2022 at 23:23, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.17.13 release. There are 75 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.13-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.17.13-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.17.y * git commit: 30200667e823559c51baeb2bd95c14b144cd8e5c * git describe: v5.17.11-188-g30200667e823 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.17.y/build/v5.17....
## Test Regressions (compared to v5.17.11-112-g118948632858) No test regressions found.
## Metric Regressions (compared to v5.17.11-112-g118948632858) No metric regressions found.
## Test Fixes (compared to v5.17.11-112-g118948632858) No test fixes found.
## Metric Fixes (compared to v5.17.11-112-g118948632858) No metric fixes found.
## Test result summary total: 135943, pass: 122628, fail: 435, skip: 12050, xfail: 830
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 314 total, 314 passed, 0 failed * arm64: 58 total, 58 passed, 0 failed * i386: 52 total, 49 passed, 3 failed * mips: 37 total, 37 passed, 0 failed * parisc: 12 total, 12 passed, 0 failed * powerpc: 54 total, 54 passed, 0 failed * riscv: 22 total, 22 passed, 0 failed * s390: 21 total, 21 passed, 0 failed * sh: 24 total, 24 passed, 0 failed * sparc: 12 total, 12 passed, 0 failed * x86_64: 56 total, 55 passed, 1 failed
## Test suites summary * fwts * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-cap_bounds-tests * ltp-commands * ltp-commands-tests * ltp-containers * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests * ltp-fcntl-locktests-tests * ltp-filecaps * ltp-filecaps-tests * ltp-fs * ltp-fs-tests * ltp-fs_bind * ltp-fs_bind-tests * ltp-fs_perms_simple * ltp-fs_perms_simple-tests * ltp-fsx * ltp-fsx-tests * ltp-hugetlb * ltp-hugetlb-tests * ltp-io * ltp-io-tests * ltp-ipc * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty * ltp-pty-tests * ltp-sched-tests * ltp-securebits * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * network-basic-tests * packetdrill * perf * rcutorture * ssuite * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
On Fri, Jun 03, 2022 at 07:42:44PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.13 release. There are 75 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
Build results: total: 156 pass: 156 fail: 0 Qemu test results: total: 489 pass: 489 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On 6/3/22 10:42 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.17.13 release. There are 75 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.17.13-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.17.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
linux-stable-mirror@lists.linaro.org