Driver API devm_krealloc() calls alloc_dr() with wrong argument
@total_new_size, so causes more memory to be allocated than required
fix this memory waste by using @new_size as the argument for alloc_dr().
Fixes: f82485722e5d ("devres: provide devm_krealloc()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
Previous discussion link:
https://lore.kernel.org/all/1718531655-29761-1-git-send-email-quic_zijuhu@q…
Changes since the original one:
- Correct tile and commit message
- Add inline comments and stable tag
drivers/base/devres.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index 3df0025d12aa..ff2247eec43c 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -896,9 +896,12 @@ void *devm_krealloc(struct device *dev, void *ptr, size_t new_size, gfp_t gfp)
/*
* Otherwise: allocate new, larger chunk. We need to allocate before
* taking the lock as most probably the caller uses GFP_KERNEL.
+ * alloc_dr() will call check_dr_size() to reserve extra memory
+ * for struct devres automatically, so size @new_size user request
+ * is delivered to it directly as devm_kmalloc() does.
*/
new_dr = alloc_dr(devm_kmalloc_release,
- total_new_size, gfp, dev_to_node(dev));
+ new_size, gfp, dev_to_node(dev));
if (!new_dr)
return NULL;
--
2.34.1
In the future, please send this to the regressions M/L and CC people
instead of just sending a private message.
For now, I've added the @regressions and @stable mailing lists as this
is an issue you find exposed specifically in the LTS series.
Hi Lars,
Can you please test 6.9.7? If this is still failing, can you please
check 6.10-rc6?
I'd like to understand if we just have a missing commit to backport or
it's a problem in the mainline kernel as well.
From the below description it's specifically with boost in passive
mode, right?
If 6.10-rc6 is still affected, can you please see if this commit helps?
https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/commit/?h…
This is going into 6.11-rc1.
Perry, Jassmine,
Can you try to repro this using bleeding-edge or linux-next branches?
Thanks,
On 7/1/2024 4:33, Huang, Ray wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Hi all,
>
> Could you please help for a quick fix?
>
> -----Original Message-----
> From: Lars Wendler <wendler.lars(a)web.de>
> Sent: Monday, July 1, 2024 5:30 PM
> To: Huang, Ray <Ray.Huang(a)amd.com>
> Cc: gregkh(a)linuxfoundation.org
> Subject: linux-6.6.y: Regression in amd-pstate cpufreq driver since 6.6.34
>
> Hello dear kernel developers,
>
> I might have found a regression in the amd-pstate driver of linux-6.6 stable series. I haven't checked linux-master nor any other LTS branch.
>
>
> Now here's what I have found:
>
> Since linux-6.6.34 the following command fails:
>
> # echo 0 > /sys/devices/system/cpu/cpufreq/boost
> -bash: echo: write error: Invalid argument
>
> and indeed, disabling CPU boost seems to not work:
>
> # cat /sys/devices/system/cpu/cpufreq/boost
> 1
>
> I have bisected the issue to commit
> 8f893e52b9e030a25ea62e31271bf930b01f2f07:
>
> cpufreq: amd-pstate: Fix the inconsistency in max frequency units
>
> commit e4731baaf29438508197d3a8a6d4f5a8c51663f8 upstream.
>
> Reverting that commit (even on latest linux-6.6 release) gives me back the ability to disable CPU boost again.
>
> I can only reproduce this bug on my Zen4 machine:
>
> # lscpu | grep "^Model name:" | sed 's@[[:space:]][[:space:]]\+@ @'
> Model name: AMD Ryzen 7 7745HX with Radeon Graphics
>
> My older Zen3 machines seem not to be affected by this issue. All my Ryzen systems run on latest linux-6.6 kernels and have the following configuration regarding amd-pstate:
>
> # zgrep -F AMD_PSTATE /proc/config.gz
> CONFIG_X86_AMD_PSTATE=y
> CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=2
> # CONFIG_X86_AMD_PSTATE_UT is not set
>
>
> If you need more information, please don't hesitate to ask.
>
> Kind regards
> Lars Wendler
Hi stable team,
Could you please backport [1] to linux-5.10.y?
I noticed a regression caused by [2], which was merged to linux-5.10.y since v5.10.80.
After sock_map_unhash() helper was removed in [2], sock elems added to the bpf sock map
via sock_hash_update_common() cannot be removed if they are in the icsk_accept_queue
of the listener sock. Since they have not been accept()ed, they cannot be removed via
sock_map_close()->sock_map_remove_links() either.
It can be reproduced in network test with short-lived connections. If the server is
stopped during the test, there is a probability that some sock elems will remain in
the bpf sock map.
And with [1], the sock_map_destroy() helper is introduced to invoke sock_map_remove_links()
when inet_csk_listen_stop()->inet_child_forget()->inet_csk_destroy_sock(), to remove the
sock elems from the bpf sock map in such situation.
[1] d8616ee2affc ("bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues")
(link: https://lore.kernel.org/all/20220524075311.649153-1-wangyufen@huawei.com/)
[2] 8b5c98a67c1b ("bpf, sockmap: Remove unhash handler for BPF sockmap usage")
(link: https://lore.kernel.org/all/20211103204736.248403-3-john.fastabend@gmail.co…)
Thanks!
Wen Gu
From: Miaohe Lin <linmiaohe(a)huawei.com>
commit 35e351780fa9d8240dd6f7e4f245f9ea37e96c19 upstream.
Thorvald reported a WARNING [1]. And the root cause is below race:
CPU 1 CPU 2
fork hugetlbfs_fallocate
dup_mmap hugetlbfs_punch_hole
i_mmap_lock_write(mapping);
vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
i_mmap_lock_write(mapping);
hugetlb_vmdelete_list
vma_interval_tree_foreach
hugetlb_vma_trylock_write -- Vma_lock is cleared.
tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
by deferring linking file vma until vma is fully initialized. Those vmas
should be initialized first before they can be used.
Backport notes:
The first backport attempt (cec11fa2e) was reverted (dd782da4707). This is
the new backport of the original fix (35e351780fa9).
35e351780f ("fork: defer linking file vma until vma is fully initialized")
fixed a hugetlb locking race by moving a bunch of intialization code to earlier
in the function. The call to open() was included in the move but the call to
copy_page_range was not, effectively inverting their relative ordering. This
created an issue for the vfio code which assumes copy_page_range happens before
the call to open() - vfio's open zaps the vma so that the fault handler is
invoked later, but when we inverted the ordering, copy_page_range can set up
mappings post-zap which would prevent the fault handler from being invoked
later. This patch moves the call to copy_page_range to earlier than the call to
open() to restore the original ordering of the two functions while keeping the
fix for hugetlb intact.
Commit aac6db75a9 made several changes to vfio_pci_core.c, including
removing the vfio-pci custom open function. This resolves the issue on
the main branch and so we only need to apply these changes when
backporting to stable branches.
35e351780f ("fork: defer linking file vma until vma is fully initialized")-> v6.9-rc5
aac6db75a9 ("vfio/pci: Use unmap_mapping_range()") -> v6.10-rc4
Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com
Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reported-by: Thorvald Natvig <thorvald(a)google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
Reviewed-by: Jane Chu <jane.chu(a)oracle.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Leah Rumancik <leah.rumancik(a)gmail.com>
---
kernel/fork.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 177ce7438db6..122d2cd124d5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -727,6 +727,19 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
+ /*
+ * Copy/update hugetlb private vma information.
+ */
+ if (is_vm_hugetlb_page(tmp))
+ hugetlb_dup_vma_private(tmp);
+
+ if (!(tmp->vm_flags & VM_WIPEONFORK) &&
+ copy_page_range(tmp, mpnt))
+ goto fail_nomem_vmi_store;
+
+ if (tmp->vm_ops && tmp->vm_ops->open)
+ tmp->vm_ops->open(tmp);
+
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
@@ -743,25 +756,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
i_mmap_unlock_write(mapping);
}
- /*
- * Copy/update hugetlb private vma information.
- */
- if (is_vm_hugetlb_page(tmp))
- hugetlb_dup_vma_private(tmp);
-
/* Link the vma into the MT */
if (vma_iter_bulk_store(&vmi, tmp))
goto fail_nomem_vmi_store;
mm->map_count++;
- if (!(tmp->vm_flags & VM_WIPEONFORK))
- retval = copy_page_range(tmp, mpnt);
-
- if (tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
-
- if (retval)
- goto loop_out;
}
/* a new mm has just been created */
retval = arch_dup_mmap(oldmm, mm);
--
2.45.2.803.g4e1b14247a-goog
In case of the COW file, new updates and GC writes are already
separated to page caches of the atomic file and COW file. As some cases
that use the meta inode for GC, there are some race issues between a
foreground thread and GC thread.
To handle them, we need to take care when to invalidate and wait
writeback of GC pages in COW files as the case of using the meta inode.
Also, a pointer from the COW inode to the original inode is required to
check the state of original pages.
For the former, we can solve the problem by using the meta inode for GC
of COW files. Then let's get a page from the original inode in
move_data_block when GCing the COW file to avoid race condition.
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil(a)samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
---
fs/f2fs/data.c | 2 +-
fs/f2fs/f2fs.h | 7 ++++++-
fs/f2fs/file.c | 3 +++
fs/f2fs/gc.c | 12 ++++++++++--
fs/f2fs/inline.c | 2 +-
fs/f2fs/inode.c | 3 ++-
6 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 05158f89ef32..90ff0f6f7f7f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2651,7 +2651,7 @@ bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio)
return true;
if (IS_NOQUOTA(inode))
return true;
- if (f2fs_is_atomic_file(inode))
+ if (f2fs_used_in_atomic_write(inode))
return true;
/* rewrite low ratio compress data w/ OPU mode to avoid fragmentation */
if (f2fs_compressed_file(inode) &&
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 59c5117e54b1..4f9fd1c1d024 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4267,9 +4267,14 @@ static inline bool f2fs_post_read_required(struct inode *inode)
f2fs_compressed_file(inode);
}
+static inline bool f2fs_used_in_atomic_write(struct inode *inode)
+{
+ return f2fs_is_atomic_file(inode) || f2fs_is_cow_file(inode);
+}
+
static inline bool f2fs_meta_inode_gc_required(struct inode *inode)
{
- return f2fs_post_read_required(inode) || f2fs_is_atomic_file(inode);
+ return f2fs_post_read_required(inode) || f2fs_used_in_atomic_write(inode);
}
/*
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 25b119cf3499..c9f0ba658cfd 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2116,6 +2116,9 @@ static int f2fs_ioc_start_atomic_write(struct file *filp, bool truncate)
set_inode_flag(fi->cow_inode, FI_COW_FILE);
clear_inode_flag(fi->cow_inode, FI_INLINE_DATA);
+
+ /* Set the COW inode's cow_inode to the atomic inode */
+ F2FS_I(fi->cow_inode)->cow_inode = inode;
} else {
/* Reuse the already created COW inode */
ret = f2fs_do_truncate_blocks(fi->cow_inode, 0, true);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 136b9e8180a3..76854e732b35 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1188,7 +1188,11 @@ static int ra_data_block(struct inode *inode, pgoff_t index)
};
int err;
- page = f2fs_grab_cache_page(mapping, index, true);
+ if (f2fs_is_cow_file(inode))
+ page = f2fs_grab_cache_page(F2FS_I(inode)->cow_inode->i_mapping,
+ index, true);
+ else
+ page = f2fs_grab_cache_page(mapping, index, true);
if (!page)
return -ENOMEM;
@@ -1287,7 +1291,11 @@ static int move_data_block(struct inode *inode, block_t bidx,
CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
/* do not read out */
- page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
+ if (f2fs_is_cow_file(inode))
+ page = f2fs_grab_cache_page(F2FS_I(inode)->cow_inode->i_mapping,
+ bidx, false);
+ else
+ page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
if (!page)
return -ENOMEM;
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index ac00423f117b..0186ec049db6 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -16,7 +16,7 @@
static bool support_inline_data(struct inode *inode)
{
- if (f2fs_is_atomic_file(inode))
+ if (f2fs_used_in_atomic_write(inode))
return false;
if (!S_ISREG(inode->i_mode) && !S_ISLNK(inode->i_mode))
return false;
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index c26effdce9aa..c810304e2681 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -807,8 +807,9 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_abort_atomic_write(inode, true);
- if (fi->cow_inode) {
+ if (fi->cow_inode && f2fs_is_cow_file(fi->cow_inode)) {
clear_inode_flag(fi->cow_inode, FI_COW_FILE);
+ F2FS_I(fi->cow_inode)->cow_inode = NULL;
iput(fi->cow_inode);
fi->cow_inode = NULL;
}
--
2.25.1