[PATCH 6.6 038/121] mm: page_alloc: control latency caused by zone PCP draining

7 Aug 2024

6.6-stable review patch.  If anyone has any objections, please let me know.
------------------
From: Lucas Stach l.stach@pengutronix.de
[ Upstream commit 55f77df7d715110299f12c27f4365bd6332d1adb ]
Patch series "mm/treewide: Remove pXd_huge() API", v2.
In previous work [1], we removed the pXd_large() API, which is arch
specific.  This patchset further removes the hugetlb pXd_huge() API.
Hugetlb was never special on creating huge mappings when compared with
other huge mappings.  Having a standalone API just to detect such pgtable
entries is more or less redundant, especially after the pXd_leaf() API set
is introduced with/without CONFIG_HUGETLB_PAGE.
When looking at this problem, a few issues are also exposed that we don't
have a clear definition of the *_huge() variance API.  This patchset
started by cleaning these issues first, then replace all *_huge() users to
use *_leaf(), then drop all *_huge() code.
On x86/sparc, swap entries will be reported "true" in pXd_huge(), while
for all the rest archs they're reported "false" instead.  This part is
done in patch 1-5, in which I suspect patch 1 can be seen as a bug fix,
but I'll leave that to hmm experts to decide.
Besides, there are three archs (arm, arm64, powerpc) that have slightly
different definitions between the *_huge() v.s.  *_leaf() variances.  I
tackled them separately so that it'll be easier for arch experts to chim
in when necessary.  This part is done in patch 6-9.
The final patches 10-14 do the rest on the final removal, since *_leaf()
will be the ultimate API in the future, and we seem to have quite some
confusions on how *_huge() APIs can be defined, provide a rich comment for
*_leaf() API set to define them properly to avoid future misuse, and
hopefully that'll also help new archs to start support huge mappings and
avoid traps (like either swap entries, or PROT_NONE entry checks).
[1] https://lore.kernel.org/r/20240305043750.93762-1-peterx@redhat.com
This patch (of 14):
When the complete PCP is drained a much larger number of pages than the
usual batch size might be freed at once, causing large IRQ and preemption
latency spikes, as they are all freed while holding the pcp and zone
spinlocks.
To avoid those latency spikes, limit the number of pages freed in a single
bulk operation to common batch limits.
Link: https://lkml.kernel.org/r/20240318200404.448346-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20240318200736.2835502-1-l.stach@pengutronix.de
Signed-off-by: Lucas Stach l.stach@pengutronix.de
Signed-off-by: Peter Xu peterx@redhat.com
Cc: Christophe Leroy christophe.leroy@csgroup.eu
Cc: Jason Gunthorpe jgg@nvidia.com
Cc: "Matthew Wilcox (Oracle)" willy@infradead.org
Cc: Mike Rapoport (IBM) rppt@kernel.org
Cc: Muchun Song muchun.song@linux.dev
Cc: Alistair Popple apopple@nvidia.com
Cc: Andreas Larsson andreas@gaisler.com
Cc: "Aneesh Kumar K.V" aneesh.kumar@kernel.org
Cc: Arnd Bergmann arnd@arndb.de
Cc: Bjorn Andersson andersson@kernel.org
Cc: Borislav Petkov bp@alien8.de
Cc: Catalin Marinas catalin.marinas@arm.com
Cc: Dave Hansen dave.hansen@linux.intel.com
Cc: David S. Miller davem@davemloft.net
Cc: Fabio Estevam festevam@denx.de
Cc: Ingo Molnar mingo@redhat.com
Cc: Konrad Dybcio konrad.dybcio@linaro.org
Cc: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
Cc: Mark Salter msalter@redhat.com
Cc: Michael Ellerman mpe@ellerman.id.au
Cc: Naoya Horiguchi nao.horiguchi@gmail.com
Cc: "Naveen N. Rao" naveen.n.rao@linux.ibm.com
Cc: Nicholas Piggin npiggin@gmail.com
Cc: Russell King linux@armlinux.org.uk
Cc: Shawn Guo shawnguo@kernel.org
Cc: Thomas Gleixner tglx@linutronix.de
Cc: Will Deacon will@kernel.org
Signed-off-by: Andrew Morton akpm@linux-foundation.org
Stable-dep-of: 66eca1021a42 ("mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()")
Signed-off-by: Sasha Levin sashal@kernel.org
---
 mm/page_alloc.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d3a2c4d3dc3eb..2c40cf4f1eb2d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2185,12 +2185,15 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
  */
 static void drain_pages_zone(unsigned int cpu, struct zone *zone)
 {
-	struct per_cpu_pages *pcp;
+	struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
+	int count = READ_ONCE(pcp->count);
+
+	while (count) {
+		int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
+		count -= to_drain;
-	pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
-	if (pcp->count) {
    	spin_lock(&pcp->lock);
-		free_pcppages_bulk(zone, pcp->count, pcp, 0);
+		free_pcppages_bulk(zone, to_drain, pcp, 0);
    	spin_unlock(&pcp->lock);
    }
 }
-- 
2.43.0





    

2025

2024

2023

2022

2021

2020

2019

2018

2017

[PATCH 6.6 038/121] mm: page_alloc: control latency caused by zone PCP draining