On Fri, Dec 09, 2022 at 09:19:37AM +0000, Akira Naribayashi (Fujitsu) wrote:
On Wed, 23 Nov 2022 10:26:05 +0000, Mei Gorman wrote:
On Wed, Nov 09, 2022 at 05:41:12AM +0000, Akira Naribayashi (Fujitsu) wrote:
On Mon, 7 Nov 2022 15:43:56 +0000, Mei Gorman wrote:
On Mon, Nov 07, 2022 at 12:32:34PM +0000, Akira Naribayashi (Fujitsu) wrote:
Under what circumstances will this panic occur? I assume those circumstnces are pretty rare, give that 6e2b7044c1992 was nearly two years ago.
Did you consider the desirability of backporting this fix into earlier kernels?
Panic can occur on systems with multiple zones in a single pageblock.
Please provide an example of the panic and the zoneinfo.
This issue is occurring in our customer's environment and cannot be shared publicly as it contains customer information. Also, the panic is occurring with the kernel in RHEL and may not panic with Upstream's community kernel. In other words, it is possible to panic on older kernels. I think this fix should be backported to stable kernel series.
The reason it is rare is that it only happens in special configurations.
How is this special configuration created?
This is the case when the node boundary is not aligned to pageblock boundary.
In that case, does this work to avoid rescanning an area that was already isolated?
In the case of your patch, I think I need to clamp the isolated_end as well. Because sometimes isolated_end < start_pfn(value before entering Scan after) < end_pfn.
After re-reading the source, I think the problem is that min_pfn and low_pfn can be out of range in fast_isolate_freepages. How about the following patch?
Ok, makes sense and it is a condition that could happen because of pageblock alignment.