On Thu, Nov 24, 2022 at 01:08:57AM +0000, Dominic Jones wrote:
On Fri, Oct 28, 2022 at 02:51:43PM +0000, Dominic Jones wrote:
Updating the machine's kernel from v5.19.x to v6.0.x causes the machine to not successfully boot. The machine boots successfully (and exhibits stable operation) with version v5.19.17 and multiple earlier releases in the 5.19 line. Multiple releases from the 6.0 line (including 6.0.0, 6.0.3, and 6.0.5), with no other changes to the software environment, do not boot. Instead, the machine hangs after loading services but before presenting a display manager; the machine instead shows repetitive hard drive activity at this point and then no apparent activity.
''uname'' output for the machine successfully running v5.19.17 is:
Linux [MACHINE_NAME] 5.19.17 #1 SMP PREEMPT_DYNAMIC Mon Oct 24 13:32:29 2022 i686 Intel(R) Atom(TM) CPU N270 @ 1.60GHz GenuineIntel GNU/Linux
The machine is an OCZ Neutrino netbook, running a custom OS build largely similar to LFS development. The kernel update uses ''make olddefconfig''.
Can you use 'git bisect' to find the offending change that causes this to happen?
Bisection is complete. Here's what it returned.
3a194f3f8ad01bce00bd7174aaba1563bcc827eb is the first bad commit commit 3a194f3f8ad01bce00bd7174aaba1563bcc827eb Author: Naoya Horiguchi naoya.horiguchi@nec.com Date: Thu Jul 14 13:24:14 2022 +0900
mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
follow_pud_mask() does not support non-present pud entry now. As long as I tested on x86_64 server, follow_pud_mask() still simply returns no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe user-visible effect should happen. But generally we should call follow_huge_pud() for non-present pud entry for 1GB hugetlb page. Update pud_huge() and follow_huge_pud() to handle non-present pud entries. The changes are similar to previous works for pud entries commit e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()") and commit cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present hugepage"). Link: https://lkml.kernel.org/r/20220714042420.1847125-3-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi naoya.horiguchi@nec.com Reviewed-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: David Hildenbrand david@redhat.com Cc: kernel test robot lkp@intel.com Cc: Liu Shixin liushixin2@huawei.com Cc: Muchun Song songmuchun@bytedance.com Cc: Oscar Salvador osalvador@suse.de Cc: Yang Shi shy828301@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org
arch/x86/mm/hugetlbpage.c | 8 +++++++- mm/hugetlb.c | 32 ++++++++++++++++++++++++++++++-- 2 files changed, 37 insertions(+), 3 deletions(-)
I got two replies here, so I'm responding to both for visibility.
From Greg K H:
Great! Please work with those developers to figure out why this is causing a problem for your system.
From Thorsten L:
Many thx for this. A fix for that particular commit for recently committed to 6.0.y: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=l...
That thus bears the question: does your problem still happen with the latest 6.0.y version?
Version 6.0.9 appears to fix the issue, with no regression as of 6.0.10. (The issue appeared in 6.0.7. I didn't test 6.0.8 since 6.0.9 had already appeared by the time bisection was complete.)
Thanks!
Dominic Jones jonesd@xmission.com