On 6/1/2016 10:00 PM, Minchan Kim wrote:
On Wed, Jun 01, 2016 at 01:40:48PM -0700, Shi, Yang wrote:
On 5/29/2016 11:11 PM, Minchan Kim wrote:
On Fri, May 27, 2016 at 11:16:41AM -0700, Shi, Yang wrote:
<snip>
If we goes this way, how to guarantee this race?
Thanks for pointing out this. It sounds reasonable. However, this should be only possible to happen on 32 bit since just 32 bit version page_is_idle() calls lookup_page_ext(), it doesn't do it on 64 bit.
And, such race condition should exist regardless of whether DEBUG_VM is enabled or not, right?
rcu might be good enough to protect it.
A quick fix may look like:
diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 8f5d4ad..bf0cd6a 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -77,8 +77,12 @@ static inline bool test_and_clear_page_young(struct page *page) static inline bool page_is_idle(struct page *page) { struct page_ext *page_ext;
rcu_read_lock(); page_ext = lookup_page_ext(page);
rcu_read_unlock();
- if (unlikely(!page_ext)) return false;
diff --git a/mm/page_ext.c b/mm/page_ext.c index 56b160f..94927c9 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -183,7 +183,6 @@ struct page_ext *lookup_page_ext(struct page *page) { unsigned long pfn = page_to_pfn(page); struct mem_section *section = __pfn_to_section(pfn); -#if defined(CONFIG_DEBUG_VM) || defined(CONFIG_PAGE_POISONING) /* * The sanity checks the page allocator does upon freeing a * page can reach here before the page_ext arrays are @@ -195,7 +194,7 @@ struct page_ext *lookup_page_ext(struct page *page) */ if (!section->page_ext) return NULL; -#endif
return section->page_ext + pfn;
}
@@ -279,7 +278,8 @@ static void __free_page_ext(unsigned long pfn) return; base = ms->page_ext + pfn; free_page_ext(base);
ms->page_ext = NULL;
rcu_assign_pointer(ms->page_ext, NULL);
synchronize_rcu();
How does it fix the problem? I cannot understand your point.
Assigning NULL pointer to page_Ext will be blocked until rcu_read_lock critical section is done, so the lookup and writing operations will be serialized. And, rcu_read_lock disables preempt too.
I meant your rcu_read_lock in page_idle should cover test_bit op.
Yes, definitely. Thanks for catching it.
One more thing, you should use rcu_dereference.
I will check which one is the best since I saw some use rcu_assign_pointer.
As well, please cover memory onlining case I mentioned in another thread as well as memory offlining.
I will look into it too.
Thanks, Yang
Anyway, to me, every caller of page_ext should prepare lookup_page_ext can return NULL anytime and they should use rcu_read_[un]lock, which is not good. :(