On Thu, Aug 17, 2023 at 12:01:26PM -0700, Nhat Pham wrote:
static bool lru_gen_test_recent(void *shadow, bool file, struct lruvec **lruvec, unsigned long *token, bool *workingset) {
- int memcg_id; unsigned long min_seq; struct mem_cgroup *memcg; struct pglist_data *pgdat;
- unpack_shadow(shadow, &memcg_id, &pgdat, token, workingset);
- unpack_shadow(shadow, &memcg, &pgdat, token, workingset);
- if (!mem_cgroup_disabled() && !memcg)
return false;
- memcg = mem_cgroup_from_id(memcg_id); *lruvec = mem_cgroup_lruvec(memcg, pgdat);
- mem_cgroup_put(memcg);
min_seq = READ_ONCE((*lruvec)->lrugen.min_seq[file]); return (*token >> LRU_REFS_WIDTH) == (min_seq & (EVICTION_MASK >> LRU_REFS_WIDTH));
This isn't quite right. lruvec's lifetime is bound to the memcg, so the put has to happen after the code is done accessing it.
lru_gen_refault() is still using the eviction lruvec after the recency test - but only if eviction_lruvec == refault_lruvec. This gives me pause, because they're frequently different. This is a common setup:
root - slice - service 1 `- service 2
where slice has the limit and will be the cause of evictions, whereas the service groups have the actual pages and see the refaults.
workingset_eviction() and workingset_refault() will do recency evaluation against slice, and refault accounting against service X.
MGLRU will use service X to determine recency, which 1) will not properly activate when one service is thrashing while the other is dominating the memory allowance of slice, and 2) will not detect refaults of pages shared between the services.
Maybe it's time to fix this first and bring the two codebases in unison. Then the recency test and eviction group lifetime can be encapsulated in test_recent(), and _refault() can just use folio_memcg() to do the activation and accounting against.