Skip to content

Commit 81a19b7

Browse files
ryncsn1Naim
authored andcommitted
mm/mglru: simplify and improve dirty writeback handling
Right now the flusher wakeup mechanism for MGLRU is less responsive and unlikely to trigger compared to classical LRU. The classical LRU wakes the flusher if one batch of folios passed to shrink_folio_list is unevictable due to under writeback. MGLRU instead check and handle this after the whole reclaim loop is done. We previously even saw OOM problems due to passive flusher, which were fixed but still not perfect [1]. We have just unified the dirty folio counting and activation routine, now just move the dirty flush into the loop right after shrink_folio_list. This improves the performance a lot for workloads involving heavy writeback and prepares for throttling too. Test with YCSB workloadb showed a major performance improvement: Before this series: Throughput(ops/sec): 62485.02962831822 AverageLatency(us): 500.9746963330107 pgpgin 159347462 workingset_refault_file 34522071 After this commit: Throughput(ops/sec): 80857.08510208207 AverageLatency(us): 386.653262968934 pgpgin 112233121 workingset_refault_file 19516246 The performance is a lot better with significantly lower refault. We also observed similar or higher performance gain for other real-world workloads. We were concerned that the dirty flush could cause more wear for SSD: that should not be the problem here, since the wakeup condition is when the dirty folios have been pushed to the tail of LRU, which indicates that memory pressure is so high that writeback is blocking the workload already. Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Link: https://lore.kernel.org/linux-mm/20241026115714.1437435-1-jingxiangzeng.cas@gmail.com/ [1] Signed-off-by: Kairui Song <kasong@tencent.com>
1 parent 3465074 commit 81a19b7

1 file changed

Lines changed: 16 additions & 25 deletions

File tree

mm/vmscan.c

Lines changed: 16 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -4588,8 +4588,6 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
45884588
trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
45894589
scanned, skipped, isolated,
45904590
type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
4591-
if (type == LRU_GEN_FILE)
4592-
sc->nr.file_taken += isolated;
45934591

45944592
*isolatedp = isolated;
45954593
return scanned;
@@ -4697,12 +4695,27 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
46974695
return scanned;
46984696
retry:
46994697
reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false, memcg);
4700-
sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
47014698
sc->nr_reclaimed += reclaimed;
47024699
trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
47034700
type_scanned, reclaimed, &stat, sc->priority,
47044701
type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
47054702

4703+
/*
4704+
* If too many file cache in the coldest generation can't be evicted
4705+
* due to being dirty, wake up the flusher.
4706+
*/
4707+
if (stat.nr_unqueued_dirty == isolated) {
4708+
wakeup_flusher_threads(WB_REASON_VMSCAN);
4709+
4710+
/*
4711+
* For cgroupv1 dirty throttling is achieved by waking up
4712+
* the kernel flusher here and later waiting on folios
4713+
* which are in writeback to finish (see shrink_folio_list()).
4714+
*/
4715+
if (!writeback_throttling_sane(sc))
4716+
reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
4717+
}
4718+
47064719
list_for_each_entry_safe_reverse(folio, next, &list, lru) {
47074720
DEFINE_MIN_SEQ(lruvec);
47084721

@@ -4871,28 +4884,6 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
48714884
cond_resched();
48724885
}
48734886

4874-
/*
4875-
* If too many file cache in the coldest generation can't be evicted
4876-
* due to being dirty, wake up the flusher.
4877-
*/
4878-
if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken) {
4879-
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
4880-
4881-
wakeup_flusher_threads(WB_REASON_VMSCAN);
4882-
4883-
/*
4884-
* For cgroupv1 dirty throttling is achieved by waking up
4885-
* the kernel flusher here and later waiting on folios
4886-
* which are in writeback to finish (see shrink_folio_list()).
4887-
*
4888-
* Flusher may not be able to issue writeback quickly
4889-
* enough for cgroupv1 writeback throttling to work
4890-
* on a large system.
4891-
*/
4892-
if (!writeback_throttling_sane(sc))
4893-
reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
4894-
}
4895-
48964887
return need_rotate;
48974888
}
48984889

0 commit comments

Comments
 (0)