Hi All,
Thanks a lot for Zhongjiang's needed commits candidate list! The writeback-cgroup feature was backported to lsk 4.1 git://git.linaro.org/kernel/linux-linaro-stable.git v4.1/topic/writeback-cg and mainly merged into LSK ba2b0f32c on linux-linaro-lsk-v4.1 branch This feature is for cgroup2 writeback support.
Following is sourcecode change for this feature. Testing/review/comments are very appreciated!
Thanks Alex ---
CREDITS | 5 + Documentation/{cgroups => cgroup-legacy}/00-INDEX | 0 .../blkio-controller.txt | 30 +- .../{cgroups => cgroup-legacy}/cgroups.txt | 4 + .../{cgroups => cgroup-legacy}/cpuacct.txt | 0 .../{cgroups => cgroup-legacy}/cpusets.txt | 0 .../{cgroups => cgroup-legacy}/devices.txt | 0 .../freezer-subsystem.txt | 0 .../{cgroups => cgroup-legacy}/hugetlb.txt | 0 .../{cgroups => cgroup-legacy}/memcg_test.txt | 0 .../{cgroups => cgroup-legacy}/memory.txt | 1 + .../{cgroups => cgroup-legacy}/net_cls.txt | 0 .../{cgroups => cgroup-legacy}/net_prio.txt | 0 Documentation/cgroup-legacy/pids.txt | 85 + .../unified-hierarchy.txt | 254 ++- Documentation/cgroup.txt | 1293 +++++++++++++++ Documentation/filesystems/dax.txt | 7 +- MAINTAINERS | 1 + arch/Kconfig | 6 + arch/arm/include/asm/jump_label.h | 25 +- arch/arm/kernel/jump_label.c | 2 +- arch/arm64/include/asm/jump_label.h | 18 +- arch/arm64/kernel/jump_label.c | 2 +- arch/mips/include/asm/jump_label.h | 19 +- arch/mips/kernel/jump_label.c | 2 +- arch/powerpc/include/asm/jump_label.h | 19 +- arch/powerpc/kernel/jump_label.c | 2 +- arch/s390/include/asm/jump_label.h | 19 +- arch/s390/kernel/jump_label.c | 2 +- arch/sparc/include/asm/jump_label.h | 35 +- arch/sparc/kernel/jump_label.c | 2 +- arch/x86/include/asm/jump_label.h | 21 +- arch/x86/kernel/jump_label.c | 2 +- block/bio.c | 37 +- block/blk-cgroup.c | 754 ++++++--- block/blk-core.c | 74 +- block/blk-integrity.c | 1 + block/blk-sysfs.c | 3 +- block/blk-throttle.c | 509 +++--- block/blk.h | 5 - block/bounce.c | 8 +- block/cfq-iosched.c | 734 +++++---- block/elevator.c | 4 +- block/genhd.c | 1 + drivers/block/drbd/drbd_int.h | 1 + drivers/block/drbd/drbd_main.c | 10 +- drivers/block/pktcdvd.c | 1 + drivers/char/raw.c | 1 + drivers/md/bcache/request.c | 1 + drivers/md/dm.c | 2 +- drivers/md/dm.h | 1 + drivers/md/md.h | 1 + drivers/md/raid1.c | 4 +- drivers/md/raid10.c | 2 +- drivers/mtd/devices/block2mtd.c | 1 + .../lustre/include/linux/lustre_patchless_compat.h | 4 +- fs/9p/v9fs.c | 50 +- fs/9p/vfs_super.c | 8 +- fs/block_dev.c | 32 +- fs/btrfs/disk-io.c | 11 +- fs/btrfs/extent_io.c | 2 - fs/buffer.c | 74 +- fs/dax.c | 192 ++- fs/ext2/file.c | 14 +- fs/ext2/inode.c | 1 + fs/ext2/super.c | 1 + fs/ext4/ext4.h | 2 +- fs/ext4/extents.c | 1 + fs/ext4/file.c | 27 +- fs/ext4/indirect.c | 1 + fs/ext4/inode.c | 35 +- fs/ext4/mballoc.c | 1 + fs/ext4/page-io.c | 9 +- fs/ext4/super.c | 9 +- fs/f2fs/node.c | 4 +- fs/f2fs/segment.h | 3 +- fs/fat/file.c | 1 + fs/fat/inode.c | 1 + fs/fs-writeback.c | 1177 +++++++++++--- fs/fuse/file.c | 12 +- fs/gfs2/super.c | 2 +- fs/hfs/super.c | 1 + fs/hfsplus/super.c | 1 + fs/inode.c | 15 +- fs/internal.h | 2 +- fs/kernfs/dir.c | 23 + fs/kernfs/kernfs-internal.h | 1 - fs/mpage.c | 3 + fs/nfs/filelayout/filelayout.c | 1 + fs/nfs/internal.h | 2 +- fs/nfs/write.c | 3 +- fs/nilfs2/segbuf.c | 12 - fs/ocfs2/file.c | 1 + fs/reiserfs/super.c | 1 + fs/ufs/super.c | 1 + fs/xfs/xfs_aops.c | 12 +- fs/xfs/xfs_buf.h | 1 + fs/xfs/xfs_file.c | 1 + include/linux/backing-dev-defs.h | 259 +++ include/linux/backing-dev.h | 506 ++++-- include/linux/bio.h | 3 + {block => include/linux}/blk-cgroup.h | 381 +++-- include/linux/blk_types.h | 1 - include/linux/blkdev.h | 22 +- include/linux/cgroup-defs.h | 529 ++++++ include/linux/cgroup.h | 1088 ++++--------- include/linux/cgroup_subsys.h | 30 +- include/linux/dax.h | 35 + include/linux/elevator.h | 2 + include/linux/fs.h | 41 +- include/linux/huge_mm.h | 9 + include/linux/hugetlb_cgroup.h | 4 +- include/linux/init_task.h | 8 - include/linux/jump_label.h | 202 ++- include/linux/kernfs.h | 9 + include/linux/list.h | 5 + include/linux/memcontrol.h | 431 ++++- include/linux/mm.h | 31 +- include/linux/mmzone.h | 6 +- include/linux/pagemap.h | 3 +- include/linux/sched.h | 80 +- include/linux/string.h | 1 + include/linux/swap.h | 11 +- include/linux/tracehook.h | 3 + include/linux/writeback.h | 221 ++- include/net/sock.h | 33 - include/trace/events/writeback.h | 187 ++- include/uapi/linux/magic.h | 1 + init/Kconfig | 22 + kernel/Makefile | 1 + kernel/cgroup.c | 1697 ++++++++++++-------- kernel/cgroup_freezer.c | 25 +- kernel/cgroup_pids.c | 316 ++++ kernel/cpuset.c | 103 +- kernel/events/core.c | 22 +- kernel/fork.c | 31 +- kernel/jump_label.c | 168 +- kernel/sched/core.c | 36 +- lib/string.c | 17 + mm/backing-dev.c | 682 ++++++-- mm/fadvise.c | 2 +- mm/filemap.c | 34 +- mm/huge_memory.c | 50 +- mm/madvise.c | 1 + mm/memcontrol.c | 805 ++++------ mm/memory-failure.c | 2 +- mm/memory.c | 30 +- mm/migrate.c | 12 +- mm/page-writeback.c | 1265 ++++++++++----- mm/page_alloc.c | 21 +- mm/readahead.c | 2 +- mm/rmap.c | 2 + mm/slab_common.c | 2 +- mm/truncate.c | 18 +- mm/vmscan.c | 86 +- net/core/netclassid_cgroup.c | 14 +- net/core/netprio_cgroup.c | 9 +- 157 files changed, 10767 insertions(+), 4616 deletions(-)
Picked commits (bottom up sequence): --- writeback: Fix performance regression in wb_over_bg_thresh() cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation cgroup: rename Documentation/cgroups/ to Documentation/cgroup-legacy/ cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type cfq-iosched: fix the setting of IOPS mode on SSDs block: Make CFQ default to IOPS mode on SSDs writeback: initialize inode members that track writeback history writeback: keep superblock pinned during cgroup writeback association switches memcg: fix dirty page migration block: detach bdev inode from its wb in __blkdev_put() fs/block_dev.c: Remove WARN_ON() when inode writeback fails cgroup: make sure a parent css isn't offlined before its children cgroup: make sure a parent css isn't freed before its children mm: page_alloc: generalize the dirty balance reserve block: fix module reference leak on put_disk() call for cgroups throttle memcg: fix memory.high target memcg: ratify and consolidate over-charge handling memcg: punt high overage reclaim to return-to-userland path memcg: flatten task_struct->memcg_oom mm: memcontrol: eliminate root memory.current fs/writeback, rcu: Don't use list_entry_rcu() for pointer offsetting in bdi_split_work_to_wbs() writeback: bdi_writeback iteration must not skip dying ones writeback: sync_inodes_sb() must write out I_DIRTY_TIME inodes and always call wait_sb_inodes() mm, vmscan: unlock page while waiting on writeback cgroup_pids: don't account for the root cgroup cgroup: fix handling of multi-destination migration from subtree_control enabling cgroup_freezer: simplify propagation of CGROUP_FROZEN clearing in freezer_attach() cgroup: pids: kill pids_fork(), simplify pids_can_fork() and pids_cancel_fork() cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate() cgroup: make css_set pin its css's to avoid use-afer-free cgroup: fix cftype->file_offset handling cgroup: pids: fix invalid get/put usage cgroup: fix race condition around termination check in css_task_iter_next() blkcg: don't create "io.stat" on the root cgroup cgroup: drop cgroup__DEVEL__legacy_files_on_dfl cgroup: replace error handling in cgroup_init() with WARN_ON()s cgroup: add cgroup_subsys->free() method and use it to fix pids controller cgroup: keep zombies associated with their original cgroups cgroup: make css_set_rwsem a spinlock and rename it to css_set_lock cgroup: don't hold css_set_rwsem across css task iteration cgroup: reorganize css_task_iter functions cgroup: factor out css_set_move_task() cgroup: keep css_set and task lists in chronological order cgroup: make cgroup_destroy_locked() test cgroup_is_populated() cgroup: make css_sets pin the associated cgroups cgroup: relocate cgroup_[try]get/put() cgroup: move check_for_release() invocation cgroup: replace cgroup_has_tasks() with cgroup_is_populated() cgroup: make cgroup->nr_populated count the number of populated css_sets cgroup: remove an unused parameter from cgroup_task_migrate() cgroup: implement the PIDs subsystem cgroup: fix too early usage of static_branch_disable() cgroup: make cgroup_update_dfl_csses() migrate all target processes atomically cgroup: separate out taskset operations from cgroup_migrate() cgroup: reorder cgroup_migrate()'s parameters cgroup, memcg, cpuset: implement cgroup_taskset_for_each_leader() cpuset: migrate memory only for threadgroup leaders cgroup: allow a cgroup subsystem to reject a fork memcg: generate file modified notifications on "memory.events" cgroup: generalize obtaining the handles of and notifying cgroup files cgroup: restructure file creation / removal handling cgroup: cosmetic updates to rebind_subsystems() cgroup: make cgroup_addrm_files() clean up after itself on failures cgroup: relocate cgroup_populate_dir() cgroup: replace cftype->mode with CFTYPE_WORLD_WRITABLE cgroup: replace "cgroup.populated" with "cgroup.events" cgroup: replace cgroup_on_dfl() tests in controllers with cgroup_subsys_on_dfl() cgroup: replace cgroup_subsys->disabled tests with cgroup_subsys_enabled() memcg: move memcg_proto_active from sock.h memcg, tcp_kmem: check for cg_proto in sock_update_memcg memcg: restructure mem_cgroup_can_attach() memcg: get rid of extern for functions in memcontrol.h memcg: get rid of mem_cgroup_root_css for !CONFIG_MEMCG memcg: export struct mem_cgroup memcg: convert mem_cgroup->under_oom from atomic_t to int memcg: remove unused mem_cgroup->oom_wakeups jump_label: make static_key_enabled() work on static_key_true/false types too cgroup: implement static_key based cgroup_subsys_enabled() and cgroup_subsys_on_dfl() cgroup: add delegation section to unified hierarchy documentation cgroup: require write perm on common ancestor when moving processes on the default hierarchy cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write() MAINTAINERS: add a cgroup core co-maintainer cgroup: fix uninitialised iterator in for_each_subsys_which cgroup: replace explicit ss_mask checking with for_each_subsys_which cgroup: use bitmask to filter for_each_subsys cgroup: add seq_file forward declaration for struct cftype cgroup: simplify threadgroup locking sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem sched, cgroup: reorganize threadgroup locking cgroup: switch to unsigned long for bitmasks kernfs: make kernfs_get_inode() public locking/static_keys: Add selftest locking/static_keys: Add a new static_key interface locking/static_keys: Rework update logic locking/static_keys: Add static_key_{en,dis}able() helpers jump_label: Add jump_entry_key() helper jump_label, locking/static_keys: Rename JUMP_LABEL_TYPE_* and related helpers to the static_key* pattern module, jump_label: Fix module locking jump_label: Rename JUMP_LABEL_{EN,DIS}ABLE to JUMP_LABEL_{JMP,NOP} block: fix bounce_end_io writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in cgwb_bdi_destroy() ext4: optimize ext4_writepage() for attempted 4k delalloc writes dax: don't use set_huge_zero_page() mm: add vmf_insert_pfn_pmd() mm: export various functions for the benefit of DAX dax: add huge page fault support dax: move DAX-related functions to a new header dax: expose __dax_fault for filesystems with locking constraints dax: don't abuse get_block mapping for endio callbacks writeback: memcg dirty_throttle_control should be initialized with wb->memcg_completions writeback: fix bdi_writeback iteration in wakeup_dirtytime_writeback() writeback: laptop_mode_timer_fn() needs rcu_read_lock() around bdi_writeback iteration writeback: fix incorrect calculation of available memory for memcg domains blkcg: use CGROUP_WEIGHT_* scale for io.weight on the unified hierarchy blkcg: s/CFQ_WEIGHT_*/CFQ_WEIGHT_LEGACY_*/ blkcg: implement interface for the unified hierarchy blkcg: misc preparations for unified hierarchy interface blkcg: separate out tg_conf_updated() from tg_set_conf() blkcg: move body parsing from blkg_conf_prep() to its callers blkcg: mark existing cftypes as legacy blkcg: rename subsystem name from blkio to io blkcg: refine error codes returned during blkcg configuration blkcg: remove unnecessary NULL checks from __cfqg_set_weight_device() blkcg: reduce stack usage of blkg_rwstat_recursive_sum() blkcg: remove cfqg_stats->sectors blkcg: move io_service_bytes and io_serviced stats into blkcg_gq blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq blkcg: make blkcg_[rw]stat per-cpu blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it blkcg: consolidate blkg creation in blkcg_bio_issue_check() blk-throttle: improve queue bypass handling blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup() blkcg: inline [__]blkg_lookup() blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() methods blkcg: minor updates around blkcg_policy_data blkcg: make blkcg_policy methods take a pointer to blkcg_policy_data blk-throttle: clean up blkg_policy_data alloc/init/exit/free methods blk-throttle: remove asynchrnous percpu stats allocation mechanism blkcg: replace blkcg_policy->pd_size with ->pd_alloc/free_fn() methods blkcg: make blkcg_activate_policy() allow NULL ->pd_init_fn blkcg: restructure blkg_policy_data allocation in blkcg_activate_policy() blkcg: remove unnecessary blkcg_root handling from css_alloc/free paths blkcg: use blkg_free() in blkcg_init_queue() failure path blkcg: remove unnecessary request_list->blkg NULL test in blk_put_rl() cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root cfq-iosched: fold cfq_find_alloc_queue() into cfq_get_queue() cfq-iosched: move cfq_group determination from cfq_find_alloc_queue() to cfq_get_queue() cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue() blkcg, cfq-iosched: use GFP_NOWAIT instead of GFP_ATOMIC for non-critical allocations cfq-iosched: minor cleanups cfq-iosched: fix oom cfq_queue ref leak in cfq_set_request() cfq-iosched: fix async oom queue handling cfq-iosched: simplify control flow in cfq_get_queue() writeback: update writeback tracepoints to report cgroup kernfs: implement kernfs_path_len() writeback: explain why @inode is allowed to be NULL for inode_congested() writeback: remove wb_writeback_work->single_wait/done writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg cgroup, writeback: don't enable cgroup writeback on traditional hierarchies fs-writeback: unplug before cond_resched in writeback_sb_inodes writeback: plug writeback in wb_writeback() and writeback_inodes_wb() block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkg ext4: huge page fault support ext2: huge page fault support mm: add a pmd_fault handler lib/string.c: introduce strreplace() ext4: reject journal options for ext2 mounts blkcg: fix blkcg_policy_data allocation bug blkcg: implement all_blkcgs list blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[] cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL cfq-iosched: fix sysfs oops when attempting to read unconfigured weights cfq-iosched: move group scheduling functions under ifdef cfq-iosched: fix the setting of IOPS mode on SSDs block, cgroup: implement policy-specific per-blkcg data block: Make CFQ default to IOPS mode on SSDs block: add blk_set_queue_dying() to blkdev.h blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods block/blk-cgroup.c: free per-blkcg data when freeing the blkcg cgroup: introduce cgroup_subsys->legacy_name cgroup: don't print subsystems for the default hierarchy inode: rename i_wb_list to i_io_list inode: add hlist_fake to avoid the inode hash lock in evict writeback: plug writeback at a high level blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL) cgroup: make cftype->private a unsigned long writeback: fix initial dirty limit cgroup: export cgrp_dfl_root cgroup: define controller file conventions cgroup, block: implement task_get_css() and use it in bio_associate_current() cgroup: reorganize include/linux/cgroup.h cgroup: separate out include/linux/cgroup-defs.h Revert "cgroup, block: implement task_get_css() and use it in bio_associate_current()" cgroup: fix idr_preload usage cgroup: net_cls: fix false-positive "suspicious RCU usage" block: export bio_associate_*() and wbc_account_io() mm/page-writeback.c: initialize m_dirty to avoid compile warning v9fs: fix error handling in v9fs_session_init() Merge branch 'writeback-cg' into linux-4.1.y mm, vmscan: Do not wait for page writeback for GFP_NOFS allocations ext4: implement cgroup writeback support ext4: replace ext4_io_submit->io_op with ->io_wbc block: remove BIO_EOPNOTSUPP writeback: don't drain bdi_writeback_congested on bdi destruction writeback: don't embed root bdi_writeback_congested in bdi_writeback ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp bufferhead: Add _gfp version for sb_getblk() writeback, blkio: add documentation for cgroup writeback support vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB writeback: do foreign inode detection iff cgroup writeback is enabled bdi: fix wrong error return value in cgwb_create() writeback: disassociate inodes from dying bdi_writebacks writeback: implement foreign cgroup inode bdi_writeback switching writeback: add lockdep annotation to inode_to_wb() writeback: use unlocked_inode_to_wb transaction in inode_congested() writeback: implement unlocked_inode_to_wb transaction and use it for stat updates writeback: implement [locked_]inode_to_wb_and_lock_list() writeback: implement foreign cgroup inode detection writeback: make writeback_control track the inode being written back writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb() mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use writeback: implement memcg writeback domain based throttling writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes writeback: implement memcg wb_domain writeback: update wb_over_bg_thresh() to use wb_domain aware operations writeback: move over_bground_thresh() to mm/page-writeback.c writeback: separate out domain_dirty_limits() writeback: make __wb_writeout_inc() and hard_dirty_limit() take wb_domaas a parameter writeback: add dirty_throttle_control->dom writeback: add dirty_throttle_control->wb_completions writeback: add dirty_throttle_control->pos_ratio writeback: make __wb_calc_thresh() take dirty_throttle_control writeback: add dirty_throttle_control->wb_bg_thresh writeback: consolidate dirty throttle parameters into dirty_throttle_control writeback: move global_dirty_limit into wb_domain writeback: implement wb_domain writeback: reorganize [__]wb_update_bandwidth() writeback: clean up wb_dirty_limit() memcg: make mem_cgroup_read_{stat|event}() iterate possible cpus instead of online ext2: enable cgroup writeback support mpage: make __mpage_writepage() honor cgroup writeback buffer, writeback: make __block_write_full_page() honor cgroup writeback writeback: dirty inodes against their matching cgroup bdi_writeback's writeback: make writeback initiation functions handle multiple bdi_writeback's writeback: restructure try_writeback_inodes_sb[_nr]() writeback: implement wb_wait_for_single_work() writeback: implement bdi_wait_for_completion() writeback: add wb_writeback_work->auto_free writeback: make wakeup_dirtytime_writeback() handle multiple bdi_writeback's writeback: make wakeup_flusher_threads() handle multiple bdi_writeback's writeback: make bdi_start_background_writeback() take bdi_writeback instead of backing_dev_info writeback: make writeback_in_progress() take bdi_writeback instead of backing_dev_info writeback: make laptop_mode_timer_fn() handle multiple bdi_writeback's writeback: remove bdi_start_writeback() writeback: implement bdi_for_each_wb() writeback: make bdi->min/max_ratio handling cgroup writeback aware writeback: don't issue wb_writeback_work if clean writeback: make bdi_has_dirty_io() take multiple bdi_writeback's into account writeback: implement backing_dev_info->tot_write_bandwidth writeback: implement WB_has_dirty_io wb_state flag writeback: implement and use inode_congested() writeback, blkcg: propagate non-root blkcg congestion state writeback, blkcg: restructure blk_{set|clear}_queue_congested() writeback: make congestion functions per bdi_writeback writeback: let balance_dirty_pages() work on the matching cgroup bdi_writeback writeback: attribute stats to the matching per-cgroup bdi_writeback writeback, blkcg: associate each blkcg_gq with the corresponding bdi_writeback_congested writeback: make backing_dev_info host cgroup-specific bdi_writebacks writeback: add {CONFIG|BDI_CAP|FS}_CGROUP_WRITEBACK bdi: separate out congested state into a separate struct writeback: add @gfp to wb_init() bdi: make inode_to_bdi() inline writeback: separate out include/linux/backing-dev-defs.h writeback: reorganize mm/backing-dev.c writeback: move backing_dev_info->wb_lock and ->worklist into bdi_writeback writeback: s/bdi/wb/ in mm/page-writeback.c writeback: move bandwidth related fields from backing_dev_info into bdi_writeback writeback: move backing_dev_info->bdi_stat[] into bdi_writeback writeback: move backing_dev_info->state into bdi_writeback memcg: implement mem_cgroup_css_from_page() blkcg: implement bio_associate_blkcg() blkcg: implement task_get_blkcg_css() cgroup, block: implement task_get_css() and use it in bio_associate_current() blkcg: add blkcg_root_css memcg: add mem_cgroup_root_css blkcg: always create the blkcg_gq for the root blkcg update !CONFIG_BLK_CGROUP dummies in include/linux/blk-cgroup.h blkcg: move block/blk-cgroup.h to include/linux/blk-cgroup.h memcg: add per cgroup dirty page accounting page_writeback: revive cancel_dirty_page() in a restricted form