The patch titled
Subject: mm/huge_memory: rename freeze_page() to unmap_page()
has been added to the -mm tree. Its filename is
mm-huge_memory-rename-freeze_page-to-unmap_page.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-huge_memory-rename-freeze_page-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-huge_memory-rename-freeze_page-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mm/huge_memory: rename freeze_page() to unmap_page()
The term "freeze" is used in several ways in the kernel, and in mm it has
the particular meaning of forcing page refcount temporarily to 0.
freeze_page() is just too confusing a name for a function that unmaps a
page: rename it unmap_page(), and rename unfreeze_page() remap_page().
Went to change the mention of freeze_page() added later in mm/rmap.c, but
found it to be incorrect: ordinary page reclaim reaches there too; but the
substance of the comment still seems correct, so edit it down.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261514080.2275@eggly.anvils
Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/huge_memory.c~mm-huge_memory-rename-freeze_page-to-unmap_page
+++ a/mm/huge_memory.c
@@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_are
}
}
-static void freeze_page(struct page *page)
+static void unmap_page(struct page *page)
{
enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS |
TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
@@ -2365,7 +2365,7 @@ static void freeze_page(struct page *pag
VM_BUG_ON_PAGE(!unmap_success, page);
}
-static void unfreeze_page(struct page *page)
+static void remap_page(struct page *page)
{
int i;
if (PageTransHuge(page)) {
@@ -2483,7 +2483,7 @@ static void __split_huge_page(struct pag
spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags);
- unfreeze_page(head);
+ remap_page(head);
for (i = 0; i < HPAGE_PMD_NR; i++) {
struct page *subpage = head + i;
@@ -2664,7 +2664,7 @@ int split_huge_page_to_list(struct page
}
/*
- * Racy check if we can split the page, before freeze_page() will
+ * Racy check if we can split the page, before unmap_page() will
* split PMDs
*/
if (!can_split_huge_page(head, &extra_pins)) {
@@ -2673,7 +2673,7 @@ int split_huge_page_to_list(struct page
}
mlocked = PageMlocked(page);
- freeze_page(head);
+ unmap_page(head);
VM_BUG_ON_PAGE(compound_mapcount(head), head);
/* Make sure the page is not on per-CPU pagevec as it takes pin */
@@ -2727,7 +2727,7 @@ int split_huge_page_to_list(struct page
fail: if (mapping)
xa_unlock(&mapping->i_pages);
spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags);
- unfreeze_page(head);
+ remap_page(head);
ret = -EBUSY;
}
--- a/mm/rmap.c~mm-huge_memory-rename-freeze_page-to-unmap_page
+++ a/mm/rmap.c
@@ -1627,16 +1627,9 @@ static bool try_to_unmap_one(struct page
address + PAGE_SIZE);
} else {
/*
- * We should not need to notify here as we reach this
- * case only from freeze_page() itself only call from
- * split_huge_page_to_list() so everything below must
- * be true:
- * - page is not anonymous
- * - page is locked
- *
- * So as it is a locked file back page thus it can not
- * be remove from the page cache and replace by a new
- * page before mmu_notifier_invalidate_range_end so no
+ * This is a locked file-backed page, thus it cannot
+ * be removed from the page cache and replaced by a new
+ * page before mmu_notifier_invalidate_range_end, so no
* concurrent thread might update its page table to
* point at new page while a device still is using this
* page.
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-huge_memory-rename-freeze_page-to-unmap_page.patch
mm-huge_memory-splitting-set-mappingindex-before-unfreeze.patch
mm-huge_memory-fix-lockdep-complaint-on-32-bit-i_size_read.patch
mm-khugepaged-collapse_shmem-stop-if-punched-or-truncated.patch
mm-khugepaged-fix-crashes-due-to-misaccounted-holes.patch
mm-khugepaged-collapse_shmem-remember-to-clear-holes.patch
mm-khugepaged-minor-reorderings-in-collapse_shmem.patch
mm-khugepaged-collapse_shmem-without-freezing-new_page.patch
mm-khugepaged-collapse_shmem-do-not-crash-on-compound.patch
mm-khugepaged-fix-the-xas_create_range-error-path.patch
mm-put_and_wait_on_page_locked-while-page-is-migrated.patch
From: Dexuan Cui <decui(a)microsoft.com>
We can concurrently try to open the same sub-channel from 2 paths:
path #1: vmbus_onoffer() -> vmbus_process_offer() -> handle_sc_creation().
path #2: storvsc_probe() -> storvsc_connect_to_vsp() ->
-> storvsc_channel_init() -> handle_multichannel_storage() ->
-> vmbus_are_subchannels_present() -> handle_sc_creation().
They conflict with each other, but it was not an issue before the recent
commit ae6935ed7d42 ("vmbus: split ring buffer allocation from open"),
because at the beginning of vmbus_open() we checked newchannel->state so
only one path could succeed, and the other would return with -EINVAL.
After ae6935ed7d42, the failing path frees the channel's ringbuffer by
vmbus_free_ring(), and this causes a panic later.
Commit ae6935ed7d42 itself is good, and it just reveals the longstanding
race. We can resolve the issue by removing path #2, i.e. removing the
second vmbus_are_subchannels_present() in handle_multichannel_storage().
BTW, the comment "Check to see if sub-channels have already been created"
in handle_multichannel_storage() is incorrect: when we unload the driver,
we first close the sub-channel(s) and then close the primary channel, next
the host sends rescind-offer message(s) so primary->sc_list will become
empty. This means the first vmbus_are_subchannels_present() in
handle_multichannel_storage() is never useful.
Fixes: ae6935ed7d42 ("vmbus: split ring buffer allocation from open")
Cc: stable(a)vger.kernel.org
Cc: Long Li <longli(a)microsoft.com>
Cc: Stephen Hemminger <sthemmin(a)microsoft.com>
Cc: K. Y. Srinivasan <kys(a)microsoft.com>
Cc: Haiyang Zhang <haiyangz(a)microsoft.com>
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys(a)microsoft.com>
---
drivers/scsi/storvsc_drv.c | 61 +++++++++++++++++++-------------------
1 file changed, 30 insertions(+), 31 deletions(-)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index f03dc03a42c3..8f88348ebe42 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -446,7 +446,6 @@ struct storvsc_device {
bool destroy;
bool drain_notify;
- bool open_sub_channel;
atomic_t num_outstanding_req;
struct Scsi_Host *host;
@@ -636,33 +635,38 @@ static inline struct storvsc_device *get_in_stor_device(
static void handle_sc_creation(struct vmbus_channel *new_sc)
{
struct hv_device *device = new_sc->primary_channel->device_obj;
+ struct device *dev = &device->device;
struct storvsc_device *stor_device;
struct vmstorage_channel_properties props;
+ int ret;
stor_device = get_out_stor_device(device);
if (!stor_device)
return;
- if (stor_device->open_sub_channel == false)
- return;
-
memset(&props, 0, sizeof(struct vmstorage_channel_properties));
- vmbus_open(new_sc,
- storvsc_ringbuffer_size,
- storvsc_ringbuffer_size,
- (void *)&props,
- sizeof(struct vmstorage_channel_properties),
- storvsc_on_channel_callback, new_sc);
+ ret = vmbus_open(new_sc,
+ storvsc_ringbuffer_size,
+ storvsc_ringbuffer_size,
+ (void *)&props,
+ sizeof(struct vmstorage_channel_properties),
+ storvsc_on_channel_callback, new_sc);
- if (new_sc->state == CHANNEL_OPENED_STATE) {
- stor_device->stor_chns[new_sc->target_cpu] = new_sc;
- cpumask_set_cpu(new_sc->target_cpu, &stor_device->alloced_cpus);
+ /* In case vmbus_open() fails, we don't use the sub-channel. */
+ if (ret != 0) {
+ dev_err(dev, "Failed to open sub-channel: err=%d\n", ret);
+ return;
}
+
+ /* Add the sub-channel to the array of available channels. */
+ stor_device->stor_chns[new_sc->target_cpu] = new_sc;
+ cpumask_set_cpu(new_sc->target_cpu, &stor_device->alloced_cpus);
}
static void handle_multichannel_storage(struct hv_device *device, int max_chns)
{
+ struct device *dev = &device->device;
struct storvsc_device *stor_device;
int num_cpus = num_online_cpus();
int num_sc;
@@ -679,21 +683,11 @@ static void handle_multichannel_storage(struct hv_device *device, int max_chns)
request = &stor_device->init_request;
vstor_packet = &request->vstor_packet;
- stor_device->open_sub_channel = true;
/*
* Establish a handler for dealing with subchannels.
*/
vmbus_set_sc_create_callback(device->channel, handle_sc_creation);
- /*
- * Check to see if sub-channels have already been created. This
- * can happen when this driver is re-loaded after unloading.
- */
-
- if (vmbus_are_subchannels_present(device->channel))
- return;
-
- stor_device->open_sub_channel = false;
/*
* Request the host to create sub-channels.
*/
@@ -710,23 +704,29 @@ static void handle_multichannel_storage(struct hv_device *device, int max_chns)
VM_PKT_DATA_INBAND,
VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
- if (ret != 0)
+ if (ret != 0) {
+ dev_err(dev, "Failed to create sub-channel: err=%d\n", ret);
return;
+ }
t = wait_for_completion_timeout(&request->wait_event, 10*HZ);
- if (t == 0)
+ if (t == 0) {
+ dev_err(dev, "Failed to create sub-channel: timed out\n");
return;
+ }
if (vstor_packet->operation != VSTOR_OPERATION_COMPLETE_IO ||
- vstor_packet->status != 0)
+ vstor_packet->status != 0) {
+ dev_err(dev, "Failed to create sub-channel: op=%d, sts=%d\n",
+ vstor_packet->operation, vstor_packet->status);
return;
+ }
/*
- * Now that we created the sub-channels, invoke the check; this
- * may trigger the callback.
+ * We need to do nothing here, because vmbus_process_offer()
+ * invokes channel->sc_creation_callback, which will open and use
+ * the sub-channel(s).
*/
- stor_device->open_sub_channel = true;
- vmbus_are_subchannels_present(device->channel);
}
static void cache_wwn(struct storvsc_device *stor_device,
@@ -1794,7 +1794,6 @@ static int storvsc_probe(struct hv_device *device,
}
stor_device->destroy = false;
- stor_device->open_sub_channel = false;
init_waitqueue_head(&stor_device->waiting_to_drain);
stor_device->device = device;
stor_device->host = host;
--
2.19.1
The patch titled
Subject: zram: writeback throttle
has been added to the -mm tree. Its filename is
zram-writeback-throttle.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/zram-writeback-throttle.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/zram-writeback-throttle.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Minchan Kim <minchan(a)kernel.org>
Subject: zram: writeback throttle
On small memory systems there are lots of write IOs so if we use a flash
device as swap there would be serious flash wearout. To overcome this
problem, system developers need to design a write limitation strategy to
guarantee flash health for the entire product life.
This patch creates a new knob "writeback_limit" on zram. With that, if
the current writeback IO count (/sys/block/zramX/io_stat) exceeds the
limitation, zram stops further writeback until the admin can reset the
limit.
Link: http://lkml.kernel.org/r/20181127055429.251614-8-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan(a)kernel.org>
Cc: Joey Pabalinas <joeypabalinas(a)gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/ABI/testing/sysfs-block-zram | 9 +++
Documentation/blockdev/zram.txt | 2
drivers/block/zram/zram_drv.c | 47 ++++++++++++++++++-
drivers/block/zram/zram_drv.h | 2
4 files changed, 59 insertions(+), 1 deletion(-)
--- a/Documentation/ABI/testing/sysfs-block-zram~zram-writeback-throttle
+++ a/Documentation/ABI/testing/sysfs-block-zram
@@ -121,3 +121,12 @@ Description:
The bd_stat file is read-only and represents backing device's
statistics (bd_count, bd_reads, bd_writes) in a format
similar to block layer statistics file format.
+
+What: /sys/block/zram<id>/writeback_limit
+Date: November 2018
+Contact: Minchan Kim <minchan(a)kernel.org>
+Description:
+ The writeback_limit file is read-write and specifies the maximum
+ amount of writeback ZRAM can do. The limit could be changed
+ in run time and "0" means disable the limit.
+ No limit is the initial state.
--- a/Documentation/blockdev/zram.txt~zram-writeback-throttle
+++ a/Documentation/blockdev/zram.txt
@@ -164,6 +164,8 @@ reset WO trigger device r
mem_used_max WO reset the `mem_used_max' counter (see later)
mem_limit WO specifies the maximum amount of memory ZRAM can use
to store the compressed data
+writeback_limit WO specifies the maximum amount of write IO zram can
+ write out to backing device as 4KB unit
max_comp_streams RW the number of possible concurrent compress operations
comp_algorithm RW show and change the compression algorithm
compact WO trigger memory compaction
--- a/drivers/block/zram/zram_drv.c~zram-writeback-throttle
+++ a/drivers/block/zram/zram_drv.c
@@ -330,6 +330,40 @@ next:
}
#ifdef CONFIG_ZRAM_WRITEBACK
+
+static ssize_t writeback_limit_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ struct zram *zram = dev_to_zram(dev);
+ u64 val;
+ ssize_t ret = -EINVAL;
+
+ if (kstrtoull(buf, 10, &val))
+ return ret;
+
+ down_read(&zram->init_lock);
+ atomic64_set(&zram->stats.bd_wb_limit, val);
+ if (val == 0 || val > atomic64_read(&zram->stats.bd_writes))
+ zram->stop_writeback = false;
+ up_read(&zram->init_lock);
+ ret = len;
+
+ return ret;
+}
+
+static ssize_t writeback_limit_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ u64 val;
+ struct zram *zram = dev_to_zram(dev);
+
+ down_read(&zram->init_lock);
+ val = atomic64_read(&zram->stats.bd_wb_limit);
+ up_read(&zram->init_lock);
+
+ return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
+}
+
static void reset_bdev(struct zram *zram)
{
struct block_device *bdev;
@@ -571,6 +605,7 @@ static ssize_t writeback_store(struct de
char mode_buf[8];
unsigned long mode = -1UL;
unsigned long blk_idx = 0;
+ u64 wb_count, wb_limit;
sz = strscpy(mode_buf, buf, sizeof(mode_buf));
if (sz <= 0)
@@ -612,6 +647,11 @@ static ssize_t writeback_store(struct de
bvec.bv_len = PAGE_SIZE;
bvec.bv_offset = 0;
+ if (zram->stop_writeback) {
+ ret = -EIO;
+ break;
+ }
+
if (!blk_idx) {
blk_idx = alloc_block_bdev(zram);
if (!blk_idx) {
@@ -670,7 +710,7 @@ static ssize_t writeback_store(struct de
continue;
}
- atomic64_inc(&zram->stats.bd_writes);
+ wb_count = atomic64_inc_return(&zram->stats.bd_writes);
/*
* We released zram_slot_lock so need to check if the slot was
* changed. If there is freeing for the slot, we can catch it
@@ -694,6 +734,9 @@ static ssize_t writeback_store(struct de
zram_set_element(zram, index, blk_idx);
blk_idx = 0;
atomic64_inc(&zram->stats.pages_stored);
+ wb_limit = atomic64_read(&zram->stats.bd_wb_limit);
+ if (wb_limit != 0 && wb_count >= wb_limit)
+ zram->stop_writeback = true;
next:
zram_slot_unlock(zram, index);
}
@@ -1767,6 +1810,7 @@ static DEVICE_ATTR_RW(comp_algorithm);
#ifdef CONFIG_ZRAM_WRITEBACK
static DEVICE_ATTR_RW(backing_dev);
static DEVICE_ATTR_WO(writeback);
+static DEVICE_ATTR_RW(writeback_limit);
#endif
static struct attribute *zram_disk_attrs[] = {
@@ -1782,6 +1826,7 @@ static struct attribute *zram_disk_attrs
#ifdef CONFIG_ZRAM_WRITEBACK
&dev_attr_backing_dev.attr,
&dev_attr_writeback.attr,
+ &dev_attr_writeback_limit.attr,
#endif
&dev_attr_io_stat.attr,
&dev_attr_mm_stat.attr,
--- a/drivers/block/zram/zram_drv.h~zram-writeback-throttle
+++ a/drivers/block/zram/zram_drv.h
@@ -86,6 +86,7 @@ struct zram_stats {
atomic64_t bd_count; /* no. of pages in backing device */
atomic64_t bd_reads; /* no. of reads from backing device */
atomic64_t bd_writes; /* no. of writes from backing device */
+ atomic64_t bd_wb_limit; /* writeback limit of backing device */
#endif
};
@@ -113,6 +114,7 @@ struct zram {
*/
bool claim; /* Protected by bdev->bd_mutex */
struct file *backing_dev;
+ bool stop_writeback;
#ifdef CONFIG_ZRAM_WRITEBACK
struct block_device *bdev;
unsigned int old_block_size;
_
Patches currently in -mm which might be from minchan(a)kernel.org are
zram-fix-lockdep-warning-of-free-block-handling.patch
zram-fix-double-free-backing-device.patch
zram-refactoring-flags-and-writeback-stuff.patch
zram-introduce-zram_idle-flag.patch
zram-support-idle-huge-page-writeback.patch
zram-add-bd_stat-statistics.patch
zram-writeback-throttle.patch
The patch titled
Subject: zram: add bd_stat statistics
has been added to the -mm tree. Its filename is
zram-add-bd_stat-statistics.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/zram-add-bd_stat-statistics.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/zram-add-bd_stat-statistics.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Minchan Kim <minchan(a)kernel.org>
Subject: zram: add bd_stat statistics
bd_stat represents things that happened in the backing device. Currently
it supports bd_counts, bd_reads and bd_writes which are helpful to
understand wearout of flash and memory saving.
Link: http://lkml.kernel.org/r/20181127055429.251614-7-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan(a)kernel.org>
Cc: Joey Pabalinas <joeypabalinas(a)gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/ABI/testing/sysfs-block-zram | 8 +++++
Documentation/blockdev/zram.txt | 11 +++++++
drivers/block/zram/zram_drv.c | 29 +++++++++++++++++++
drivers/block/zram/zram_drv.h | 5 +++
4 files changed, 53 insertions(+)
--- a/Documentation/ABI/testing/sysfs-block-zram~zram-add-bd_stat-statistics
+++ a/Documentation/ABI/testing/sysfs-block-zram
@@ -113,3 +113,11 @@ Contact: Minchan Kim <minchan(a)kernel.org
Description:
The writeback file is write-only and trigger idle and/or
huge page writeback to backing device.
+
+What: /sys/block/zram<id>/bd_stat
+Date: November 2018
+Contact: Minchan Kim <minchan(a)kernel.org>
+Description:
+ The bd_stat file is read-only and represents backing device's
+ statistics (bd_count, bd_reads, bd_writes) in a format
+ similar to block layer statistics file format.
--- a/Documentation/blockdev/zram.txt~zram-add-bd_stat-statistics
+++ a/Documentation/blockdev/zram.txt
@@ -221,6 +221,17 @@ line of text and contains the following
pages_compacted the number of pages freed during compaction
huge_pages the number of incompressible pages
+File /sys/block/zram<id>/bd_stat
+
+The stat file represents device's backing device statistics. It consists of
+a single line of text and contains the following stats separated by whitespace:
+ bd_count size of data written in backing device.
+ Unit: 4K bytes
+ bd_reads the number of reads from backing device
+ Unit: 4K bytes
+ bd_writes the number of writes to backing device
+ Unit: 4K bytes
+
9) Deactivate:
swapoff /dev/zram0
umount /dev/zram1
--- a/drivers/block/zram/zram_drv.c~zram-add-bd_stat-statistics
+++ a/drivers/block/zram/zram_drv.c
@@ -502,6 +502,7 @@ retry:
if (test_and_set_bit(blk_idx, zram->bitmap))
goto retry;
+ atomic64_inc(&zram->stats.bd_count);
return blk_idx;
}
@@ -511,6 +512,7 @@ static void free_block_bdev(struct zram
was_set = test_and_clear_bit(blk_idx, zram->bitmap);
WARN_ON_ONCE(!was_set);
+ atomic64_dec(&zram->stats.bd_count);
}
static void zram_page_end_io(struct bio *bio)
@@ -668,6 +670,7 @@ static ssize_t writeback_store(struct de
continue;
}
+ atomic64_inc(&zram->stats.bd_writes);
/*
* We released zram_slot_lock so need to check if the slot was
* changed. If there is freeing for the slot, we can catch it
@@ -757,6 +760,7 @@ static int read_from_bdev_sync(struct zr
static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
unsigned long entry, struct bio *parent, bool sync)
{
+ atomic64_inc(&zram->stats.bd_reads);
if (sync)
return read_from_bdev_sync(zram, bvec, entry, parent);
else
@@ -1013,6 +1017,25 @@ static ssize_t mm_stat_show(struct devic
return ret;
}
+#ifdef CONFIG_ZRAM_WRITEBACK
+static ssize_t bd_stat_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct zram *zram = dev_to_zram(dev);
+ ssize_t ret;
+
+ down_read(&zram->init_lock);
+ ret = scnprintf(buf, PAGE_SIZE,
+ "%8llu %8llu %8llu\n",
+ (u64)atomic64_read(&zram->stats.bd_count),
+ (u64)atomic64_read(&zram->stats.bd_reads),
+ (u64)atomic64_read(&zram->stats.bd_writes));
+ up_read(&zram->init_lock);
+
+ return ret;
+}
+#endif
+
static ssize_t debug_stat_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -1033,6 +1056,9 @@ static ssize_t debug_stat_show(struct de
static DEVICE_ATTR_RO(io_stat);
static DEVICE_ATTR_RO(mm_stat);
+#ifdef CONFIG_ZRAM_WRITEBACK
+static DEVICE_ATTR_RO(bd_stat);
+#endif
static DEVICE_ATTR_RO(debug_stat);
static void zram_meta_free(struct zram *zram, u64 disksize)
@@ -1759,6 +1785,9 @@ static struct attribute *zram_disk_attrs
#endif
&dev_attr_io_stat.attr,
&dev_attr_mm_stat.attr,
+#ifdef CONFIG_ZRAM_WRITEBACK
+ &dev_attr_bd_stat.attr,
+#endif
&dev_attr_debug_stat.attr,
NULL,
};
--- a/drivers/block/zram/zram_drv.h~zram-add-bd_stat-statistics
+++ a/drivers/block/zram/zram_drv.h
@@ -82,6 +82,11 @@ struct zram_stats {
atomic_long_t max_used_pages; /* no. of maximum pages stored */
atomic64_t writestall; /* no. of write slow paths */
atomic64_t miss_free; /* no. of missed free */
+#ifdef CONFIG_ZRAM_WRITEBACK
+ atomic64_t bd_count; /* no. of pages in backing device */
+ atomic64_t bd_reads; /* no. of reads from backing device */
+ atomic64_t bd_writes; /* no. of writes from backing device */
+#endif
};
struct zram {
_
Patches currently in -mm which might be from minchan(a)kernel.org are
zram-fix-lockdep-warning-of-free-block-handling.patch
zram-fix-double-free-backing-device.patch
zram-refactoring-flags-and-writeback-stuff.patch
zram-introduce-zram_idle-flag.patch
zram-support-idle-huge-page-writeback.patch
zram-add-bd_stat-statistics.patch
zram-writeback-throttle.patch
The patch titled
Subject: zram: support idle/huge page writeback
has been added to the -mm tree. Its filename is
zram-support-idle-huge-page-writeback.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/zram-support-idle-huge-page-writeb…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/zram-support-idle-huge-page-writeb…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Minchan Kim <minchan(a)kernel.org>
Subject: zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
By per discussion:
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")) so
we could regard it as regression because incompressible pages don't go to
backing storage automatically. Instead, users should do this via "echo
huge" > /sys/block/zram/writeback" manually.
If we hear some regression, we could restore the function.
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan(a)kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas(a)gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/ABI/testing/sysfs-block-zram | 7
Documentation/blockdev/zram.txt | 28 +-
drivers/block/zram/Kconfig | 5
drivers/block/zram/zram_drv.c | 247 +++++++++++++------
drivers/block/zram/zram_drv.h | 1
5 files changed, 209 insertions(+), 79 deletions(-)
--- a/Documentation/ABI/testing/sysfs-block-zram~zram-support-idle-huge-page-writeback
+++ a/Documentation/ABI/testing/sysfs-block-zram
@@ -106,3 +106,10 @@ Description:
idle file is write-only and mark zram slot as idle.
If system has mounted debugfs, user can see which slots
are idle via /sys/kernel/debug/zram/zram<id>/block_state
+
+What: /sys/block/zram<id>/writeback
+Date: November 2018
+Contact: Minchan Kim <minchan(a)kernel.org>
+Description:
+ The writeback file is write-only and trigger idle and/or
+ huge page writeback to backing device.
--- a/Documentation/blockdev/zram.txt~zram-support-idle-huge-page-writeback
+++ a/Documentation/blockdev/zram.txt
@@ -238,11 +238,31 @@ line of text and contains the following
= writeback
-With incompressible pages, there is no memory saving with zram.
-Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page
+With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
to backing storage rather than keeping it in memory.
-User should set up backing device via /sys/block/zramX/backing_dev
-before disksize setting.
+To use the feature, admin should set up backing device via
+
+ "echo /dev/sda5 > /sys/block/zramX/backing_dev"
+
+before disksize setting. It supports only partition at this moment.
+If admin want to use incompressible page writeback, they could do via
+
+ "echo huge > /sys/block/zramX/write"
+
+To use idle page writeback, first, user need to declare zram pages
+as idle.
+
+ "echo all > /sys/block/zramX/idle"
+
+From now on, any pages on zram are idle pages. The idle mark
+will be removed until someone request access of the block.
+IOW, unless there is access request, those pages are still idle pages.
+
+Admin can request writeback of those idle pages at right timing via
+
+ "echo idle > /sys/block/zramX/writeback"
+
+With the command, zram writeback idle pages from memory to the storage.
= memory tracking
--- a/drivers/block/zram/Kconfig~zram-support-idle-huge-page-writeback
+++ a/drivers/block/zram/Kconfig
@@ -15,7 +15,7 @@ config ZRAM
See Documentation/blockdev/zram.txt for more information.
config ZRAM_WRITEBACK
- bool "Write back incompressible page to backing device"
+ bool "Write back incompressible or idle page to backing device"
depends on ZRAM
help
With incompressible page, there is no memory saving to keep it
@@ -23,6 +23,9 @@ config ZRAM_WRITEBACK
For this feature, admin should set up backing device via
/sys/block/zramX/backing_dev.
+ With /sys/block/zramX/{idle,writeback}, application could ask
+ idle page's writeback to the backing device to save in memory.
+
See Documentation/blockdev/zram.txt for more information.
config ZRAM_MEMORY_TRACKING
--- a/drivers/block/zram/zram_drv.c~zram-support-idle-huge-page-writeback
+++ a/drivers/block/zram/zram_drv.c
@@ -52,6 +52,9 @@ static unsigned int num_devices = 1;
static size_t huge_class_size;
static void zram_free_page(struct zram *zram, size_t index);
+static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
+ u32 index, int offset, struct bio *bio);
+
static int zram_slot_trylock(struct zram *zram, u32 index)
{
@@ -73,13 +76,6 @@ static inline bool init_done(struct zram
return zram->disksize;
}
-static inline bool zram_allocated(struct zram *zram, u32 index)
-{
-
- return (zram->table[index].flags >> (ZRAM_FLAG_SHIFT + 1)) ||
- zram->table[index].handle;
-}
-
static inline struct zram *dev_to_zram(struct device *dev)
{
return (struct zram *)dev_to_disk(dev)->private_data;
@@ -138,6 +134,13 @@ static void zram_set_obj_size(struct zra
zram->table[index].flags = (flags << ZRAM_FLAG_SHIFT) | size;
}
+static inline bool zram_allocated(struct zram *zram, u32 index)
+{
+ return zram_get_obj_size(zram, index) ||
+ zram_test_flag(zram, index, ZRAM_SAME) ||
+ zram_test_flag(zram, index, ZRAM_WB);
+}
+
#if PAGE_SIZE != 4096
static inline bool is_partial_io(struct bio_vec *bvec)
{
@@ -308,10 +311,14 @@ static ssize_t idle_store(struct device
}
for (index = 0; index < nr_pages; index++) {
+ /*
+ * Do not mark ZRAM_UNDER_WB slot as ZRAM_IDLE to close race.
+ * See the comment in writeback_store.
+ */
zram_slot_lock(zram, index);
- if (!zram_allocated(zram, index))
+ if (!zram_allocated(zram, index) ||
+ zram_test_flag(zram, index, ZRAM_UNDER_WB))
goto next;
-
zram_set_flag(zram, index, ZRAM_IDLE);
next:
zram_slot_unlock(zram, index);
@@ -546,6 +553,158 @@ static int read_from_bdev_async(struct z
return 1;
}
+#define HUGE_WRITEBACK 0x1
+#define IDLE_WRITEBACK 0x2
+
+static ssize_t writeback_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ struct zram *zram = dev_to_zram(dev);
+ unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
+ unsigned long index;
+ struct bio bio;
+ struct bio_vec bio_vec;
+ struct page *page;
+ ssize_t ret, sz;
+ char mode_buf[8];
+ unsigned long mode = -1UL;
+ unsigned long blk_idx = 0;
+
+ sz = strscpy(mode_buf, buf, sizeof(mode_buf));
+ if (sz <= 0)
+ return -EINVAL;
+
+ /* ignore trailing newline */
+ if (mode_buf[sz - 1] == '\n')
+ mode_buf[sz - 1] = 0x00;
+
+ if (!strcmp(mode_buf, "idle"))
+ mode = IDLE_WRITEBACK;
+ else if (!strcmp(mode_buf, "huge"))
+ mode = HUGE_WRITEBACK;
+
+ if (mode == -1UL)
+ return -EINVAL;
+
+ down_read(&zram->init_lock);
+ if (!init_done(zram)) {
+ ret = -EINVAL;
+ goto release_init_lock;
+ }
+
+ if (!zram->backing_dev) {
+ ret = -ENODEV;
+ goto release_init_lock;
+ }
+
+ page = alloc_page(GFP_KERNEL);
+ if (!page) {
+ ret = -ENOMEM;
+ goto release_init_lock;
+ }
+
+ for (index = 0; index < nr_pages; index++) {
+ struct bio_vec bvec;
+
+ bvec.bv_page = page;
+ bvec.bv_len = PAGE_SIZE;
+ bvec.bv_offset = 0;
+
+ if (!blk_idx) {
+ blk_idx = alloc_block_bdev(zram);
+ if (!blk_idx) {
+ ret = -ENOSPC;
+ break;
+ }
+ }
+
+ zram_slot_lock(zram, index);
+ if (!zram_allocated(zram, index))
+ goto next;
+
+ if (zram_test_flag(zram, index, ZRAM_WB) ||
+ zram_test_flag(zram, index, ZRAM_SAME) ||
+ zram_test_flag(zram, index, ZRAM_UNDER_WB))
+ goto next;
+
+ if ((mode & IDLE_WRITEBACK &&
+ !zram_test_flag(zram, index, ZRAM_IDLE)) &&
+ (mode & HUGE_WRITEBACK &&
+ !zram_test_flag(zram, index, ZRAM_HUGE)))
+ goto next;
+ /*
+ * Clearing ZRAM_UNDER_WB is duty of caller.
+ * IOW, zram_free_page never clear it.
+ */
+ zram_set_flag(zram, index, ZRAM_UNDER_WB);
+ /* Need for hugepage writeback racing */
+ zram_set_flag(zram, index, ZRAM_IDLE);
+ zram_slot_unlock(zram, index);
+ if (zram_bvec_read(zram, &bvec, index, 0, NULL)) {
+ zram_slot_lock(zram, index);
+ zram_clear_flag(zram, index, ZRAM_UNDER_WB);
+ zram_clear_flag(zram, index, ZRAM_IDLE);
+ zram_slot_unlock(zram, index);
+ continue;
+ }
+
+ bio_init(&bio, &bio_vec, 1);
+ bio_set_dev(&bio, zram->bdev);
+ bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
+ bio.bi_opf = REQ_OP_WRITE | REQ_SYNC;
+
+ bio_add_page(&bio, bvec.bv_page, bvec.bv_len,
+ bvec.bv_offset);
+ /*
+ * XXX: A single page IO would be inefficient for write
+ * but it would be not bad as starter.
+ */
+ ret = submit_bio_wait(&bio);
+ if (ret) {
+ zram_slot_lock(zram, index);
+ zram_clear_flag(zram, index, ZRAM_UNDER_WB);
+ zram_clear_flag(zram, index, ZRAM_IDLE);
+ zram_slot_unlock(zram, index);
+ continue;
+ }
+
+ /*
+ * We released zram_slot_lock so need to check if the slot was
+ * changed. If there is freeing for the slot, we can catch it
+ * easily by zram_allocated.
+ * A subtle case is the slot is freed/reallocated/marked as
+ * ZRAM_IDLE again. To close the race, idle_store doesn't
+ * mark ZRAM_IDLE once it found the slot was ZRAM_UNDER_WB.
+ * Thus, we could close the race by checking ZRAM_IDLE bit.
+ */
+ zram_slot_lock(zram, index);
+ if (!zram_allocated(zram, index) ||
+ !zram_test_flag(zram, index, ZRAM_IDLE)) {
+ zram_clear_flag(zram, index, ZRAM_UNDER_WB);
+ zram_clear_flag(zram, index, ZRAM_IDLE);
+ goto next;
+ }
+
+ zram_free_page(zram, index);
+ zram_clear_flag(zram, index, ZRAM_UNDER_WB);
+ zram_set_flag(zram, index, ZRAM_WB);
+ zram_set_element(zram, index, blk_idx);
+ blk_idx = 0;
+ atomic64_inc(&zram->stats.pages_stored);
+next:
+ zram_slot_unlock(zram, index);
+ }
+
+ if (blk_idx)
+ free_block_bdev(zram, blk_idx);
+ ret = len;
+ __free_page(page);
+release_init_lock:
+ up_read(&zram->init_lock);
+
+ return ret;
+}
+
struct zram_work {
struct work_struct work;
struct zram *zram;
@@ -603,57 +762,8 @@ static int read_from_bdev(struct zram *z
else
return read_from_bdev_async(zram, bvec, entry, parent);
}
-
-static int write_to_bdev(struct zram *zram, struct bio_vec *bvec,
- u32 index, struct bio *parent,
- unsigned long *pentry)
-{
- struct bio *bio;
- unsigned long entry;
-
- bio = bio_alloc(GFP_ATOMIC, 1);
- if (!bio)
- return -ENOMEM;
-
- entry = alloc_block_bdev(zram);
- if (!entry) {
- bio_put(bio);
- return -ENOSPC;
- }
-
- bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
- bio_set_dev(bio, zram->bdev);
- if (!bio_add_page(bio, bvec->bv_page, bvec->bv_len,
- bvec->bv_offset)) {
- bio_put(bio);
- free_block_bdev(zram, entry);
- return -EIO;
- }
-
- if (!parent) {
- bio->bi_opf = REQ_OP_WRITE | REQ_SYNC;
- bio->bi_end_io = zram_page_end_io;
- } else {
- bio->bi_opf = parent->bi_opf;
- bio_chain(bio, parent);
- }
-
- submit_bio(bio);
- *pentry = entry;
-
- return 0;
-}
-
#else
static inline void reset_bdev(struct zram *zram) {};
-static int write_to_bdev(struct zram *zram, struct bio_vec *bvec,
- u32 index, struct bio *parent,
- unsigned long *pentry)
-
-{
- return -EIO;
-}
-
static int read_from_bdev(struct zram *zram, struct bio_vec *bvec,
unsigned long entry, struct bio *parent, bool sync)
{
@@ -1006,7 +1116,8 @@ out:
atomic64_dec(&zram->stats.pages_stored);
zram_set_handle(zram, index, 0);
zram_set_obj_size(zram, index, 0);
- WARN_ON_ONCE(zram->table[index].flags & ~(1UL << ZRAM_LOCK));
+ WARN_ON_ONCE(zram->table[index].flags &
+ ~(1UL << ZRAM_LOCK | 1UL << ZRAM_UNDER_WB));
}
static int __zram_bvec_read(struct zram *zram, struct page *page, u32 index,
@@ -1115,7 +1226,6 @@ static int __zram_bvec_write(struct zram
struct page *page = bvec->bv_page;
unsigned long element = 0;
enum zram_pageflags flags = 0;
- bool allow_wb = true;
mem = kmap_atomic(page);
if (page_same_filled(mem, &element)) {
@@ -1140,21 +1250,8 @@ compress_again:
return ret;
}
- if (unlikely(comp_len >= huge_class_size)) {
+ if (comp_len >= huge_class_size)
comp_len = PAGE_SIZE;
- if (zram->backing_dev && allow_wb) {
- zcomp_stream_put(zram->comp);
- ret = write_to_bdev(zram, bvec, index, bio, &element);
- if (!ret) {
- flags = ZRAM_WB;
- ret = 1;
- goto out;
- }
- allow_wb = false;
- goto compress_again;
- }
- }
-
/*
* handle allocation has 2 paths:
* a) fast path is executed with preemption disabled (for
@@ -1643,6 +1740,7 @@ static DEVICE_ATTR_RW(max_comp_streams);
static DEVICE_ATTR_RW(comp_algorithm);
#ifdef CONFIG_ZRAM_WRITEBACK
static DEVICE_ATTR_RW(backing_dev);
+static DEVICE_ATTR_WO(writeback);
#endif
static struct attribute *zram_disk_attrs[] = {
@@ -1657,6 +1755,7 @@ static struct attribute *zram_disk_attrs
&dev_attr_comp_algorithm.attr,
#ifdef CONFIG_ZRAM_WRITEBACK
&dev_attr_backing_dev.attr,
+ &dev_attr_writeback.attr,
#endif
&dev_attr_io_stat.attr,
&dev_attr_mm_stat.attr,
--- a/drivers/block/zram/zram_drv.h~zram-support-idle-huge-page-writeback
+++ a/drivers/block/zram/zram_drv.h
@@ -47,6 +47,7 @@ enum zram_pageflags {
ZRAM_LOCK = ZRAM_FLAG_SHIFT,
ZRAM_SAME, /* Page consists the same element */
ZRAM_WB, /* page is stored on backing_device */
+ ZRAM_UNDER_WB, /* page is under writeback */
ZRAM_HUGE, /* Incompressible page */
ZRAM_IDLE, /* not accessed page since last idle marking */
_
Patches currently in -mm which might be from minchan(a)kernel.org are
zram-fix-lockdep-warning-of-free-block-handling.patch
zram-fix-double-free-backing-device.patch
zram-refactoring-flags-and-writeback-stuff.patch
zram-introduce-zram_idle-flag.patch
zram-support-idle-huge-page-writeback.patch
zram-add-bd_stat-statistics.patch
zram-writeback-throttle.patch
The patch titled
Subject: zram: introduce ZRAM_IDLE flag
has been added to the -mm tree. Its filename is
zram-introduce-zram_idle-flag.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/zram-introduce-zram_idle-flag.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/zram-introduce-zram_idle-flag.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Minchan Kim <minchan(a)kernel.org>
Subject: zram: introduce ZRAM_IDLE flag
To support idle page writeback with upcoming patches, this patch
introduces a new ZRAM_IDLE flag.
Userspace can mark zram slots as "idle" via
"echo all > /sys/block/zramX/idle"
which marks every allocated zram slot as ZRAM_IDLE.
User could see it by /sys/kernel/debug/zram/zram0/block_state.
300 75.033841 ...i
301 63.806904 s..i
302 63.806919 ..hi
Once there is IO for the slot, the mark will be disappeared.
300 75.033841 ...
301 63.806904 s..i
302 63.806919 ..hi
Therefore, 300th block is idle zpage. With this feature,
user can how many zram has idle pages which are waste of memory.
Link: http://lkml.kernel.org/r/20181127055429.251614-5-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan(a)kernel.org>
Cc: Joey Pabalinas <joeypabalinas(a)gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/ABI/testing/sysfs-block-zram | 8 ++
Documentation/blockdev/zram.txt | 10 ++-
drivers/block/zram/zram_drv.c | 57 ++++++++++++++++++-
drivers/block/zram/zram_drv.h | 1
4 files changed, 69 insertions(+), 7 deletions(-)
--- a/Documentation/ABI/testing/sysfs-block-zram~zram-introduce-zram_idle-flag
+++ a/Documentation/ABI/testing/sysfs-block-zram
@@ -98,3 +98,11 @@ Description:
The backing_dev file is read-write and set up backing
device for zram to write incompressible pages.
For using, user should enable CONFIG_ZRAM_WRITEBACK.
+
+What: /sys/block/zram<id>/idle
+Date: November 2018
+Contact: Minchan Kim <minchan(a)kernel.org>
+Description:
+ idle file is write-only and mark zram slot as idle.
+ If system has mounted debugfs, user can see which slots
+ are idle via /sys/kernel/debug/zram/zram<id>/block_state
--- a/Documentation/blockdev/zram.txt~zram-introduce-zram_idle-flag
+++ a/Documentation/blockdev/zram.txt
@@ -169,6 +169,7 @@ comp_algorithm RW show and change
compact WO trigger memory compaction
debug_stat RO this file is used for zram debugging purposes
backing_dev RW set up backend storage for zram to write out
+idle WO mark allocated slot as idle
User space is advised to use the following files to read the device statistics.
@@ -251,16 +252,17 @@ pages of the process with*pagemap.
If you enable the feature, you could see block state via
/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
- 300 75.033841 .wh
- 301 63.806904 s..
- 302 63.806919 ..h
+ 300 75.033841 .wh.
+ 301 63.806904 s...
+ 302 63.806919 ..hi
First column is zram's block index.
Second column is access time since the system was booted
Third column is state of the block.
(s: same page
w: written page to backing store
-h: huge page)
+h: huge page
+i: idle page)
First line of above example says 300th block is accessed at 75.033841sec
and the block's state is huge so it is written back to the backing
--- a/drivers/block/zram/zram_drv.c~zram-introduce-zram_idle-flag
+++ a/drivers/block/zram/zram_drv.c
@@ -281,6 +281,47 @@ static ssize_t mem_used_max_store(struct
return len;
}
+static ssize_t idle_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t len)
+{
+ struct zram *zram = dev_to_zram(dev);
+ unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
+ int index;
+ char mode_buf[8];
+ ssize_t sz;
+
+ sz = strscpy(mode_buf, buf, sizeof(mode_buf));
+ if (sz <= 0)
+ return -EINVAL;
+
+ /* ignore trailing new line */
+ if (mode_buf[sz - 1] == '\n')
+ mode_buf[sz - 1] = 0x00;
+
+ if (strcmp(mode_buf, "all"))
+ return -EINVAL;
+
+ down_read(&zram->init_lock);
+ if (!init_done(zram)) {
+ up_read(&zram->init_lock);
+ return -EINVAL;
+ }
+
+ for (index = 0; index < nr_pages; index++) {
+ zram_slot_lock(zram, index);
+ if (!zram_allocated(zram, index))
+ goto next;
+
+ zram_set_flag(zram, index, ZRAM_IDLE);
+next:
+ zram_slot_unlock(zram, index);
+ }
+
+ up_read(&zram->init_lock);
+
+ return len;
+}
+
#ifdef CONFIG_ZRAM_WRITEBACK
static void reset_bdev(struct zram *zram)
{
@@ -638,6 +679,7 @@ static void zram_debugfs_destroy(void)
static void zram_accessed(struct zram *zram, u32 index)
{
+ zram_clear_flag(zram, index, ZRAM_IDLE);
zram->table[index].ac_time = ktime_get_boottime();
}
@@ -670,12 +712,13 @@ static ssize_t read_block_state(struct f
ts = ktime_to_timespec64(zram->table[index].ac_time);
copied = snprintf(kbuf + written, count,
- "%12zd %12lld.%06lu %c%c%c\n",
+ "%12zd %12lld.%06lu %c%c%c%c\n",
index, (s64)ts.tv_sec,
ts.tv_nsec / NSEC_PER_USEC,
zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
- zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.');
+ zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.',
+ zram_test_flag(zram, index, ZRAM_IDLE) ? 'i' : '.');
if (count < copied) {
zram_slot_unlock(zram, index);
@@ -720,7 +763,10 @@ static void zram_debugfs_unregister(stru
#else
static void zram_debugfs_create(void) {};
static void zram_debugfs_destroy(void) {};
-static void zram_accessed(struct zram *zram, u32 index) {};
+static void zram_accessed(struct zram *zram, u32 index)
+{
+ zram_clear_flag(zram, index, ZRAM_IDLE);
+};
static void zram_debugfs_register(struct zram *zram) {};
static void zram_debugfs_unregister(struct zram *zram) {};
#endif
@@ -924,6 +970,9 @@ static void zram_free_page(struct zram *
#ifdef CONFIG_ZRAM_MEMORY_TRACKING
zram->table[index].ac_time = 0;
#endif
+ if (zram_test_flag(zram, index, ZRAM_IDLE))
+ zram_clear_flag(zram, index, ZRAM_IDLE);
+
if (zram_test_flag(zram, index, ZRAM_HUGE)) {
zram_clear_flag(zram, index, ZRAM_HUGE);
atomic64_dec(&zram->stats.huge_pages);
@@ -1589,6 +1638,7 @@ static DEVICE_ATTR_RO(initstate);
static DEVICE_ATTR_WO(reset);
static DEVICE_ATTR_WO(mem_limit);
static DEVICE_ATTR_WO(mem_used_max);
+static DEVICE_ATTR_WO(idle);
static DEVICE_ATTR_RW(max_comp_streams);
static DEVICE_ATTR_RW(comp_algorithm);
#ifdef CONFIG_ZRAM_WRITEBACK
@@ -1602,6 +1652,7 @@ static struct attribute *zram_disk_attrs
&dev_attr_compact.attr,
&dev_attr_mem_limit.attr,
&dev_attr_mem_used_max.attr,
+ &dev_attr_idle.attr,
&dev_attr_max_comp_streams.attr,
&dev_attr_comp_algorithm.attr,
#ifdef CONFIG_ZRAM_WRITEBACK
--- a/drivers/block/zram/zram_drv.h~zram-introduce-zram_idle-flag
+++ a/drivers/block/zram/zram_drv.h
@@ -48,6 +48,7 @@ enum zram_pageflags {
ZRAM_SAME, /* Page consists the same element */
ZRAM_WB, /* page is stored on backing_device */
ZRAM_HUGE, /* Incompressible page */
+ ZRAM_IDLE, /* not accessed page since last idle marking */
__NR_ZRAM_PAGEFLAGS,
};
_
Patches currently in -mm which might be from minchan(a)kernel.org are
zram-fix-lockdep-warning-of-free-block-handling.patch
zram-fix-double-free-backing-device.patch
zram-refactoring-flags-and-writeback-stuff.patch
zram-introduce-zram_idle-flag.patch
zram-support-idle-huge-page-writeback.patch
zram-add-bd_stat-statistics.patch
zram-writeback-throttle.patch